log2(TPM) is simply
the log2 of the Transcript Count Per Million.TPM stands for transcript per million, and the sum of all TPM values is the same in all samples, such that
a TPM value represents a relative expression level that, in principle, should be comparable between samples.TPM is very similar to RPKM and FPKM. The only difference is the order of operations. Here’s how you calculate TPM:
- Divide the read counts by the length of each gene in kilobases. This gives you reads per kilobase (RPK).
- Count up all the RPK values in a sample and divide this number by 1,000,000. This is your “per million” scaling factor.
- Divide the RPK values by the “per million” scaling factor. This gives you TPM.
So you see, when calculating TPM, the only difference is that you normalize for gene length first, and then normalize for sequencing depth second. However, the effects of this difference are quite profound.
When you use TPM, the sum of all TPMs in each sample are the same. This makes it easier to compare the proportion of reads that mapped to a gene in each sample. In contrast, with RPKM and FPKM, the sum of the normalized reads in each sample may be different, and this makes it harder to compare samples directly.