Differential expression analysis

Info

General info:

Differential expression analysis using limma/voom
Some values have been rounded

Datasets analysed:

mirna smrnaseq: TRUE
mirna excerpt: TRUE
pirna excerpt: FALSE
trna excerpt: TRUE
circrna excerpt: FALSE
gencode excerpt: TRUE

Treatment comparisons:

treatment1 - treatment2
treatment1 - treatment3

Total number of samples: 12

Number of samples in each treatment group:

treatment1: 4
treatment2: 4
treatment3: 4

The data was filtered using the filterByExpr function. The filtering keeps RNA’s that have count-per-million (CPM) above min.count (10) in n samples (12). In addition, each kept RNA is required to have at least min.total.count reads (15) across all the samples. From a statistical point of view, removing low count RNA’s allows the mean-variance relationship in the data to be estimated with greater reliability (Law et al., (2018)).

Normalization factors used to scale the raw library sizes are calculated using the calcNormFactors function using the TMM normalization method. This method uses the weighted trimmed mean of M-values (to the reference) proposed by Robinson and Oshlack (2010), where the weights are from the delta method on Binomial data.

The count data is then transformed to log2-counts per million (logCPM) using the voom function (see the paper here). Fitting linear models to the comparisons/contrasts (treatment1 - treatment2, treatment1 - treatment3) is carried out using the lmFit and contrasts.fit functions. Next, empirical Bayes moderation is carried using the eBayes function which borrows information across all RNA’s to obtain more precise estimates of RNA-wise variability (Law et al., (2018)). It computes moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.

The Benjamini and Hochberg method (see Benjamini & Hochberg (1995)) was used in the topTreat function to adjust the p-values to account for multiple testing, reducing the false discovery rate. See this article comparing the different adjustment methods. Both the limma/voom and DESeq2 differential expression methods use the same p-value adjustment method.

Mean variance plot

The mean-variance relationship of log-CPM values for this dataset. Typically, the “voom-plot” shows a decreasing trend between the means and variances resulting from a combination of technical variation in the sequencing experiment and biological variation amongst the replicate samples from different treatment groups. Experiments with high biological variation usually result in flatter trends, where variance values plateau at high expression values. Experiments with low biological variation tend to result in sharp decreasing trends (Law et al., (2018)).

Moreover, the voom-plot provides a visual check on the level of filtering performed upstream. If filtering of lowly-expressed RNA’s is insufficient, a drop in variance levels can be observed at the low end of the expression scale due to very small counts. If this is observed, one should return to the earlier filtering step and increase the expression threshold applied to the dataset (Law et al., (2018)).

miRNA’s (smrnaseq)

miRNA’s (excerpt)

tRNA’s (excerpt)

gencodes (excerpt)

Residual variances

An adapted version of the plotSA function was used to create this plot (made interactive and included individual RNA information for each data point), which can be used to check the mean-variance relationship of the expression data, after fitting a linear model.

Differential expression analysis - limma/voom

Date: 09/03/2022

Info

Mean variance plot

miRNA’s (smrnaseq)

miRNA’s (excerpt)

tRNA’s (excerpt)

gencodes (excerpt)

Residual variances

miRNA’s (smrnaseq)

miRNA’s (excerpt)

tRNA’s (excerpt)

gencodes (excerpt)