Differential expression analysis

Info

General info:

Differential expression analysis using deseq2.
Some values have been rounded

Datasets analysed:

mirna smrnaseq: TRUE
mirna excerpt: TRUE
pirna excerpt: FALSE
trna excerpt: TRUE
circrna excerpt: FALSE
gencode excerpt: TRUE

Treatment comparisons:

treatment1 - treatment2
treatment1 - treatment3

Total number of samples: 12

Number of samples in each treatment group:

treatment1: 4
treatment2: 4
treatment3: 4

This differential expression method is undertaken using the DESeq2 function. This performs an analysis through estimation of size factors (estimateSizeFactors function), estimation of dispersion (estimateDispersions function) and Negative Binomial GLM fitting and Wald statistics (nbinomWaldTest function).

When obtaining dispersion estimates, a parametric fit type was used for fitting of dispersions to the mean intensity. This fits a dispersion-mean relation via a robust gamma-family GLM (see McCarthy et al., (2012)).

The Benjamini and Hochberg method (see Benjamini & Hochberg (1995)) was used in the results function to adjust the p-values to account for multiple testing, reducing the false discovery rate. See this article comparing the different adjustment methods. Both the limma/voom and DESeq2 differential expression methods use the same p-value adjustment method.

Additionally, a log fold change threshold of log2(1.1) was used in the results function.

Count outlier detection

DESeq2 relies on the negative binomial distribution to make estimates and perform statistical inference on differences. While the negative binomial is versatile in having a mean and dispersion parameter, extreme counts in individual samples might not fit well to the negative binomial. For this reason, we perform automatic detection of count outliers. We use Cook’s distance, which is a measure of how much the fitted coefficients would change if an individual sample were removed (Cook 1977). For more on the implementation of Cook’s distance see the manual page for the results function. Below we plot the maximum value of Cook’s distance for each row over the rank of the test statistic to justify its use as a filtering criterion (based on DESeq2 vignette).

Differential expression analysis - deseq2

Date: 09/03/2022

Info

Count outlier detection

miRNA’s (smrnaseq)

miRNA’s (excerpt)

tRNA’s (excerpt)

gencodes (excerpt)