SUPPLEMENTARY WEB-PAGE
An Empirical Bayes Adjustment to Increase the Sensitivity of Detecting
Differentially Expressed Genes in Microarray
Experiments
Susmita Datta1, Glen A. Satten2,
Dale J. Benos3, Jiazeng Xia3,
Martin J. Heslin4, Somnath Datta* 5
1 Department of Mathematics and Statistics, Georgia State University,
Atlanta, GA 30303.
2 Centers for Disease Control and Prevention, Atlanta, GA
30333.
3 Department of Physiology and Biophysics, University of Alabama at Birmingham, Birmingham, AL 35294.
4 Department of Surgery, University of Alabama at Birmingham,
Birmingham, AL 35294.
5 Department of Statistics, University of Georgia,
Athens, GA 30602.
* Corresponding author:
somnath.datta@louisville.edu
Bioinformatics, 20, 235-242 (2004)
Motivation: Detection of differentially expressed genes is one of the major
goals of microarray experiments. Pairwise
comparison for each gene is not appropriate without controlling the overall (experimentwise) type 1 error rate. Dudoit et al. have advocated use of permutation-based
step-down P-value adjustments to correct the observed significance levels for
the individual (i.e., for each gene) two sample t-tests.
Results: In this paper, we consider an ANOVA formulation of the gene expression
levels corresponding to multiple tissue types. We provide resampling-based
step-down adjustments to correct the observed significance levels for the
individual ANOVA t-tests for each gene and for each pair of tissue type
comparisons. More importantly, we introduce a novel empirical Bayes adjustment to the t-test statistics that can be
incorporated into the step-down procedure. Using simulated data, we show
that the empirical Bayes adjustment improved the
sensitivity of detecting differentially expressed genes up to 16%, while
maintaining a high level of specificity. This adjustment also reduces the FNR
(false non-discovery rate) to some degree at the cost of a modest increase in
the FDR (false discovery rate). We illustrate our approach using a human colon
cancer data set consisting of oligonucleotide arrays
of normal, adenoma and carcinoma cells. The number of genes with differential
expression level declared statistically significant was about fifty when
comparing normal to adenoma cells and about five when comparing adenoma to
carcinoma cells. This list includes genes previously known to be associated
with colon cancer as well as some novel genes.
##################################################################################
Data set (colon cancer) download
For further details on the data set
contact Dr. Dale Benos (benos@physiology.uab.edu)
##################################################################################
R-code used to calculate
the P-values of the empirical Bayes adjusted t-tests
for the colon cancer data download
This is being distributed without warrantee; use at your own risk!
##################################################################################
Mean (log) expression level versus standard deviation of the residuals for the colon cancer data:
For easy visualization, values for the top 2000 (most differentially expressed) genes are plotted.
##################################################################################
##################################################################################