Affymetrix microarray how does it work
Affymetrix microarray data are normalized in three steps: background correction, to adjust for hybridization effects unrelated to the interaction between probes and target DNA; normalization, to remove systematic errors and biases thereby allowing data to be compared from one array to another; summarization, combining the multiple probe intensities from a probe set to yield a single value for each gene that best represents the expression level of the RNA transcript.
Numerous data extraction methods have been proposed in the literature to perform these crucial steps in processing Affymetrix oligonucleotide microarray data.
The first data extraction method provided as the Affymetrix default was the Average Difference AD , a linear scale measure that relied upon the difference measure PM-MM to correct for non-specific binding. This measurement was superseded by the current standard MAS5. It was shown subsequently that one third of probe pairs consistently yield negative signals, showing that use of MM probes for detection of non-specific binding is unreliable [ 3 , 4 ].
In this respect, Irizarry et al. Li and Wong [ 6 ] developed a statistical model for probe level data and their model based expression index MBEI has been developed into dChip, one of the most popular software approaches used today.
Physical energy-based models have also been developed as an attempt to model the formation of DNA-RNA duplexes on oligonucleotide microarrays [ 7 ], most notably the positional dependent nearest neighbour PDNN model of Zhang et al. Following this idea, Wu et al. The number of methods available continues to grow, yet there is no consensus as to which is the most appropriate and reliable method for a given application.
Calibration datasets derived from mixture experiments [ 10 ], spike-in studies and dilution series [ 3 , 5 , 11 — 14 ] have been an invaluable resource to develop and assess data extraction methodology. The advantage of these benchmarking datasets is that the expected outcome of expression analysis is known in advance and so alternative expression measures can be compared in terms of the expected features.
This property has been exploited to develop a graphical tool for the evaluation and comparison of expression measures aimed at helping researchers to decipher the multitude of methods available [ 12 , 14 ]. Studies utilizing benchmark datasets have typically observed a large effect of the normalization method on the outcome of the expression analyses [ 15 — 17 ].
However, the performance of 'spike-in' experiments can be affected by sources of systematic variation and it is not clear how this might affect evaluation of different data extraction methods [ 15 ]. One alternative strategy involved assessing the gene expression between males and females at Y-chromosome linked genes as a true biological internal control [ 18 ]. In this study, the performance of the method was measured by recording how many differentially expressed Y-chromosome linked genes were detected between male and female samples.
However, the general applicability of this kind of test is limited. More recently, Harr and Schlotterer [ 15 ] introduced an alternative strategy to evaluate normalization methods by exploiting the existence of bacterial operons in which genes are expected to have highly correlated expression levels.
This strategy effectively avoided the systematic biases inherent in the spike-in approach. It is increasingly evident that performance analyses using calibration datasets are not necessarily consistent with data from realistic biological studies [ 16 , 20 ], suggesting the need to consider real biological studies in an attempt to evaluate the relative merits of Affymetrix data extraction methods.
In this article we present a comparison of the influence of seven commonly used data extraction methods on the detection of differentially expressed genes using a genome-wide gene expression dataset from eight genetically divergent barley lines. The major challenge arising from the use of this dataset is that one has no a priori knowledge of which genes are differentially expressed.
To address this challenge we used a novel strategy based on genes in which we detected single feature polymorphisms SFPs. SFPs are genetic polymorphisms in observed expression within one particular feature oligonucleotide probe of a probe set 11 PM and MM probes on the array [ 21 ]. Using two barley 'Genetical Genomics' datasets we have previously shown that SFPs mainly represent expression differences that are the result of polymorphism in cis -acting regulators [ 22 ].
On this basis we propose that differential expression detected in SFP-containing genes is more likely to reflect true differential expression and so we use this as a criterion to assess the efficacy of the seven methods referred to above in the detection of differential gene expression. The present study implements seven methods commonly used in the literature to calculate expression indices from Affymetrix microarray gene expression data, which was collected from a well-designed genome-wide microarray hybridization experiment with eight genetically divergent barley cultivars.
We explore various statistical properties of the methods in modelling and analyzing the microarray dataset. The findings are compared with those based on an independent dataset of Affymetrix genome-wide gene expression profiled on two divergent yeast strains. To explore the consistency of the 22, barley gene expression indices estimated from the seven different methods, we calculated Pearson's Product Moment Correlation coefficients in the expression estimates and the correlation analyses are summarized in Table 2.
The corresponding results based on the yeast dataset are summarized in Table 4 [see Additional file 1 ]. The same pattern of correlation in gene expression estimate between these seven methods was also recovered in the analysis of gene expression profiles on two yeast strains. The diagonal elements in Table 2 represent means and standard deviations of correlation coefficients in gene expression indices between biological replicates.
They show that MAS5. We compared the ability of each method to calculate consistent gene expression values between biological replicates of a given barley variety using the intra-class correlation coefficients. Statistical properties of estimated barley gene expression indices from seven data extraction methods. For each method the three columns from left to right correspond to FDR levels 0. To explain the different performances of the methods illustrated above, we investigated the effect of each step in processing the microarray datasets on estimates of the expression indices in the barley dataset.
We tested use of different background correction methods but the same normalization and summarization steps in estimating the genome-wide gene expression indices, and calculated the correlation coefficient for each pair-wise comparison of background correction methods. The correlation coefficients for the MAS5.
Therefore the background correction methods did not have a significant effect on the correlation between methods. To compare the ability to detect differentially expressed genes among the barley varieties for the seven data extraction methods, our primary focus is sensitivity, defined as the total number of genes detected with significant differential expression at a given FDR level. Figures 1b and 2b [see Additional file 2 ] show the number of genes with significant differential expression called by the seven methods across a range of FDR levels, for the barley and yeast datasets respectively.
Across all FDR levels, there was marked variation among the seven methods in the number of genes detected as differentially expressed. The variation in FDR across the seven methods occurs for two reasons; firstly, variation in the number of genes detected significantly differentially expressed among the varieties and secondly, variation in the expected number of genes with detected significant differential expression when there is no real differential expression.
Shedden et al. Figures 1c and 2c [see Additional File 1 ] show how the p -value threshold required to achieve a given FDR value differs substantially among the seven methods, for both barley and yeast datasets respectively. Notably, Figures 1b and 1c and also Figures 2b and 2c [see Additional file 2 ] both illustrate exactly the same order of the seven methods, showing that calibration plays an important role in determining sensitivity in detecting differential gene expression.
An important aspect in comparing the different methods would be to compare their ability to detect the same differentially expressed genes, their mutual predictability. The MAS5. However, all pair-wise comparisons between methods showed that all methods detected differentially expressed genes not detected by the other methods. This suggests that all methods contribute unique but important information on differential gene expression.
Interestingly, methods calling similar genes as differentially expressed did not exhibit greater expression similarity. Previous generations of all Affymetrix GeneChips are available for users seeking to add new data to existing gene-expression datasets.
Please contact us for pricing. This is optional, but highly recommended. Give us total RNA See our recommendations. Standard processing is available starting with as little as 50 ng of total RNA. A computer is used to record the pattern of fluorescence emission and DNA identification.
This technique of employing DNA chips is very rapid, besides being sensitive and specific for the identification of several DNA fragments simultaneously.
TMAs are similar to gene expression microarrays in having samples arrayed in rows and columns on a glass slide; they differ in that each element on the TMA slide corresponds to a single patient sample, allowing multiple patient samples to be assessed for a single molecular marker in one experiment, while gene expression arrays allow assessment of thousands of molecular markers on a single patient sample per experiment.
Tumor formation involves simultaneous changes in hundreds of cells and variations in genes. Microarray can be a boon to researchers as it provides a platform for simultaneous testing of a large set of genetic samples. It helps especially in the identification of single-nucleotide polymorphisms SNPs and mutations, classification of tumors, identification of target genes of tumor suppressors, identification of cancer biomarkers, identification of genes associated with chemoresistance, and drug discovery.
For example, we can compare the different patterns of gene expression levels between a group of cancer patients and a group of normal patients and identify the gene associated with that particular cancer. Gene microarrays have been used for comparative genomic hybridization. In this technique, genomic DNA is fluorescently labeled and used to determine the presence of gene loss or amplification.
The conversion of a non-invasive tumor to an invasive tumor also warrants research. Clark et al. Microarray-based expression profiling allows us to identity families of genes as well as the important molecular and cellular events that might be important in complex processes like metastasis. Practical applications in future include diagnostic and prognostic management of patients. Clinicians will be able to use microarrays during early clinical trials to confirm the mechanisms of action of drugs and to assess drug sensitivity and toxicity.
They can be used to develop a new molecular taxonomy of cancer, including clustering of cancers according to prognostic groups on the basis of gene expression profiles.
Increase in the number of resistant bacteria and superadded infections has led to failure of antibiotics. Virulence of the bacterial strains too affects the outcome of the disease process. In oral cavity where anaerobic bacteria might be the infective agent, they often are not easily culturable, especially organisms such as actinomyces. DNA microarray analysis helps as the bacterial genomic DNA often outlasts the viability of the bacteria and a diagnosis can be made using a small amount of DNA, as opposed to the large numbers of bacteria needed for culture.
In future, an abscess specimen might be sent not for culture and sensitivity testing, but rather for DNA microarray analysis. Leukoplakia or white lesions of the oral cavity may result from a myriad of reversible conditions. Currently, microscopic examination fails to identify the small subset of these lesions that progress to oral cancer.
Recent studies have illustrated the effectiveness of microarrays in oral cancer. Early diagnosis and management of oral cancer is correlated with increased survival. Identification and treatment of premalignant and early cancerous oral lesions may become one of the most valuable services in future performs.
This review has given a small outline of the technique behind microarray and the various steps involved. The technique, though limited at present in its applications due to the cost factor, may widen its prospects once there is increase in the availability of commercial products.
The manufacturing process for these chips is similar to that used in the semiconductor industry - a combination of chemistry and photolithography. Photolithography is a process of using light to control the manufacture of multiple layers of material. In this activity, you will learn about the type of process used to manufacture a GeneChip microarray - photolithography.
Then, in groups, you will use what you learned to build models of the process out of every day household items. You will present this model to the rest of the class. Students will be organized into groups of five or six and each group will be assigned a different scenario that uses one of three different types of DNA chips, Gene Expression, Resequencing, or Genotyping, in the research. Each scenario will have its' own set of results that the group must analyze, interpret, and then present to the class what they determined in a short, five minute presentation.
The presentation should include relevant experimental background, the experiment itself, the results, the analysis, and possible future research. The purpose of this activity is to challenge you to analyze and interpret data in a group setting and work out a real life research problem. All scenarios and results are simplified, but are related to real research and medical studies occurring in recent studies.
Scenario F: E. All Scenarios Download pdf, KB. Whether it is in medical research and drug manufacturing, insurance, reproductive technology, or public policy, the information this technology affects everyone in one way or another.
It can change the way we live our lives! An educated and informed public is a key part of making sure this genetic information is used in the most ethical manner possible and for the benefit of all. However, as with all technology, there is the possibility of abuse in a manner that complicates, discriminates or endangers the lives of others.
Who has access to the information? In this activity, you will work in groups to analyze a given ethical scenario for either pros or cons or both. You will discuss and brainstorm ideas with your group, then report out to the class.