922 resultados para DNA-microarray data
Resumo:
A common interest in gene expression data analysis is to identify from a large pool of candidate genes the genes that present significant changes in expression levels between a treatment and a control biological condition. Usually, it is done using a statistic value and a cutoff value that are used to separate the genes differentially and nondifferentially expressed. In this paper, we propose a Bayesian approach to identify genes differentially expressed calculating sequentially credibility intervals from predictive densities which are constructed using the sampled mean treatment effect from all genes in study excluding the treatment effect of genes previously identified with statistical evidence for difference. We compare our Bayesian approach with the standard ones based on the use of the t-test and modified t-tests via a simulation study, using small sample sizes which are common in gene expression data analysis. Results obtained report evidence that the proposed approach performs better than standard ones, especially for cases with mean differences and increases in treatment variance in relation to control variance. We also apply the methodologies to a well-known publicly available data set on Escherichia coli bacterium.
Resumo:
We describe the time evolution of gene expression levels by using a time translational matrix to predict future expression levels of genes based on their expression levels at some initial time. We deduce the time translational matrix for previously published DNA microarray gene expression data sets by modeling them within a linear framework by using the characteristic modes obtained by singular value decomposition. The resulting time translation matrix provides a measure of the relationships among the modes and governs their time evolution. We show that a truncated matrix linking just a few modes is a good approximation of the full time translation matrix. This finding suggests that the number of essential connections among the genes is small.
Resumo:
The Stanford Microarray Database (SMD) stores raw and normalized data from microarray experiments, and provides web interfaces for researchers to retrieve, analyze and visualize their data. The two immediate goals for SMD are to serve as a storage site for microarray data from ongoing research at Stanford University, and to facilitate the public dissemination of that data once published, or released by the researcher. Of paramount importance is the connection of microarray data with the biological data that pertains to the DNA deposited on the microarray (genes, clones etc.). SMD makes use of many public resources to connect expression information to the relevant biology, including SGD [Ball,C.A., Dolinski,K., Dwight,S.S., Harris,M.A., Issel-Tarver,L., Kasarskis,A., Scafe,C.R., Sherlock,G., Binkley,G., Jin,H. et al. (2000) Nucleic Acids Res., 28, 77–80], YPD and WormPD [Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C. et al. (2000) Nucleic Acids Res., 28, 73–76], Unigene [Wheeler,D.L., Chappey,C., Lash,A.E., Leipe,D.D., Madden,T.L., Schuler,G.D., Tatusova,T.A. and Rapp,B.A. (2000) Nucleic Acids Res., 28, 10–14], dbEST [Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332–333] and SWISS-PROT [Bairoch,A. and Apweiler,R. (2000) Nucleic Acids Res., 28, 45–48] and can be accessed at http://genome-www.stanford.edu/microarray.
Resumo:
This paper considers a model-based approach to the clustering of tissue samples of a very large number of genes from microarray experiments. It is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. Frequently in practice, there are also clinical data available on those cases on which the tissue samples have been obtained. Here we investigate how to use the clinical data in conjunction with the microarray gene expression data to cluster the tissue samples. We propose two mixture model-based approaches in which the number of components in the mixture model corresponds to the number of clusters to be imposed on the tissue samples. One approach specifies the components of the mixture model to be the conditional distributions of the microarray data given the clinical data with the mixing proportions also conditioned on the latter data. Another takes the components of the mixture model to represent the joint distributions of the clinical and microarray data. The approaches are demonstrated on some breast cancer data, as studied recently in van't Veer et al. (2002).
Resumo:
Mass spectrometry (MS)-based proteomics has seen significant technical advances during the past two decades and mass spectrometry has become a central tool in many biosciences. Despite the popularity of MS-based methods, the handling of the systematic non-biological variation in the data remains a common problem. This biasing variation can result from several sources ranging from sample handling to differences caused by the instrumentation. Normalization is the procedure which aims to account for this biasing variation and make samples comparable. Many normalization methods commonly used in proteomics have been adapted from the DNA-microarray world. Studies comparing normalization methods with proteomics data sets using some variability measures exist. However, a more thorough comparison looking at the quantitative and qualitative differences of the performance of the different normalization methods and at their ability in preserving the true differential expression signal of proteins, is lacking. In this thesis, several popular and widely used normalization methods (the Linear regression normalization, Local regression normalization, Variance stabilizing normalization, Quantile-normalization, Median central tendency normalization and also variants of some of the forementioned methods), representing different strategies in normalization are being compared and evaluated with a benchmark spike-in proteomics data set. The normalization methods are evaluated in several ways. The performance of the normalization methods is evaluated qualitatively and quantitatively on a global scale and in pairwise comparisons of sample groups. In addition, it is investigated, whether performing the normalization globally on the whole data or pairwise for the comparison pairs examined, affects the performance of the normalization method in normalizing the data and preserving the true differential expression signal. In this thesis, both major and minor differences in the performance of the different normalization methods were found. Also, the way in which the normalization was performed (global normalization of the whole data or pairwise normalization of the comparison pair) affected the performance of some of the methods in pairwise comparisons. Differences among variants of the same methods were also observed.
Resumo:
Several factors have recently converged, elevating the need for highly parallel diagnostic platforms that have the ability to detect many known, novel, and emerging pathogenic agents simultaneously. Panviral DNA microarrays represent the most robust approach for massively parallel viral surveillance and detection. The Virochip is a panviral DNA microarray that is capable of detecting all known viruses, as well as novel viruses related to known viral families, in a single assay and has been used to successfully identify known and novel viral agents in clinical human specimens. However, the usefulness and the sensitivity of the Virochip platform have not been tested on a set of clinical veterinary specimens with the high degree of genetic variance that is frequently observed with swine virus field isolates. In this report, we investigate the utility and sensitivity of the Virochip to positively detect swine viruses in both cell culture-derived samples and clinical swine samples. The Virochip successfully detected porcine reproductive and respiratory syndrome virus (PRRSV) in serum containing 6.10 × 10(2) viral copies per microliter and influenza A virus in lung lavage fluid containing 2.08 × 10(6) viral copies per microliter. The Virochip also successfully detected porcine circovirus type 2 (PCV2) in serum containing 2.50 × 10(8) viral copies per microliter and porcine respiratory coronavirus (PRCV) in turbinate tissue homogenate. Collectively, the data in this report demonstrate that the Virochip can successfully detect pathogenic viruses frequently found in swine in a variety of solid and liquid specimens, such as turbinate tissue homogenate and lung lavage fluid, as well as antemortem samples, such as serum.
Resumo:
Ochnaceae s.str. (Malpighiales) are a pantropical family of about 500 species and 27 genera of almost exclusively woody plants. Infrafamilial classification and relationships have been controversial partially due to the lack of a robust phylogenetic framework. Including all genera except Indosinia and Perissocarpa and DNA sequence data for five DNA regions (ITS, matK, ndhF, rbcL, trnL-F), we provide for the first time a nearly complete molecular phylogenetic analysis of Ochnaceae s.l. resolving most of the phylogenetic backbone of the family. Based on this, we present a new classification of Ochnaceae s.l., with Medusagynoideae and Quiinoideae included as subfamilies and the former subfamilies Ochnoideae and Sauvagesioideae recognized at the rank of tribe. Our data support a monophyletic Ochneae, but Sauvagesieae in the traditional circumscription is paraphyletic because Testulea emerges as sister to the rest of Ochnoideae, and the next clade shows Luxemburgia+Philacra as sister group to the remaining Ochnoideae. To avoid paraphyly, we classify Luxemburgieae and Testuleeae as new tribes. The African genus Lophira, which has switched between subfamilies (here tribes) in past classifications, emerges as sister to all other Ochneae. Thus, endosperm-free seeds and ovules with partly to completely united integuments (resulting in an apparently single integument) are characters that unite all members of that tribe. The relationships within its largest clade, Ochnineae (former Ochneae), are poorly resolved, but former Ochninae (Brackenridgea, Ochna) are polyphyletic. Within Sauvagesieae, the genus Sauvagesia in its broad circumscription is polyphyletic as Sauvagesia serrata is sister to a clade of Adenarake, Sauvagesia spp., and three other genera. Within Quiinoideae, in contrast to former phylogenetic hypotheses, Lacunaria and Touroulia form a clade that is sister to Quiina. Bayesian ancestral state reconstructions showed that zygomorphic flowers with adaptations to buzz-pollination (poricidal anthers), a syncarpous gynoecium (a near-apocarpous gynoecium evolved independently in Quiinoideae and Ochninae), numerous ovules, septicidal capsules, and winged seeds with endosperm are the ancestral condition in Ochnoideae. Although in some lineages poricidal anthers were lost secondarily, the evolution of poricidal superstructures secured the maintenance of buzz-pollination in some of these genera, indicating a strong selective pressure on keeping that specialized pollination system.
Resumo:
Background: High-density tiling arrays and new sequencing technologies are generating rapidly increasing volumes of transcriptome and protein-DNA interaction data. Visualization and exploration of this data is critical to understanding the regulatory logic encoded in the genome by which the cell dynamically affects its physiology and interacts with its environment. Results: The Gaggle Genome Browser is a cross-platform desktop program for interactively visualizing high-throughput data in the context of the genome. Important features include dynamic panning and zooming, keyword search and open interoperability through the Gaggle framework. Users may bookmark locations on the genome with descriptive annotations and share these bookmarks with other users. The program handles large sets of user-generated data using an in-process database and leverages the facilities of SQL and the R environment for importing and manipulating data. A key aspect of the Gaggle Genome Browser is interoperability. By connecting to the Gaggle framework, the genome browser joins a suite of interconnected bioinformatics tools for analysis and visualization with connectivity to major public repositories of sequences, interactions and pathways. To this flexible environment for exploring and combining data, the Gaggle Genome Browser adds the ability to visualize diverse types of data in relation to its coordinates on the genome. Conclusions: Genomic coordinates function as a common key by which disparate biological data types can be related to one another. In the Gaggle Genome Browser, heterogeneous data are joined by their location on the genome to create information-rich visualizations yielding insight into genome organization, transcription and its regulation and, ultimately, a better understanding of the mechanisms that enable the cell to dynamically respond to its environment.
Resumo:
Dermcidin (DCD) is a human gene mapped to chromosome 12q13 region, which is co-amplified with multiple oncogenes with a well-established role in the growth, survival and progression of breast cancers. Here, we present a summary of a DNA microarray-based study that identified the genes that are up- and down-regulated in a human MDA-361 pLKO control clone and three clones expressing short hairpin RNA against three different regions of DCD mRNA. A list of 235 genes was differentially expressed among independent clones (> 3-fold change and P < 0.005). The gene expression of 208 was reduced and of 27 was increased in the three DCD-RNAi clones compared to pLKO control clone. The expression of 77 genes (37%) encoding for enzymes involved in amino acid metabolism, glucose metabolism and oxidoreductase activity and several genes required for cell survival and DNA repair were decreased. The expression of EGFR/ErbB-1 gene, an important predictor of outcome in breast cancer, was reduced together with the genes for betacellulin and amphiregulin, two known ligands of EGFR/ErbB receptors. Many of the 27 genes up-regulated by DCD-RNAi expression have not yet been fully characterized; among those with known function, we identified the calcium-calmodulin-dependent protein kinase-II delta and calcineurin A alpha. We compared 132 up-regulated and 12 down-regulated genes in our dataset with those genes up- and down-regulated by inhibitors targeting various signaling pathway components. The analysis showed that the genes in the DCD pathway are aligned with those functionally influenced by the drugs sirolimus, LY-294002 and wortmannin. Therefore, DCD may exert its function by activating the PI3K/AKT/mTOR signaling pathway. Together, these bioinformatic approaches suggest the involvement of DCD in the regulation of genes for breast cancer cell metabolism, proliferation and survival.
Resumo:
Moniliophthora perniciosa is a hemibiotrophic fungus that causes witches` broom disease (WBD) in cacao. Marked dimorphism characterizes this fungus, showing a monokaryotic or biotrophic phase that causes disease symptoms and a later dikaryotic or saprotrophic phase. A combined strategy of DNA microarray, expressed sequence tag, and real-time reverse-transcriptase polymerase chain reaction analyses was employed to analyze differences between these two fungal stages in vitro. In all, 1,131 putative genes were hybridized with cDNA from different phases, resulting in 189 differentially expressed genes, and 4,595 reads were clusterized, producing 1,534 unigenes. The analysis of these genes, which represent approximately 21% of the total genes, indicates that the biotrophic-like phase undergoes carbon and nitrogen catabollite repression that correlates to the expression of phytopathogenicity genes. Moreover, downregulation of mitochondrial oxidative phosphorylation and the presence of a putative ngr1 of Saccharomyces cerevisiae could help explain its lower growth rate. In contrast, the saprotrophic mycelium expresses genes related to the metabolism of hexoses, ammonia, and oxidative phosphorylation, which could explain its faster growth. Antifungal toxins were upregulated and could prevent the colonization by competing fungi. This work significantly contributes to our understanding of the molecular mechanisms of WBD and, to our knowledge, is the first to analyze differential gene expression of the different phases of a hemibiotrophic fungus.
Resumo:
Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.
Resumo:
Projecte de recerca elaborat a partir d’una estada al Department for Feed and Food Hygiene del National Veterinary Institute, Noruega, entre novembre i desembre del 2006. Els grans de cereal poden estar contaminats amb diferents espècies de Fusarium capaces de produir metabolits secundaris altament tòxics com trichotecenes, fumonisines o moniliformines. La correcta identificació d’aquestes espècies és de gran importància per l’assegurament del risc en l’àmbit de la salut humana i animal. La identificació de Fusarium en base a la seva morfologia requereix coneixements taxonòmics i temps; la majoria dels mètodes moleculars permeten la identificació d’una única espècie diana. Per contra, la tecnologia de microarray ofereix l’anàlisi paral•lel d’un alt nombre de DNA dianes. En aquest treball, s’ha desenvolupat un array per a la identificació de les principals espècies de Fusarium toxigèniques del Nord i Sud d’Europa. S’ha ampliat un array ja existent, per a la detecció de les espècies de Fusarium productores de trichothecene i moniliformina (predominants al Nord d’Europa), amb l’addició de 18 sondes de DNA que permeten identificar les espècies toxigèniques més abundants al Sud d’Europa, les qual produeixen majoritàriament fumonisines. Les sondes de captura han estat dissenyades en base al factor d’elongació translació- 1 alpha (TEF-1alpha). L’anàlisi de les mostres es realitza mitjançant una única PCR que permet amplificar part del TEF-1alpha seguida de la hibridació al xip de Fusarium. Els resultats es visualitzen mitjançant un mètode de detecció colorimètric. El xip de Fusarium desenvolupat pot esdevenir una eina útil i de gran interès per a l’anàlisi de cereals presents en la cadena alimentària.
Resumo:
BACKGROUND: The Nuclear Factor I (NFI) family of DNA binding proteins (also called CCAAT box transcription factors or CTF) is involved in both DNA replication and gene expression regulation. Using chromatin immuno-precipitation and high throughput sequencing (ChIP-Seq), we performed a genome-wide mapping of NFI DNA binding sites in primary mouse embryonic fibroblasts. RESULTS: We found that in vivo and in vitro NFI DNA binding specificities are indistinguishable, as in vivo ChIP-Seq NFI binding sites matched predictions based on previously established position weight matrix models of its in vitro binding specificity. Combining ChIP-Seq with mRNA profiling data, we found that NFI preferentially associates with highly expressed genes that it up-regulates, while binding sites were under-represented at expressed but unregulated genes. Genomic binding also correlated with markers of transcribed genes such as histone modifications H3K4me3 and H3K36me3, even outside of annotated transcribed loci, implying NFI in the control of the deposition of these modifications. Positional correlation between + and - strand ChIP-Seq tags revealed that, in contrast to other transcription factors, NFI associates with a nucleosomal length of cleavage-resistant DNA, suggesting an interaction with positioned nucleosomes. In addition, NFI binding prominently occurred at boundaries displaying discontinuities in histone modifications specific of expressed and silent chromatin, such as loci submitted to parental allele-specific imprinted expression. CONCLUSIONS: Our data thus suggest that NFI nucleosomal interaction may contribute to the partitioning of distinct chromatin domains and to epigenetic gene expression regulation.NFI ChIP-Seq and input control DNA data were deposited at Gene Expression Omnibus (GEO) repository under accession number GSE15844. Gene expression microarray data for mouse embryonic fibroblasts are on GEO accession number GSE15871.
Resumo:
SUMMARY: Large sets of data, such as expression profiles from many samples, require analytic tools to reduce their complexity. The Iterative Signature Algorithm (ISA) is a biclustering algorithm. It was designed to decompose a large set of data into so-called 'modules'. In the context of gene expression data, these modules consist of subsets of genes that exhibit a coherent expression profile only over a subset of microarray experiments. Genes and arrays may be attributed to multiple modules and the level of required coherence can be varied resulting in different 'resolutions' of the modular mapping. In this short note, we introduce two BioConductor software packages written in GNU R: The isa2 package includes an optimized implementation of the ISA and the eisa package provides a convenient interface to run the ISA, visualize its output and put the biclusters into biological context. Potential users of these packages are all R and BioConductor users dealing with tabular (e.g. gene expression) data. AVAILABILITY: http://www.unil.ch/cbg/ISA CONTACT: sven.bergmann@unil.ch
Resumo:
The European Prospective Investigation into Cancer and nutrition (EPIC) is a long-term, multi-centric prospective study in Europe investigating the relationships between cancer and nutrition. This study has served as a basis for a number of Genome-Wide Association Studies (GWAS) and other types of genetic analyses. Over a period of 5 years, 52,256 EPIC DNA samples have been extracted using an automated DNA extraction platform. Here we have evaluated the pre-analytical factors affecting DNA yield, including anthropometric, epidemiological and technical factors such as center of subject recruitment, age, gender, body-mass index, disease case or control status, tobacco consumption, number of aliquots of buffy coat used for DNA extraction, extraction machine or procedure, DNA quantification method, degree of haemolysis and variations in the timing of sample processing. We show that the largest significant variations in DNA yield were observed with degree of haemolysis and with center of subject recruitment. Age, gender, body-mass index, cancer case or control status and tobacco consumption also significantly impacted DNA yield. Feedback from laboratories which have analyzed DNA with different SNP genotyping technologies demonstrate that the vast majority of samples (approximately 88%) performed adequately in different types of assays. To our knowledge this study is the largest to date to evaluate the sources of pre-analytical variations in DNA extracted from peripheral leucocytes. The results provide a strong evidence-based rationale for standardized recommendations on blood collection and processing protocols for large-scale genetic studies.