922 resultados para DNA-microarray data


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Statistical analysis of DNA microarray data provides a valuable diagnostic tool for the investigation of genetic components of diseases. To take advantage of the multitude of available data sets and analysis methods, it is desirable to combine both different algorithms and data from different studies. Applying ensemble learning, consensus clustering and cross-study normalization methods for this purpose in an almost fully automated process and linking different analysis modules together under a single interface would simplify many microarray analysis tasks. Results: We present ArrayMining.net, a web-application for microarray analysis that provides easy access to a wide choice of feature selection, clustering, prediction, gene set analysis and cross-study normalization methods. In contrast to other microarray-related web-tools, multiple algorithms and data sets for an analysis task can be combined using ensemble feature selection, ensemble prediction, consensus clustering and cross-platform data integration. By interlinking different analysis tools in a modular fashion, new exploratory routes become available, e.g. ensemble sample classification using features obtained from a gene set analysis and data from multiple studies. The analysis is further simplified by automatic parameter selection mechanisms and linkage to web tools and databases for functional annotation and literature mining. Conclusion: ArrayMining.net is a free web-application for microarray analysis combining a broad choice of algorithms based on ensemble and consensus methods, using automatic parameter selection and integration with annotation databases.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Down syndrome (DS) immune phenotype is characterized by thymus hypotrophy, higher propensity to organ-specific autoimmune disorders, and higher susceptibility to infections, among other features. Considering that AIRE (autoimmune regulator) is located on 21q22.3, we analyzed protein and gene expression in surgically removed thymuses from 14 DS patients with congenital heart defects, who were compared with 42 age-matched controls with heart anomaly as an isolated malformation. Immunohistochemistry revealed 70.48 +/- 49.59 AIRE-positive cells/mm(2) in DS versus 154.70 +/- 61.16 AIRE-positive cells/mm(2) in controls (p < 0.0001), and quantitative PCR as well as DNA microarray data confirmed those results. The number of FOXP3-positive cells/mm(2) was equivalent in both groups. Thymus transcriptome analysis showed 407 genes significantly hypoexpressed in DS, most of which were related, according to network transcriptional analysis (FunNet), to cell division and to immunity. Immune response-related genes included those involved in 1) Ag processing and presentation (HLA-DQB1, HLA-DRB3, CD1A, CD1B, CD1C, ERAP) and 2) thymic T cell differentiation (IL2RG, RAG2, CD3D, CD3E, PRDX2, CDK6) and selection (SH2D1A, CD74). It is noteworthy that relevant AIRE-partner genes, such as TOP2A, LAMNB1, and NUP93, were found hypoexpressed in DNA microarrays and quantitative real-time PCR analyses. These findings on global thymic hypofunction in DS revealed molecular mechanisms underlying DS immune phenotype and strongly suggest that DS immune abnormalities are present since early development, rather than being a consequence of precocious aging, as widely hypothesized. Thus, DS should be considered as a non-monogenic primary immunodeficiency. The Journal of Immunology, 2011, 187: 3422-3430.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In the present study we show that luxS of Bifidobacterium breve UCC2003 is involved in the production of the interspecies signaling molecule autoinducer-2 (AI-2), and that this gene is essential for gastrointestinal colonization of a murine host, while it is also involved in providing protection against Salmonella infection in Caenorhabditis elegans. We demonstrate that a B. breve luxS-insertion mutant is significantly more susceptible to iron chelators than the WT strain and that this sensitivity can be partially reverted in the presence of the AI-2 precursor DPD. Furthermore, we show that several genes of an iron starvation-induced gene cluster, which are downregulated in the luxS-insertion mutant and which encodes a presumed iron-uptake system, are transcriptionally upregulated under in vivo conditions. Mutation of two genes of this cluster in B. breve UCC2003 renders the derived mutant strains sensitive to iron chelators while deficient in their ability to confer gut pathogen protection to Salmonella-infected nematodes. Since a functional luxS gene is present in all tested members of the genus Bifidobacterium, we conclude that bifidobacteria operate a LuxS-mediated system for gut colonization and pathogen protection that is correlated with iron acquisition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we demonstrate that the prototype B. breve strain UCC2003 possesses specific metabolic pathways for the utilisation of lacto-N-tetraose (LNT) and lacto-N-neotetraose (LNnT), which represent the central moieties of Type I and Type II human milk oligosaccharides (HMOs), respectively. Using a combination of experimental approaches, the enzymatic machinery involved in the metabolism of LNT and LNnT was identified and characterised. Homologs of the key genetic loci involved in the utilisation of these HMO substrates were identified in B. breve, B. bifidum, B. longum subsp. infantis and B. longum subsp. longum using bioinformatic analyses, and were shown to be variably present among other members of the Bifidobacterium genus, with a distinct pattern of conservation among human-associated bifidobacterial species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Xylella fastidiosa genome sequencing has generated valuable data by identifying genes acting either on metabolic pathways or in associated pathogenicity and virulence. Based on available information on these genes, new strategies for studying their expression patterns, such as microarray technology, were employed. A total of 2,600 primer pairs were synthesized and then used to generate fragments using the PCR technique. The arrays were hybridized against cDNAs labeled during reverse transcription reactions and which were obtained from bacteria grown under two different conditions (liquid XDM2 and liquid BCYE). All data were statistically analyzed to verify which genes were differentially expressed. In addition to exploring conditions for X. fastidiosa genome-wide transcriptome analysis, the present work observed the differential expression of several classes of genes (energy, protein, amino acid and nucleotide metabolism, transport, degradation of substances, toxins and hypothetical proteins, among others). The understanding of expressed genes in these two different media will be useful in comprehending the metabolic characteristics of X. fastidiosa, and in evaluating how important certain genes are for the functioning and survival of these bacteria in plants.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: In order to provide a cost-effective tool to analyse pharmacogenetic markers in malaria treatment, DNA microarray technology was compared with sequencing of polymerase chain reaction (PCR) fragments to detect single nucleotide polymorphisms (SNPs) in a larger number of samples. Methods: The microarray was developed to affordably generate SNP data of genes encoding the human cytochrome P450 enzyme family (CYP) and N-acetyltransferase-2 (NAT2) involved in antimalarial drug metabolisms and with known polymorphisms, i.e. CYP2A6, CYP2B6, CYP2C8, CYP2C9, CYP2C19, CYP2D6, CYP3A4, CYP3A5, and NAT2. Results: For some SNPs, i.e. CYP2A6*2, CYP2B6*5, CYP2C8*3, CYP2C9*3/*5, CYP2C19*3, CYP2D6*4 and NAT2*6/*7/*14, agreement between both techniques ranged from substantial to almost perfect (kappa index between 0.61 and 1.00), whilst for other SNPs a large variability from slight to substantial agreement (kappa index between 0.39 and 1.00) was found, e. g. CYP2D6*17 (2850C>T), CYP3A4*1B and CYP3A5*3. Conclusion: The major limit of the microarray technology for this purpose was lack of robustness and with a large number of missing data or with incorrect specificity.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The pipeline for macro- and microarray analyses (PMmA) is a set of scripts with a web interface developed to analyze DNA array data generated by array image quantification software. PMmA is designed for use with single- or double-color array data and to work as a pipeline in five classes (data format, normalization, data analysis, clustering, and array maps). It can also be used as a plugin in the BioArray Software Environment, an open-source database for array analysis, or used in a local version of the web service. All scripts in PMmA were developed in the PERL programming language and statistical analysis functions were implemented in the R statistical language. Consequently, our package is a platform-independent software. Our algorithms can correctly select almost 90% of the differentially expressed genes, showing a superior performance compared to other methods of analysis. The pipeline software has been applied to 1536 expressed sequence tags macroarray public data of sugarcane exposed to cold for 3 to 48 h. PMmA identified thirty cold-responsive genes previously unidentified in this public dataset. Fourteen genes were up-regulated, two had a variable expression and the other fourteen were down-regulated in the treatments. These new findings certainly were a consequence of using a superior statistical analysis approach, since the original study did not take into account the dependence of data variability on the average signal intensity of each gene. The web interface, supplementary information, and the package source code are available, free, to non-commercial users at http://ipe.cbmeg.unicamp.br/pub/PMmA.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)