30 resultados para DNA-microarray data
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
A number of experimental methods have been reported for estimating the number of genes in a genome, or the closely related coding density of a genome, defined as the fraction of base pairs in codons. Recently, DNA sequence data representative of the genome as a whole have become available for several organisms, making the problem of estimating coding density amenable to sequence analytic methods. Estimates of coding density for a single genome vary widely, so that methods with characterized error bounds have become increasingly desirable. We present a method to estimate the protein coding density in a corpus of DNA sequence data, in which a ‘coding statistic’ is calculated for a large number of windows of the sequence under study, and the distribution of the statistic is decomposed into two normal distributions, assumed to be the distributions of the coding statistic in the coding and noncoding fractions of the sequence windows. The accuracy of the method is evaluated using known data and application is made to the yeast chromosome III sequence and to C.elegans cosmid sequences. It can also be applied to fragmentary data, for example a collection of short sequences determined in the course of STS mapping.
Resumo:
DnaSP is a software package for a comprehensive analysis of DNA polymorphism data. Version 5 implements a number of new features and analytical methods allowing extensive DNA polymorphism analyses on large datasets. Among other features, the newly implemented methods allow for: (i) analyses on multiple data files; (ii) haplotype phasing; (iii) analyses on insertion/deletion polymorphism data; (iv) visualizing sliding window results integrated with available genome annotations in the UCSC browser.
Resumo:
In this work, we propose a copula-based method to generate synthetic gene expression data that account for marginal and joint probability distributions features captured from real data. Our method allows us to implant significant genes in the synthetic dataset in a controlled manner, giving the possibility of testing new detection algorithms under more realistic environments.
Resumo:
Background: We use an approach based on Factor Analysis to analyze datasets generated for transcriptional profiling. The method groups samples into biologically relevant categories, and enables the identification of genes and pathways most significantly associated to each phenotypic group, while allowing for the participation of a given gene in more than one cluster. Genes assigned to each cluster are used for the detection of pathways predominantly activated in that cluster by finding statistically significant associated GO terms. We tested the approach with a published dataset of microarray experiments in yeast. Upon validation with the yeast dataset, we applied the technique to a prostate cancer dataset. Results: Two major pathways are shown to be activated in organ-confined, non-metastatic prostate cancer: those regulated by the androgen receptor and by receptor tyrosine kinases. A number of gene markers (HER3, IQGAP2 and POR1) highlighted by the software and related to the later pathway have been validated experimentally a posteriori on independent samples. Conclusion: Using a new microarray analysis tool followed by a posteriori experimental validation of the results, we have confirmed several putative markers of malignancy associated with peptide growth factor signalling in prostate cancer and revealed others, most notably ERRB3 (HER3). Our study suggest that, in primary prostate cancer, HER3, together or not with HER4, rather than in receptor complexes involving HER2, could play an important role in the biology of these tumors. These results provide new evidence for the role of receptor tyrosine kinases in the establishment and progression of prostate cancer.
Resumo:
Projecte de recerca elaborat a partir d’una estada al Department for Feed and Food Hygiene del National Veterinary Institute, Noruega, entre novembre i desembre del 2006. Els grans de cereal poden estar contaminats amb diferents espècies de Fusarium capaces de produir metabolits secundaris altament tòxics com trichotecenes, fumonisines o moniliformines. La correcta identificació d’aquestes espècies és de gran importància per l’assegurament del risc en l’àmbit de la salut humana i animal. La identificació de Fusarium en base a la seva morfologia requereix coneixements taxonòmics i temps; la majoria dels mètodes moleculars permeten la identificació d’una única espècie diana. Per contra, la tecnologia de microarray ofereix l’anàlisi paral•lel d’un alt nombre de DNA dianes. En aquest treball, s’ha desenvolupat un array per a la identificació de les principals espècies de Fusarium toxigèniques del Nord i Sud d’Europa. S’ha ampliat un array ja existent, per a la detecció de les espècies de Fusarium productores de trichothecene i moniliformina (predominants al Nord d’Europa), amb l’addició de 18 sondes de DNA que permeten identificar les espècies toxigèniques més abundants al Sud d’Europa, les qual produeixen majoritàriament fumonisines. Les sondes de captura han estat dissenyades en base al factor d’elongació translació- 1 alpha (TEF-1alpha). L’anàlisi de les mostres es realitza mitjançant una única PCR que permet amplificar part del TEF-1alpha seguida de la hibridació al xip de Fusarium. Els resultats es visualitzen mitjançant un mètode de detecció colorimètric. El xip de Fusarium desenvolupat pot esdevenir una eina útil i de gran interès per a l’anàlisi de cereals presents en la cadena alimentària.
Resumo:
BACKGROUND: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure. RESULTS: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae. CONCLUSION: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Characterization of human gene expression changes after olive oil ingestion: an exploratory approach
Resumo:
Olive oil consumption is protective against risk factors for cardiovascular and cancer diseases. A nutrigenomic approach was performed to assess whether changes in gene expression could occur in human peripheral blood mononuclear cells after oli ve oil ingestion at postprandial state. Six healthy male volunteers ingested, at fasting state, 50 ml of olive oil. Prior to intervention a 1-week washout period with a controlled diet and sunflower oil as the only source of fat was followed. During the 3 days before and on the intervention day, a very low-phenolic compound diet was followed. At baseline (0 h) and at post-ingestion (6 h), total RNA was isolated and gene expression (29,082 genes) was evaluated by microarray. From microarray data, nutrient-gene interactions were observed in genes related to metabolism, cellular processes, cancer, and atherosclerosis (e.g. USP48 by 2.16; OGT by 1.68-fold change) and associated processes such as inflammation (e.g. AKAP13 by 2.30; IL-10 by 1.66-fold change) and DNA damage (e.g. DCLRE1C by 1.47; POLK by 1.44- fold change). When results obtained by microarray were verified by qRT-PCR in nine genes, full concordance was achieved only in the case of up-regulated genes. Changes were observed at a real-life dose of olive oil, as it is daily consumed in some Mediterranean areas. Our results support the hypothesis that postprandial protective changes related to olive oil consumption could be mediated through gene expression changes.
Resumo:
Background: The trithorax group (trxG) and Polycomb group (PcG) proteins are responsible for the maintenance of stable transcriptional patterns of many developmental regulators. They bind to specific regions of DNA and direct the post-translational modifications of histones, playing a role in the dynamics of chromatin structure.Results: We have performed genome-wide expression studies of trx and ash2 mutants in Drosophila melanogaster. Using computational analysis of our microarray data, we have identified 25 clusters of genes potentially regulated by TRX. Most of these clusters consist of genes that encode structural proteins involved in cuticle formation. This organization appears to be a distinctive feature of the regulatory networks of TRX and other chromatin regulators, since we have observed the same arrangement in clusters after experiments performed with ASH2, as well as in experiments performed by others with NURF, dMyc, and ASH1. We have also found many of these clusters to be significantly conserved in D. simulans, D. yakuba, D. pseudoobscura and partially in Anopheles gambiae.Conclusion: The analysis of genes governed by chromatin regulators has led to the identification of clusters of functionally related genes conserved in other insect species, suggesting this chromosomal organization is biologically important. Moreover, our results indicate that TRX and other chromatin regulators may act globally on chromatin domains that contain transcriptionally co-regulated genes.
Resumo:
DnaSP is a software package for the analysis of DNA polymorphism data. Present version introduces several new modules and features which, among other options allow: (1) handling big data sets (~5 Mb per sequence); (2) conducting a large number of coalescent-based tests by Monte Carlo computer simulations; (3) extensive analyses of the genetic differentiation and gene flow among populations; (4) analysing the evolutionary pattern of preferred and unpreferred codons; (5) generating graphical outputs for an easy visualization of results. Availability: The software package, including complete documentation and examples, is freely available to academic users from: http://www.ub.es/dnasp
Resumo:
BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Resumo:
BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Resumo:
DnaSP, DNA Sequence Polymorphism, is a software package for the analysis of nucleotide polymorphism from aligned DNA sequence data. DnaSP can estimate several measures of DNA sequence variation within and between populations (in noncoding, synonymous or nonsynonymous sites, or in various sorts of codon positions), as well as linkage disequilibrium, recombination, gene flow and gene conversion parameters. DnaSP can also carry out several tests of neutrality: Hudson, Kreitman and Aguadé (1987), Tajima (1989), McDonald and Kreitman (1991), Fu and Li (1993), and Fu (1997) tests. Additionally, DnaSP can estimate the confidence intervals of some test-statistics by the coalescent. The results of the analyses are displayed on tabular and graphic form.
Resumo:
In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.
Resumo:
Emergent molecular measurement methods, such as DNA microarray, qRTPCR, andmany others, offer tremendous promise for the personalized treatment of cancer. Thesetechnologies measure the amount of specific proteins, RNA, DNA or other moleculartargets from tumor specimens with the goal of “fingerprinting” individual cancers. Tumorspecimens are heterogeneous; an individual specimen typically contains unknownamounts of multiple tissues types. Thus, the measured molecular concentrations resultfrom an unknown mixture of tissue types, and must be normalized to account for thecomposition of the mixture.For example, a breast tumor biopsy may contain normal, dysplastic and cancerousepithelial cells, as well as stromal components (fatty and connective tissue) and bloodand lymphatic vessels. Our diagnostic interest focuses solely on the dysplastic andcancerous epithelial cells. The remaining tissue components serve to “contaminate”the signal of interest. The proportion of each of the tissue components changes asa function of patient characteristics (e.g., age), and varies spatially across the tumorregion. Because each of the tissue components produces a different molecular signature,and the amount of each tissue type is specimen dependent, we must estimate the tissuecomposition of the specimen, and adjust the molecular signal for this composition.Using the idea of a chemical mass balance, we consider the total measured concentrationsto be a weighted sum of the individual tissue signatures, where weightsare determined by the relative amounts of the different tissue types. We develop acompositional source apportionment model to estimate the relative amounts of tissuecomponents in a tumor specimen. We then use these estimates to infer the tissuespecificconcentrations of key molecular targets for sub-typing individual tumors. Weanticipate these specific measurements will greatly improve our ability to discriminatebetween different classes of tumors, and allow more precise matching of each patient tothe appropriate treatment
Resumo:
When dealing with nonlinear blind processing algorithms (deconvolution or post-nonlinear source separation), complex mathematical estimations must be done giving as a result very slow algorithms. This is the case, for example, in speech processing, spike signals deconvolution or microarray data analysis. In this paper, we propose a simple method to reduce computational time for the inversion of Wiener systems or the separation of post-nonlinear mixtures, by using a linear approximation in a minimum mutual information algorithm. Simulation results demonstrate that linear spline interpolation is fast and accurate, obtaining very good results (similar to those obtained without approximation) while computational time is dramatically decreased. On the other hand, cubic spline interpolation also obtains similar good results, but due to its intrinsic complexity, the global algorithm is much more slow and hence not useful for our purpose.