954 resultados para Genomic data
Resumo:
Genome-wide association studies have failed to establish common variant risk for the majority of common human diseases. The underlying reasons for this failure are explained by recent studies of resequencing and comparison of over 1200 human genomes and 10 000 exomes, together with the delineation of DNA methylation patterns (epigenome) and full characterization of coding and noncoding RNAs (transcriptome) being transcribed. These studies have provided the most comprehensive catalogues of functional elements and genetic variants that are now available for global integrative analysis and experimental validation in prospective cohort studies. With these datasets, researchers will have unparalleled opportunities for the alignment, mining, and testing of hypotheses for the roles of specific genetic variants, including copy number variations, single nucleotide polymorphisms, and indels as the cause of specific phenotypes and diseases. Through the use of next-generation sequencing technologies for genotyping and standardized ontological annotation to systematically analyze the effects of genomic variation on humans and model organism phenotypes, we will be able to find candidate genes and new clues for disease’s etiology and treatment. This article describes essential concepts in genetics and genomic technologies as well as the emerging computational framework to comprehensively search websites and platforms available for the analysis and interpretation of genomic data.
Resumo:
In the post genomic era with the massive production of biological data the understanding of factors affecting protein stability is one of the most important and challenging tasks for highlighting the role of mutations in relation to human maladies. The problem is at the basis of what is referred to as molecular medicine with the underlying idea that pathologies can be detailed at a molecular level. To this purpose scientific efforts focus on characterising mutations that hamper protein functions and by these affect biological processes at the basis of cell physiology. New techniques have been developed with the aim of detailing single nucleotide polymorphisms (SNPs) at large in all the human chromosomes and by this information in specific databases are exponentially increasing. Eventually mutations that can be found at the DNA level, when occurring in transcribed regions may then lead to mutated proteins and this can be a serious medical problem, largely affecting the phenotype. Bioinformatics tools are urgently needed to cope with the flood of genomic data stored in database and in order to analyse the role of SNPs at the protein level. In principle several experimental and theoretical observations are suggesting that protein stability in the solvent-protein space is responsible of the correct protein functioning. Then mutations that are found disease related during DNA analysis are often assumed to perturb protein stability as well. However so far no extensive analysis at the proteome level has investigated whether this is the case. Also computationally methods have been developed to infer whether a mutation is disease related and independently whether it affects protein stability. Therefore whether the perturbation of protein stability is related to what it is routinely referred to as a disease is still a big question mark. In this work we have tried for the first time to explore the relation among mutations at the protein level and their relevance to diseases with a large-scale computational study of the data from different databases. To this aim in the first part of the thesis for each mutation type we have derived two probabilistic indices (for 141 out of 150 possible SNPs): the perturbing index (Pp), which indicates the probability that a given mutation effects protein stability considering all the “in vitro” thermodynamic data available and the disease index (Pd), which indicates the probability of a mutation to be disease related, given all the mutations that have been clinically associated so far. We find with a robust statistics that the two indexes correlate with the exception of all the mutations that are somatic cancer related. By this each mutation of the 150 can be coded by two values that allow a direct comparison with data base information. Furthermore we also implement computational methods that starting from the protein structure is suited to predict the effect of a mutation on protein stability and find that overpasses a set of other predictors performing the same task. The predictor is based on support vector machines and takes as input protein tertiary structures. We show that the predicted data well correlate with the data from the databases. All our efforts therefore add to the SNP annotation process and more importantly found the relationship among protein stability perturbation and the human variome leading to the diseasome.
Resumo:
Background: ;Rates of molecular evolution vary widely among species. While significant deviations from molecular clock have been found in many taxa, effects of life histories on molecular evolution are not fully understood. In plants, annual/perennial life history traits have long been suspected to influence the evolutionary rates at the molecular level. To date, however, the number of genes investigated on this subject is limited and the conclusions are mixed. To evaluate the possible heterogeneity in evolutionary rates between annual and perennial plants at the genomic level, we investigated 85 nuclear housekeeping genes, 10 non-housekeeping families, and 34 chloroplast;genes using the genomic data from model plants including Arabidopsis thaliana and Medicago truncatula for annuals and grape (Vitis vinifera) and popular (Populus trichocarpa) for perennials.;Results: ;According to the cross-comparisons among the four species, 74-82% of the nuclear genes and 71-97% of the chloroplast genes suggested higher rates of molecular evolution in the two annuals than those in the two perennials. The significant heterogeneity in evolutionary rate between annuals and perennials was consistently found both in nonsynonymous sites and synonymous sites. While a linear correlation of evolutionary rates in orthologous genes between species was observed in nonsynonymous sites, the correlation was weak or invisible in synonymous sites. This tendency was clearer in nuclear genes than in chloroplast genes, in which the overall;evolutionary rate was small. The slope of the regression line was consistently lower than unity, further confirming the higher evolutionary rate in annuals at the genomic level.;Conclusions: ;The higher evolutionary rate in annuals than in perennials appears to be a universal phenomenon both in nuclear and chloroplast genomes in the four dicot model plants we investigated. Therefore, such heterogeneity in evolutionary rate should result from factors that have genome-wide influence, most likely those associated with annual/perennial life history. Although we acknowledge current limitations of this kind of study, mainly due to a small sample size available and a distant taxonomic relationship of the model organisms, our results indicate that the genome-wide survey is a promising approach toward further understanding of the;mechanism determining the molecular evolutionary rate at the genomic level.
Resumo:
Simulation-based assessment is a popular and frequently necessary approach to evaluation of statistical procedures. Sometimes overlooked is the ability to take advantage of underlying mathematical relations and we focus on this aspect. We show how to take advantage of large-sample theory when conducting a simulation using the analysis of genomic data as a motivating example. The approach uses convergence results to provide an approximation to smaller-sample results, results that are available only by simulation. We consider evaluating and comparing a variety of ranking-based methods for identifying the most highly associated SNPs in a genome-wide association study, derive integral equation representations of the pre-posterior distribution of percentiles produced by three ranking methods, and provide examples comparing performance. These results are of interest in their own right and set the framework for a more extensive set of comparisons.
Resumo:
Gap junctions are clustered channels between contacting cells through which direct intercellular communication via diffusion of ions and metabolites can occur. Two hemichannels, each built up of six connexin protein subunits in the plasma membrane of adjacent cells, can dock to each other to form conduits between cells. We have recently screened mouse and human genomic data bases and have found 19 connexin (Cx) genes in the mouse genome and 20 connexin genes in the human genome. One mouse connexin gene and two human connexin genes do not appear to have orthologs in the other genome. With three exceptions, the characterized connexin genes comprise two exons whereby the complete reading frame is located on the second exon. Targeted ablation of eleven mouse connexin genes revealed basic insights into the functional diversity of the connexin gene family. In addition, the phenotypes of human genetic disorders caused by mutated connexin genes further complement our understanding of connexin functions in the human organism. In this review we compare currently identified connexin genes in both the mouse and human genome and discuss the functions of gap junctions deduced from targeted mouse mutants and human genetic disorders.
Resumo:
The onset of lactation in dairy cows represents a major metabolic challenge that involves large adaptations in glucose, fatty acid, and mineral metabolism to support lactation and to avoid metabolic dysfunction. The complex system of adaptation can differ considerably between cows, and may have a genetic base. In the present review, the variation in adaptive reactions in dairy cows is discussed. In these studies, the liver being a key metabolic regulator for understanding the variation in adaptive performance of the dairy cow was the main focus of research. Liver function was evaluated through gene expression measurements; to explain the associated phenotypic variability and to identify descriptors for metabolic robustness in dairy cows. Hence, the identified genes involved act as a connecting link between the genotype encoded on the DNA and the phenotypic expression of the target factors at a protein level. The integration of phenotypic data, including gene expression profiles, and genomic data will facilitate a better characterization of the complex interplay between these levels, and will improve the genetic understanding necessary to unravel a certain trait or multi-trait such as metabolic robustness in dairy cows.
Resumo:
Salmonella enterica subspecies I serovars are common bacterial pathogens causing diseases ranging from enterocolitis to systemic infections. Some serovars are adapted to specific hosts, whereas others have a broad host range. The molecular mechanisms defining the virulence characteristics and the host range of a given S. enterica serovar are unknown. Streptomycin pretreated mice provide a surrogate host model for studying molecular aspects of the intestinal inflammation (colitis) caused by serovar Typhimurium (S. Hapfelmeier and W. D. Hardt, Trends Microbiol. 13:497-503, 2005). Here, we studied whether this animal model is also useful for studying other S. enterica subspecies I serovars. All three tested strains of the broad-host-range serovar Enteritidis (125109, 5496/98, and 832/99) caused pronounced colitis and systemic infection in streptomycin pretreated mice. Different levels of virulence were observed among three tested strains of the host-adapted serovar Dublin (SARB13, SD2229, and SD3246). Several strains of host restricted serovars were also studied. Two serovar Pullorum strains (X3543 and 449/87) caused intermediate levels of colitis. No intestinal inflammation was observed upon infection with three different serovar Paratyphi A strains (SARB42, 2804/96, and 5314/98) and one serovar Gallinarum strain (X3796). A second serovar Gallinarum strain (287/91) was highly virulent and caused severe colitis. This strain awaits future analysis. In conclusion, the streptomycin pretreated mouse model can provide an additional tool to study virulence factors (i.e., those involved in enteropathogenesis) of various S. enterica subspecies I serovars. Five of these strains (125109, 2229, 287/91, 449/87, and SARB42) are subject of Salmonella genome sequencing projects. The streptomycin pretreated mouse model may be useful for testing hypotheses derived from this genomic data.
Resumo:
Phylogenetic reconstruction of the evolutionary history of closely related organisms may be difficult because of the presence of unsorted lineages and of a relatively high proportion of heterozygous sites that are usually not handled well by phylogenetic programs. Genomic data may provide enough fixed polymorphisms to resolve phylogenetic trees, but the diploid nature of sequence data remains analytically challenging. Here, we performed a phylogenomic reconstruction of the evolutionary history of the common vole (Microtus arvalis) with a focus on the influence of heterozygosity on the estimation of intraspecific divergence times. We used genome-wide sequence information from 15 voles distributed across the European range. We provide a novel approach to integrate heterozygous information in existing phylogenetic programs by repeated random haplotype sampling from sequences with multiple unphased heterozygous sites. We evaluated the impact of the use of full, partial, or no heterozygous information for tree reconstructions on divergence time estimates. All results consistently showed four deep and strongly supported evolutionary lineages in the vole data. These lineages undergoing divergence processes split only at the end or after the last glacial maximum based on calibration with radiocarbon-dated paleontological material. However, the incorporation of information from heterozygous sites had a significant impact on absolute and relative branch length estimations. Ignoring heterozygous information led to an overestimation of divergence times between the evolutionary lineages of M. arvalis. We conclude that the exclusion of heterozygous sites from evolutionary analyses may cause biased and misleading divergence time estimates in closely related taxa.
Resumo:
The interaction between sibling species that share a zone of contact is a multifaceted relationship affected by climate change [ 1, 2 ]. Between sibling species, interactions may occur at whole-organism (direct or indirect competition) or genomic (hybridization and introgression) levels [ 3–5 ]. Tracking hybrid zone movements can provide insights about influences of environmental change on species interactions [ 1 ]. Here, we explore the extent and mechanism of movement of the contact zone between black-capped chickadees (Poecile atricapillus) and Carolina chickadees (Poecile carolinensis) at whole-organism and genomic levels. We find strong evidence that winter temperatures limit the northern extent of P. carolinensis by demonstrating a current-day association between the range limit of this species and minimum winter temperatures. We further show that this temperature limitation has been consistent over time because we are able to accurately hindcast the previous northern range limit under earlier climate conditions. Using genomic data, we confirm northward movement of this contact zone over the past decade and highlight temporally consistent differential—but limited—geographic introgression of alleles. Our results provide an informative example of the influence of climate change on a contact zone between sibling species.
Resumo:
Based on bacterial genomic data, we developed a one-step multiplex PCR assay to identify Salmonella and simultaneously differentiate the two invasive avian-adapted S. enterica serovar Gallinarum biotypes Gallinarum and Pullorum, and the most frequent, specific, and asymptomatic colonizers of chickens, serovars Enteritidis, Heidelberg, and Kentucky.
Resumo:
Alveolar echinococcosis, caused by the tapeworm Echinococcus multilocularis, is one of the most severe parasitic diseases in humans and represents one of the 17 neglected diseases prioritised by the World Health Organisation (WHO) in 2012. Considering the major medical and veterinary importance of this parasite, the phylogeny of the genus Echinococcus is of considerable importance; yet, despite numerous efforts with both mitochondrial and nuclear data, it has remained unresolved. The genus is clearly complex, and this is one of the reasons for the incomplete understanding of its taxonomy. Although taxonomic studies have recognised E. multilocularis as a separate entity from the Echinococcus granulosus complex and other members of the genus, it would be premature to draw firm conclusions about the taxonomy of the genus before the phylogeny of the whole genus is fully resolved. The recent sequencing of E. multilocularis and E. granulosus genomes opens new possibilities for performing in-depth phylogenetic analyses. In addition, whole genome data provide the possibility of inferring phylogenies based on a large number of functional genes, i.e. genes that trace the evolutionary history of adaptation in E. multilocularis and other members of the genus. Moreover, genomic data open new avenues for studying the molecular epidemiology of E. multilocularis: genotyping studies with larger panels of genetic markers allow the genetic diversity and spatial dynamics of parasites to be evaluated with greater precision. There is an urgent need for international coordination of genotyping of E. multilocularis isolates from animals and human patients. This could be fundamental for a better understanding of the transmission of alveolar echinococcosis and for designing efficient healthcare strategies.
Resumo:
A menudo los científicos secuencian el ADN de un gran número de personas con el objetivo de determinar qué genes se asocian con determinadas enfermedades. Esto permite meóon del genoma humano. El precio de un perfil genómico completo se ha posicionado por debajo de los 200 dólares y este servicio lo ofrecen muchas compañías, la mayor parte localizadas en EEUU. Como consecuencia, en unos pocos a~nos la mayoría de las personas procedentes de los países desarrollados tendrán los medios para tener su ADN secuenciado. Alrededor del 0.5% del ADN de cada persona (que corresponde a varios millones de nucleótidos) es diferente del genoma de referencia debido a variaciones genéticas. Así que el genoma contiene información altamente sensible y personal y representa la identidad biológica óon sobre el entorno o estilo de vida de uno (a menudo facilmente obtenible de las redes sociales), sería posible inferir el fenotipo del individuo. Multiples GWAS (Genome Wide Association Studies) realizados en los últimos a~nos muestran que la susceptibilidad de un paciente a tener una enfermedad en particular, como el Alzheimer, cáncer o esquizofrenia, puede ser predicha parcialmente a partir de conjuntos de sus SNP (Single Nucleotide Polimorphism). Estos resultados pueden ser usados para medicina genómica personalizada (facilitando los tratamientos preventivos y diagnósticos), tests de paternidad genéticos y tests de compatibilidad genética para averiguar a qué enfermedades pueden ser susceptibles los descendientes. Estos son algunos de los beneficios que podemos obtener usando la información genética, pero si esta información no es protegida puede ser usada para investigaciones criminales y por compañías aseguradoras. Este hecho podría llevar a discriminaci ón genética. Por lo que podemos concluir que la privacidad genómica es fundamental por el hecho de que contiene información sobre nuestra herencia étnica, nuestra predisposición a múltiples condiciones físicas y mentales, al igual que otras características fenotópicas, ancestros, hermanos y progenitores, pues los genomas de cualquier par de individuos relacionados son idénticos al 99.9%, contrastando con el 99.5% de dos personas aleatorias. La legislación actual no proporciona suficiente información técnica sobre como almacenar y procesar de forma segura los genomas digitalizados, por lo tanto, es necesaria una legislación mas restrictiva ---ABSTRACT---Scientists typically sequence DNA from large numbers of people in order to determine genes associated with particular diseases. This allows to improve the modern healthcare and to provide a better understanding of the human genome. The price of a complete genome profile has plummeted below $200 and this service is ofered by a number of companies, most of them located in the USA. Therefore, in a few years, most individuals in developed countries will have the means of having their genomes sequenced. Around 0.5% of each person's DNA (which corresponds to several millions of nucleotides) is diferent from the reference genome, owing to genetic variations. Thus, the genome contains highly personal and sensitive information, and it represents our ultimate biological identity. By combining genomic data with information about one's environment or lifestyle (often easily obtainable from social networks), could make it possible to infer the individual's phenotype. Multiple Genome Wide Association Studies (GWAS) performed in recent years have shown that a patient's susceptibility to particular diseases, such as Alzheimer's, cancer, or schizophrenia, can be partially predicted from sets of his SNPs. This results can be used for personalized genomic medicine (facilitating preventive treatment and diagnosis), genetic paternity tests, ancestry and genealogical testing, and genetic compatibility tests in order to have knowledge about which deseases would the descendant be susceptible to. These are some of the betefts we can obtain using genoma information, but if this information is not protected it can be used for criminal investigations and insurance purposes. Such issues could lead to genetic discrimination. So we can conclude that genomic privacy is fundamental due to the fact that genome contains information about our ethnic heritage, predisposition to numerous physical and mental health conditions, as well as other phenotypic traits, and ancestors, siblings, and progeny, since genomes of any two closely related individuals are 99.9% identical, in contrast with 99.5%, for two random people. The current legislation does not ofer suficient technical information about safe and secure ways of storing and processing digitized genomes, therefore, there is need for more restrictive legislation.
Resumo:
En los últimos años ha habido un gran aumento de fuentes de datos biomédicos. La aparición de nuevas técnicas de extracción de datos genómicos y generación de bases de datos que contienen esta información ha creado la necesidad de guardarla para poder acceder a ella y trabajar con los datos que esta contiene. La información contenida en las investigaciones del campo biomédico se guarda en bases de datos. Esto se debe a que las bases de datos permiten almacenar y manejar datos de una manera simple y rápida. Dentro de las bases de datos existen una gran variedad de formatos, como pueden ser bases de datos en Excel, CSV o RDF entre otros. Actualmente, estas investigaciones se basan en el análisis de datos, para a partir de ellos, buscar correlaciones que permitan inferir, por ejemplo, tratamientos nuevos o terapias más efectivas para una determinada enfermedad o dolencia. El volumen de datos que se maneja en ellas es muy grande y dispar, lo que hace que sea necesario el desarrollo de métodos automáticos de integración y homogeneización de los datos heterogéneos. El proyecto europeo p-medicine (FP7-ICT-2009-270089) tiene como objetivo asistir a los investigadores médicos, en este caso de investigaciones relacionadas con el cáncer, proveyéndoles con nuevas herramientas para el manejo de datos y generación de nuevo conocimiento a partir del análisis de los datos gestionados. La ingestión de datos en la plataforma de p-medicine, y el procesamiento de los mismos con los métodos proporcionados, buscan generar nuevos modelos para la toma de decisiones clínicas. Dentro de este proyecto existen diversas herramientas para integración de datos heterogéneos, diseño y gestión de ensayos clínicos, simulación y visualización de tumores y análisis estadístico de datos. Precisamente en el ámbito de la integración de datos heterogéneos surge la necesidad de añadir información externa al sistema proveniente de bases de datos públicas, así como relacionarla con la ya existente mediante técnicas de integración semántica. Para resolver esta necesidad se ha creado una herramienta, llamada Term Searcher, que permite hacer este proceso de una manera semiautomática. En el trabajo aquí expuesto se describe el desarrollo y los algoritmos creados para su correcto funcionamiento. Esta herramienta ofrece nuevas funcionalidades que no existían dentro del proyecto para la adición de nuevos datos provenientes de fuentes públicas y su integración semántica con datos privados.---ABSTRACT---Over the last few years, there has been a huge growth of biomedical data sources. The emergence of new techniques of genomic data generation and data base generation that contain this information, has created the need of storing it in order to access and work with its data. The information employed in the biomedical research field is stored in databases. This is due to the capability of databases to allow storing and managing data in a quick and simple way. Within databases there is a variety of formats, such as Excel, CSV or RDF. Currently, these biomedical investigations are based on data analysis, which lead to the discovery of correlations that allow inferring, for example, new treatments or more effective therapies for a specific disease or ailment. The volume of data handled in them is very large and dissimilar, which leads to the need of developing new methods for automatically integrating and homogenizing the heterogeneous data. The p-medicine (FP7-ICT-2009-270089) European project aims to assist medical researchers, in this case related to cancer research, providing them with new tools for managing and creating new knowledge from the analysis of the managed data. The ingestion of data into the platform and its subsequent processing with the provided tools aims to enable the generation of new models to assist in clinical decision support processes. Inside this project, there exist different tools related to areas such as the integration of heterogeneous data, the design and management of clinical trials, simulation and visualization of tumors and statistical data analysis. Particularly in the field of heterogeneous data integration, there is a need to add external information from public databases, and relate it to the existing ones through semantic integration methods. To solve this need a tool has been created: the term Searcher. This tool aims to make this process in a semiautomatic way. This work describes the development of this tool and the algorithms employed in its operation. This new tool provides new functionalities that did not exist inside the p-medicine project for adding new data from public databases and semantically integrate them with private data.
Resumo:
A evolução do veneno, uma das misturas mais complexas da natureza, tem sustentado o sucesso da diversificação de inúmeras linhagens de animais. Serpentes deslizantes ou medusas flutuantes utilizam o veneno, um coquetel de peptídeos farmacologicamente ativos, sais e moléculas orgânicas. Esses animais surpreendentes têm provocado grande fascínio ao longo da história humana. Nesta dissertação propomos um estudo da evolução dos venenos no filo Cnidaria, englobando dados proteômicos e genômicos. Este projeto teve como objetivos: (1) caracterizar e elucidar a evolução da composição do veneno em Cnidaria por meio da comparação de listas de proteínas; (2) testar a hipótese de que a variação na família de toxinas específica de cnidários tem sido o resultado de um regime de seleção positiva; e (3) determinar a extensão em que a duplicação de genes pode ser considerada como a principal razão para a diversificação de toxinas em Cnidaria. O capítulo \"Comparative proteomics reveals common components of a powerful arsenal in the earliest animal venomous lineage, the cnidarians\" propõe o estudo comparado mais completo sobre a composição do veneno de cnidários e uma hipótese sobre a montagem evolutiva do complexo arsenal bioquímico de cnidários e do veneno ancestral desse grupo basal. Vinte e oito famílias de proteínas foram identificadas. Destas, 13 famílias foram registradas pela primeira vez no proteoma de Cnidaria. Pelo menos 15 famílias de toxinas foram recrutadas no proteoma de veneno de cnidários antes da diversificação dos grupos Anthozoa e Medusozoa. Nos capítulos \"Evidence of episodic positive selection in the evolution of jellyfish toxins of the cnidarian venom\" e \"Gene duplications are extensive and contribute significantly to the toxic proteome of nematocysts isolated from Acropora digitifera (Cnidaria: Anthozoa: Scleractinia)\", nossas análises demonstram que as famílias de toxinas nos cnidários se diversificam amplamente mediante a duplicação de genes. Além disso, em contraste com as famílias de toxinas do veneno na maioria das linhagens animais; nós identificamos um padrão diferente na família de toxinas específica de cnidários, em que há uma seleção purificadora por longos períodos seguindo longos tempos de diversificação ou vice-versa