997 resultados para Databases, Genetic
Resumo:
The DNA microarray technology has arguably caught the attention of the worldwide life science community and is now systematically supporting major discoveries in many fields of study. The majority of the initial technical challenges of conducting experiments are being resolved, only to be replaced with new informatics hurdles, including statistical analysis, data visualization, interpretation, and storage. Two systems of databases, one containing expression data and one containing annotation data are quickly becoming essential knowledge repositories of the research community. This present paper surveys several databases, which are considered "pillars" of research and important nodes in the network. This paper focuses on a generalized workflow scheme typical for microarray experiments using two examples related to cancer research. The workflow is used to reference appropriate databases and tools for each step in the process of array experimentation. Additionally, benefits and drawbacks of current array databases are addressed, and suggestions are made for their improvement.
Resumo:
CONTEXT: Several genetic risk scores to identify asymptomatic subjects at high risk of developing type 2 diabetes mellitus (T2DM) have been proposed, but it is unclear whether they add extra information to risk scores based on clinical and biological data. OBJECTIVE: The objective of the study was to assess the extra clinical value of genetic risk scores in predicting the occurrence of T2DM. DESIGN: This was a prospective study, with a mean follow-up time of 5 yr. SETTING AND SUBJECTS: The study included 2824 nondiabetic participants (1548 women, 52 ± 10 yr). MAIN OUTCOME MEASURE: Six genetic risk scores for T2DM were tested. Four were derived from the literature and two were created combining all (n = 24) or shared (n = 9) single-nucleotide polymorphisms of the previous scores. A previously validated clinic + biological risk score for T2DM was used as reference. RESULTS: Two hundred seven participants (7.3%) developed T2DM during follow-up. On bivariate analysis, no differences were found for all but one genetic score between nondiabetic and diabetic participants. After adjusting for the validated clinic + biological risk score, none of the genetic scores improved discrimination, as assessed by changes in the area under the receiver-operating characteristic curve (range -0.4 to -0.1%), sensitivity (-2.9 to -1.0%), specificity (0.0-0.1%), and positive (-6.6 to +0.7%) and negative (-0.2 to 0.0%) predictive values. Similarly, no improvement in T2DM risk prediction was found: net reclassification index ranging from -5.3 to -1.6% and nonsignificant (P ≥ 0.49) integrated discrimination improvement. CONCLUSIONS: In this study, adding genetic information to a previously validated clinic + biological score does not seem to improve the prediction of T2DM.
Resumo:
Expert curation and complete collection of mutations in genes that affect human health is essential for proper genetic healthcare and research. Expert curation is given by the curators of gene-specific mutation databases or locus-specific databases (LSDBs). While there are over 700 such databases, they vary in their content, completeness, time available for curation, and the expertise of the curator. Curation and LSDBs have been discussed, written about, and protocols have been provided for over 10 years, but there have been no formal recommendations for the ideal form of these entities. This work initiates a discussion on this topic to assist future efforts in human genetics. Further discussion is welcome.
New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk.
Resumo:
Levels of circulating glucose are tightly regulated. To identify new loci influencing glycemic traits, we performed meta-analyses of 21 genome-wide association studies informative for fasting glucose, fasting insulin and indices of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) in up to 46,186 nondiabetic participants. Follow-up of 25 loci in up to 76,558 additional subjects identified 16 loci associated with fasting glucose and HOMA-B and two loci associated with fasting insulin and HOMA-IR. These include nine loci newly associated with fasting glucose (in or near ADCY5, MADD, ADRA2A, CRY2, FADS1, GLIS3, SLC2A2, PROX1 and C2CD4B) and one influencing fasting insulin and HOMA-IR (near IGF1). We also demonstrated association of ADCY5, PROX1, GCK, GCKR and DGKB-TMEM195 with type 2 diabetes. Within these loci, likely biological candidate genes influence signal transduction, cell proliferation, development, glucose-sensing and circadian regulation. Our results demonstrate that genetic studies of glycemic traits can identify type 2 diabetes risk loci, as well as loci containing gene variants that are associated with a modest elevation in glucose levels but are not associated with overt diabetes.
Resumo:
BACKGROUND: The criteria for choosing relevant cell lines among a vast panel of available intestinal-derived lines exhibiting a wide range of functional properties are still ill-defined. The objective of this study was, therefore, to establish objective criteria for choosing relevant cell lines to assess their appropriateness as tumor models as well as for drug absorption studies. RESULTS: We made use of publicly available expression signatures and cell based functional assays to delineate differences between various intestinal colon carcinoma cell lines and normal intestinal epithelium. We have compared a panel of intestinal cell lines with patient-derived normal and tumor epithelium and classified them according to traits relating to oncogenic pathway activity, epithelial-mesenchymal transition (EMT) and stemness, migratory properties, proliferative activity, transporter expression profiles and chemosensitivity. For example, SW480 represent an EMT-high, migratory phenotype and scored highest in terms of signatures associated to worse overall survival and higher risk of recurrence based on patient derived databases. On the other hand, differentiated HT29 and T84 cells showed gene expression patterns closest to tumor bulk derived cells. Regarding drug absorption, we confirmed that differentiated Caco-2 cells are the model of choice for active uptake studies in the small intestine. Regarding chemosensitivity we were unable to confirm a recently proposed association of chemo-resistance with EMT traits. However, a novel signature was identified through mining of NCI60 GI50 values that allowed to rank the panel of intestinal cell lines according to their drug responsiveness to commonly used chemotherapeutics. CONCLUSIONS: This study presents a straightforward strategy to exploit publicly available gene expression data to guide the choice of cell-based models. While this approach does not overcome the major limitations of such models, introducing a rank order of selected features may allow selecting model cell lines that are more adapted and pertinent to the addressed biological question.
Resumo:
We previously introduced two new protein databases (trEST and trGEN) of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Here, we present the updates made on these two databases plus a new database (trome), which uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This new database is of higher quality and since it contains the information in a much denser format it is of much smaller size. These new databases are in a Swiss-Prot-like format and are updated on a weekly basis (trEST and trGEN) or every 3 months (trome). They can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases.
Resumo:
The advent of molecular markers has created opportunities for a better understanding of quantitative inheritance and for developing novel strategies for genetic improvement of agricultural species, using information on quantitative trait loci (QTL). A QTL analysis relies on accurate genetic marker maps. At present, most statistical methods used for map construction ignore the fact that molecular data may be read with error. Often, however, there is ambiguity about some marker genotypes. A Bayesian MCMC approach for inferences about a genetic marker map when random miscoding of genotypes occurs is presented, and simulated and real data sets are analyzed. The results suggest that unless there is strong reason to believe that genotypes are ascertained without error, the proposed approach provides more reliable inference on the genetic map.
Resumo:
OBJECTIVE: To determine the prevalence of fibroblast growth factor receptor 1 (FGFR1) mutations and their predicted functional consequences in patients with idiopathic hypogonadotropic hypogonadism (IHH). DESIGN: Cross-sectional study. SETTING: Multicentric. PATIENT(S): Fifty unrelated patients with IHH (21 with Kallmann syndrome and 29 with normosmic IHH). INTERVENTION(S): None. MAIN OUTCOME MEASURE(S): Patients were screened for mutations in FGFR1. The functional consequences of mutations were predicted by in silico structural and conservation analysis. RESULT(S): Heterozygous FGFR1 mutations were identified in six (12%) kindreds. These consisted of frameshift mutations (p.Pro33-Alafs*17 and p.Tyr654*) and missense mutations in the signal peptide (p.Trp4Cys), in the D1 extracellular domain (p.Ser96Cys) and in the cytoplasmic tyrosine kinase domain (p.Met719Val). A missense mutation was identified in the alternatively spliced exon 8A (p.Ala353Thr) that exclusively affects the D3 extracellular domain of FGFR1 isoform IIIb. Structure-based and sequence-based prediction methods and the absence of these variants in 200 normal controls were all consistent with a critical role for the mutations in the activity of the receptor. Oligogenic inheritance (FGFR1/CHD7/PROKR2) was found in one patient. CONCLUSION(S): Two FGFR1 isoforms, IIIb and IIIc, result from alternative splicing of exons 8A and 8B, respectively. Loss-of-function of isoform IIIc is a cause of IHH, whereas isoform IIIb is thought to be redundant. Ours is the first report of normosmic IHH associated with a mutation in the alternatively spliced exon 8A and suggests that this disorder can be caused by defects in either of the two alternatively spliced FGFR1 isoforms.
Resumo:
Among the largest resources for biological sequence data is the large amount of expressed sequence tags (ESTs) available in public and proprietary databases. ESTs provide information on transcripts but for technical reasons they often contain sequencing errors. Therefore, when analyzing EST sequences computationally, such errors must be taken into account. Earlier attempts to model error prone coding regions have shown good performance in detecting and predicting these while correcting sequencing errors using codon usage frequencies. In the research presented here, we improve the detection of translation start and stop sites by integrating a more complex mRNA model with codon usage bias based error correction into one hidden Markov model (HMM), thus generalizing this error correction approach to more complex HMMs. We show that our method maintains the performance in detecting coding sequences.
Resumo:
NR2E3, also called photoreceptor-specific nuclear receptor (PNR), is a transcription factor of the nuclear hormone receptor superfamily whose expression is uniquely restricted to photoreceptors. There, its physiological activity is essential for proper rod and cone photoreceptor development and maintenance. Thirty-two different mutations in NR2E3 have been identified in either homozygous or compound heterozygous state in the recessively inherited enhanced S-cone sensitivity syndrome (ESCS), Goldmann-Favre syndrome (GFS), and clumped pigmentary retinal degeneration (CPRD). The clinical phenotype common to all these patients is night blindness, rudimental or absent rod function, and hyperfunction of the "blue" S-cones. A single p.G56R mutation is inherited in a dominant manner and causes retinitis pigmentosa (RP). We have established a new locus-specific database for NR2E3 (www.LOVD.nl/eye), containing all reported mutations, polymorphisms, and unclassified sequence variants, including novel ones. A high proportion of mutations are located in the evolutionarily-conserved DNA-binding domains (DBDs) and ligand-binding domains (LBDs) of NR2E3. Based on homology modeling of these NR2E3 domains, we propose a structural localization of mutated residues. The high variability of clinical phenotypes observed in patients affected by NR2E3-linked retinal degenerations may be caused by different disease mechanisms, including absence of DNA-binding, altered interactions with transcriptional coregulators, and differential activity of modifier genes.
Resumo:
Microarray transcript profiling and RNA interference are two new technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific hybridization between complementary nucleic acid strands, inciting us to create a collection of gene-specific sequence tags (GSTs) representing at least 21,500 Arabidopsis genes and which are compatible with both approaches. The GSTs were carefully selected to ensure that each of them shared no significant similarity with any other region in the Arabidopsis genome. They were synthesized by PCR amplification from genomic DNA. Spotted microarrays fabricated from the GSTs show good dynamic range, specificity, and sensitivity in transcript profiling experiments. The GSTs have also been transferred to bacterial plasmid vectors via recombinational cloning protocols. These cloned GSTs constitute the ideal starting point for a variety of functional approaches, including reverse genetics. We have subcloned GSTs on a large scale into vectors designed for gene silencing in plant cells. We show that in planta expression of GST hairpin RNA results in the expected phenotypes in silenced Arabidopsis lines. These versatile GST resources provide novel and powerful tools for functional genomics.
Resumo:
Type 2 diabetes mellitus (T2DM) is a major disease affecting nearly 280 million people worldwide. Whilst the pathophysiological mechanisms leading to disease are poorly understood, dysfunction of the insulin-producing pancreatic beta-cells is key event for disease development. Monitoring the gene expression profiles of pancreatic beta-cells under several genetic or chemical perturbations has shed light on genes and pathways involved in T2DM. The EuroDia database has been established to build a unique collection of gene expression measurements performed on beta-cells of three organisms, namely human, mouse and rat. The Gene Expression Data Analysis Interface (GEDAI) has been developed to support this database. The quality of each dataset is assessed by a series of quality control procedures to detect putative hybridization outliers. The system integrates a web interface to several standard analysis functions from R/Bioconductor to identify differentially expressed genes and pathways. It also allows the combination of multiple experiments performed on different array platforms of the same technology. The design of this system enables each user to rapidly design a custom analysis pipeline and thus produce their own list of genes and pathways. Raw and normalized data can be downloaded for each experiment. The flexible engine of this database (GEDAI) is currently used to handle gene expression data from several laboratory-run projects dealing with different organisms and platforms. Database URL: http://eurodia.vital-it.ch.
Resumo:
The first scientific meeting of the newly established European SYSGENET network took place at the Helmholtz Centre for Infection Research (HZI) in Braunschweig, April 7-9, 2010. About 50 researchers working in the field of systems genetics using mouse genetic reference populations (GRP) participated in the meeting and exchanged their results, phenotyping approaches, and data analysis tools for studying systems genetics. In addition, the future of GRP resources and phenotyping in Europe was discussed.
Resumo:
BACKGROUND: DNA sequence integrity, mRNA concentrations and protein-DNA interactions have been subject to genome-wide analyses based on microarrays with ever increasing efficiency and reliability over the past fifteen years. However, very recently novel technologies for Ultra High-Throughput DNA Sequencing (UHTS) have been harnessed to study these phenomena with unprecedented precision. As a consequence, the extensive bioinformatics environment available for array data management, analysis, interpretation and publication must be extended to include these novel sequencing data types. DESCRIPTION: MIMAS was originally conceived as a simple, convenient and local Microarray Information Management and Annotation System focused on GeneChips for expression profiling studies. MIMAS 3.0 enables users to manage data from high-density oligonucleotide SNP Chips, expression arrays (both 3'UTR and tiling) and promoter arrays, BeadArrays as well as UHTS data using MIAME-compliant standardized vocabulary. Importantly, researchers can export data in MAGE-TAB format and upload them to the EBI's ArrayExpress certified data repository using a one-step procedure. CONCLUSION: We have vastly extended the capability of the system such that it processes the data output of six types of GeneChips (Affymetrix), two different BeadArrays for mRNA and miRNA (Illumina) and the Genome Analyzer (a popular Ultra-High Throughput DNA Sequencer, Illumina), without compromising on its flexibility and user-friendliness. MIMAS, appropriately renamed into Multiomics Information Management and Annotation System, is currently used by scientists working in approximately 50 academic laboratories and genomics platforms in Switzerland and France. MIMAS 3.0 is freely available via http://multiomics.sourceforge.net/.
Resumo:
Plasmodium falciparum is the parasite responsible for the most acute form of malaria in humans. Recently, the serine repeat antigen (SERA) in P. falciparum has attracted attention as a potential vaccine and drug target, and it has been shown to be a member of a large gene family. To clarify the relationships among the numerous P. falciparum SERAs and to identify orthologs to SERA5 and SERA6 in Plasmodium species affecting rodents, gene trees were inferred from nucleotide and amino acid sequence data for 33 putative SERA homologs in seven different species. (A distance method for nucleotide sequences that is specifically designed to accommodate differing GC content yielded results that were largely compatible with the amino acid tree. Standard-distance and maximum-likelihood methods for nucleotide sequences, on the other hand, yielded gene trees that differed in important respects.) To infer the pattern of duplication, speciation, and gene loss events in the SERA gene family history, the resulting gene trees were then "reconciled" with two competing Plasmodium species tree topologies that have been identified by previous phylogenetic studies. Parsimony of reconciliation was used as a criterion for selecting a gene tree/species tree pair and provided (1) support for one of the two species trees and for the core topology of the amino acid-derived gene tree, (2) a basis for critiquing fine detail in a poorly resolved region of the gene tree, (3) a set of predicted "missing genes" in some species, (4) clarification of the relationship among the P. falciparum SERA, and (5) some information about SERA5 and SERA6 orthologs in the rodent malaria parasites. Parsimony of reconciliation and a second criterion--implied mutational pattern at two key active sites in the SERA proteins-were also seen to be useful supplements to standard "bootstrap" analysis for inferred topologies.