979 resultados para Bioinformatics Analysis
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
Systemic sclerosis (SSc) or Scleroderma is a complex disease and its etiopathogenesis remains unelucidated. Fibrosis in multiple organs is a key feature of SSc and studies have shown that transforming growth factor-β (TGF-β) pathway has a crucial role in fibrotic responses. For a complex disease such as SSc, expression quantitative trait loci (eQTL) analysis is a powerful tool for identifying genetic variations that affect expression of genes involved in this disease. In this study, a multilevel model is described to perform a multivariate eQTL for identifying genetic variation (SNPs) specifically associated with the expression of three members of TGF-β pathway, CTGF, SPARC and COL3A1. The uniqueness of this model is that all three genes were included in one model, rather than one gene being examined at a time. A protein might contribute to multiple pathways and this approach allows the identification of important genetic variations linked to multiple genes belonging to the same pathway. In this study, 29 SNPs were identified and 16 of them located in known genes. Exploring the roles of these genes in TGF-β regulation will help elucidate the etiology of SSc, which will in turn help to better manage this complex disease. ^
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
Resumo:
The genomic era brought by recent advances in the next-generation sequencing technology makes the genome-wide scans of natural selection a reality. Currently, almost all the statistical tests and analytical methods for identifying genes under selection was performed on the individual gene basis. Although these methods have the power of identifying gene subject to strong selection, they have limited power in discovering genes targeted by moderate or weak selection forces, which are crucial for understanding the molecular mechanisms of complex phenotypes and diseases. Recent availability and rapid completeness of many gene network and protein-protein interaction databases accompanying the genomic era open the avenues of exploring the possibility of enhancing the power of discovering genes under natural selection. The aim of the thesis is to explore and develop normal mixture model based methods for leveraging gene network information to enhance the power of natural selection target gene discovery. The results show that the developed statistical method, which combines the posterior log odds of the standard normal mixture model and the Guilt-By-Association score of the gene network in a naïve Bayes framework, has the power to discover moderate/weak selection gene which bridges the genes under strong selection and it helps our understanding the biology under complex diseases and related natural selection phenotypes.^
Resumo:
DNA extraction was carried out as described on the MICROBIS project pages (http://icomm.mbl.edu/microbis ) using a commercially available extraction kit. We amplified the hypervariable regions V4-V6 of archaeal and bacterial 16S rRNA genes using PCR and several sets of forward and reverse primers (http://vamps.mbl.edu/resources/primers.php). Massively parallel tag sequencing of the PCR products was carried out on a 454 Life Sciences GS FLX sequencer at Marine Biological Laboratory, Woods Hole, MA, following the same experimental conditions for all samples. Sequence reads were submitted to a rigorous quality control procedure based on mothur v30 (doi:10.1128/AEM.01541-09) including denoising of the flow grams using an algorithm based on PyroNoise (doi:10.1038/nmeth.1361), removal of PCR errors and a chimera check using uchime (doi:10.1093/bioinformatics/btr381). The reads were taxonomically assigned according to the SILVA taxonomy (SSURef v119, 07-2014; doi:10.1093/nar/gks1219) implemented in mothur and clustered at 98% ribosomal RNA gene V4-V6 sequence identity. V4-V6 amplicon sequence abundance tables were standardized to account for unequal sampling effort using 1000 (Archaea) and 2300 (Bacteria) randomly chosen sequences without replacement using mothur and then used to calculate inverse Simpson diversity indices and Chao1 richness (doi:10.2307/4615964). Bray-Curtis dissimilarities (doi:10.2307/1942268) between all samples were calculated and used for 2-dimensional non metric multidimensional scaling (NMDS) ordinations with 20 random starts (doi:10.1007/BF02289694). Stress values below 0.2 indicated that the multidimensional dataset was well represented by the 2D ordination. NMDS ordinations were compared and tested using Procrustes correlation analysis (doi:10.1007/BF02291478). All analyses were carried out with the R statistical environment and the packages vegan (available at: http://cran.r-project.org/package=vegan), labdsv (available at: http://cran.r-project.org/package=labdsv), as well as with custom R scripts. Operational taxonomic units at 98% sequence identity (OTU0.03) that occurred only once in the whole dataset were termed absolute single sequence OTUs (SSOabs; doi:10.1038/ismej.2011.132). OTU0.03 sequences that occurred only once in at least one sample, but may occur more often in other samples were termed relative single sequence OTUs (SSOrel). SSOrel are particularly interesting for community ecology, since they comprise rare organisms that might become abundant when conditions change.16S rRNA amplicons and metagenomic reads have been stored in the sequence read archive under SRA project accession number SRP042162.
Resumo:
Background DCE@urLAB is a software application for analysis of dynamic contrast-enhanced magnetic resonance imaging data (DCE-MRI). The tool incorporates a friendly graphical user interface (GUI) to interactively select and analyze a region of interest (ROI) within the image set, taking into account the tissue concentration of the contrast agent (CA) and its effect on pixel intensity. Results Pixel-wise model-based quantitative parameters are estimated by fitting DCE-MRI data to several pharmacokinetic models using the Levenberg-Marquardt algorithm (LMA). DCE@urLAB also includes the semi-quantitative parametric and heuristic analysis approaches commonly used in practice. This software application has been programmed in the Interactive Data Language (IDL) and tested both with publicly available simulated data and preclinical studies from tumor-bearing mouse brains. Conclusions A user-friendly solution for applying pharmacokinetic and non-quantitative analysis DCE-MRI in preclinical studies has been implemented and tested. The proposed tool has been specially designed for easy selection of multi-pixel ROIs. A public release of DCE@urLAB, together with the open source code and sample datasets, is available at http://www.die.upm.es/im/archives/DCEurLAB/ webcite.
Resumo:
Semantic interoperability is essential to facilitate efficient collaboration in heterogeneous multi-site healthcare environments. The deployment of a semantic interoperability solution has the potential to enable a wide range of informatics supported applications in clinical care and research both within as ingle healthcare organization and in a network of organizations. At the same time, building and deploying a semantic interoperability solution may require significant effort to carryout data transformation and to harmonize the semantics of the information in the different systems. Our approach to semantic interoperability leverages existing healthcare standards and ontologies, focusing first on specific clinical domains and key applications, and gradually expanding the solution when needed. An important objective of this work is to create a semantic link between clinical research and care environments to enable applications such as streamlining the execution of multi-centric clinical trials, including the identification of eligible patients for the trials. This paper presents an analysis of the suitability of several widely-used medical ontologies in the clinical domain: SNOMED-CT, LOINC, MedDRA, to capture the semantics of the clinical trial eligibility criteria, of the clinical trial data (e.g., Clinical Report Forms), and of the corresponding patient record data that would enable the automatic identification of eligible patients. Next to the coverage provided by the ontologies we evaluate and compare the sizes of the sets of relevant concepts and their relative frequency to estimate the cost of data transformation, of building the necessary semantic mappings, and of extending the solution to new domains. This analysis shows that our approach is both feasible and scalable.
Resumo:
A novel protein superfamily with over 600 members was discovered by iterative profile searches and analyzed with powerful bioinformatics and information visualization methods. Evidence exists that these proteins generate a radical species by reductive cleavage of S-adenosylmethionine (SAM) through an unusual Fe-S center. The superfamily (named here Radical SAM) provides evidence that radical-based catalysis is important in a number of previously well- studied but unresolved biochemical pathways and reflects an ancient conserved mechanistic approach to difficult chemistries. Radical SAM proteins catalyze diverse reactions, including unusual methylations, isomerization, sulfur insertion, ring formation, anaerobic oxidation and protein radical formation. They function in DNA precursor, vitamin, cofactor, antibiotic and herbicide biosynthesis and in biodegradation pathways. One eukaryotic member is interferon-inducible and is considered a candidate drug target for osteoporosis; another is observed to bind the neuronal Cdk5 activator protein. Five defining members not previously recognized as homologs are lysine 2,3-aminomutase, biotin synthase, lipoic acid synthase and the activating enzymes for pyruvate formate-lyase and anaerobic ribonucleotide reductase. Two functional predictions for unknown proteins are made based on integrating other data types such as motif, domain, operon and biochemical pathway into an organized view of similarity relationships.
Resumo:
Protein–protein interactions play crucial roles in the execution of various biological functions. Accordingly, their comprehensive description would contribute considerably to the functional interpretation of fully sequenced genomes, which are flooded with novel genes of unpredictable functions. We previously developed a system to examine two-hybrid interactions in all possible combinations between the ≈6,000 proteins of the budding yeast Saccharomyces cerevisiae. Here we have completed the comprehensive analysis using this system to identify 4,549 two-hybrid interactions among 3,278 proteins. Unexpectedly, these data do not largely overlap with those obtained by the other project [Uetz, P., et al. (2000) Nature (London) 403, 623–627] and hence have substantially expanded our knowledge on the protein interaction space or interactome of the yeast. Cumulative connection of these binary interactions generates a single huge network linking the vast majority of the proteins. Bioinformatics-aided selection of biologically relevant interactions highlights various intriguing subnetworks. They include, for instance, the one that had successfully foreseen the involvement of a novel protein in spindle pole body function as well as the one that may uncover a hitherto unidentified multiprotein complex potentially participating in the process of vesicular transport. Our data would thus significantly expand and improve the protein interaction map for the exploration of genome functions that eventually leads to thorough understanding of the cell as a molecular system.
Resumo:
Support for molecular biology researchers has been limited to traditional library resources and services in most academic health sciences libraries. The University of Washington Health Sciences Libraries have been providing specialized services to this user community since 1995. The library recruited a Ph.D. biologist to assess the molecular biological information needs of researchers and design strategies to enhance library resources and services. A survey of laboratory research groups identified areas of greatest need and led to the development of a three-pronged program: consultation, education, and resource development. Outcomes of this program include bioinformatics consultation services, library-based and graduate level courses, networking of sequence analysis tools, and a biological research Web site. Bioinformatics clients are drawn from diverse departments and include clinical researchers in need of tools that are not readily available outside of basic sciences laboratories. Evaluation and usage statistics indicate that researchers, regardless of departmental affiliation or position, require support to access molecular biology and genetics resources. Centralizing such services in the library is a natural synergy of interests and enhances the provision of traditional library resources. Successful implementation of a library-based bioinformatics program requires both subject-specific and library and information technology expertise.
Resumo:
Proteins secreted by and anchored on the surfaces of parasites are in intimate contact with host tissues. The transcriptome of infective cercariae of the blood fluke, Schistosoma mansoni, was screened using signal sequence trap to isolate cDNAs encoding predicted proteins with an N-terminal signal peptide. Twenty cDNA fragments were identified, most of which contained predicted signal peptides or transmembrane regions, including a novel putative seven-transmembrane receptor and a membrane-associated mitogen-activated protein kinase. The developmental expression pattern within different life-cycle stages ranged from ubiquitous to a transcript that was highly upregulated in the cercaria. A bioinformatics-based comparison of 100 signal peptides from each of schistosomes, humans, a parasitic nematode and Escherichia coli showed that differences in the sequence composition of signal peptides, notably the residues flanking the predicted cleavage site, might account for the negative bias exhibited in the processing of schistosome signal peptides in mammalian cells. (c) 2005 Federation of European Microbiological Societies. Published by Elsevier B.V. All rights reserved.
Resumo:
This study describes the identification of outer membrane proteins (OMPs) of the bacterial pathogen Pasteurella multocida and an analysis of how the expression of these proteins changes during infection of the natural host. We analysed the sarcosine-insoluble membrane fractions, which are highly enriched for OMPs, from bacteria grown under a range of conditions. Initially, the OMP-containing fractions were resolved by 2-DE and the proteins identified by MALDI-TOF MS. In addition, the OMP-containing fractions were separated by 1-D SDS-PAGE and protein identifications were made using nano LC MS/MS. Using these two methods a total of 35 proteins was identified from samples obtained from organisms grown in rich culture medium. Six of the proteins were identified only by 2-DE MALDI-TOF MS, whilst 17 proteins were identified only by 1-D LC MS/MS. We then analysed the OMPs from P. multocida which had been isolated from the bloodstream of infected chickens (a natural host) or grown in iron-depleted medium. Three proteins were found to be significantly up-regulated during growth in vivo and one of these (Pm0803) was also up-regulated during growth in iron-depleted medium. After bioinformatic analysis of the protein matches, it was predicted that over one third of the combined OMPs predicted by the bioinformatics sub-cellular localisation tools PSORTB and Proteome Analyst, had been identified during this study. This is the first comprehensive proteomic analysis of the P. multocida outer membrane and the first proteomic analysis of how a bacterial pathogen modifies its outer membrane proteome during infection.
Resumo:
At present, little is known about signal transduction mechanisms in schistosomes, which cause the disease of schistosomiasis. The mitogen-activated protein kinase (MAPK) signaling pathways, which are evolutionarily conserved from yeast to Homo sapiens, play key roles in multiple cellular processes. Here, we reconstructed the hypothetical MAPK signaling pathways in Schistosoma japonicum and compared the schistosome pathways with those of model eukaryote species. We identified 60 homologous components in the S. japoncium MAPK signaling pathways. Among these, 27 were predicted to be full-length sequences. Phylogenetic analysis of these proteins confirmed the evolutionary conservation of the MAPK signaling pathways. Remarkably, we identified S. japonicum homologues of GTP-binding protein beta and alpha-I subunits in the yeast mating pathway, which might be involved in the regulation of different life stages and female sexual maturation processes as well in schistosomes. In addition, several pathway member genes, including ERK, JNK, Sja-DSP, MRAS and RAS, were determined through quantitative PCR analysis to be expressed in a stage-specific manner, with ERK, JNK and their inhibitor Sja-DSP markedly upregulated in adult female schistosomes. (c) 2006 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Resumo:
An understanding of inheritance requires comprehension of genetic processes at all levels, from molecules to populations. Frequently genetics courses are separated into molecular and organismal genetics and students may fail to see the relationships between them. This is particularly true with human genetics, because of the difficulties in designing experimental approaches which are consistent with ethical restrictions, student abilities and background knowledge, and available time and materials. During 2005 we used analysis of single nucleotide polymorphisms (SNPs) in two genetic regions to enhance student learning and provide a practical experience in human genetics. Students scanned databases to discover SNPs in a gene of interest, used software to design PCR primers and a restriction enzyme based assay for the alleles, and carried out an analysis of the SNP on anonymous individual and family DNAs. The project occupied eight to ten hours per week for one semester, with some time spent in the laboratory and some spent in database searching, reading and writing the report. In completing their projects, students acquired a knowledge of Mendel’s first law (through looking at inheritance patterns), Mendel’s second law and the exceptions (the concepts of linkage and linkage disequilibrium), DNA structure (primer design and restriction enzyme analysis) and function (SNPs in coding and non-coding regions), population genetics and the statistical analysis of allele frequencies, genomics, bioinformatics and the ethical issues associated with the use of human samples. They also developed skills in presentation of results by publication and conference participation. Deficiencies in their understanding (for example of inheritance patterns, gene structure, statistical approaches and report writing) were detected and guidance given during the project. SNP analysis was found to be a powerful approach to enhance and integrate student understanding of genetic concepts.