931 resultados para Bioinformatics
Resumo:
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.
Resumo:
The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods based on the occurrence of specific patterns of nucleotides at coding regions have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically predefined window length required for a local analysis of a DNA region. We introduce a method based on a modified Gabor-wavelet transform (MGWT) for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods by using eukaryote data sets. The results show that MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors but also makes a tool available for detailed exploration of the nucleotide occurrence.
Resumo:
Despite its importance to agriculture, the genetic basis of heterosis is still not well understood. The main competing hypotheses include dominance, overdominance, and epistasis. NC design III is an experimental design that. has been used for estimating the average degree of dominance of quantitative trait 106 (QTL) and also for studying heterosis. In this study, we first develop a multiple-interval mapping (MIM) model for design III that provides a platform to estimate the number, genomic positions, augmented additive and dominance effects, and epistatic interactions of QTL. The model can be used for parents with any generation of selling. We apply the method to two data sets, one for maize and one for rice. Our results show that heterosis in maize is mainly due to dominant gene action, although overdominance of individual QTL could not completely be ruled out due to the mapping resolution and limitations of NC design III. For rice, the estimated QTL dominant effects could not explain the observed heterosis. There is evidence that additive X additive epistatic effects of QTL could be the main cause for the heterosis in rice. The difference in the genetic basis of heterosis seems to be related to open or self pollination of the two species. The MIM model for NC design III is implemented in Windows QTL Cartographer, a freely distributed software.
Resumo:
Background: The aim of this study was to identify novel candidate biomarker proteins differentially expressed in the plasma of patients with early stage acute myocardial infarction (AMI) using SELDI-TOF-MS as a high throughput screening technology. Methods: Ten individuals with recent acute ischemic-type chest pain (< 12 h duration) and ST-segment elevation AMI (1STEMI) and after a second AMI (2STEMI) were selected. Blood samples were drawn at six times after STEMI diagnosis. The first stage (T(0)) was in Emergency Unit before receiving any medication, the second was just after primary angioplasty (T(2)), and the next four stages occurred at 12 h intervals after T(0). Individuals (n = 7) with similar risk factors for cardiovascular disease and normal ergometric test were selected as a control group (CG). Plasma proteomic profiling analysis was performed using the top-down (i.e. intact proteins) SELDI-TOF-MS, after processing in a Multiple Affinity Removal Spin Cartridge System (Agilent). Results: Compared with the CG, the 1STEMI group exhibited 510 differentially expressed protein peaks in the first 48 h after the AMI (p < 0.05). The 2STEMI group, had similar to 85% fewer differently expressed protein peaks than those without previous history of AMI (76, p < 0.05). Among the 16 differentially-regulated protein peaks common to both STEMI cohorts (compared with the CG at T(0)), 6 peaks were persistently down-regulated at more than one time-stage, and also were inversed correlated with serum protein markers (cTnI, CK and CKMB) during 48 h-period after IAM. Conclusions: Proteomic analysis by SELDI-TOF-MS technology combined with bioinformatics tools demonstrated differential expression during a 48 h time course suggests a potential role of some of these proteins as biomarkers for the very early stages of AMI, as well as for monitoring early cardiac ischemic recovery. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Genetic recombination can produce heterogeneous phylogenetic histories within a set of homologous genes. Delineating recombination events is important in the study of molecular evolution, as inference of such events provides a clearer picture of the phylogenetic relationships among different gene sequences or genomes. Nevertheless, detecting recombination events can be a daunting task, as the performance of different recombination-detecting approaches can vary, depending on evolutionary events that take place after recombination. We previously evaluated the effects of post-recombination events on the prediction accuracy of recombination-detecting approaches using simulated nucleotide sequence data. The main conclusion, supported by other studies, is that one should not depend on a single method when searching for recombination events. In this paper, we introduce a two-phase strategy, applying three statistical measures to detect the occurrence of recombination events, and a Bayesian phylogenetic approach to delineate breakpoints of such events in nucleotide sequences. We evaluate the performance of these approaches using simulated data, and demonstrate the applicability of this strategy to empirical data. The two-phase strategy proves to be time-efficient when applied to large datasets, and yields high-confidence results.
Resumo:
Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
The explosive growth in biotechnology combined with major advancesin information technology has the potential to radically transformimmunology in the postgenomics era. Not only do we now have readyaccess to vast quantities of existing data, but new data with relevanceto immunology are being accumulated at an exponential rate. Resourcesfor computational immunology include biological databases and methodsfor data extraction, comparison, analysis and interpretation. Publiclyaccessible biological databases of relevance to immunologists numberin the hundreds and are growing daily. The ability to efficientlyextract and analyse information from these databases is vital forefficient immunology research. Most importantly, a new generationof computational immunology tools enables modelling of peptide transportby the transporter associated with antigen processing (TAP), modellingof antibody binding sites, identification of allergenic motifs andmodelling of T-cell receptor serial triggering.
Resumo:
Allergy is a major cause of morbidity worldwide. The number of characterized allergens and related information is increasing rapidly creating demands for advanced information storage, retrieval and analysis. Bioinformatics provides useful tools for analysing allergens and these are complementary to traditional laboratory techniques for the study of allergens. Specific applications include structural analysis of allergens, identification of B- and T-cell epitopes, assessment of allergenicity and cross-reactivity, and genome analysis. In this paper, the most important bioinformatic tools and methods with relevance to the study of allergy have been reviewed.
Resumo:
Allergies represent a significant medical and industrial problem. Molecular and clinical data on allergens are growing exponentially and in this article we have reviewed nine specialized allergen databases and identified data sources related to protein allergens contained in general purpose molecular databases. An analysis of allergens contained in public databases indicates a high level of redundancy of entries and a relatively low coverage of allergens by individual databases. From this analysis we identify current database needs for allergy research and, in particular, highlight the need for a centralized reference allergen database.
Resumo:
Background: A major goal in the post-genomic era is to identify and characterise disease susceptibility genes and to apply this knowledge to disease prevention and treatment. Rodents and humans have remarkably similar genomes and share closely related biochemical, physiological and pathological pathways. In this work we utilised the latest information on the mouse transcriptome as revealed by the RIKEN FANTOM2 project to identify novel human disease-related candidate genes. We define a new term patholog to mean a homolog of a human disease-related gene encoding a product ( transcript, anti-sense or protein) potentially relevant to disease. Rather than just focus on Mendelian inheritance, we applied the analysis to all potential pathologs regardless of their inheritance pattern. Results: Bioinformatic analysis and human curation of 60,770 RIKEN full-length mouse cDNA clones produced 2,578 sequences that showed similarity ( 70 - 85% identity) to known human-disease genes. Using a newly developed biological information extraction and annotation tool ( FACTS) in parallel with human expert analysis of 17,051 MEDLINE scientific abstracts we identified 182 novel potential pathologs. Of these, 36 were identified by computational tools only, 49 by human expert analysis only and 97 by both methods. These pathologs were related to neoplastic ( 53%), hereditary ( 24%), immunological ( 5%), cardio-vascular (4%), or other (14%), disorders. Conclusions: Large scale genome projects continue to produce a vast amount of data with potential application to the study of human disease. For this potential to be realised we need intelligent strategies for data categorisation and the ability to link sequence data with relevant literature. This paper demonstrates the power of combining human expert annotation with FACTS, a newly developed bioinformatics tool, to identify novel pathologs from within large-scale mouse transcript datasets.
Resumo:
It has been reported that microRNAs (miRNA) may have allele-specific targeting for the 3` untranslated region (3` UTR) of the HLA-G locus. In a previous study, we reported 11 3`UTR haplotypes encompassing the 14-bp insertion/deletion polymorphism and seven SNPs (+3003 T/C, +3010 C/G, +3027 C/A, +3035 C/T, +3142 C/G, +3187A/G,and +3196 C/G), of which only the +3142 C/G SNP has been reported to influence the binding of miRNAs. Using bioinformatics analyses, we identified putative miRNA-binding sites considering the haplotypes encompassing these eight polymorphic sites, and we ranked the lowest free energies that could potentially lead to an mRNA degradation or translational repression. When a specific haplotype or a particular SNP was associated with a miRNA-binding site, we defined a free energy difference of 4 kcal/mol between alleles to classify them energetically distant. The best results were obtained for the miR-513a-5p, miR-518c*, miR-1262 and miR-92a-1*, miR-92a-2*, miR-661, miR-1224-5p, and miR-433 miRNAs, all influencing one or more of the +3003, +3010, +3027, and +3035 SNPs. The miR-2110, miR-93, miR-508-5p, miR-331-5p, miR-616, miR-513b, and miR-589* miRNAs targeted the 14-bp fragment region, and miR-148a, miR-19a*, miR-152, mir-148b,and miR-218-2 also influenced the +3142C/G polymorphism. These results suggest that these miRNAs might play a relevant role on the HLA-G expression pattern. (C) 2009 Published by Elsevier Inc. on behalf of American Society for Histocompatibility and Immunogenetics.
Resumo:
Linkage studies have identified the human leukocyte antigen (HLA)-DRB1 as a putative rheumatoid arthritis (RA) susceptibility locus (SL). Nevertheless, it was estimated that its contribution was partial, suggesting that other non-HLA genes may play a role in RA susceptibility. To test this hypothesis, we conducted microarray transcription profiling of peripheral blood mononuclear cells in 15 RA patients and analyzed the data, using bioinformatics programs (significance analysis of microarrays method and GeneNetwork), which allowed us to determine the differentially expressed genes and to reconstruct transcriptional networks. The patients were grouped according to disease features or treatment with tumor necrosis factor blocker. Transcriptional networks that were reconstructed allowed us to identify the interactions occurring between RA SL and other genes, for example, HLA-DRB1 interacting with FNDC3A (fibronectin type III domain containing 3A). Given that fibronectin fragments can stimulate mediators of matrix and cartilage destruction in RA, this interaction is of special interest and may contribute to a clearer understanding of the functional role of HLA-DRB1 in RA pathogenesis.
Resumo:
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Resumo:
Directed evolution techniques have been used to improve the thermal stability of the xylanase A from Bacillus subtilis (XylA). Two generations of random mutant libraries generated by error prone PCR coupled with a single generation of DNA shuffling produced a series of mutant proteins with increasing thermostability. The most Thermostable XylA variant from the third generation contained four mutations Q7H, G13R, S22P, and S179C that showed an increase in melting temperature of 20 degrees C. The thermodynamic properties Of a representative subset of nine XylA variants showing a range of thermostabilities were measured by thermal denaturation as monitored by the change in the far ultraviolet circular dichroism signal. Analysis of the data from these thermostable variants demonstrated a correlation between the decrease in the heat capacity change (Delta C(p)) with an increase in the midpoint of the transition temperature (T(m)) on transition from the native to the unfolded state. This result could not be interpreted within the context of the changes in accessible surface area of the protein on transition from the native to unfolded states. Since all the mutations are located at the surface of the protein, these results suggest that an explanation of the decrease in Delta C(p) on should include effects arising from the prot inlsolvent interface.