40 resultados para Bioinformatics
Resumo:
Motivation: Understanding the patterns of association between polymorphisms at different loci in a population ( linkage disequilibrium, LD) is of fundamental importance in various genetic studies. Many coefficients were proposed for measuring the degree of LD, but they provide only a static view of the current LD structure. Generative models (GMs) were proposed to go beyond these measures, giving not only a description of the actual LD structure but also a tool to help understanding the process that generated such structure. GMs based in coalescent theory have been the most appealing because they link LD to evolutionary factors. Nevertheless, the inference and parameter estimation of such models is still computationally challenging. Results: We present a more practical method to build GM that describe LD. The method is based on learning weighted Bayesian network structures from haplotype data, extracting equivalence structure classes and using them to model LD. The results obtained in public data from the HapMap database showed that the method is a promising tool for modeling LD. The associations represented by the learned models are correlated with the traditional measure of LD D`. The method was able to represent LD blocks found by standard tools. The granularity of the association blocks and the readability of the models can be controlled in the method. The results suggest that the causality information gained by our method can be useful to tell about the conservability of the genetic markers and to guide the selection of subset of representative markers.
Resumo:
The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
An important topic in genomic sequence analysis is the identification of protein coding regions. In this context, several coding DNA model-independent methods based on the occurrence of specific patterns of nucleotides at coding regions have been proposed. Nonetheless, these methods have not been completely suitable due to their dependence on an empirically predefined window length required for a local analysis of a DNA region. We introduce a method based on a modified Gabor-wavelet transform (MGWT) for the identification of protein coding regions. This novel transform is tuned to analyze periodic signal components and presents the advantage of being independent of the window length. We compared the performance of the MGWT with other methods by using eukaryote data sets. The results show that MGWT outperforms all assessed model-independent methods with respect to identification accuracy. These results indicate that the source of at least part of the identification errors produced by the previous methods is the fixed working scale. The new method not only avoids this source of errors but also makes a tool available for detailed exploration of the nucleotide occurrence.
Resumo:
Despite its importance to agriculture, the genetic basis of heterosis is still not well understood. The main competing hypotheses include dominance, overdominance, and epistasis. NC design III is an experimental design that. has been used for estimating the average degree of dominance of quantitative trait 106 (QTL) and also for studying heterosis. In this study, we first develop a multiple-interval mapping (MIM) model for design III that provides a platform to estimate the number, genomic positions, augmented additive and dominance effects, and epistatic interactions of QTL. The model can be used for parents with any generation of selling. We apply the method to two data sets, one for maize and one for rice. Our results show that heterosis in maize is mainly due to dominant gene action, although overdominance of individual QTL could not completely be ruled out due to the mapping resolution and limitations of NC design III. For rice, the estimated QTL dominant effects could not explain the observed heterosis. There is evidence that additive X additive epistatic effects of QTL could be the main cause for the heterosis in rice. The difference in the genetic basis of heterosis seems to be related to open or self pollination of the two species. The MIM model for NC design III is implemented in Windows QTL Cartographer, a freely distributed software.
Resumo:
Background: The aim of this study was to identify novel candidate biomarker proteins differentially expressed in the plasma of patients with early stage acute myocardial infarction (AMI) using SELDI-TOF-MS as a high throughput screening technology. Methods: Ten individuals with recent acute ischemic-type chest pain (< 12 h duration) and ST-segment elevation AMI (1STEMI) and after a second AMI (2STEMI) were selected. Blood samples were drawn at six times after STEMI diagnosis. The first stage (T(0)) was in Emergency Unit before receiving any medication, the second was just after primary angioplasty (T(2)), and the next four stages occurred at 12 h intervals after T(0). Individuals (n = 7) with similar risk factors for cardiovascular disease and normal ergometric test were selected as a control group (CG). Plasma proteomic profiling analysis was performed using the top-down (i.e. intact proteins) SELDI-TOF-MS, after processing in a Multiple Affinity Removal Spin Cartridge System (Agilent). Results: Compared with the CG, the 1STEMI group exhibited 510 differentially expressed protein peaks in the first 48 h after the AMI (p < 0.05). The 2STEMI group, had similar to 85% fewer differently expressed protein peaks than those without previous history of AMI (76, p < 0.05). Among the 16 differentially-regulated protein peaks common to both STEMI cohorts (compared with the CG at T(0)), 6 peaks were persistently down-regulated at more than one time-stage, and also were inversed correlated with serum protein markers (cTnI, CK and CKMB) during 48 h-period after IAM. Conclusions: Proteomic analysis by SELDI-TOF-MS technology combined with bioinformatics tools demonstrated differential expression during a 48 h time course suggests a potential role of some of these proteins as biomarkers for the very early stages of AMI, as well as for monitoring early cardiac ischemic recovery. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
It has been reported that microRNAs (miRNA) may have allele-specific targeting for the 3` untranslated region (3` UTR) of the HLA-G locus. In a previous study, we reported 11 3`UTR haplotypes encompassing the 14-bp insertion/deletion polymorphism and seven SNPs (+3003 T/C, +3010 C/G, +3027 C/A, +3035 C/T, +3142 C/G, +3187A/G,and +3196 C/G), of which only the +3142 C/G SNP has been reported to influence the binding of miRNAs. Using bioinformatics analyses, we identified putative miRNA-binding sites considering the haplotypes encompassing these eight polymorphic sites, and we ranked the lowest free energies that could potentially lead to an mRNA degradation or translational repression. When a specific haplotype or a particular SNP was associated with a miRNA-binding site, we defined a free energy difference of 4 kcal/mol between alleles to classify them energetically distant. The best results were obtained for the miR-513a-5p, miR-518c*, miR-1262 and miR-92a-1*, miR-92a-2*, miR-661, miR-1224-5p, and miR-433 miRNAs, all influencing one or more of the +3003, +3010, +3027, and +3035 SNPs. The miR-2110, miR-93, miR-508-5p, miR-331-5p, miR-616, miR-513b, and miR-589* miRNAs targeted the 14-bp fragment region, and miR-148a, miR-19a*, miR-152, mir-148b,and miR-218-2 also influenced the +3142C/G polymorphism. These results suggest that these miRNAs might play a relevant role on the HLA-G expression pattern. (C) 2009 Published by Elsevier Inc. on behalf of American Society for Histocompatibility and Immunogenetics.
Resumo:
Linkage studies have identified the human leukocyte antigen (HLA)-DRB1 as a putative rheumatoid arthritis (RA) susceptibility locus (SL). Nevertheless, it was estimated that its contribution was partial, suggesting that other non-HLA genes may play a role in RA susceptibility. To test this hypothesis, we conducted microarray transcription profiling of peripheral blood mononuclear cells in 15 RA patients and analyzed the data, using bioinformatics programs (significance analysis of microarrays method and GeneNetwork), which allowed us to determine the differentially expressed genes and to reconstruct transcriptional networks. The patients were grouped according to disease features or treatment with tumor necrosis factor blocker. Transcriptional networks that were reconstructed allowed us to identify the interactions occurring between RA SL and other genes, for example, HLA-DRB1 interacting with FNDC3A (fibronectin type III domain containing 3A). Given that fibronectin fragments can stimulate mediators of matrix and cartilage destruction in RA, this interaction is of special interest and may contribute to a clearer understanding of the functional role of HLA-DRB1 in RA pathogenesis.
Resumo:
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Resumo:
Directed evolution techniques have been used to improve the thermal stability of the xylanase A from Bacillus subtilis (XylA). Two generations of random mutant libraries generated by error prone PCR coupled with a single generation of DNA shuffling produced a series of mutant proteins with increasing thermostability. The most Thermostable XylA variant from the third generation contained four mutations Q7H, G13R, S22P, and S179C that showed an increase in melting temperature of 20 degrees C. The thermodynamic properties Of a representative subset of nine XylA variants showing a range of thermostabilities were measured by thermal denaturation as monitored by the change in the far ultraviolet circular dichroism signal. Analysis of the data from these thermostable variants demonstrated a correlation between the decrease in the heat capacity change (Delta C(p)) with an increase in the midpoint of the transition temperature (T(m)) on transition from the native to the unfolded state. This result could not be interpreted within the context of the changes in accessible surface area of the protein on transition from the native to unfolded states. Since all the mutations are located at the surface of the protein, these results suggest that an explanation of the decrease in Delta C(p) on should include effects arising from the prot inlsolvent interface.
Resumo:
Familial idiopathic basal ganglia calcification, also known as ""Fahr`s disease"" (FD), is a neuropsychiatric disorder with autosomal dominant pattern of inheritance and characterized by symmetric basal ganglia calcifications and, occasionally, other brain regions. Currently, there are three loci linked to this devastating disease. The first one (IBGC1) is located in 14q11.2-21.3 and the other two have been identified in 2q37 (IBGC2) and 8p21.1-q11.13 (IBGC3). Further studies identified a heterozygous variation (rs36060072) which consists in the change of the cytosine to guanine located at MGEA6/CTAGE5 gene, present in all of the affected large American family linked to IBGC1. This missense substitution, which induces changes of a proline to alanine at the 521 position (P521A), in a proline-rich and highly conserved protein domain was considered a rare variation, with a minor allele frequency (MAF) of 0.0058 at the US population. Considering that the population frequency of a given variation is an indirect indicative of potential pathogenicity, we screened 200 chromosomes in a random control set of Brazilian samples and in two nuclear families, comparing with our previous analysis in a US population. In addition, we accomplished analyses through bioinformatics programs to predict the pathogenicity of such variation. Our genetic screen found no P521A carriers. Polling these data together with the previous study in the USA, we have now a MAF of 0.0036, showing that this mutation is very rare. On the other hand, the bioinformatics analysis provided conflicting findings. There are currently various candidate genes and loci that could be involved with the underlying molecular basis of FD etiology, and other groups suggested the possible role played by genes in 2q37, related to calcium metabolism, and at chromosome 8 (NRG1 and SNTG1). Additional mutagenesis and in vivo studies are necessary to confirm the pathogenicity for variation in the P521A MGEA6.
Resumo:
Alzheimer`s Disease (AD) is the most common type of dementia among the elderly, with devastating consequences for the patient, their relatives, and caregivers. More than 300 genetic polymorphisms have been involved with AD, demonstrating that this condition is polygenic and with a complex pattern of inheritance. This paper aims to report and compare the results of AD genetics studies in case-control and familial analysis performed in Brazil since our first publication, 10 years ago. They include the following genes/markers: Apolipoprotein E (APOE), 5-hidroxytryptamine transporter length polymorphic region (5-HTTLPR), brain-derived neurotrophin factor (BDNF), monoamine oxidase A (MAO-A), and two simple-sequence tandem repeat polymorphisms (DXS1047 and D10S1423). Previously unpublished data of the interleukin-1 alpha (IL-1 alpha) and interleukin-1 beta (IL-1 beta) genes are reported here briefly. Results from others Brazilian studies with AD patients are also reported at this short review. Four local families studied with various markers at the chromosome 21, 19, 14, and 1 are briefly reported for the first time. The importance of studying DNA samples from Brazil is highlighted because of the uniqueness of its population, which presents both intense ethnical miscegenation, mainly at the east coast, but also clusters with high inbreeding rates in rural areas at the countryside. We discuss the current stage of extending these studies using high-throughput methods of large-scale genotyping, such as single nucleotide polymorphism microarrays, associated with bioinformatics tools that allow the analysis of such extensive number of genetics variables, with different levels of penetrance. There is still a long way between the huge amount of data gathered so far and the actual application toward the full understanding of AD, but the final goal is to develop precise tools for diagnosis and prognosis, creating new strategies for better treatments based on genetic profile.
Resumo:
Recurrent submicroscopic genomic copy number changes are the result of nonallelic homologous recombination (NAHR). Nonrecurrent aberrations, however, can result from different nonexclusive recombination-repair mechanisms. We previously described small microduplications at Xq28 containing MECP2 in four male patients with a severe neurological phenotype. Here, we report on the fine-mapping and breakpoint analysis of 16 unique microduplications. The size of the overlapping copy number changes varies between 0.3 and 2.3 Mb, and FISH analysis on three patients demonstrated a tandem orientation. Although eight of the 32 breakpoint regions coincide with low-copy repeats, none of the duplications are the result of NAHR. Bioinformatics analysis of the breakpoint regions demonstrated a 2.5-fold higher frequency of Alu interspersed repeats as compared with control regions, as well as a very high GC content (53%). Unexpectedly, we obtained the junction in only one patient by long-range PCR, which revealed nonhomologous end joining as the mechanism. Breakpoint analysis in two other patients by inverse PCR and subsequent array comparative genomic hybridization analysis demonstrated the presence of a second duplicated region more telomeric at Xq28, of which one copy was inserted in between the duplicated MECP2 regions. These data suggest a two-step mechanism in which part of Xq28 is first inserted near the MECP2 locus, followed by breakage-induced replication with strand invasion of the normal sister chromatid. Our results indicate that the mechanism by which copy number changes occur in regions with a complex genomic architecture can yield complex rearrangements.
Resumo:
Motivation: DNA assembly programs classically perform an all-against-all comparison of reads to identify overlaps, followed by a multiple sequence alignment and generation of a consensus sequence. If the aim is to assemble a particular segment, instead of a whole genome or transcriptome, a target-specific assembly is a more sensible approach. GenSeed is a Perl program that implements a seed-driven recursive assembly consisting of cycles comprising a similarity search, read selection and assembly. The iterative process results in a progressive extension of the original seed sequence. GenSeed was tested and validated on many applications, including the reconstruction of nuclear genes or segments, full-length transcripts, and extrachromosomal genomes. The robustness of the method was confirmed through the use of a variety of DNA and protein seeds, including short sequences derived from SAGE and proteome projects.