43 resultados para Bioinformatics

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Recent studies have identified the genetic underpinnings of a growing number of diseases through targeted exome sequencing. However, this strategy ignores the large component of the genome that does not code for proteins, but is nonetheless biologically functional. To address the possible involvement of regulatory variation in congenital heart diseases (CHDs), we searched for regulatory mutations impacting the activity of TBX5, a dosage-dependent transcription factor with well-defined roles in the heart and limb development that has been associated with the HoltOram syndrome (hearthand syndrome), a condition that affects 1/100 000 newborns. Using a combination of genomics, bioinformatics and mouse genetic engineering, we scanned approximate to 700 kb of the TBX5 locus in search of cis-regulatory elements. We uncovered three enhancers that collectively recapitulate the endogenous expression pattern of TBX5 in the developing heart. We re-sequenced these enhancer elements in a cohort of non-syndromic patients with isolated atrial and/or ventricular septal defects, the predominant cardiac defects of the HoltOram syndrome, and identified a patient with a homozygous mutation in an enhancer approximate to 90 kb downstream of TBX5. Notably, we demonstrate that this single-base-pair mutation abrogates the ability of the enhancer to drive expression within the heart in vivo using both mouse and zebrafish transgenic models. Given the population-wide frequency of this variant, we estimate that 1/100 000 individuals would be homozygous for this variant, highlighting that a significant number of CHD associated with TBX5 dysfunction might arise from non-coding mutations in TBX5 heart enhancers, effectively decoupling the heart and hand phenotypes of the HoltOram syndrome.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In [1], the authors proposed a framework for automated clustering and visualization of biological data sets named AUTO-HDS. This letter is intended to complement that framework by showing that it is possible to get rid of a user-defined parameter in a way that the clustering stage can be implemented more accurately while having reduced computational complexity

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In silico analyses of Leishmania spp. genome data are a powerful resource to improve the understanding of these pathogens' biology. Trypanosomatids such as Leishmania spp. have their protein-coding genes grouped in long polycistronic units of functionally unrelated genes. The control of gene expression happens by a variety of posttranscriptional mechanisms. The high degree of synteny among Leishmania species is accompanied by highly conserved coding sequences (CDS) and poorly conserved intercoding untranslated sequences. To identify the elements involved in the control of gene expression, we conducted an in silico investigation to find conserved intercoding sequences (CICS) in the genomes of L major, L infantum, and L braziliensis. We used a combination of computational tools, such as Linux-Shell, PERL and R languages, BLAST, MSPcrunch, SSAKE, and Pred-A-Term algorithms to construct a pipeline which was able to: (i) search for conservation in target-regions, (ii) eliminate CICS redundancy and mask repeat elements, (iii) predict the mRNA's extremities, (iv) analyze the distribution of orthologous genes within the generated LeishCICS-clusters, (v) assign GO terms to the LeishCICS-clusters. and (vi) provide statistical support for the gene-enrichment annotation. We associated the LeishCICS-cluster data, generated at the end of the pipeline, with the expression profile oft. donovani genes during promastigote-amastigote differentiation, as previously evaluated by others (GEO accession: GSE21936). A Pearson's correlation coefficient greater than 0.5 was observed for 730 LeishCICS-clusters containing from 2 to 17 genes. The designed computational pipeline is a useful tool and its application identified potential regulatory cis elements and putative regulons in Leishmania. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results: The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions: We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. Results: We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of SIS, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that SIS has overall better performance. Conclusions: SIS is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of SIS in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In silico comparison of 34 putative pks genes in Aspergillus niger strain CBS 513.88 versus A. niger strain ATCC 1015 genome revealed significant nucleotide identity (>95% covering a minimum of 99% of the gene sequence) for 31 of these genes (approximately 91%). A. niger CBS 513.88 harbors three putative pks genes (An01g01130, An11g05940, and An15g07920), for which nucleotide identity was not found in A. niger ATCC 1015. To compare the results of the in silico analysis with the in vivo situation, experimental data were obtained for a large number of A. niger strains obtained from different substrates and geographical regions. Three putative Os genes that were found to be variable between the two A. niger strains using bioinformatics tools were in fact strain-specific genes based on experimental data. The PCR amplification signals for the An01g01130, An11g05940, and An15g07920 pks genes were detected in only 97%, 71%, and 26% of the strains, respectively. Southern blot analyses confirmed the PCR data. Because one of the strain-specific pits genes (An15g07920) is located in a putative ochratoxin cluster, we focused our investigation on that region. We assessed the ochratoxin production capability of the 119 A. niger strains and found a positive association between the presence of this pia gene and the capability of the respective strain to produce ochratoxin. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The time is ripe for a comprehensive mission to explore and document Earth's species. This calls for a campaign to educate and inspire the next generation of professional and citizen species explorers, investments in cyber-infrastructure and collections to meet the unique needs of the producers and consumers of taxonomic information, and the formation and coordination of a multi-institutional, international, transdisciplinary community of researchers, scholars and engineers with the shared objective of creating a comprehensive inventory of species and detailed map of the biosphere. We conclude that an ambitious goal to describe 10 million species in less than 50 years is attainable based on the strength of 250 years of progress, worldwide collections, existing experts, technological innovation and collaborative teamwork. Existing digitization projects are overcoming obstacles of the past, facilitating collaboration and mobilizing literature, data, images and specimens through cyber technologies. Charting the biosphere is enormously complex, yet necessary expertise can be found through partnerships with engineers, information scientists, sociologists, ecologists, climate scientists, conservation biologists, industrial project managers and taxon specialists, from agrostologists to zoophytologists. Benefits to society of the proposed mission would be profound, immediate and enduring, from detection of early responses of flora and fauna to climate change to opening access to evolutionary designs for solutions to countless practical problems. The impacts on the biodiversity, environmental and evolutionary sciences would be transformative, from ecosystem models calibrated in detail to comprehensive understanding of the origin and evolution of life over its 3.8 billion year history. The resultant cyber-enabled taxonomy, or cybertaxonomy, would open access to biodiversity data to developing nations, assure access to reliable data about species, and change how scientists and citizens alike access, use and think about biological diversity information.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Modern sugarcane cultivars are complex hybrids resulting from crosses among several Saccharum species. Traditional breeding methods have been employed extensively in different countries over the past decades to develop varieties with increased sucrose yield and resistance to pests and diseases. Conventional variety improvement, however, may be limited by the narrow pool of suitable genes. Thus, molecular genetics is seen as a promising tool to assist in the process of developing improved varieties. The SUCEST-FUN Project (http://sucest-fun.org) aims to associate function with sugarcane genes using a variety of tools, in particular those that enable the study of the sugarcane transcriptome. An extensive analysis has been conducted to characterise, phenotypically, sugarcane genotypes with regard to their sucrose content, biomass and drought responses. Through the analysis of different cultivars, genes associated with sucrose content, yield, lignin and drought have been identified. Currently, tools are being developed to determine signalling and regulatory networks in grasses, and to sequence the sugarcane genome, as well as to identify sugarcane promoters. This is being implemented through the SUCEST-FUN (http://sucest-fun.org) and GRASSIUS databases (http://grassius.org), the cloning of sugarcane promoters, the identification of cis-regulatory elements (CRE) using Chromatin Immunoprecipitation-sequencing (ChIP-Seq) and the generation of a comprehensive Signal Transduction and Transcription gene catalogue (SUCAST Catalogue).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Intron splicing is one of the most important steps involved in the maturation process of a pre-mRNA. Although the sequence profiles around the splice sites have been studied extensively, the levels of sequence identity between the exonic sequences preceding the donor sites and the intronic sequences preceding the acceptor sites has not been examined as thoroughly. In this study we investigated identity patterns between the last 15 nucleotides of the exonic sequence preceding the 5' splice site and the intronic sequence preceding the 3' splice site in a set of human protein-coding genes that do not exhibit intron retention. We found that almost 60% of consecutive exons and introns in human protein-coding genes share at least two identical nucleotides at their 3' ends and, on average, the sequence identity length is 2.47 nucleotides. Based on our findings we conclude that the 3' ends of exons and introns tend to have longer identical sequences within a gene than when being taken from different genes. Our results hold even if the pairs are non-consecutive in the transcription order. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Traditional methods for bacterial identification include Gram staining, culturing, and biochemical assays for phenotypic characterization of the causative organism. These methods can be time-consuming because they require in vitro cultivation of the microorganisms. Recently, however, it has become possible to obtain chemical profiles for lipids, peptides, and proteins that are present in an intact organism, particularly now that new developments have been made for the efficient ionization of biomolecules. MS has therefore become the state-of-the-art technology for microorganism identification in microbiological clinical diagnosis. Here, we introduce an innovative sample preparation method for nonculture-based identification of bacteria in milk. The technique detects characteristic profiles of intact proteins (mostly ribosomal) with the recently introduced MALDI SepsityperTM Kit followed by MALDI-MS. In combination with a dedicated bioinformatics software tool for databank matching, the method allows for almost real-time and reliable genus and species identification. We demonstrate the sensitivity of this protocol by experimentally contaminating pasteurized and homogenized whole milk samples with bacterial loads of 10(3)-10(8) colony-forming units (cfu) of laboratory strains of Escherichia coli, Enterococcus faecalis, and Staphylococcus aureus. For milk samples contaminated with a lower bacterial load (104 cfu mL-1), bacterial identification could be performed after initial incubation at 37 degrees C for 4 h. The sensitivity of the method may be influenced by the bacterial species and count, and therefore, it must be optimized for the specific application. The proposed use of protein markers for nonculture-based bacterial identification allows for high-throughput detection of pathogens present in milk samples. This method could therefore be useful in the veterinary practice and in the dairy industry, such as for the diagnosis of subclinical mastitis and for the sanitary monitoring of raw and processed milk products.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In protein databases there is a substantial number of proteins structurally determined but without function annotation. Understanding the relationship between function and structure can be useful to predict function on a large scale. We have analyzed the similarities in global physicochemical parameters for a set of enzymes which were classified according to the four Enzyme Commission (EC) hierarchical levels. Using relevance theory we introduced a distance between proteins in the space of physicochemical characteristics. This was done by minimizing a cost function of the metric tensor built to reflect the EC classification system. Using an unsupervised clustering method on a set of 1025 enzymes, we obtained no relevant clustering formation compatible with EC classification. The distance distributions between enzymes from the same EC group and from different EC groups were compared by histograms. Such analysis was also performed using sequence alignment similarity as a distance. Our results suggest that global structure parameters are not sufficient to segregate enzymes according to EC hierarchy. This indicates that features essential for function are rather local than global. Consequently, methods for predicting function based on global attributes should not obtain high accuracy in main EC classes prediction without relying on similarities between enzymes from training and validation datasets. Furthermore, these results are consistent with a substantial number of studies suggesting that function evolves fundamentally by recruitment, i.e., a same protein motif or fold can be used to perform different enzymatic functions and a few specific amino acids (AAs) are actually responsible for enzyme activity. These essential amino acids should belong to active sites and an effective method for predicting function should be able to recognize them. (C) 2012 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Understanding alternative splicing is crucial to elucidate the mechanisms behind several biological phenomena, including diseases. The huge amount of expressed sequences available nowadays represents an opportunity and a challenge to catalog and display alternative splicing events (ASEs). Although several groups have faced this challenge with relative success, we still lack a computational tool that uses a simple and straightforward method to retrieve, name and present ASEs. Here we present SPLOOCE, a portal for the analysis of human splicing variants. SPLOOCE uses a method based on regular expressions for retrieval of ASEs. We propose a simple syntax that is able to capture the complexity of ASEs.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The hierarchy of the segmentation cascade responsible for establishing the Drosophila body plan is composed by gap, pair-rule and segment polarity genes. However, no pair-rule stripes are formed in the anterior regions of the embryo. This lack of stripe formation, as well as other evidence from the literature that is further investigated here, led us to the hypothesis that anterior gap genes might be involved in a combinatorial mechanism responsible for repressing the cis-regulatory modules (CRMs) of hairy (h), even-skipped (eve), runt (run), and fushi-tarazu (ftz) anterior-most stripes. In this study, we investigated huckebein (hkb), which has a gap expression domain at the anterior tip of the embryo. Using genetic methods we were able to detect deviations from the wild-type patterns of the anterior-most pair-rule stripes in different genetic backgrounds, which were consistent with Hkb-mediated repression. Moreover, we developed an image processing tool that, for the most part, confirmed our assumptions. Using an hkb misexpression system, we further detected specific repression on anterior stripes. Furthermore, bioinformatics analysis predicted an increased significance of binding site clusters in the CRMs of h 1, eve 1, run 1 and ftz 1 when Hkb was incorporated in the analysis, indicating that Hkb plays a direct role in these CRMs. We further discuss that Hkb and Slp1, which is the other previously identified common repressor of anterior stripes, might participate in a combinatorial repression mechanism controlling stripe CRMs in the anterior parts of the embryo and define the borders of these anterior stripes. (C) 2011 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Ontologies have increasingly been used in the biomedical domain, which has prompted the emergence of different initiatives to facilitate their development and integration. The Open Biological and Biomedical Ontologies (OBO) Foundry consortium provides a repository of life-science ontologies, which are developed according to a set of shared principles. This consortium has developed an ontology called OBO Relation Ontology aiming at standardizing the different types of biological entity classes and associated relationships. Since ontologies are primarily intended to be used by humans, the use of graphical notations for ontology development facilitates the capture, comprehension and communication of knowledge between its users. However, OBO Foundry ontologies are captured and represented basically using text-based notations. The Unified Modeling Language (UML) provides a standard and widely-used graphical notation for modeling computer systems. UML provides a well-defined set of modeling elements, which can be extended using a built-in extension mechanism named Profile. Thus, this work aims at developing a UML profile for the OBO Relation Ontology to provide a domain-specific set of modeling elements that can be used to create standard UML-based ontologies in the biomedical domain. Results: We have studied the OBO Relation Ontology, the UML metamodel and the UML profiling mechanism. Based on these studies, we have proposed an extension to the UML metamodel in conformance with the OBO Relation Ontology and we have defined a profile that implements the extended metamodel. Finally, we have applied the proposed UML profile in the development of a number of fragments from different ontologies. Particularly, we have considered the Gene Ontology (GO), the PRotein Ontology (PRO) and the Xenopus Anatomy and Development Ontology (XAO). Conclusions: The use of an established and well-known graphical language in the development of biomedical ontologies provides a more intuitive form of capturing and representing knowledge than using only text-based notations. The use of the profile requires the domain expert to reason about the underlying semantics of the concepts and relationships being modeled, which helps preventing the introduction of inconsistencies in an ontology under development and facilitates the identification and correction of errors in an already defined ontology.