936 resultados para Biology, Bioinformatics|Computer Science
Resumo:
Acetohydroxyacid synthase (AHAS; EC 2.2.1.6) catalyzes the first common step in branched-chain amino acid biosynthesis. The enzyme is inhibited by several chemical classes of compounds and this inhibition is the basis of action of the sulfonylurea and imidazolinone herbicides. The commercial sulfonylureas contain a pyrimidine or a triazine ring that is substituted at both meta positions, thus obeying the initial rules proposed by Levitt. Here we assess the activity of 69 monosubstituted sulfonylurea analogs and related compounds as inhibitors of pure recombinant Arabidopsis thaliana AHAS and show that disubstitution is not absolutely essential as exemplified by our novel herbicide, monosulfuron (2-nitro-N-(4'-methyl-pyrimidin-2'-yl) phenyl-sulfonylurea), which has a pyrimidine ring with a single meta substituent. A subset of these compounds was tested for herbicidal activity and it was shown that their effect in vivo correlates well with their potency in vitro as AHAS inhibitors. Three-dimensional quantitative structure-activity relationships were developed using comparative molecular field analysis and comparative molecular similarity indices analysis. For the latter, the best result was obtained when steric, electrostatic, hydrophobic and H-bond acceptor factors were taken into consideration. The resulting fields were mapped on to the published crystal structure of the yeast enzyme and it was shown that the steric and hydrophobic fields are in good agreement with sulfonylurea-AHAS interaction geometry.
Resumo:
beta-turns are important topological motifs for biological recognition of proteins and peptides. Organic molecules that sample the side chain positions of beta-turns have shown broad binding capacity to multiple different receptors, for example benzodiazepines. beta-turns have traditionally been classified into various types based on the backbone dihedral angles (phi 2, psi 2, phi 3 and psi 3). Indeed, 57-68% of beta-turns are currently classified into 8 different backbone families (Type I, Type II, Type I', Type II', Type VIII, Type VIa1, Type VIa2 and Type VIb and Type IV which represents unclassified beta-turns). Although this classification of beta-turns has been useful, the resulting beta-turn types are not ideal for the design of beta-turn mimetics as they do not reflect topological features of the recognition elements, the side chains. To overcome this, we have extracted beta-turns from a data set of non-homologous and high-resolution protein crystal structures. The side chain positions, as defined by C-alpha-C-beta vectors, of these turns have been clustered using the kth nearest neighbor clustering and filtered nearest centroid sorting algorithms. Nine clusters were obtained that cluster 90% of the data, and the average intra-cluster RMSD of the four C-alpha-C-beta vectors is 0.36. The nine clusters therefore represent the topology of the side chain scaffold architecture of the vast majority of beta-turns. The mean structures of the nine clusters are useful for the development of beta-turn mimetics and as biological descriptors for focusing combinatorial chemistry towards biologically relevant topological space.
Resumo:
In this paper, we first overview the French project on heritage called PATRIMA, launched in 2011 as one of the Projets d'investissement pour l'avenir, a French funding program meant to last for the next ten years. The overall purpose of the PATRIMA project is to promote and fund research on various aspects of heritage presentation and preservation. Such research being interdisciplinary, research groups in history, physics, chemistry, biology and computer science are involved in this project. The PATRIMA consortium involves research groups from universities and from the main museums or cultural heritage institutions in Paris and surroundings. More specifically, the main members of the consortium are the two universities of Cergy-Pontoise and Versailles Saint-Quentin and the following famous museums or cultural institutions: Musée du Louvre, Château de Versailles, Bibliothèque nationale de France, Musée du Quai Branly, Musée Rodin. In the second part of the paper, we focus on two projects funded by PATRIMA named EDOP and Parcours and dealing with data integration. The goal of the EDOP project is to provide users with a data space for the integration of heterogeneous information about heritage; Linked Open Data are considered for an effective access to the corresponding data sources. On the other hand, the Parcours project aims at building an ontology on the terminology about the techniques dealing with restoration and/or conservation. Such an ontology is meant to provide a common terminology to researchers using different databases and different vocabularies.
Resumo:
-
Resumo:
The generation of a correlation matrix from a large set of long gene sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. The generation is not only computationally intensive but also requires significant memory resources as, typically, few gene sequences can be simultaneously stored in primary memory. The standard practice in such computation is to use frequent input/output (I/O) operations. Therefore, minimizing the number of these operations will yield much faster run-times. This paper develops an approach for the faster and scalable computing of large-size correlation matrices through the full use of available memory and a reduced number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different problems with different correlation matrix sizes. The significant performance improvement of the approach over the existing approaches is demonstrated through benchmark examples.
Resumo:
Molecular biology is a scientific discipline which has changed fundamentally in character over the past decade to rely on large scale datasets – public and locally generated - and their computational analysis and annotation. Undergraduate education of biologists must increasingly couple this domain context with a data-driven computational scientific method. Yet modern programming and scripting languages and rich computational environments such as R and MATLAB present significant barriers to those with limited exposure to computer science, and may require substantial tutorial assistance over an extended period if progress is to be made. In this paper we report our experience of undergraduate bioinformatics education using the familiar, ubiquitous spreadsheet environment of Microsoft Excel. We describe a configurable extension called QUT.Bio.Excel, a custom ribbon, supporting a rich set of data sources, external tools and interactive processing within the spreadsheet, and a range of problems to demonstrate its utility and success in addressing the needs of students over their studies.
Resumo:
Soldatova, L. N. and King R. D. (2005) Are the Current Ontologies used in Biology Good Ontologies? Nature Biotechnology 23:1095-1098
Resumo:
Ferr?, S. and King, R. D. (2004) BLID: an Application of Logical Information Systems in Bioinformatics. In P. Eklund (editor), 2nd International Conference on Formal Concept Analysis (ICFCA), Feb 2004. LNCS 2961, Springer.
Resumo:
In the present paper, we introduce BioPatML.NET, an application library for the Microsoft Windows .NET framework [2] that implements the BioPatML pattern definition language and sequence search engine. BioPatML.NET is integrated with the Microsoft Biology Foundation (MBF) application library [3], unifying the parsers and annotation services supported or emerging through MBF with the language, search framework and pattern repository of BioPatML. End users who wish to exploit the BioPatML.NET engine and repository without engaging the services of a programmer may do so via the freely accessible web-based BioPatML Editor, which we describe below.
Resumo:
This item provides supplementary materials for the paper mentioned in the title, specifically a range of organisms used in the study. The full abstract for the main paper is as follows: Next Generation Sequencing (NGS) technologies have revolutionised molecular biology, allowing clinical sequencing to become a matter of routine. NGS data sets consist of short sequence reads obtained from the machine, given context and meaning through downstream assembly and annotation. For these techniques to operate successfully, the collected reads must be consistent with the assumed species or species group, and not corrupted in some way. The common bacterium Staphylococcus aureus may cause severe and life-threatening infections in humans,with some strains exhibiting antibiotic resistance. In this paper, we apply an SVM classifier to the important problem of distinguishing S. aureus sequencing projects from alternative pathogens, including closely related Staphylococci. Using a sequence k-mer representation, we achieve precision and recall above 95%, implicating features with important functional associations.
Resumo:
Genomic sequences are fundamentally text documents, admitting various representations according to need and tokenization. Gene expression depends crucially on binding of enzymes to the DNA sequence at small, poorly conserved binding sites, limiting the utility of standard pattern search. However, one may exploit the regular syntactic structure of the enzyme's component proteins and the corresponding binding sites, framing the problem as one of detecting grammatically correct genomic phrases. In this paper we propose new kernels based on weighted tree structures, traversing the paths within them to capture the features which underpin the task. Experimentally, we and that these kernels provide performance comparable with state of the art approaches for this problem, while offering significant computational advantages over earlier methods. The methods proposed may be applied to a broad range of sequence or tree-structured data in molecular biology and other domains.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
The generation of a correlation matrix for set of genomic sequences is a common requirement in many bioinformatics problems such as phylogenetic analysis. Each sequence may be millions of bases long and there may be thousands of such sequences which we wish to compare, so not all sequences may fit into main memory at the same time. Each sequence needs to be compared with every other sequence, so we will generally need to page some sequences in and out more than once. In order to minimize execution time we need to minimize this I/O. This paper develops an approach for faster and scalable computing of large-size correlation matrices through the maximal exploitation of available memory and reducing the number of I/O operations. The approach is scalable in the sense that the same algorithms can be executed on different computing platforms with different amounts of memory and can be applied to different bioinformatics problems with different correlation matrix sizes. The significant performance improvement of the approach over previous work is demonstrated through benchmark examples.
Resumo:
Determination of sequence similarity is a central issue in computational biology, a problem addressed primarily through BLAST, an alignment based heuristic which has underpinned much of the analysis and annotation of the genomic era. Despite their success, alignment-based approaches scale poorly with increasing data set size, and are not robust under structural sequence rearrangements. Successive waves of innovation in sequencing technologies – so-called Next Generation Sequencing (NGS) approaches – have led to an explosion in data availability, challenging existing methods and motivating novel approaches to sequence representation and similarity scoring, including adaptation of existing methods from other domains such as information retrieval. In this work, we investigate locality-sensitive hashing of sequences through binary document signatures, applying the method to a bacterial protein classification task. Here, the goal is to predict the gene family to which a given query protein belongs. Experiments carried out on a pair of small but biologically realistic datasets (the full protein repertoires of families of Chlamydia and Staphylococcus aureus genomes respectively) show that a measure of similarity obtained by locality sensitive hashing gives highly accurate results while offering a number of avenues which will lead to substantial performance improvements over BLAST..