37 resultados para local sequence alignment problem
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
Resumo:
Las aplicaciones de alineamiento múltiple de secuencias son prototipos de aplicaciones que requieren elevada potencia de cómputo y memoria. Se destacan por la relevancia científica que tienen los resultados que brindan a investigaciones científicas en el campo de la biomedicina, genética y farmacología. Las aplicaciones de alineamiento múltiple tienen la limitante de que no son capaces de procesar miles de secuencias, por lo que se hace necesario crear un modelo para resolver la problemática. Analizando el volumen de datos que se manipulan en el área de las ciencias biológica y la complejidad de los algoritmos de alineamiento de secuencias, la única vía de solución del problema es a través de la utilización de entornos de cómputo paralelos y la computación de altas prestaciones. La investigación realizada por nosotros tiene como objetivo la creación de un modelo paralelo que le permita a los algoritmos de alineamiento múltiple aumentar el número de secuencias a procesar, tratando de mantener la calidad en los resultados para garantizar la precisión científica. El modelo que proponemos emplea como base la clusterización de las secuencias de entrada utilizando criterios biológicos que permiten mantener la calidad de los resultados. Además, el modelo se enfoca en la disminución del tiempo de cómputo y consumo de memoria. Para presentar y validar el modelo utilizamos T-Coffee, como plataforma de desarrollo e investigación. El modelo propuesto pudiera ser aplicado a cualquier otro algoritmo de alineamiento múltiple de secuencias.
Resumo:
Desde el inicio del proyecto del genoma humano y su éxito en el año 2001 se han secuenciado genomas de multitud de especies. La mejora en las tecnologías de secuenciación ha generado volúmenes de datos con un crecimiento exponencial. El proyecto Análisis bioinformáticos sobre la tecnología Hadoop abarca la computación paralela de datos biológicos como son las secuencias de ADN. El estudio ha sido encauzado por la naturaleza del problema a resolver. El alineamiento de secuencias genéticas con el paradigma MapReduce.
Resumo:
Can rules be used to shield public resources from political interference? The Brazilian constitution and national tax code stipulate that revenue sharing transfers to municipal governments be determined by the size of counties in terms of estimated population. In this paper I document that the population estimates which went into the transfer allocation formula for the year 1991 were manipulated, resulting in significant transfer differentials over the entire 1990's. I test whether conditional on county characteristics that might account for the manipulation, center-local party alignment, party popularity and the extent of interparty fragmentation at the county level are correlated with estimated populations in 1991. Results suggest that revenue sharing transfers were targeted at right-wing national deputies in electorally fragmented counties as well as aligned local executives.
Resumo:
L'objectiu del projecte consisteix en el desenvolupament d'un add-in d'anàlisi i manipulació de seqüències, senzill i de fàcil ús, integrable en l'entorn Microsoft Word per permetre la manipulació de seqüències genètiques directament des de Microsoft Word, estalviant temps, en evitar haver de canviar constantment de programa i format per treballar amb elles; i, també, complicacions a l'usuari final. L'add-in ha estat desenvolupat en Visual Basic + VSTO i ofereix diverses funcionalitats d'edició i anàlisi de seqüències, com ara el complement, la recerca de motius o l'alineament.
Resumo:
Avui en dia la biologia aporta grans quantitats de dades que només la informàtica pot tractar. Les aplicacions bioinformàtiques són la més important eina d’anàlisi i comparació que tenim per entendre la vida i aconseguir desxifrar aquestes dades. Aquest projecte centra el seu esforç en l’estudi de les aplicacions dedicades a l’alineament de seqüències genètiques, i més concretament a dos algoritmes, basats en programació dinàmica i òptims: el Needleman&Wunsch i el Smith&Waterman. Amb l’objectiu de millorar el rendiment d’aquests algoritmes per a alineaments de seqüències grans, proposem diferents versions d’implementació. Busquem millorar rendiments en temps i espai. Per a aconseguir millorar els resultats aprofitem el paral·lelisme. Els resultats dels anàlisis de les versions els comparem per obtenir les dades necessàries per valorar cost, guany i rendiment.
Resumo:
Las aplicaciones de alineamiento de secuencias son una herramienta importante para la comunidad científica. Estas aplicaciones bioinformáticas son usadas en muchos campos distintos como pueden ser la medicina, la biología, la farmacología, la genética, etc. A día de hoy los algoritmos de alineamiento de secuencias tienen una complejidad elevada y cada día tienen que manejar un volumen de datos más grande. Por esta razón se deben buscar alternativas para que estas aplicaciones sean capaces de manejar el aumento de tamaño que los bancos de secuencias están sufriendo día a día. En este proyecto se estudian y se investigan mejoras en este tipo de aplicaciones como puede ser el uso de sistemas paralelos que pueden mejorar el rendimiento notablemente.
Resumo:
BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Resumo:
Background: Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results: In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions: The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.
Resumo:
Mitochondrial DNA (mtDNA), a maternally inherited 16.6-Kb molecule crucial for energy production, is implicated in numerous human traits and disorders. It has been hypothesized that the presence of mutations in the mtDNA may contribute to the complex genetic basis of schizophreniadisease, due to the evidence of maternal inheritance and the presence of schizophrenia symptoms in patients affected of a mitochondrial disorder related to a mtDNA mutation. The present project aims to study the association of variants of mitochondrial DNA (mtDNA), and an increased risk of schizophrenia in a cohort of patients and controls from the same population. The entire mtDNA of 55 schizophrenia patients with an apparent maternal transmission of the disease and 38 controls was sequenced by Next Generation Sequencing (Ion Torrent PGM, Life Technologies) and compared to the reference sequence. The current method for establishing mtDNA haplotypes is Sanger sequencing, which is laborious, timeconsuming, and expensive. With the emergence of Next Generation Sequencing technologies, this sequencing process can be much more quickly and cost-efficiently. We have identified 14 variants that have not been previously reported. Two of them were missense variants: MTATP6 p.V113M and MTND5 p.F334L ,and also three variants encoding rRNA and one variant encoding tRNA. Not significant differences have been found in the number of variants between the two groups. We found that the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of the bioinformatics analysis and annotation step would be desirable to facilitate the application of NGS in mtDNA analysis.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms.This article first presents a series of experiments carried outwith two state-of-the-art methods for cover song identification.We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multipleevaluations of this one confirm a large increase in identificationaccuracy when comparing it with alternative state-of-the-artapproaches.
Resumo:
We study the existence of solutions to general measure-minimization problems over topological classes that are stable under localized Lipschitz homotopy, including the standard Plateau problem without the need for restrictive assumptions such as orientability or even rectifiability of surfaces. In case of problems over an open and bounded domain we establish the existence of a “minimal candidate”, obtained as the limit for the local Hausdorff convergence of a minimizing sequence for which the measure is lower-semicontinuous. Although we do not give a way to control the topological constraint when taking limit yet— except for some examples of topological classes preserving local separation or for periodic two-dimensional sets — we prove that this candidate is an Almgren-minimal set. Thus, using regularity results such as Jean Taylor’s theorem, this could be a way to find solutions to the above minimization problems under a generic setup in arbitrary dimension and codimension.
Resumo:
When underwater vehicles perform navigation close to the ocean floor, computer vision techniques can be applied to obtain quite accurate motion estimates. The most crucial step in the vision-based estimation of the vehicle motion consists on detecting matchings between image pairs. Here we propose the extensive use of texture analysis as a tool to ameliorate the correspondence problem in underwater images. Once a robust set of correspondences has been found, the three-dimensional motion of the vehicle can be computed with respect to the bed of the sea. Finally, motion estimates allow the construction of a map that could aid to the navigation of the robot
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models