59 resultados para Sequence alignment
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.
Resumo:
L'objectiu del projecte consisteix en el desenvolupament d'un add-in d'anàlisi i manipulació de seqüències, senzill i de fàcil ús, integrable en l'entorn Microsoft Word per permetre la manipulació de seqüències genètiques directament des de Microsoft Word, estalviant temps, en evitar haver de canviar constantment de programa i format per treballar amb elles; i, també, complicacions a l'usuari final. L'add-in ha estat desenvolupat en Visual Basic + VSTO i ofereix diverses funcionalitats d'edició i anàlisi de seqüències, com ara el complement, la recerca de motius o l'alineament.
Resumo:
Las aplicaciones de alineamiento múltiple de secuencias son prototipos de aplicaciones que requieren elevada potencia de cómputo y memoria. Se destacan por la relevancia científica que tienen los resultados que brindan a investigaciones científicas en el campo de la biomedicina, genética y farmacología. Las aplicaciones de alineamiento múltiple tienen la limitante de que no son capaces de procesar miles de secuencias, por lo que se hace necesario crear un modelo para resolver la problemática. Analizando el volumen de datos que se manipulan en el área de las ciencias biológica y la complejidad de los algoritmos de alineamiento de secuencias, la única vía de solución del problema es a través de la utilización de entornos de cómputo paralelos y la computación de altas prestaciones. La investigación realizada por nosotros tiene como objetivo la creación de un modelo paralelo que le permita a los algoritmos de alineamiento múltiple aumentar el número de secuencias a procesar, tratando de mantener la calidad en los resultados para garantizar la precisión científica. El modelo que proponemos emplea como base la clusterización de las secuencias de entrada utilizando criterios biológicos que permiten mantener la calidad de los resultados. Además, el modelo se enfoca en la disminución del tiempo de cómputo y consumo de memoria. Para presentar y validar el modelo utilizamos T-Coffee, como plataforma de desarrollo e investigación. El modelo propuesto pudiera ser aplicado a cualquier otro algoritmo de alineamiento múltiple de secuencias.
Resumo:
Avui en dia la biologia aporta grans quantitats de dades que només la informàtica pot tractar. Les aplicacions bioinformàtiques són la més important eina d’anàlisi i comparació que tenim per entendre la vida i aconseguir desxifrar aquestes dades. Aquest projecte centra el seu esforç en l’estudi de les aplicacions dedicades a l’alineament de seqüències genètiques, i més concretament a dos algoritmes, basats en programació dinàmica i òptims: el Needleman&Wunsch i el Smith&Waterman. Amb l’objectiu de millorar el rendiment d’aquests algoritmes per a alineaments de seqüències grans, proposem diferents versions d’implementació. Busquem millorar rendiments en temps i espai. Per a aconseguir millorar els resultats aprofitem el paral·lelisme. Els resultats dels anàlisis de les versions els comparem per obtenir les dades necessàries per valorar cost, guany i rendiment.
Resumo:
Desde el inicio del proyecto del genoma humano y su éxito en el año 2001 se han secuenciado genomas de multitud de especies. La mejora en las tecnologías de secuenciación ha generado volúmenes de datos con un crecimiento exponencial. El proyecto Análisis bioinformáticos sobre la tecnología Hadoop abarca la computación paralela de datos biológicos como son las secuencias de ADN. El estudio ha sido encauzado por la naturaleza del problema a resolver. El alineamiento de secuencias genéticas con el paradigma MapReduce.
Resumo:
Las aplicaciones de alineamiento de secuencias son una herramienta importante para la comunidad científica. Estas aplicaciones bioinformáticas son usadas en muchos campos distintos como pueden ser la medicina, la biología, la farmacología, la genética, etc. A día de hoy los algoritmos de alineamiento de secuencias tienen una complejidad elevada y cada día tienen que manejar un volumen de datos más grande. Por esta razón se deben buscar alternativas para que estas aplicaciones sean capaces de manejar el aumento de tamaño que los bancos de secuencias están sufriendo día a día. En este proyecto se estudian y se investigan mejoras en este tipo de aplicaciones como puede ser el uso de sistemas paralelos que pueden mejorar el rendimiento notablemente.
Resumo:
BACKGROUND: DNA sequence polymorphisms analysis can provide valuable information on the evolutionary forces shaping nucleotide variation, and provides an insight into the functional significance of genomic regions. The recent ongoing genome projects will radically improve our capabilities to detect specific genomic regions shaped by natural selection. Current available methods and software, however, are unsatisfactory for such genome-wide analysis. RESULTS: We have developed methods for the analysis of DNA sequence polymorphisms at the genome-wide scale. These methods, which have been tested on a coalescent-simulated and actual data files from mouse and human, have been implemented in the VariScan software package version 2.0. Additionally, we have also incorporated a graphical-user interface. The main features of this software are: i) exhaustive population-genetic analyses including those based on the coalescent theory; ii) analysis adapted to the shallow data generated by the high-throughput genome projects; iii) use of genome annotations to conduct a comprehensive analyses separately for different functional regions; iv) identification of relevant genomic regions by the sliding-window and wavelet-multiresolution approaches; v) visualization of the results integrated with current genome annotations in commonly available genome browsers. CONCLUSION: VariScan is a powerful and flexible suite of software for the analysis of DNA polymorphisms. The current version implements new algorithms, methods, and capabilities, providing an important tool for an exhaustive exploratory analysis of genome-wide DNA polymorphism data.
Resumo:
Background: Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results: In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions: The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.
Resumo:
Mitochondrial DNA (mtDNA), a maternally inherited 16.6-Kb molecule crucial for energy production, is implicated in numerous human traits and disorders. It has been hypothesized that the presence of mutations in the mtDNA may contribute to the complex genetic basis of schizophreniadisease, due to the evidence of maternal inheritance and the presence of schizophrenia symptoms in patients affected of a mitochondrial disorder related to a mtDNA mutation. The present project aims to study the association of variants of mitochondrial DNA (mtDNA), and an increased risk of schizophrenia in a cohort of patients and controls from the same population. The entire mtDNA of 55 schizophrenia patients with an apparent maternal transmission of the disease and 38 controls was sequenced by Next Generation Sequencing (Ion Torrent PGM, Life Technologies) and compared to the reference sequence. The current method for establishing mtDNA haplotypes is Sanger sequencing, which is laborious, timeconsuming, and expensive. With the emergence of Next Generation Sequencing technologies, this sequencing process can be much more quickly and cost-efficiently. We have identified 14 variants that have not been previously reported. Two of them were missense variants: MTATP6 p.V113M and MTND5 p.F334L ,and also three variants encoding rRNA and one variant encoding tRNA. Not significant differences have been found in the number of variants between the two groups. We found that the sequence alignment algorithm employed to align NGS reads played a significant role in the analysis of the data and the resulting mtDNA haplotypes. Further development of the bioinformatics analysis and annotation step would be desirable to facilitate the application of NGS in mtDNA analysis.
Resumo:
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments.
Resumo:
L’accent nuclear ascendent-descendent de les oracions expressant desacord en occità consta de tres tons: LH+L*. En comptes de precedir el to asterisc (“starred tone”) a un interval fix en temps normalitzat (Pierrehumbert & Beckman 1989), els tons menadors (“leading tones”) L i H s’alineen amb determinats punts d’ancoratge de la cadena de segments: les fronteres dreta i esquerra de la síl•laba pretònica, respectivament. El model de Grice (1995b) per a l’estructura dels accents tonals permet donar compte d’aquest patró d’alineació incloent els tons menadors en un node diferent que precedeix el que domina to seguidor (“trailing tone”) i to asterisc.
Resumo:
The genetic diversity of three temperate fruit tree phytoplasmas ‘Candidatus Phytoplasma prunorum’, ‘Ca. P. mali’ and ‘Ca. P. pyri’ has been established by multilocus sequence analysis. Among the four genetic loci used, the genes imp and aceF distinguished 30 and 24 genotypes, respectively, and showed the highest variability. Percentage of substitution for imp ranged from 50 to 68% according to species. Percentage of substitution varied between 9 and 12% for aceF, whereas it was between 5 and 6% for pnp and secY. In the case of ‘Ca P. prunorum’ the three most prevalent aceF genotypes were detected in both plants and insect vectors, confirming that the prevalent isolates are propagated by insects. The four isolates known to be hypo-virulent had the same aceF sequence, indicating a possible monophyletic origin. Haplotype network reconstructed by eBURST revealed that among the 34 haplotypes of ‘Ca. P. prunorum’, the four hypo-virulent isolates also grouped together in the same clade. Genotyping of some Spanish and Azerbaijanese ‘Ca. P. pyri’ isolates showed that they shared some alleles with ‘Ca. P. prunorum’, supporting for the first time to our knowledge, the existence of inter-species recombination between these two species.
Resumo:
With the advent of High performance computing, it is now possible to achieve orders of magnitude performance and computation e ciency gains over conventional computer architectures. This thesis explores the potential of using high performance computing to accelerate whole genome alignment. A parallel technique is applied to an algorithm for whole genome alignment, this technique is explained and some experiments were carried out to test it. This technique is based in a fair usage of the available resource to execute genome alignment and how this can be used in HPC clusters. This work is a rst approximation to whole genome alignment and it shows the advantages of parallelism and some of the drawbacks that our technique has. This work describes the resource limitations of current WGA applications when dealing with large quantities of sequences. It proposes a parallel heuristic to distribute the load and to assure that alignment quality is mantained.
Resumo:
Seafloor imagery is a rich source of data for the study of biological and geological processes. Among several applications, still images of the ocean floor can be used to build image composites referred to as photo-mosaics. Photo-mosaics provide a wide-area visual representation of the benthos, and enable applications as diverse as geological surveys, mapping and detection of temporal changes in the morphology of biodiversity. We present an approach for creating globally aligned photo-mosaics using 3D position estimates provided by navigation sensors available in deep water surveys. Without image registration, such navigation data does not provide enough accuracy to produce useful composite images. Results from a challenging data set of the Lucky Strike vent field at the Mid Atlantic Ridge are reported
Resumo:
When underwater vehicles perform navigation close to the ocean floor, computer vision techniques can be applied to obtain quite accurate motion estimates. The most crucial step in the vision-based estimation of the vehicle motion consists on detecting matchings between image pairs. Here we propose the extensive use of texture analysis as a tool to ameliorate the correspondence problem in underwater images. Once a robust set of correspondences has been found, the three-dimensional motion of the vehicle can be computed with respect to the bed of the sea. Finally, motion estimates allow the construction of a map that could aid to the navigation of the robot