2 resultados para Independent Sequence
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo
Resumo:
mitochondrial genomes are generally thought to be under selection for compactness, due to their small size, consistent gene content, and a lack of introns or intergenic spacers. As more animal mitochondrial genomes are fully sequenced, rearrangements and partial duplications are being identified with increasing frequency, particularly in birds (Class Ayes). In this study, we investigate the evolutionary history of mitochondrial control region states within the avian order Psittaciformes (parrots and cockatoos). To this aim, we reconstructed a comprehensive multi-locus phylogeny of parrots, used PCR of three diagnostic fragments to classify the mitochondrial control region state as single or duplicated, and mapped these states onto the phylogeny. We further sequenced 44 selected species to validate these inferences of control region state. Ancestral state reconstruction using a range of weighting schemes identified six independent origins of mitochondrial control region duplications within Psittaciformes. Analysis of sequence data showed that varying levels of mitochondrial gene and tRNA homology and degradation were present within a given clade exhibiting duplications. Levels of divergence between control regions within an individual varied from 0-10.9% with the differences occurring mainly between 51 and 225 nucleotides 3' of the goose hairpin in domain I. Further investigations into the fates of duplicated mitochondrial genes, the potential costs and benefits of having a second control region, and the complex relationship between evolutionary rates, selection, and time since duplication are needed to fully explain these patterns in the mitochondrial genome. (C) 2012 Elsevier Inc. All rights reserved.
Resumo:
Abstract Background A large number of probabilistic models used in sequence analysis assign non-zero probability values to most input sequences. To decide when a given probability is sufficient the most common way is bayesian binary classification, where the probability of the model characterizing the sequence family of interest is compared to that of an alternative probability model. We can use as alternative model a null model. This is the scoring technique used by sequence analysis tools such as HMMER, SAM and INFERNAL. The most prevalent null models are position-independent residue distributions that include: the uniform distribution, genomic distribution, family-specific distribution and the target sequence distribution. This paper presents a study to evaluate the impact of the choice of a null model in the final result of classifications. In particular, we are interested in minimizing the number of false predictions in a classification. This is a crucial issue to reduce costs of biological validation. Results For all the tests, the target null model presented the lowest number of false positives, when using random sequences as a test. The study was performed in DNA sequences using GC content as the measure of content bias, but the results should be valid also for protein sequences. To broaden the application of the results, the study was performed using randomly generated sequences. Previous studies were performed on aminoacid sequences, using only one probabilistic model (HMM) and on a specific benchmark, and lack more general conclusions about the performance of null models. Finally, a benchmark test with P. falciparum confirmed these results. Conclusions Of the evaluated models the best suited for classification are the uniform model and the target model. However, the use of the uniform model presents a GC bias that can cause more false positives for candidate sequences with extreme compositional bias, a characteristic not described in previous studies. In these cases the target model is more dependable for biological validation due to its higher specificity.