964 resultados para Shaft Alignment
Resumo:
We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.
Resumo:
The 3' untranslated regions (3'UTRs) of flaviviruses are reviewed and analyzed in relation to short sequences conserved as direct repeats (DRs). Previously, alignments of the 3'UTRs have been constructed for three of the four recognized flavivirus groups, namely mosquito-borne, tick-borne, and nonclassified flaviviruses (MBFV, TBFV, and NCFV, respectively). This revealed (1) six long repeat sequences (LRSs) in the 3'UTR and open-reading frame (ORF) of the TBFV, (2) duplication of the 3'UTR of the NCFV by intramolecular recombination, and (3) the possibility of a common origin for all DRs within the MBFV. We have now extended this analysis and review it in the context of all previous published analyses. This has been achieved by constructing a robust alignment between all flaviviruses using the published DRs and secondary RNA structures as "anchors" to reveal additional homologies along the 3'UTR. This approach identified nucleotide regions within the MBFV, NKV (no-known vector viruses), and NCFV 3'UTRs that are homologous to different LRSs in the TBFV 3'UTR and ORF. The analysis revealed that some of the DRs and secondary RNA structures described individually within each flavivirus group share common evolutionary origins. The 3'UTR of flaviviruses, and possibly the ORF, therefore probably evolved through multiple duplication of an RNA domain, homologous to the LRS previously identified only in the TBFV. The short DRs in all virus groups appear to represent the evolutionary remnants of these domains rather than resulting from new duplications. The relevance of these flavivirus DRs to evolution, diversity, 3'UTR enhancer function, and virus transmission is reviewed.
Direct repeats in the flavivirus 3' untranslated region; a strategy for survival in the environment?
Resumo:
Previously, direct repeats (DRs) of 20-70 nucleotides were identified in the 3' untranslated regions (3'UTR) of flavivirus sequences. To address their functional significance, we have manually generated a pan-flavivirus 3'UTR alignment and correlated it with the corresponding predicted RNA secondary structures. This approach revealed that intra-group-conserved DRs evolved from six long repeated sequences (LRSs) which, as approximately 200-nucleotide domains were preserved only in the genomes of the slowly evolving tick-borne flaviviruses. We propose that short DRs represent the evolutionary remnants of LRSs rather than distinct molecular duplications. The relevance of DRs to virus replication enhancer function, and thus survival, is discussed.
Resumo:
Background: Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results: The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion: MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.