4 resultados para phylogenetic tree
em DigitalCommons@The Texas Medical Center
Resumo:
(1) A mathematical theory for computing the probabilities of various nucleotide configurations is developed, and the probability of obtaining the correct phylogenetic tree (model tree) from sequence data is evaluated for six phylogenetic tree-making methods (UPGMA, distance Wagner method, transformed distance method, Fitch-Margoliash's method, maximum parsimony method, and compatibility method). The number of nucleotides (m*) necessary to obtain the correct tree with a probability of 95% is estimated with special reference to the human, chimpanzee, and gorilla divergence. m* is at least 4,200, but the availability of outgroup species greatly reduces m* for all methods except UPGMA. m* increases if transitions occur more frequently than transversions as in the case of mitochondrial DNA. (2) A new tree-making method called the neighbor-joining method is proposed. This method is applicable either for distance data or character state data. Computer simulation has shown that the neighbor-joining method is generally better than UPGMA, Farris' method, Li's method, and modified Farris method on recovering the true topology when distance data are used. A related method, the simultaneous partitioning method, is also discussed. (3) The maximum likelihood (ML) method for phylogeny reconstruction under the assumption of both constant and varying evolutionary rates is studied, and a new algorithm for obtaining the ML tree is presented. This method gives a tree similar to that obtained by UPGMA when constant evolutionary rate is assumed, whereas it gives a tree similar to that obtained by the maximum parsimony tree and the neighbor-joining method when varying evolutionary rate is assumed. ^
Resumo:
Historically morphological features were used as the primary means to classify organisms. However, the age of molecular genetics has allowed us to approach this field from the perspective of the organism's genetic code. Early work used highly conserved sequences, such as ribosomal RNA. The increasing number of complete genomes in the public data repositories provides the opportunity to look not only at a single gene, but at organisms' entire parts list. ^ Here the Sequence Comparison Index (SCI) and the Organism Comparison Index (OCI), algorithms and methods to compare proteins and proteomes, are presented. The complete proteomes of 104 sequenced organisms were compared. Over 280 million full Smith-Waterman alignments were performed on sequence pairs which had a reasonable expectation of being related. From these alignments a whole proteome phylogenetic tree was constructed. This method was also used to compare the small subunit (SSU) rRNA from each organism and a tree constructed from these results. The SSU rRNA tree by the SCI/OCI method looks very much like accepted SSU rRNA trees from sources such as the Ribosomal Database Project, thus validating the method. The SCI/OCI proteome tree showed a number of small but significant differences when compared to the SSU rRNA tree and proteome trees constructed by other methods. Horizontal gene transfer does not appear to affect the SCI/OCI trees until the transferred genes make up a large portion of the proteome. ^ As part of this work, the Database of Related Local Alignments (DaRLA) was created and contains over 81 million rows of sequence alignment information. DaRLA, while primarily used to build the whole proteome trees, can also be applied shared gene content analysis, gene order analysis, and creating individual protein trees. ^ Finally, the standard BLAST method for analyzing shared gene content was compared to the SCI method using 4 spirochetes. The SCI system performed flawlessly, finding all proteins from one organism against itself and finding all the ribosomal proteins between organisms. The BLAST system missed some proteins from its respective organism and failed to detect small ribosomal proteins between organisms. ^
Resumo:
Normal humans have one red and at least one green visual pigment genes. These genes are tightly linked as tandem repeats on the X chromosome and each of them has six exons. There is only one X-linked visual pigment gene in New World monkeys (NWMs) but the locus has three polymorphic alleles encoding red, yellow and green visual pigments, respectively. The spectral properties of the squirrel monkey and the marmoset (both NWMs) have been studied and partial sequences of the three alleles are available. To study the evolutionary history of these X-linked opsin genes in humans and NWMs, coding and intron sequences of the three squirrel monkey alleles and the three marmoset alleles were amplified by PCR followed by subcloning and sequencing. Introns 2 and 4 of the human red and green pigment genes were also sequenced. The results obtained are as follows: (1) The sequences of introns 2 and 4 of the human red and green opsin genes are significantly more similar between the two genes than are coding sequences, contrary to the usual situation where coding regions are better conserved in evolution than are introns. The high similarities in the two introns are probably due to recent gene conversion events during evolution of the human lineage. (2) Phylogenetic analysis of both intron and exon sequences indicates that the phylogenetic tree of the available primate opsin genes is the same as the species tree. The two human genes were derived from a gene duplication event after the divergence of the human and NWM lineages. The three alleles in each of the two NWM species diverged after the split of the two NWMs but have persisted in the population for at least 5 million years. (3) Allelic gene conversion might have occurred between the three squirrel monkey alleles. (4) A model of additive effect of hydroxyl-bearing amino acids on spectral tuning is proposed by treating some unknown variables as groups. Under the assumption that some residues have no effect, it is found that at least five amino acid residues, at positions 178 (3 nm), 180 (5 nm), 230 ($-$4 nm), 277 (9 nm) and 285 (13 nm), have linear spectral tuning effects. (5) Adaptive evolution of the opsin genes to different spectral peaks was observed at four residues that are important for spectral tuning. ^
Resumo:
Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^