11 resultados para Markov chains hidden Markov models Viterbi algorithm Forward-Backward algorithm maximum likelihood
em National Center for Biotechnology Information - NCBI
Resumo:
Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes.
Resumo:
We propose a general procedure for solving incomplete data estimation problems. The procedure can be used to find the maximum likelihood estimate or to solve estimating equations in difficult cases such as estimation with the censored or truncated regression model, the nonlinear structural measurement error model, and the random effects model. The procedure is based on the general principle of stochastic approximation and the Markov chain Monte-Carlo method. Applying the theory on adaptive algorithms, we derive conditions under which the proposed procedure converges. Simulation studies also indicate that the proposed procedure consistently converges to the maximum likelihood estimate for the structural measurement error logistic regression model.
Resumo:
Speech recognition involves three processes: extraction of acoustic indices from the speech signal, estimation of the probability that the observed index string was caused by a hypothesized utterance segment, and determination of the recognized utterance via a search among hypothesized alternatives. This paper is not concerned with the first process. Estimation of the probability of an index string involves a model of index production by any given utterance segment (e.g., a word). Hidden Markov models (HMMs) are used for this purpose [Makhoul, J. & Schwartz, R. (1995) Proc. Natl. Acad. Sci. USA 92, 9956-9963]. Their parameters are state transition probabilities and output probability distributions associated with the transitions. The Baum algorithm that obtains the values of these parameters from speech data via their successive reestimation will be described in this paper. The recognizer wishes to find the most probable utterance that could have caused the observed acoustic index string. That probability is the product of two factors: the probability that the utterance will produce the string and the probability that the speaker will wish to produce the utterance (the language model probability). Even if the vocabulary size is moderate, it is impossible to search for the utterance exhaustively. One practical algorithm is described [Viterbi, A. J. (1967) IEEE Trans. Inf. Theory IT-13, 260-267] that, given the index string, has a high likelihood of finding the most probable utterance.
Resumo:
Nuclear receptors regulate metabolic pathways in response to changes in the environment by appropriate alterations in gene expression of key metabolic enzymes. Here, a computational search approach based on iteratively built hidden Markov models of nuclear receptors was used to identify a human nuclear receptor, termed hPAR, that is expressed in liver and intestines. hPAR was found to be efficiently activated by pregnanes and by clinically used drugs including rifampicin, an antibiotic known to selectively induce human but not murine CYP3A expression. The CYP3A drug-metabolizing enzymes are expressed in gut and liver in response to environmental chemicals and clinically used drugs. Interestingly, hPAR is not activated by pregnenolone 16α-carbonitrile, which is a potent inducer of murine CYP3A genes and an activator of the mouse receptor PXR.1. Furthermore, hPAR was found to bind to and trans-activate through a conserved regulatory sequence present in human but not murine CYP3A genes. These results provide evidence that hPAR and PXR.1 may represent orthologous genes from different species that have evolved to regulate overlapping target genes in response to pharmacologically distinct CYP3A activators, and have potential implications for the in vitro identification of drug interactions important to humans.
Resumo:
Signature databases are vital tools for identifying distant relationships in novel sequences and hence for inferring protein function. InterPro is an integrated documentation resource for protein families, domains and functional sites, which amalgamates the efforts of the PROSITE, PRINTS, Pfam and ProDom database projects. Each InterPro entry includes a functional description, annotation, literature references and links back to the relevant member database(s). Release 2.0 of InterPro (October 2000) contains over 3000 entries, representing families, domains, repeats and sites of post-translational modification encoded by a total of 6804 different regular expressions, profiles, fingerprints and Hidden Markov Models. Each InterPro entry lists all the matches against SWISS-PROT and TrEMBL (more than 1 000 000 hits from 462 500 proteins in SWISS-PROT and TrEMBL). The database is accessible for text- and sequence-based searches at http://www.ebi.ac.uk/interpro/. Questions can be emailed to interhelp@ebi.ac.uk.
Resumo:
TIGRFAMs is a collection of protein families featuring curated multiple sequence alignments, hidden Markov models and associated information designed to support the automated functional identification of proteins by sequence homology. We introduce the term ‘equivalog’ to describe members of a set of homologous proteins that are conserved with respect to function since their last common ancestor. Related proteins are grouped into equivalog families where possible, and otherwise into protein families with other hierarchically defined homology types. TIGRFAMs currently contains over 800 protein families, available for searching or downloading at www.tigr.org/TIGRFAMs. Classification by equivalog family, where achievable, complements classification by orthology, superfamily, domain or motif. It provides the information best suited for automatic assignment of specific functions to proteins from large-scale genome sequencing projects.
Resumo:
A maximum likelihood estimator based on the coalescent for unequal migration rates and different subpopulation sizes is developed. The method uses a Markov chain Monte Carlo approach to investigate possible genealogies with branch lengths and with migration events. Properties of the new method are shown by using simulated data from a four-population n-island model and a source–sink population model. Our estimation method as coded in migrate is tested against genetree; both programs deliver a very similar likelihood surface. The algorithm converges to the estimates fairly quickly, even when the Markov chain is started from unfavorable parameters. The method was used to estimate gene flow in the Nile valley by using mtDNA data from three human populations.
Resumo:
Recently, the target function for crystallographic refinement has been improved through a maximum likelihood analysis, which makes proper allowance for the effects of data quality, model errors, and incompleteness. The maximum likelihood target reduces the significance of false local minima during the refinement process, but it does not completely eliminate them, necessitating the use of stochastic optimization methods such as simulated annealing for poor initial models. It is shown that the combination of maximum likelihood with cross-validation, which reduces overfitting, and simulated annealing by torsion angle molecular dynamics, which simplifies the conformational search problem, results in a major improvement of the radius of convergence of refinement and the accuracy of the refined structure. Torsion angle molecular dynamics and the maximum likelihood target function interact synergistically, the combination of both methods being significantly more powerful than each method individually. This is demonstrated in realistic test cases at two typical minimum Bragg spacings (dmin = 2.0 and 2.8 Å, respectively), illustrating the broad applicability of the combined method. In an application to the refinement of a new crystal structure, the combined method automatically corrected a mistraced loop in a poor initial model, moving the backbone by 4 Å.
Resumo:
A maximum likelihood approach of half tetrad analysis (HTA) based on multiple restriction fragment length polymorphism (RFLP) markers was developed. This procedure estimates the relative frequencies of 2n gametes produced by mechanisms genetically equivalent to first division restitution (FDR) or second division restitution and simultaneously locates the centromere within a linkage group of RFLP marker loci. The method was applied to the diploid alfalfa clone PG-F9 (2n = 2x = 16) previously selected because of its high frequency of 2n egg production. HTA was based on four RFLP loci for which PG-F9 was heterozygous with codominant alleles that were absent in the tetraploid tester. Models including three linked and one unlinked RFLP loci were developed and tested. Results of the HTA showed that PG-F9 produced 6% FDR and 94% second division restitution 2n eggs. Information from a marker locus belonging to one linkage group was used to more precisely locate the centromere on a different linkage group. HTA, together with previous cytological analysis, indicated that in PG-F9, FDR 2n eggs are likely produced by diplospory, a mechanism common among apomictic species. The occurrence of FDR 2n eggs in plant species and their importance for crop evolution and breeding is discussed together with the potential applicability of multilocus HTA in the study of reproductive mutants.
Resumo:
The reconstruction of multitaxon trees from molecular sequences is confounded by the variety of algorithms and criteria used to evaluate trees, making it difficult to compare the results of different analyses. A global method of multitaxon phylogenetic reconstruction described here, Bootstrappers Gambit, can be used with any four-taxon algorithm, including distance, maximum likelihood, and parsimony methods. It incorporates a Bayesian-Jeffreys'-bootstrap analysis to provide a uniform probability-based criterion for comparing the results from diverse algorithms. To examine the usefulness of the method, the origin of the eukaryotes has been investigated by the analysis of ribosomal small subunit RNA sequences. Three common algorithms (paralinear distances, Jukes-Cantor distances, and Kimura distances) support the eocyte topology, whereas one (maximum parsimony) supports the archaebacterial topology, suggesting that the eocyte prokaryotes are the closest prokaryotic relatives of the eukaryotes.
Resumo:
A large recombinant inbred population of soybean has been characterized for 220 restriction fragment-length polymorphism (RFLP) markers. Values for agronomic traits also have been measured. Quantitative trait loci (QTL) for height, yield, and maturity were located by their linkage to RFLP markers. QTL controlling large amounts of trait variation were analyzed for the dependence of trait variation on particular alleles at a second locus by comparing cumulative distributions of the trait for each genotype (four genotypes per pair of loci). Interesting pairs of loci were analyzed statistically with maximum likelihood and Monte Carlo comparison of additive and epistatic models. For each locus affecting height, variation was conditional upon the presence of a particular allele at a second unlinked locus that itself explained little or no trait variation. The results show that interactions between QTL are frequent and control large effects. Interactions distinguished between different QTL in a single linkage group and between QTL that affect different traits closely linked to one RFLP marker--i.e., distinguished between pleiotropy and closely linked genes. The implications for the evolution of inbreeding plants and for the construction of agronomic breeding strategies are discussed.