26 resultados para Markov chains hidden Markov models Viterbi algorithm Forward-Backward algorithm maximum likelihood
em University of Queensland eSpace - Australia
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
Wurst is a protein threading program with an emphasis on high quality sequence to structure alignments (http://www.zbh.uni-hamburg.de/wurst). Submitted sequences are aligned to each of about 3000 templates with a conventional dynamic programming algorithm, but using a score function with sophisticated structure and sequence terms. The structure terms are a log-odds probability of sequence to structure fragment compatibility, obtained from a Bayesian classification procedure. A simplex optimization was used to optimize the sequence-based terms for the goal of alignment and model quality and to balance the sequence and structural contributions against each other. Both sequence and structural terms operate with sequence profiles.
Resumo:
This paper has three primary aims: to establish an effective means for modelling mainland-island metapopulations inhabiting a dynamic landscape: to investigate the effect of immigration and dynamic changes in habitat on metapopulation patch occupancy dynamics; and to illustrate the implications of our results for decision-making and population management. We first extend the mainland-island metapopulation model of Alonso and McKane [Bull. Math. Biol. 64:913-958,2002] to incorporate a dynamic landscape. It is shown, for both the static and the dynamic landscape models, that a suitably scaled version of the process converges to a unique deterministic model as the size of the system becomes large. We also establish that. under quite general conditions, the density of occupied patches, and the densities of suitable and occupied patches, for the respective models, have approximate normal distributions. Our results not only provide us with estimates for the means and variances that are valid at all stages in the evolution of the population, but also provide a tool for fitting the models to real metapopulations. We discuss the effect of immigration and habitat dynamics on metapopulations, showing that mainland-like patches heavily influence metapopulation persistence, and we argue for adopting measures to increase connectivity between this large patch and the other island-like patches. We illustrate our results with specific reference to examples of populations of butterfly and the grasshopper Bryodema tuberculata.
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
Promiscuous human leukocyte antigen (HLA) binding peptides are ideal targets for vaccine development. Existing computational models for prediction of promiscuous peptides used hidden Markov models and artificial neural networks as prediction algorithms. We report a system based on support vector machines that outperforms previously published methods. Preliminary testing showed that it can predict peptides binding to HLA-A2 and -A3 super-type molecules with excellent accuracy, even for molecules where no binding data are currently available.
Resumo:
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules ( proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability ( area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets termed T-cell epitope hotspots. MULTIPRED is available at http:// antigen.i2r.a-star.edu.sg/ multipred/.
Resumo:
A generic method for the estimation of parameters for Stochastic Ordinary Differential Equations (SODEs) is introduced and developed. This algorithm, called the GePERs method, utilises a genetic optimisation algorithm to minimise a stochastic objective function based on the Kolmogorov-Smirnov statistic. Numerical simulations are utilised to form the KS statistic. Further, the examination of some of the factors that improve the precision of the estimates is conducted. This method is used to estimate parameters of diffusion equations and jump-diffusion equations. It is also applied to the problem of model selection for the Queensland electricity market. (C) 2003 Elsevier B.V. All rights reserved.
Resumo:
Euastacus crayfish are endemic to freshwater ecosystems of the eastern coast of Australia. While recent evolutionary studies have focused on a few of these species, here we provide a comprehensive phylogenetic estimate of relationships among the species within the genus. We sequenced three mitochondrial gene regions (COI, 16S, and 12S) and one nuclear region (28S) from 40 species of the genus Euastacus, as well as one undescribed species. Using these data, we estimated the phylogenetic relationships within the genus using maximum-likelihood, parsimony, and Bayesian Markov Chain Monte Carlo analyses. Using Bayes factors to test different model hypotheses, we found that the best phylogeny supports monophyletic groupings of all but two recognized species and suggests a widespread ancestor that diverged by vicariance. We also show that Eitastacus and Astacopsis are most likely monophyletic sister genera. We use the resulting phylogeny as a framework to test biogeographic hypotheses relating to the diversification of the genus. (c) 2005 Elsevier Inc. All rights reserved.
Finite mixture regression model with random effects: application to neonatal hospital length of stay
Resumo:
A two-component mixture regression model that allows simultaneously for heterogeneity and dependency among observations is proposed. By specifying random effects explicitly in the linear predictor of the mixture probability and the mixture components, parameter estimation is achieved by maximising the corresponding best linear unbiased prediction type log-likelihood. Approximate residual maximum likelihood estimates are obtained via an EM algorithm in the manner of generalised linear mixed model (GLMM). The method can be extended to a g-component mixture regression model with the component density from the exponential family, leading to the development of the class of finite mixture GLMM. For illustration, the method is applied to analyse neonatal length of stay (LOS). It is shown that identification of pertinent factors that influence hospital LOS can provide important information for health care planning and resource allocation. (C) 2002 Elsevier Science B.V. All rights reserved.
Resumo:
The expectation-maximization (EM) algorithm has been of considerable interest in recent years as the basis for various algorithms in application areas of neural networks such as pattern recognition. However, there exists some misconceptions concerning its application to neural networks. In this paper, we clarify these misconceptions and consider how the EM algorithm can be adopted to train multilayer perceptron (MLP) and mixture of experts (ME) networks in applications to multiclass classification. We identify some situations where the application of the EM algorithm to train MLP networks may be of limited value and discuss some ways of handling the difficulties. For ME networks, it is reported in the literature that networks trained by the EM algorithm using iteratively reweighted least squares (IRLS) algorithm in the inner loop of the M-step, often performed poorly in multiclass classification. However, we found that the convergence of the IRLS algorithm is stable and that the log likelihood is monotonic increasing when a learning rate smaller than one is adopted. Also, we propose the use of an expectation-conditional maximization (ECM) algorithm to train ME networks. Its performance is demonstrated to be superior to the IRLS algorithm on some simulated and real data sets.
Resumo:
Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.
Resumo:
The Wet Tropics World Heritage Area in Far North Queens- land, Australia consists predominantly of tropical rainforest and wet sclerophyll forest in areas of variable relief. Previous maps of vegetation communities in the area were produced by a labor-intensive combination of field survey and air-photo interpretation. Thus,. the aim of this work was to develop a new vegetation mapping method based on imaging radar that incorporates topographical corrections, which could be repeated frequently, and which would reduce the need for detailed field assessments and associated costs. The method employed G topographic correction and mapping procedure that was developed to enable vegetation structural classes to be mapped from satellite imaging radar. Eight JERS-1 scenes covering the Wet Tropics area for 1996 were acquired from NASDA under the auspices of the Global Rainforest Mapping Project. JERS scenes were geometrically corrected for topographic distortion using an 80 m DEM and a combination of polynomial warping and radar viewing geometry modeling. An image mosaic was created to cover the Wet Tropics region, and a new technique for image smoothing was applied to the JERS texture bonds and DEM before a Maximum Likelihood classification was applied to identify major land-cover and vegetation communities. Despite these efforts, dominant vegetation community classes could only be classified to low levels of accuracy (57.5 percent) which were partly explained by the significantly larger pixel size of the DEM in comparison to the JERS image (12.5 m). In addition, the spatial and floristic detail contained in the classes of the original validation maps were much finer than the JERS classification product was able to distinguish. In comparison to field and aerial photo-based approaches for mapping the vegetation of the Wet Tropics, appropriately corrected SAR data provides a more regional scale, all-weather mapping technique for broader vegetation classes. Further work is required to establish an appropriate combination of imaging radar with elevation data and other environmental surrogates to accurately map vegetation communities across the entire Wet Tropics.
Resumo:
The purpose of this work was to model lung cancer mortality as a function of past exposure to tobacco and to forecast age-sex-specific lung cancer mortality rates. A 3-factor age-period-cohort (APC) model, in which the period variable is replaced by the product of average tar content and adult tobacco consumption per capita, was estimated for the US, UK, Canada and Australia by the maximum likelihood method. Age- and sex-specific tobacco consumption was estimated from historical data on smoking prevalence and total tobacco consumption. Lung cancer mortality was derived from vital registration records. Future tobacco consumption, tar content and the cohort parameter were projected by autoregressive moving average (ARIMA) estimation. The optimal exposure variable was found to be the product of average tar content and adult cigarette consumption per capita, lagged for 2530 years for both males and females in all 4 countries. The coefficient of the product of average tar content and tobacco consumption per capita differs by age and sex. In all models, there was a statistically significant difference in the coefficient of the period variable by sex. In all countries, male age-standardized lung cancer mortality rates peaked in the 1980s and declined thereafter. Female mortality rates are projected to peak in the first decade of this century. The multiplicative models of age, tobacco exposure and cohort fit the observed data between 1950 and 1999 reasonably well, and time-series models yield plausible past trends of relevant variables. Despite a significant reduction in tobacco consumption and average tar content of cigarettes sold over the past few decades, the effect on lung cancer mortality is affected by the time lag between exposure and established disease. As a result, the burden of lung cancer among females is only just reaching, or soon will reach, its peak but has been declining for I to 2 decades in men. Future sex differences in lung cancer mortality are likely to be greater in North America than Australia and the UK due to differences in exposure patterns between the sexes. (c) 2005 Wiley-Liss, Inc.