18 resultados para hidden semi markov models
em University of Queensland eSpace - Australia
Resumo:
The chromodomain is 40-50 amino acids in length and is conserved in a wide range of chromatic and regulatory proteins involved in chromatin remodeling. Chromodomain-containing proteins can be classified into families based on their broader characteristics, in particular the presence of other types of domains, and which correlate with different subclasses of the chromodomains themselves. Hidden Markov model (HMM)-generated profiles of different subclasses of chromodomains were used here to identify sequences encoding chromodomain-containing proteins in the mouse transcriptome and genome. A total of 36 different loci encoding proteins containing chromodomains, including 17 novel loci, were identified. Six of these loci (including three apparent pseudogenes, a novel HP1 ortholog, and two novel Msl-3 transcription factor-like proteins) are not present in the human genome, whereas the human genome contains four loci (two CDY orthologs and two apparent CDY pseuclogenes) that are not present in mouse. A number of these loci exhibit alternative splicing to produce different isoforms, including 43 novel variants, some of which lack the chromodomain. The likely functions of these proteins are discussed in relation to the known functions of other chromodomain-containing proteins within the same family.
Resumo:
Wurst is a protein threading program with an emphasis on high quality sequence to structure alignments (http://www.zbh.uni-hamburg.de/wurst). Submitted sequences are aligned to each of about 3000 templates with a conventional dynamic programming algorithm, but using a score function with sophisticated structure and sequence terms. The structure terms are a log-odds probability of sequence to structure fragment compatibility, obtained from a Bayesian classification procedure. A simplex optimization was used to optimize the sequence-based terms for the goal of alignment and model quality and to balance the sequence and structural contributions against each other. Both sequence and structural terms operate with sequence profiles.
Resumo:
Promiscuous human leukocyte antigen (HLA) binding peptides are ideal targets for vaccine development. Existing computational models for prediction of promiscuous peptides used hidden Markov models and artificial neural networks as prediction algorithms. We report a system based on support vector machines that outperforms previously published methods. Preliminary testing showed that it can predict peptides binding to HLA-A2 and -A3 super-type molecules with excellent accuracy, even for molecules where no binding data are currently available.
Resumo:
MULTIPRED is a web-based computational system for the prediction of peptide binding to multiple molecules ( proteins) belonging to human leukocyte antigens (HLA) class I A2, A3 and class II DR supertypes. It uses hidden Markov models and artificial neural network methods as predictive engines. A novel data representation method enables MULTIPRED to predict peptides that promiscuously bind multiple HLA alleles within one HLA supertype. Extensive testing was performed for validation of the prediction models. Testing results show that MULTIPRED is both sensitive and specific and it has good predictive ability ( area under the receiver operating characteristic curve A(ROC) > 0.80). MULTIPRED can be used for the mapping of promiscuous T-cell epitopes as well as the regions of high concentration of these targets termed T-cell epitope hotspots. MULTIPRED is available at http:// antigen.i2r.a-star.edu.sg/ multipred/.
Resumo:
We present new measurements of the luminosity function (LF) of luminous red galaxies (LRGs) from the Sloan Digital Sky Survey (SDSS) and the 2dF SDSS LRG and Quasar (2SLAQ) survey. We have carefully quantified, and corrected for, uncertainties in the K and evolutionary corrections, differences in the colour selection methods, and the effects of photometric errors, thus ensuring we are studying the same galaxy population in both surveys. Using a limited subset of 6326 SDSS LRGs (with 0.17 < z < 0.24) and 1725 2SLAQ LRGs (with 0.5 < z < 0.6), for which the matching colour selection is most reliable, we find no evidence for any additional evolution in the LRG LF, over this redshift range, beyond that expected from a simple passive evolution model. This lack of additional evolution is quantified using the comoving luminosity density of SDSS and 2SLAQ LRGs, brighter than M-0.2r - 5 log h(0.7) = - 22.5, which are 2.51 +/- 0.03 x 10(-7) L circle dot Mpc(-3) and 2.44 +/- 0.15 x 10(-7) L circle dot Mpc(-3), respectively (< 10 per cent uncertainty). We compare our LFs to the COMBO-17 data and find excellent agreement over the same redshift range. Together, these surveys show no evidence for additional evolution (beyond passive) in the LF of LRGs brighter than M-0.2r - 5 log h(0.7) = - 21 ( or brighter than similar to L-*).. We test our SDSS and 2SLAQ LFs against a simple 'dry merger' model for the evolution of massive red galaxies and find that at least half of the LRGs at z similar or equal to 0.2 must already have been well assembled (with more than half their stellar mass) by z similar or equal to 0.6. This limit is barely consistent with recent results from semi-analytical models of galaxy evolution.
Resumo:
We present a detailed investigation into the recent star formation histories of 5697 luminous red galaxies (LRGs) based on the H delta (4101 angstrom), and [O II] (3727 angstrom) lines and the D4000 index. LRGs are luminous (L > 3L*) galaxies which have been selected to have photometric properties consistent with an old, passively evolving stellar population. For this study, we utilize LRGs from the recently completed 2dF-SDSS LRG and QSO Survey (2SLAQ). Equivalent widths of the H delta and [O II] lines are measured and used to define three spectral types, those with only strong H delta absorption (k+a), those with strong [O II] in emission (em) and those with both (em+a). All other LRGs are considered to have passive star formation histories. The vast majority of LRGs are found to be passive (similar to 80 per cent); however, significant numbers of k+a (2.7 per cent), em+a (1.2 per cent) and em LRGs (8.6 per cent) are identified. An investigation into the redshift dependence of the fractions is also performed. A sample of SDSS MAIN galaxies with colours and luminosities consistent with the 2SLAQ LRGs is selected to provide a low-redshift comparison. While the em and em+a fractions are consistent with the low-redshift SDSS sample, the fraction of k+a LRGs is found to increase significantly with redshift. This result is interpreted as an indication of an increasing amount of recent star formation activity in LRGs with redshift. By considering the expected lifetime of the k+a phase, the number of LRGs which will undergo a k+a phase can be estimated. A crude comparison of this estimate with the predictions from semi-analytic models of galaxy formation shows that the predicted level of k+a and em+a activities is not sufficient to reconcile the predicted mass growth for massive early types in a hierarchical merging scenario.
Resumo:
New high-precision niobium (Nb) and tantalum (Ta) concentration data are presented for early Archaean metabasalts, metabasaltic komatiites and their erosion products (mafic metapelites) from SW Greenland and the Acasta gneiss complex, Canada. Individual datasets consistently show sub-chondritic Nb/Ta ratios averaging 15.1+/-11.6. This finding is discussed with regard to two competing models for the solution of the Nb-deficit that characterises the accessible Earth. Firstly, we test whether Nb could have sequestered into the core due to its slightly siderophile (or chalcophile) character under very reducing conditions, as recently proposed from experimental evidence. We demonstrate that troilite inclusions of the Canyon Diablo iron meteorite have Nb and V concentrations in excess of typical chondrites but that the metal phase of the Grant, Toluca and Canyon Diablo iron meteorites do not have significant concentrations of these lithophile elements. We find that if the entire accessible Earth Nb-deficit were explained by Nb in the core, only ca. 17% of the mantle could be depleted and that by 3.7 Ga, continental crust would have already achieved ca. 50% of its present mass. Nb/Ta systematics of late Archaean metabasalts compiled from the literature would further require that by 2.5 Ga, 90% of the present mass of continental crust was already in existence. As an alternative to this explanation, we propose that the average Nb/Ta ratio (15.1+/-11.6) of Earth's oldest mafic rocks is a valid approximation for bulk silicate Earth. This would require that ca. 13% of the terrestrial Nb resided in the Ta-free core. Since the partitioning of Nb between silicate and metal melts depends largely on oxygen fugacity and pressure, this finding could mean that metal/silicate segregation did not occur at the base of a deep magma ocean or that the early mantle was slightly less reducing than generally assumed. A bulk silicate Earth Nb/Ta ratio of 15.1 allows for depletion of up to 40% of the total mantle. This could indicate that in addition to the upper mantle, a portion of the lower mantle is depleted also, or if only the upper mantle were depleted, an additional hidden high Nb/Ta reservoir must exist. Comparison of Nb/Ta systematics between early and late Archaean metabasalts supports the latter idea and indicates deeply subducted high Nb/Ta eclogite slabs could reside in the mantle transition zone or the lower mantle. Accumulation of such slabs appears to have commenced between 2.5 and 2.0 Ga. Regardless of these complexities of terrestrial Nb/Ta systematics, it is shown that the depleted mantle Nb/Th ratio is a very robust proxy for the amount of extracted continental crust, because the temporal evolution of this ratio is dominated by Th-loss to the continents and not Nb-retention in the mantle. We present a new parameterisation of the continental crust volume versus age curve that specifically explores the possibility of lithophile element loss to the core and storage of eclogite slabs in the transition zone. (C) 2003 Elsevier Science B.V. All rights reserved.
Resumo:
A recent development of the Markov chain Monte Carlo (MCMC) technique is the emergence of MCMC samplers that allow transitions between different models. Such samplers make possible a range of computational tasks involving models, including model selection, model evaluation, model averaging and hypothesis testing. An example of this type of sampler is the reversible jump MCMC sampler, which is a generalization of the Metropolis-Hastings algorithm. Here, we present a new MCMC sampler of this type. The new sampler is a generalization of the Gibbs sampler, but somewhat surprisingly, it also turns out to encompass as particular cases all of the well-known MCMC samplers, including those of Metropolis, Barker, and Hastings. Moreover, the new sampler generalizes the reversible jump MCMC. It therefore appears to be a very general framework for MCMC sampling. This paper describes the new sampler and illustrates its use in three applications in Computational Biology, specifically determination of consensus sequences, phylogenetic inference and delineation of isochores via multiple change-point analysis.
Resumo:
Mixture models implemented via the expectation-maximization (EM) algorithm are being increasingly used in a wide range of problems in pattern recognition such as image segmentation. However, the EM algorithm requires considerable computational time in its application to huge data sets such as a three-dimensional magnetic resonance (MR) image of over 10 million voxels. Recently, it was shown that a sparse, incremental version of the EM algorithm could improve its rate of convergence. In this paper, we show how this modified EM algorithm can be speeded up further by adopting a multiresolution kd-tree structure in performing the E-step. The proposed algorithm outperforms some other variants of the EM algorithm for segmenting MR images of the human brain. (C) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Resumo:
We investigate whether relative contributions of genetic and shared environmental factors are associated with an increased risk in melanoma. Data from the Queensland Familial Melanoma Project comprising 15,907 subjects arising from 1912 families were analyzed to estimate the additive genetic, common and unique environmental contributions to variation in the age at onset of melanoma. Two complementary approaches for analyzing correlated time-to-onset family data were considered: the generalized estimating equations (GEE) method in which one can estimate relationship-specific dependence simultaneously with regression coefficients that describe the average population response to changing covariates; and a subject-specific Bayesian mixed model in which heterogeneity in regression parameters is explicitly modeled and the different components of variation may be estimated directly. The proportional hazards and Weibull models were utilized, as both produce natural frameworks for estimating relative risks while adjusting for simultaneous effects of other covariates. A simple Markov Chain Monte Carlo method for covariate imputation of missing data was used and the actual implementation of the Bayesian model was based on Gibbs sampling using the free ware package BUGS. In addition, we also used a Bayesian model to investigate the relative contribution of genetic and environmental effects on the expression of naevi and freckles, which are known risk factors for melanoma.
A simulation model of cereal-legume intercropping systems for semi-arid regions I. Model development
Resumo:
Cereal-legume intercropping plays an important role in subsistence food production in developing countries, especially in situations of limited water resources. Crop simulation can be used to assess risk for intercrop productivity over time and space. In this study, a simple model for intercropping was developed for cereal and legume growth and yield, under semi-arid conditions. The model is based on radiation interception and use, and incorporates a water stress factor. Total dry matter and yield are functions of photosynthetically active radiation (PAR), the fraction of radiation intercepted and radiation use efficiency (RUE). One of two PAR sub-models was used to estimate PAR from solar radiation; either PAR is 50% of solar radiation or the ratio of PAR to solar radiation (PAR/SR) is a function of the clearness index (K-T). The fraction of radiation intercepted was calculated either based on Beer's Law with crop extinction coefficients (K) from field experiments or from previous reports. RUE was calculated as a function of available soil water to a depth of 900 mm (ASW). Either the soil water balance method or the decay curve approach was used to determine ASW. Thus, two alternatives for each of three factors, i.e., PAR/SR, K and ASW, were considered, giving eight possible models (2 methods x 3 factors). The model calibration and validation were carried out with maize-bean intercropping systems using data collected in a semi-arid region (Bloemfontein, Free State, South Africa) during seven growing seasons (1996/1997-2002/2003). The combination of PAR estimated from the clearness index, a crop extinction coefficient from the field experiment and the decay curve model gave the most reasonable and acceptable result. The intercrop model developed in this study is simple, so this modelling approach can be employed to develop other cereal-legume intercrop models for semi-arid regions. (c) 2004 Elsevier B.V. All rights reserved.
Resumo:
Let (Phi(t))(t is an element of R+) be a Harris ergodic continuous-time Markov process on a general state space, with invariant probability measure pi. We investigate the rates of convergence of the transition function P-t(x, (.)) to pi; specifically, we find conditions under which r(t) vertical bar vertical bar P-t (x, (.)) - pi vertical bar vertical bar -> 0 as t -> infinity, for suitable subgeometric rate functions r(t), where vertical bar vertical bar - vertical bar vertical bar denotes the usual total variation norm for a signed measure. We derive sufficient conditions for the convergence to hold, in terms of the existence of suitable points on which the first hitting time moments are bounded. In particular, for stochastically ordered Markov processes, explicit bounds on subgeometric rates of convergence are obtained. These results are illustrated in several examples.
Resumo:
We derive necessary and sufficient conditions for the existence of bounded or summable solutions to systems of linear equations associated with Markov chains. This substantially extends a famous result of G. E. H. Reuter, which provides a convenient means of checking various uniqueness criteria for birth-death processes. Our result allows chains with much more general transition structures to be accommodated. One application is to give a new proof of an important result of M. F. Chen concerning upwardly skip-free processes. We then use our generalization of Reuter's lemma to prove new results for downwardly skip-free chains, such as the Markov branching process and several of its many generalizations. This permits us to establish uniqueness criteria for several models, including the general birth, death, and catastrophe process, extended branching processes, and asymptotic birth-death processes, the latter being neither upwardly skip-free nor downwardly skip-free.
Resumo:
This paper has three primary aims: to establish an effective means for modelling mainland-island metapopulations inhabiting a dynamic landscape: to investigate the effect of immigration and dynamic changes in habitat on metapopulation patch occupancy dynamics; and to illustrate the implications of our results for decision-making and population management. We first extend the mainland-island metapopulation model of Alonso and McKane [Bull. Math. Biol. 64:913-958,2002] to incorporate a dynamic landscape. It is shown, for both the static and the dynamic landscape models, that a suitably scaled version of the process converges to a unique deterministic model as the size of the system becomes large. We also establish that. under quite general conditions, the density of occupied patches, and the densities of suitable and occupied patches, for the respective models, have approximate normal distributions. Our results not only provide us with estimates for the means and variances that are valid at all stages in the evolution of the population, but also provide a tool for fitting the models to real metapopulations. We discuss the effect of immigration and habitat dynamics on metapopulations, showing that mainland-like patches heavily influence metapopulation persistence, and we argue for adopting measures to increase connectivity between this large patch and the other island-like patches. We illustrate our results with specific reference to examples of populations of butterfly and the grasshopper Bryodema tuberculata.