50 resultados para Penalized maximum likelihood

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of fitting mixture densities to multivariate binned and truncated data. The EM approach proposed by McLachlan and Jones (Biometrics, 44: 2, 571-578, 1988) for the univariate case is generalized to multivariate measurements. The multivariate solution requires the evaluation of multidimensional integrals over each bin at each iteration of the EM procedure. Naive implementation of the procedure can lead to computationally inefficient results. To reduce the computational cost a number of straightforward numerical techniques are proposed. Results on simulated data indicate that the proposed methods can achieve significant computational gains with no loss in the accuracy of the final parameter estimates. Furthermore, experimental results suggest that with a sufficient number of bins and data points it is possible to estimate the true underlying density almost as well as if the data were not binned. The paper concludes with a brief description of an application of this approach to diagnosis of iron deficiency anemia, in the context of binned and truncated bivariate measurements of volume and hemoglobin concentration from an individual's red blood cells.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There has been a resurgence of interest in the mean trace length estimator of Pahl for window sampling of traces. The estimator has been dealt with by Mauldon and Zhang and Einstein in recent publications. The estimator is a very useful one in that it is non-parametric. However, despite some discussion regarding the statistical distribution of the estimator, none of the recent works or the original work by Pahl provide a rigorous basis for the determination a confidence interval for the estimator or a confidence region for the estimator and the corresponding estimator of trace spatial intensity in the sampling window. This paper shows, by consideration of a simplified version of the problem but without loss of generality, that the estimator is in fact the maximum likelihood estimator (MLE) and that it can be considered essentially unbiased. As the MLE, it possesses the least variance of all estimators and confidence intervals or regions should therefore be available through application of classical ML theory. It is shown that valid confidence intervals can in fact be determined. The results of the work and the calculations of the confidence intervals are illustrated by example. (C) 2003 Elsevier Science Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present a novel, maximum-likelihood (ML), lattice-decoding algorithm for noncoherent block detection of QAM signals. The computational complexity is polynomial in the block length; making it feasible for implementation compared with the exhaustive search ML detector. The algorithm works by enumerating the nearest neighbor regions for a plane defined by the received vector; in a conceptually similar manner to sphere decoding. Simulations show that the new algorithm significantly outperforms existing approaches

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A significant problem in the collection of responses to potentially sensitive questions, such as relating to illegal, immoral or embarrassing activities, is non-sampling error due to refusal to respond or false responses. Eichhorn & Hayre (1983) suggested the use of scrambled responses to reduce this form of bias. This paper considers a linear regression model in which the dependent variable is unobserved but for which the sum or product with a scrambling random variable of known distribution, is known. The performance of two likelihood-based estimators is investigated, namely of a Bayesian estimator achieved through a Markov chain Monte Carlo (MCMC) sampling scheme, and a classical maximum-likelihood estimator. These two estimators and an estimator suggested by Singh, Joarder & King (1996) are compared. Monte Carlo results show that the Bayesian estimator outperforms the classical estimators in almost all cases, and the relative performance of the Bayesian estimator improves as the responses become more scrambled.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In simultaneous analyses of multiple data partitions, the trees relevant when measuring support for a clade are the optimal tree, and the best tree lacking the clade (i.e., the most reasonable alternative). The parsimony-based method of partitioned branch support (PBS) forces each data set to arbitrate between the two relevant trees. This value is the amount each data set contributes to clade support in the combined analysis, and can be very different to support apparent in separate analyses. The approach used in PBS can also be employed in likelihood: a simultaneous analysis of all data retrieves the maximum likelihood tree, and the best tree without the clade of interest is also found. Each data set is fitted to the two trees and the log-likelihood difference calculated, giving partitioned likelihood support (PLS) for each data set. These calculations can be performed regardless of the complexity of the ML model adopted. The significance of PLS can be evaluated using a variety of resampling methods, such as the Kishino-Hasegawa test, the Shimodiara-Hasegawa test, or likelihood weights, although the appropriateness and assumptions of these tests remains debated.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Marine invertebrate sperm proteins are particularly interesting because they are characterized by positive selection and are likely to be involved in prezyogotic isolation and, thus, speciation. Here, we present the first survey of inter and intraspecific variation of a bivalve sperm protein among a group of species that regularly hybridize in nature. M7 lysin is found in sperm acrosomes of mussels and dissolves the egg vitelline coat, permitting fertilization. We sequenced multiple alleles of the mature protein-coding region of M7 lysin from allopatric populations of mussels in the Mytilus edulis species group (M. edulis, M. galloprovincialis, and M. trossulus). A significant McDonald-Kreitman test showed an excess of fixed amino acid replacing substitutions between species, consistent with positive selection. In addition, Kolmogorov-Smirnov tests showed significant heterogeneity in polymorphism to divergence ratios for both synonymous variation and combined synonymous and non-synonymous variation within M. galloprovincialis. These results indicate that there has been adaptive evolution at M7 lysin and, furthermore, shows that positive selection on sperm proteins can occur even when post-zygotic reproductive isolation is incomplete.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Most populations and some species of ticks of the genera Boophilus (5 spp.) and Rhipicephalus (ca. 75 spp.) cannot be distinguished phenotypically. Moreover, there is doubt about the validity of species in these genera. I studied the entire second internal transcribed spacer (ITS 2) rRNA of 16 populations of rhipicephaline ticks to address these problems: Boophilus,microplus from Australia, Kenya, South Africa and Brazil (4 populations); Boophilus decoloratus from Kenya; Rhipicephalus appendiculatus from Kenya, Zimbabwe and Zambia (7 populations); Rhipicephalus zambesiensis from Zimbabwe (3 populations); and Rhipicephalus evertsi from Kenya. Each of the 16 populations had a unique ITS 2, but most of the nucleotide variation occurred among species and genera. ITS 2 rRNA can be used to distinguish the populations and species of Boophilus and Rhipicephalus studied here. Little support was found for the hypothesis that B. microplus from Australia and South Africa are different species. ITS 2 appears useful for phylogenetic inference in the Rhipicephalinae because in genetic distance, maximum likelihood, and maximum parsimony analyses, most branches leading to species had >95% bootstrap support. Rhipicephalus appendiculatus and R, zambeziensis are closely related, yet their ITS 2 sequences could be distinguished unambiguously. This lends weight to a previous proposal that Rhipicephalus sanguineus and Rhipicephalus turanicus, and Rhipicephalus pumlilio and Rhipicephalus camicasi, respectively, are conspecific, because each of these pairs of species had identical sequences for ca. 250 bp of ITS 2 rRNA.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A mixture model for long-term survivors has been adopted in various fields such as biostatistics and criminology where some individuals may never experience the type of failure under study. It is directly applicable in situations where the only information available from follow-up on individuals who will never experience this type of failure is in the form of censored observations. In this paper, we consider a modification to the model so that it still applies in the case where during the follow-up period it becomes known that an individual will never experience failure from the cause of interest. Unless a model allows for this additional information, a consistent survival analysis will not be obtained. A partial maximum likelihood (ML) approach is proposed that preserves the simplicity of the long-term survival mixture model and provides consistent estimators of the quantities of interest. Some simulation experiments are performed to assess the efficiency of the partial ML approach relative to the full ML approach for survival in the presence of competing risks.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hemichordates were traditionally allied to the chordates, but recent molecular analyses have suggested that hemichordates are a sister group to the echinoderms, a relationship that has important consequences for the interpretation of the evolution of deuterostome body plans. However, the molecular phylogenetic analyses to date have not provided robust support for the hemichordate + echinoderm clade. We use a maximum likelihood framework, including the parametric bootstrap, to reanalyze DNA data from complete mitochondrial genomes and nuclear 18S rRNA. This approach provides the first statistically significant support for the hemichordate + echinoderm clade from molecular data. This grouping implies that the ancestral deuterostome had features that included an adult with a pharynx and a dorsal nerve cord and an indirectly developing dipleurula-like larva.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective: To measure prevalence and model incidence of HIV infection. Setting: 2013 consecutive pregnant women attending public sector antenatal clinics in 1997 in Hlabisa health district, South Africa. Historical seroprevalence data, 1992-1995. Methods: Serum remaining from syphilis testing was tested anonymously for antibodies to HIV to determine seroprevalence. Two models, allowing for differential mortality between HIV-positive and HIV-negative people, were used. The first used serial seroprevalence data to estimate trends in annual incidence. The second, a maximum likelihood model, took account of changing force of infection and age-dependent risk of infection, to estimate age-specific HIV incidence in 1997. Multiple logistic regression provided adjusted odds ratios (OR) for risk factors for prevalent HIV infection. Results: Estimated annual HIV incidence increased from 4% in 1992/1993 to 10% in 1996/1997. In 1997, highest age-specific incidence was 16% among women aged between 20 and 24 years. in 1997, overall prevalence was 26% (95% confidence interval [CI], 24%-28%) and at 34% was highest among women aged between 20 and 24 years. Young age (<30 years; odds ratio [OR], 2.1; p = .001), unmarried status (OR 2.2; p = .001) and living in less remote parts of the district (OR 1.5; p = .002) were associated with HIV prevalence in univariate analysis. Associations were less strong in multivariate analysis. Partner's migration status was not associated with HIV infection. Substantial heterogeneity of HIV prevalence by clinic was observed (range 17%-31%; test for trend, p = .001). Conclusions: This community is experiencing an explosive HIV epidemic. Young, single women in the more developed parts of the district would form an appropriate cohort to test, and benefit from, interventions such as vaginal microbicides and HIV vaccines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consider a more robust approach by modelling the data by a mixture of t distributions. The use of the ECM algorithm to fit this t mixture model is described and examples of its use are given in the context of clustering multivariate data in the presence of atypical observations in the form of background noise.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a method of estimating HIV incidence rates in epidemic situations from data on age-specific prevalence and changes in the overall prevalence over time. The method is applied to women attending antenatal clinics in Hlabisa, a rural district of KwaZulu/Natal, South Africa, where transmission of HIV is overwhelmingly through heterosexual contact. A model which gives age-specific prevalence rates in the presence of a progressing epidemic is fitted to prevalence data for 1998 using maximum likelihood methods and used to derive the age-specific incidence. Error estimates are obtained using a Monte Carlo procedure. Although the method is quite general some simplifying assumptions are made concerning the form of the risk function and sensitivity analyses are performed to explore the importance of these assumptions. The analysis shows that in 1998 the annual incidence of infection per susceptible woman increased from 5.4 per cent (3.3-8.5 per cent; here and elsewhere ranges give 95 per cent confidence limits) at age 15 years to 24.5 per cent (20.6-29.1 per cent) at age 22 years and declined to 1.3 per cent (0.5-2.9 per cent) at age 50 years; standardized to a uniform age distribution, the overall incidence per susceptible woman aged 15 to 59 was 11.4 per cent (10.0-13.1 per cent); per women in the population it was 8.4 per cent (7.3-9.5 per cent). Standardized to the age distribution of the female population the average incidence per woman was 9.6 per cent (8.4-11.0 per cent); standardized to the age distribution of women attending antenatal clinics, it was 11.3 per cent (9.8-13.3 per cent). The estimated incidence depends on the values used for the epidemic growth rate and the AIDS related mortality. To ensure that, for this population, errors in these two parameters change the age specific estimates of the annual incidence by less than the standard deviation of the estimates of the age specific incidence, the AIDS related mortality should be known to within +/-50 per cent and the epidemic growth rate to within +/-25 per cent, both of which conditions are met. In the absence of cohort studies to measure the incidence of HIV infection directly, useful estimates of the age-specific incidence can be obtained from cross-sectional, age-specific prevalence data and repeat cross-sectional data on the overall prevalence of HIV infection. Several assumptions were made because of the lack of data but sensitivity analyses show that they are unlikely to affect the overall estimates significantly. These estimates are important in assessing the magnitude of the public health problem, for designing vaccine trials and for evaluating the impact of interventions. Copyright (C) 2001 John Wiley & Sons, Ltd.