27 resultados para Logistic regression mixture models
em CentAUR: Central Archive University of Reading - UK
Resumo:
The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.
Resumo:
We investigate the performance of phylogenetic mixture models in reducing a well-known and pervasive artifact of phylogenetic inference known as the node-density effect, comparing them to partitioned analyses of the same data. The node-density effect refers to the tendency for the amount of evolutionary change in longer branches of phylogenies to be underestimated compared to that in regions of the tree where there are more nodes and thus branches are typically shorter. Mixture models allow more than one model of sequence evolution to describe the sites in an alignment without prior knowledge of the evolutionary processes that characterize the data or how they correspond to different sites. If multiple evolutionary patterns are common in sequence evolution, mixture models may be capable of reducing node-density effects by characterizing the evolutionary processes more accurately. In gene-sequence alignments simulated to have heterogeneous patterns of evolution, we find that mixture models can reduce node-density effects to negligible levels or remove them altogether, performing as well as partitioned analyses based on the known simulated patterns. The mixture models achieve this without knowledge of the patterns that generated the data and even in some cases without specifying the full or true model of sequence evolution known to underlie the data. The latter result is especially important in real applications, as the true model of evolution is seldom known. We find the same patterns of results for two real data sets with evidence of complex patterns of sequence evolution: mixture models substantially reduced node-density effects and returned better likelihoods compared to partitioning models specifically fitted to these data. We suggest that the presence of more than one pattern of evolution in the data is a common source of error in phylogenetic inference and that mixture models can often detect these patterns even without prior knowledge of their presence in the data. Routine use of mixture models alongside other approaches to phylogenetic inference may often reveal hidden or unexpected patterns of sequence evolution and can improve phylogenetic inference.
Resumo:
A statistical technique for fault analysis in industrial printing is reported. The method specifically deals with binary data, for which the results of the production process fall into two categories, rejected or accepted. The method is referred to as logistic regression, and is capable of predicting future fault occurrences by the analysis of current measurements from machine parts sensors. Individual analysis of each type of fault can determine which parts of the plant have a significant influence on the occurrence of such faults; it is also possible to infer which measurable process parameters have no significant influence on the generation of these faults. Information derived from the analysis can be helpful in the operator's interpretation of the current state of the plant. Appropriate actions may then be taken to prevent potential faults from occurring. The algorithm is being implemented as part of an applied self-learning expert system.
Resumo:
Investigation of preferred structures of planetary wave dynamics is addressed using multivariate Gaussian mixture models. The number of components in the mixture is obtained using order statistics of the mixing proportions, hence avoiding previous difficulties related to sample sizes and independence issues. The method is first applied to a few low-order stochastic dynamical systems and data from a general circulation model. The method is next applied to winter daily 500-hPa heights from 1949 to 2003 over the Northern Hemisphere. A spatial clustering algorithm is first applied to the leading two principal components (PCs) and shows significant clustering. The clustering is particularly robust for the first half of the record and less for the second half. The mixture model is then used to identify the clusters. Two highly significant extratropical planetary-scale preferred structures are obtained within the first two to four EOF state space. The first pattern shows a Pacific-North American (PNA) pattern and a negative North Atlantic Oscillation (NAO), and the second pattern is nearly opposite to the first one. It is also observed that some subspaces show multivariate Gaussianity, compatible with linearity, whereas others show multivariate non-Gaussianity. The same analysis is also applied to two subperiods, before and after 1978, and shows a similar regime behavior, with a slight stronger support for the first subperiod. In addition a significant regime shift is also observed between the two periods as well as a change in the shape of the distribution. The patterns associated with the regime shifts reflect essentially a PNA pattern and an NAO pattern consistent with the observed global warming effect on climate and the observed shift in sea surface temperature around the mid-1970s.
Resumo:
1. Jerdon's courser Rhinoptilus bitorquatus is a nocturnally active cursorial bird that is only known to occur in a small area of scrub jungle in Andhra Pradesh, India, and is listed as critically endangered by the IUCN. Information on its habitat requirements is needed urgently to underpin conservation measures. We quantified the habitat features that correlated with the use of different areas of scrub jungle by Jerdon's coursers, and developed a model to map potentially suitable habitat over large areas from satellite imagery and facilitate the design of surveys of Jerdon's courser distribution. 2. We used 11 arrays of 5-m long tracking strips consisting of smoothed fine soil to detect the footprints of Jerdon's coursers, and measured tracking rates (tracking events per strip night). We counted the number of bushes and trees, and described other attributes of vegetation and substrate in a 10-m square plot centred on each strip. We obtained reflectance data from Landsat 7 satellite imagery for the pixel within which each strip lay. 3. We used logistic regression models to describe the relationship between tracking rate by Jerdon's coursers and characteristics of the habitat around the strips, using ground-based survey data and satellite imagery. 4. Jerdon's coursers were most likely to occur where the density of large (>2 m tall) bushes was in the range 300-700 ha(-1) and where the density of smaller bushes was less than 1000 ha(-1). This habitat was detectable using satellite imagery. 5. Synthesis and applications. The occurrence of Jerdon's courser is strongly correlated with the density of bushes and trees, and is in turn affected by grazing with domestic livestock, woodcutting and mechanical clearance of bushes to create pasture, orchards and farmland. It is likely that there is an optimal level of grazing and woodcutting that would maintain or create suitable conditions for the species. Knowledge of the species' distribution is incomplete and there is considerable pressure from human use of apparently suitable habitats. Hence, distribution mapping is a high conservation priority. A two-step procedure is proposed, involving the use of ground surveys of bush density to calibrate satellite image-based mapping of potential habitat. These maps could then be used to select priority areas for Jerdon's courser surveys. The use of tracking strips to study habitat selection and distribution has potential in studies of other scarce and secretive species.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture–recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture–recapture models. Alternative methods, still under the capture–recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture–recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao’s lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates—in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
A cross-sectional serological survey of A. marginale was conducted on 200 randomly selected smallholder farms in each of the Tanga and Iringa Regions of Tanzania between January and April 1999. Sera, from dairy cattle of all ages, sexes and breeds were tested for antibodies against A. marginale using an indirect enzyme-linked immunosorbent assay. Antibodies to A. marginale were present in cattle throughout the study areas and the overall prevalence was 20% for Tanga and 37% for Iringa. The forces of infection based on the age seroprevalence profile were estimated at 8 for Tanga and 15 for Iringa per 100 cattle years-risk, respectively. In both regions, seroprevalence increased with age (β = 0.01 and 0.017 per year of age, p < 0.005, in Tanga and Iringa, respectively). Older animals in Iringa were significantly and negatively associated with decreased seropositivity (β = −0.002, p = 0.0029). Further results of logistic regression models reveal that geographic location of animals in Tanga was associated with seropositivity (odds ratio (OR) = 2.94, p = 0.005, for Tanga Rural and OR = 2.38, p = 0.066, for Muheza). Animals acquired as a gift in Iringa had higher odds for seropositivity than brought-in cattle (OR = 2.44, p = 0.005). Our study has identified and quantified some key risk factors that can guide planners devising disease control strategies.
Resumo:
A cross-sectional serological survey of A. marginale was conducted on 200 randomly selected smallholder farms in each of the Tanga and Iringa Regions of Tanzania between January and April 1999. Sera, from dairy cattle of all ages, sexes and breeds were tested for antibodies against A. marginale using an indirect enzyme-linked immunosorbent assay. Antibodies to A. marginale were present in cattle throughout the study areas and the overall prevalence was 20% for Tanga and 37% for Iringa. The forces of infection based on the age seroprevalence profile were estimated at 8 for Tanga and 15 for Iringa per 100 cattle years-risk, respectively. In both regions, seroprevalence increased with age (beta = 0.01 and 0.017 per year of age, p < 0.005, in Tanga and Iringa, respectively). Older animals in Iringa were significantly and negatively associated with decreased seropositivity (beta = -0.002, p = 0.0029). Further results of logistic regression models reveal that geographic location of animals in Tanga was associated with seropositivity (odds ratio (OR) = 2.94, p = 0.005, for Tanga Rural and OR = 2.38, p = 0.066, for Muheza). Animals acquired as a gift in Iringa had higher odds for seropositivity than brought-in cattle (OR = 2.44, p = 0.005). Our study has identified and quantified some key risk factors that can guide planners devising disease control strategies.
Resumo:
In this paper, Bayesian decision procedures are developed for dose-escalation studies based on bivariate observations of undesirable events and signs of therapeutic benefit. The methods generalize earlier approaches taking into account only the undesirable outcomes. Logistic regression models are used to model the two responses, which are both assumed to take a binary form. A prior distribution for the unknown model parameters is suggested and an optional safety constraint can be included. Gain functions to be maximized are formulated in terms of accurate estimation of the limits of a therapeutic window or optimal treatment of the next cohort of subjects, although the approach could be applied to achieve any of a wide variety of objectives. The designs introduced are illustrated through simulation and retrospective implementation to a completed dose-escalation study. Copyright © 2006 John Wiley & Sons, Ltd.
Resumo:
Population size estimation with discrete or nonparametric mixture models is considered, and reliable ways of construction of the nonparametric mixture model estimator are reviewed and set into perspective. Construction of the maximum likelihood estimator of the mixing distribution is done for any number of components up to the global nonparametric maximum likelihood bound using the EM algorithm. In addition, the estimators of Chao and Zelterman are considered with some generalisations of Zelterman’s estimator. All computations are done with CAMCR, a special software developed for population size estimation with mixture models. Several examples and data sets are discussed and the estimators illustrated. Problems using the mixture model-based estimators are highlighted.
Resumo:
None of the current surveillance streams monitoring the presence of scrapie in Great Britain provide a comprehensive and unbiased estimate of the prevalence of the disease at the holding level. Previous work to estimate the under-ascertainment adjusted prevalence of scrapie in Great Britain applied multiple-list capture-recapture methods. The enforcement of new control measures on scrapie-affected holdings in 2004 has stopped the overlapping between surveillance sources and, hence, the application of multiple-list capture-recapture models. Alternative methods, still under the capture-recapture methodology, relying on repeated entries in one single list have been suggested in these situations. In this article, we apply one-list capture-recapture approaches to data held on the Scrapie Notifications Database to estimate the undetected population of scrapie-affected holdings with clinical disease in Great Britain for the years 2002, 2003, and 2004. For doing so, we develop a new diagnostic tool for indication of heterogeneity as well as a new understanding of the Zelterman and Chao's lower bound estimators to account for potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a special, locally truncated Poisson likelihood equivalent to a binomial likelihood. This understanding allows the extension of the Zelterman approach by means of logistic regression to include observed heterogeneity in the form of covariates-in case studied here, the holding size and country of origin. Our results confirm the presence of substantial unobserved heterogeneity supporting the application of our two estimators. The total scrapie-affected holding population in Great Britain is around 300 holdings per year. None of the covariates appear to inform the model significantly.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
1. Jerdon's courser Rhinoptilus bitorquatus is a nocturnally active cursorial bird that is only known to occur in a small area of scrub jungle in Andhra Pradesh, India, and is listed as critically endangered by the IUCN. Information on its habitat requirements is needed urgently to underpin conservation measures. We quantified the habitat features that correlated with the use of different areas of scrub jungle by Jerdon's coursers, and developed a model to map potentially suitable habitat over large areas from satellite imagery and facilitate the design of surveys of Jerdon's courser distribution. 2. We used 11 arrays of 5-m long tracking strips consisting of smoothed fine soil to detect the footprints of Jerdon's coursers, and measured tracking rates (tracking events per strip night). We counted the number of bushes and trees, and described other attributes of vegetation and substrate in a 10-m square plot centred on each strip. We obtained reflectance data from Landsat 7 satellite imagery for the pixel within which each strip lay. 3. We used logistic regression models to describe the relationship between tracking rate by Jerdon's coursers and characteristics of the habitat around the strips, using ground-based survey data and satellite imagery. 4. Jerdon's coursers were most likely to occur where the density of large (>2 m tall) bushes was in the range 300-700 ha(-1) and where the density of smaller bushes was less than 1000 ha(-1). This habitat was detectable using satellite imagery. 5. Synthesis and applications. The occurrence of Jerdon's courser is strongly correlated with the density of bushes and trees, and is in turn affected by grazing with domestic livestock, woodcutting and mechanical clearance of bushes to create pasture, orchards and farmland. It is likely that there is an optimal level of grazing and woodcutting that would maintain or create suitable conditions for the species. Knowledge of the species' distribution is incomplete and there is considerable pressure from human use of apparently suitable habitats. Hence, distribution mapping is a high conservation priority. A two-step procedure is proposed, involving the use of ground surveys of bush density to calibrate satellite image-based mapping of potential habitat. These maps could then be used to select priority areas for Jerdon's courser surveys. The use of tracking strips to study habitat selection and distribution has potential in studies of other scarce and secretive species.