132 resultados para Robust Regression
Resumo:
In this paper we consider the estimation of population size from onesource capture–recapture data, that is, a list in which individuals can potentially be found repeatedly and where the question is how many individuals are missed by the list. As a typical example, we provide data from a drug user study in Bangkok from 2001 where the list consists of drug users who repeatedly contact treatment institutions. Drug users with 1, 2, 3, . . . contacts occur, but drug users with zero contacts are not present, requiring the size of this group to be estimated. Statistically, these data can be considered as stemming from a zero-truncated count distribution.We revisit an estimator for the population size suggested by Zelterman that is known to be robust under potential unobserved heterogeneity. We demonstrate that the Zelterman estimator can be viewed as a maximum likelihood estimator for a locally truncated Poisson likelihood which is equivalent to a binomial likelihood. This result allows the extension of the Zelterman estimator by means of logistic regression to include observed heterogeneity in the form of covariates. We also review an estimator proposed by Chao and explain why we are not able to obtain similar results for this estimator. The Zelterman estimator is applied in two case studies, the first a drug user study from Bangkok, the second an illegal immigrant study in the Netherlands. Our results suggest the new estimator should be used, in particular, if substantial unobserved heterogeneity is present.
Resumo:
We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions.
Determinants of fruit and vegetable intake in England: a re-examination based on quantile regression
Resumo:
Objective To examine die sociodemographic determinants of fruit and vegetable (F&V) consumption in England and determine the differential effects of socioeconomic variables at various parts of the intake distribution, with a special focus on severely inadequate intakes Design Quantile regression, expressing F&V intake as a function of sociodemographic variables, is employed. Here, quantile regression flexibly allows variables such as ethnicity to exert effects on F&V intake that. vary depending oil existing levels of intake. Setting The 2003 Health survey of England. Subjects Data were from 11044 adult individuals. Results The influence of particular sociodemographic variables is found to vary significantly across the intake distribution We conclude that women consume more F&V than men, Asians and Hacks mole dian Whites, co-habiting individuals more than single-living ones Increased incomes and education also boost intake However, the key general finding of the present study is that the influence of most variables is relatively weak in the area of greatest concern, i e among those with the most inadequate intakes in any reference group. Conclusions. Our findings emphasise the importance of allowing the effects of socio-economic drivers to vary across the intake distribution The main finding, that variables which exert significant influence on F&V Intake at other parts Of the conditional distribution have a relatively weak influence at the lower tail, is cause for concern. It implies that in any defined group, those consuming the lease F&V are hard to influence using compaigns or policy levers.
Resumo:
This note presents a robust method for estimating response surfaces that consist of linear response regimes and a linear plateau. The linear response-and-plateau model has fascinated production scientists since von Liebig (1855) and, as Upton and Dalton indicated, some years ago in this Journal, the response-and-plateau model seems to fit the data in many empirical studies. The estimation algorithm evolves from Bayesian implementation of a switching-regression (finite mixtures) model and demonstrates routine application of Gibbs sampling and data augmentation-techniques that are now in widespread application in other disciplines.
Resumo:
Fixed transactions costs that prohibit exchange engender bias in supply analysis due to censoring of the sample observations. The associated bias in conventional regression procedures applied to censored data and the construction of robust methods for mitigating bias have been preoccupations of applied economists since Tobin [Econometrica 26 (1958) 24]. This literature assumes that the true point of censoring in the data is zero and, when this is not the case, imparts a bias to parameter estimates of the censored regression model. We conjecture that this bias can be significant; affirm this from experiments; and suggest techniques for mitigating this bias using Bayesian procedures. The bias-mitigating procedures are based on modifications of the key step that facilitates Bayesian estimation of the censored regression model; are easy to implement; work well in both small and large samples; and lead to significantly improved inference in the censored regression model. These findings are important in light of the widespread use of the zero-censored Tobit regression and we investigate their consequences using data on milk-market participation in the Ethiopian highlands. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)
Resumo:
Inferring the spatial expansion dynamics of invading species from molecular data is notoriously difficult due to the complexity of the processes involved. For these demographic scenarios, genetic data obtained from highly variable markers may be profitably combined with specific sampling schemes and information from other sources using a Bayesian approach. The geographic range of the introduced toad Bufo marinus is still expanding in eastern and northern Australia, in each case from isolates established around 1960. A large amount of demographic and historical information is available on both expansion areas. In each area, samples were collected along a transect representing populations of different ages and genotyped at 10 microsatellite loci. Five demographic models of expansion, differing in the dispersal pattern for migrants and founders and in the number of founders, were considered. Because the demographic history is complex, we used an approximate Bayesian method, based on a rejection-regression algorithm. to formally test the relative likelihoods of the five models of expansion and to infer demographic parameters. A stepwise migration-foundation model with founder events was statistically better supported than other four models in both expansion areas. Posterior distributions supported different dynamics of expansion in the studied areas. Populations in the eastern expansion area have a lower stable effective population size and have been founded by a smaller number of individuals than those in the northern expansion area. Once demographically stabilized, populations exchange a substantial number of effective migrants per generation in both expansion areas, and such exchanges are larger in northern than in eastern Australia. The effective number of migrants appears to be considerably lower than that of founders in both expansion areas. We found our inferences to be relatively robust to various assumptions on marker. demographic, and historical features. The method presented here is the only robust, model-based method available so far, which allows inferring complex population dynamics over a short time scale. It also provides the basis for investigating the interplay between population dynamics, drift, and selection in invasive species.
Resumo:
In this paper, we list some new orthogonal main effects plans for three-level designs for 4, 5 and 6 factors in IS runs and compare them with designs obtained from the existing L-18 orthogonal array. We show that these new designs have better projection properties and can provide better parameter estimates for a range of possible models. Additionally, we study designs in other smaller run-sizes when there are insufficient resources to perform an 18-run experiment. Plans for three-level designs for 4, 5 and 6 factors in 13 to 17 runs axe given. We show that the best designs here are efficient and deserve strong consideration in many practical situations.
Resumo:
1. Wildlife managers often require estimates of abundance. Direct methods of estimation are often impractical, especially in closed-forest environments, so indirect methods such as dung or nest surveys are increasingly popular. 2. Dung and nest surveys typically have three elements: surveys to estimate abundance of the dung or nests; experiments to estimate the production (defecation or nest construction) rate; and experiments to estimate the decay or disappearance rate. The last of these is usually the most problematic, and was the subject of this study. 3. The design of experiments to allow robust estimation of mean time to decay was addressed. In most studies to date, dung or nests have been monitored until they disappear. Instead, we advocate that fresh dung or nests are located, with a single follow-up visit to establish whether the dung or nest is still present or has decayed. 4. Logistic regression was used to estimate probability of decay as a function of time, and possibly of other covariates. Mean time to decay was estimated from this function. 5. Synthesis and applications. Effective management of mammal populations usually requires reliable abundance estimates. The difficulty in estimating abundance of mammals in forest environments has increasingly led to the use of indirect survey methods, in which abundance of sign, usually dung (e.g. deer, antelope and elephants) or nests (e.g. apes), is estimated. Given estimated rates of sign production and decay, sign abundance estimates can be converted to estimates of animal abundance. Decay rates typically vary according to season, weather, habitat, diet and many other factors, making reliable estimation of mean time to decay of signs present at the time of the survey problematic. We emphasize the need for retrospective rather than prospective rates, propose a strategy for survey design, and provide analysis methods for estimating retrospective rates.
Resumo:
Multiple regression analysis is a statistical technique which allows to predict a dependent variable from m ore than one independent variable and also to determine influential independent variables. Using experimental data, in this study the multiple regression analysis is applied to predict the room mean velocity and determine the most influencing parameters on the velocity. More than 120 experiments for four different heat source locations were carried out in a test chamber with a high level wall mounted air supply terminal at air change rates 3-6 ach. The influence of the environmental parameters such as supply air momentum, room heat load, Archimedes number and local temperature ratio, were examined by two methods: a simple regression analysis incorporated into scatter matrix plots and multiple stepwise regression analysis. It is concluded that, when a heat source is located along the jet centre line, the supply momentum mainly influences the room mean velocity regardless of the plume strength. However, when the heat source is located outside the jet region, the local temperature ratio (the inverse of the local heat removal effectiveness) is a major influencing parameter.
Resumo:
We report rates of regression and associated findings in a population derived group of 255 children aged 9-14 years, participating in a prevalence study of autism spectrum disorders (ASD); 53 with narrowly defined autism, 105 with broader ASD and 97 with non-ASD neurodevelopmental problems, drawn from those with special educational needs within a population of 56,946 children. Language regression was reported in 30% with narrowly defined autism, 8% with broader ASD and less than 3% with developmental problems without ASD. A smaller group of children were identified who underwent a less clear setback. Regression was associated with higher rates of autistic symptoms and a deviation in developmental trajectory. Regression was not associated with epilepsy or gastrointestinal problems.
Resumo:
Using the classical Parzen window (PW) estimate as the target function, the sparse kernel density estimator is constructed in a forward constrained regression manner. The leave-one-out (LOO) test score is used for kernel selection. The jackknife parameter estimator subject to positivity constraint check is used for the parameter estimation of a single parameter at each forward step. As such the proposed approach is simple to implement and the associated computational cost is very low. An illustrative example is employed to demonstrate that the proposed approach is effective in constructing sparse kernel density estimators with comparable accuracy to that of the classical Parzen window estimate.