833 resultados para Data Warehouse
Resumo:
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
Resumo:
We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.
Resumo:
In analysis of longitudinal data, the variance matrix of the parameter estimates is usually estimated by the 'sandwich' method, in which the variance for each subject is estimated by its residual products. We propose smooth bootstrap methods by perturbing the estimating functions to obtain 'bootstrapped' realizations of the parameter estimates for statistical inference. Our extensive simulation studies indicate that the variance estimators by our proposed methods can not only correct the bias of the sandwich estimator but also improve the confidence interval coverage. We applied the proposed method to a data set from a clinical trial of antibiotics for leprosy.
Resumo:
This paper considers the one-sample sign test for data obtained from general ranked set sampling when the number of observations for each rank are not necessarily the same, and proposes a weighted sign test because observations with different ranks are not identically distributed. The optimal weight for each observation is distribution free and only depends on its associated rank. It is shown analytically that (1) the weighted version always improves the Pitman efficiency for all distributions; and (2) the optimal design is to select the median from each ranked set.
Resumo:
We consider the analysis of longitudinal data when the covariance function is modeled by additional parameters to the mean parameters. In general, inconsistent estimators of the covariance (variance/correlation) parameters will be produced when the "working" correlation matrix is misspecified, which may result in great loss of efficiency of the mean parameter estimators (albeit the consistency is preserved). We consider using different "Working" correlation models for the variance and the mean parameters. In particular, we find that an independence working model should be used for estimating the variance parameters to ensure their consistency in case the correlation structure is misspecified. The designated "working" correlation matrices should be used for estimating the mean and the correlation parameters to attain high efficiency for estimating the mean parameters. Simulation studies indicate that the proposed algorithm performs very well. We also applied different estimation procedures to a data set from a clinical trial for illustration.
Resumo:
The Fabens method is commonly used to estimate growth parameters k and l infinity in the von Bertalanffy model from tag-recapture data. However, the Fabens method of estimation has an inherent bias when individual growth is variable. This paper presents an asymptotically unbiassed method using a maximum likelihood approach that takes account of individual variability in both maximum length and age-at-tagging. It is assumed that each individual's growth follows a von Bertalanffy curve with its own maximum length and age-at-tagging. The parameter k is assumed to be a constant to ensure that the mean growth follows a von Bertalanffy curve and to avoid overparameterization. Our method also makes more efficient use nf thp measurements at tno and recapture and includes diagnostic techniques for checking distributional assumptions. The method is reasonably robust and performs better than the Fabens method when individual growth differs from the von Bertalanffy relationship. When measurement error is negligible, the estimation involves maximizing the profile likelihood of one parameter only. The method is applied to tag-recapture data for the grooved tiger prawn (Penaeus semisulcatus) from the Gulf of Carpentaria, Australia.
Resumo:
The extended recruitment season for short-lived species such as prawns biases the estimation of growth parameters from length-frequency data when conventional methods are used. We propose a simple method for overcoming this bias given a time series of length-frequency data. The difficulties arising from extended recruitment are eliminated by predicting the growth of the succeeding samples and the length increments of the recruits in previous samples. This method requires that some maximum size at recruitment can be specified. The advantages of this multiple length-frequency method are: it is simple to use; it requires only three parameters; no specific distributions need to be assumed; and the actual seasonal recruitment pattern does not have to be specified. We illustrate the new method with length-frequency data on the tiger prawn Penaeus esculentus from the north-western Gulf of Carpentaria, Australia.
Resumo:
We propose a new model for estimating the size of a population from successive catches taken during a removal experiment. The data from these experiments often have excessive variation, known as overdispersion, as compared with that predicted by the multinomial model. The new model allows catchability to vary randomly among samplings, which accounts for overdispersion. When the catchability is assumed to have a beta distribution, the likelihood function, which is refered to as beta-multinomial, is derived, and hence the maximum likelihood estimates can be evaluated. Simulations show that in the presence of extravariation in the data, the confidence intervals have been substantially underestimated in previous models (Leslie-DeLury, Moran) and that the new model provides more reliable confidence intervals. The performance of these methods was also demonstrated using two real data sets: one with overdispersion, from smallmouth bass (Micropterus dolomieu), and the other without overdispersion, from rat (Rattus rattus).
Resumo:
Robust estimation often relies on a dispersion function that is more slowly varying at large values than the square function. However, the choice of tuning constant in dispersion functions may impact the estimation efficiency to a great extent. For a given family of dispersion functions such as the Huber family, we suggest obtaining the "best" tuning constant from the data so that the asymptotic efficiency is maximized. This data-driven approach can automatically adjust the value of the tuning constant to provide the necessary resistance against outliers. Simulation studies show that substantial efficiency can be gained by this data-dependent approach compared with the traditional approach in which the tuning constant is fixed. We briefly illustrate the proposed method using two datasets.
Resumo:
Robust methods are useful in making reliable statistical inferences when there are small deviations from the model assumptions. The widely used method of the generalized estimating equations can be "robustified" by replacing the standardized residuals with the M-residuals. If the Pearson residuals are assumed to be unbiased from zero, parameter estimators from the robust approach are asymptotically biased when error distributions are not symmetric. We propose a distribution-free method for correcting this bias. Our extensive numerical studies show that the proposed method can reduce the bias substantially. Examples are given for illustration.
Resumo:
The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.
Resumo:
We consider estimation of mortality rates and growth parameters from length-frequency data of a fish stock and derive the underlying length distribution of the population and the catch when there is individual variability in the von Bertalanffy growth parameter L-infinity. The model is flexible enough to accommodate 1) any recruitment pattern as a function of both time and length, 2) length-specific selectivity, and 3) varying fishing effort over time. The maximum likelihood method gives consistent estimates, provided the underlying distribution for individual variation in growth is correctly specified. Simulation results indicate that our method is reasonably robust to violations in the assumptions. The method is applied to tiger prawn data (Penaeus semisulcatus) to obtain estimates of natural and fishing mortality.
Resumo:
This article develops a method for analysis of growth data with multiple recaptures when the initial ages for all individuals are unknown. The existing approaches either impute the initial ages or model them as random effects. Assumptions about the initial age are not verifiable because all the initial ages are unknown. We present an alternative approach that treats all the lengths including the length at first capture as correlated repeated measures for each individual. Optimal estimating equations are developed using the generalized estimating equations approach that only requires the first two moment assumptions. Explicit expressions for estimation of both mean growth parameters and variance components are given to minimize the computational complexity. Simulation studies indicate that the proposed method works well. Two real data sets are analyzed for illustration, one from whelks (Dicathais aegaota) and the other from southern rock lobster (Jasus edwardsii) in South Australia.
Resumo:
Statistical methods are often used to analyse commercial catch and effort data to provide standardised fishing effort and/or a relative index of fish abundance for input into stock assessment models. Achieving reliable results has proved difficult in Australia's Northern Prawn Fishery (NPF), due to a combination of such factors as the biological characteristics of the animals, some aspects of the fleet dynamics, and the changes in fishing technology. For this set of data, we compared four modelling approaches (linear models, mixed models, generalised estimating equations, and generalised linear models) with respect to the outcomes of the standardised fishing effort or the relative index of abundance. We also varied the number and form of vessel covariates in the models. Within a subset of data from this fishery, modelling correlation structures did not alter the conclusions from simpler statistical models. The random-effects models also yielded similar results. This is because the estimators are all consistent even if the correlation structure is mis-specified, and the data set is very large. However, the standard errors from different models differed, suggesting that different methods have different statistical efficiency. We suggest that there is value in modelling the variance function and the correlation structure, to make valid and efficient statistical inferences and gain insight into the data. We found that fishing power was separable from the indices of prawn abundance only when we offset the impact of vessel characteristics at assumed values from external sources. This may be due to the large degree of confounding within the data, and the extreme temporal changes in certain aspects of individual vessels, the fleet and the fleet dynamics.
Resumo:
A simple stochastic model of a fish population subject to natural and fishing mortalities is described. The fishing effort is assumed to vary over different periods but to be constant within each period. A maximum-likelihood approach is developed for estimating natural mortality (M) and the catchability coefficient (q) simultaneously from catch-and-effort data. If there is not enough contrast in the data to provide reliable estimates of both M and q, as is often the case in practice, the method can be used to obtain the best possible values of q for a range of possible values of M. These techniques are illustrated with tiger prawn (Penaeus semisulcatus) data from the Northern Prawn Fishery of Australia.