Biblioteca Digital

601 resultados para Estimators

Fast Update of Conditional Simulation Ensembles

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Gaussian random field (GRF) conditional simulation is a key ingredient in many spatial statistics problems for computing Monte-Carlo estimators and quantifying uncertainties on non-linear functionals of GRFs conditional on data. Conditional simulations are known to often be computer intensive, especially when appealing to matrix decomposition approaches with a large number of simulation points. This work studies settings where conditioning observations are assimilated batch sequentially, with one point or a batch of points at each stage. Assuming that conditional simulations have been performed at a previous stage, the goal is to take advantage of already available sample paths and by-products to produce updated conditional simulations at mini- mal cost. Explicit formulae are provided, which allow updating an ensemble of sample paths conditioned on n ≥ 0 observations to an ensemble conditioned on n + q observations, for arbitrary q ≥ 1. Compared to direct approaches, the proposed formulae proveto substantially reduce computational complexity. Moreover, these formulae explicitly exhibit how the q new observations are updating the old sample paths. Detailed complexity calculations highlighting the benefits of this approach with respect to state-of-the-art algorithms are provided and are complemented by numerical experiments.

Consistency of Concave Regression with an Application to Current-Status Data

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We consider the problem of nonparametric estimation of a concave regression function F. We show that the supremum distance between the least square s estimatorand F on a compact interval is typically of order(log(n)/n)2/5. This entails rates of convergence for the estimator’s derivative. Moreover, we discuss the impact of additional constraints on F such as monotonicity and pointwise bounds. Then we apply these results to the analysis of current status data, where the distribution function of the event times is assumed to be concave.

New Algorithms for M-Estimation of Multivariate Scatter and Location

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present new algorithms for M-estimators of multivariate scatter and location and for symmetrized M-estimators of multivariate scatter. The new algorithms are considerably faster than currently used fixed-point and related algorithms. The main idea is to utilize a second order Taylor expansion of the target functional and to devise a partial Newton-Raphson procedure. In connection with symmetrized M-estimators we work with incomplete U-statistics to accelerate our procedures initially.

Stereological estimation of particle shape and orientation from volume tensors

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the present paper, we describe new robust methods of estimating cell shape and orientation in 3D from sections. The descriptors of 3D cell shape and orientation are based on volume tensors which are used to construct an ellipsoid, the Miles ellipsoid, approximating the average cell shape and orientation in 3D. The estimators of volume tensors are based on observations in several optical planes through sampled cells. This type of geometric sampling design is known as the optical rotator. The statistical behaviour of the estimator of the Miles ellipsoid is studied under a flexible model for 3D cell shape and orientation. In a simulation study, the lengths of the axes of the Miles ellipsoid can be estimated with CVs of about 2% if 100 cells are sampled. Finally, we illustrate the use of the developed methods in an example, involving neurons in the medial prefrontal cortex of rat.

Saddlepoint approximations

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The saddlepoint method provides accurate approximations for the distributions of many test statistics, estimators and for important probabilities arising in various stochastic models. The saddlepoint approximation is a large deviations technique which is substantially more accurate than limiting normal or Edgeworth approximations, especially in presence of very small sample sizes or very small probabilities. The outstanding accuracy of the saddlepoint approximation can be explained by the fact that it has bounded relative error.

The Discontinuous Nature of Kriging Interpolation for Digital Terrain Modeling

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Kriging is a widely employed method for interpolating and estimating elevations from digital elevation data. Its place of prominence is due to its elegant theoretical foundation and its convenient practical implementation. From an interpolation point of view, kriging is equivalent to a thin-plate spline and is one species among the many in the genus of weighted inverse distance methods, albeit with attractive properties. However, from a statistical point of view, kriging is a best linear unbiased estimator and, consequently, has a place of distinction among all spatial estimators because any other linear estimator that performs as well as kriging (in the least squares sense) must be equivalent to kriging, assuming that the parameters of the semivariogram are known. Therefore, kriging is often held to be the gold standard of digital terrain model elevation estimation. However, I prove that, when used with local support, kriging creates discontinuous digital terrain models, which is to say, surfaces with “rips” and “tears” throughout them. This result is general; it is true for ordinary kriging, kriging with a trend, and other forms. A U.S. Geological Survey (USGS) digital elevation model was analyzed to characterize the distribution of the discontinuities. I show that the magnitude of the discontinuity does not depend on surface gradient but is strongly dependent on the size of the kriging neighborhood.

Optimally Combining Censored and Uncensored Datasets

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Economists and other social scientists often face situations where they have access to two datasets that they can use but one set of data suffers from censoring or truncation. If the censored sample is much bigger than the uncensored sample, it is common for researchers to use the censored sample alone and attempt to deal with the problem of partial observation in some manner. Alternatively, they simply use only the uncensored sample and ignore the censored one so as to avoid biases. It is rarely the case that researchers use both datasets together, mainly because they lack guidance about how to combine them. In this paper, we develop a tractable semiparametric framework for combining the censored and uncensored datasets so that the resulting estimators are consistent, asymptotically normal, and use all information optimally. When the censored sample, which we refer to as the master sample, is much bigger than the uncensored sample (which we call the refreshment sample), the latter can be thought of as providing identification where it is otherwise absent. In contrast, when the refreshment sample is large and could typically be used alone, our methodology can be interpreted as using information from the censored sample to increase effciency. To illustrate our results in an empirical setting, we show how to estimate the effect of changes in compulsory schooling laws on age at first marriage, a variable that is censored for younger individuals. We also demonstrate how refreshment samples for this application can be created by matching cohort information across census datasets.

Job Matching and Wage Growth in the U.S. and Germany

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper examines the contribution of job matching to wage growth in the U.S. and Germany using data drawn from the Panel Study of Income Dynamics and the German Socio-Economic Panel from 1984 through 1992. Using a symmetrical set of variables and data handling procedures, real wage growth is found to be higher in the U.S. than in Germany during this period. Also, using two different estimators, job matches are found to enhance wage growth in the U.S. and retard it in Germany. The relationship of general skills to employment in each country appears responsible for this result.

Preclinical modeling of multi-drug cancer therapies

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Anticancer drugs typically are administered in the clinic in the form of mixtures, sometimes called combinations. Only in rare cases, however, are mixtures approved as drugs. Rather, research on mixtures tends to occur after single drugs have been approved. The goal of this research project was to develop modeling approaches that would encourage rational preclinical mixture design. To this end, a series of models were developed. First, several QSAR classification models were constructed to predict the cytotoxicity, oral clearance, and acute systemic toxicity of drugs. The QSAR models were applied to a set of over 115,000 natural compounds in order to identify promising ones for testing in mixtures. Second, an improved method was developed to assess synergistic, antagonistic, and additive effects between drugs in a mixture. This method, dubbed the MixLow method, is similar to the Median-Effect method, the de facto standard for assessing drug interactions. The primary difference between the two is that the MixLow method uses a nonlinear mixed-effects model to estimate parameters of concentration-effect curves, rather than an ordinary least squares procedure. Parameter estimators produced by the MixLow method were more precise than those produced by the Median-Effect Method, and coverage of Loewe index confidence intervals was superior. Third, a model was developed to predict drug interactions based on scores obtained from virtual docking experiments. This represents a novel approach for modeling drug mixtures and was more useful for the data modeled here than competing approaches. The model was applied to cytotoxicity data for 45 mixtures, each composed of up to 10 selected drugs. One drug, doxorubicin, was a standard chemotherapy agent and the others were well-known natural compounds including curcumin, EGCG, quercetin, and rhein. Predictions of synergism/antagonism were made for all possible fixed-ratio mixtures, cytotoxicities of the 10 best-scoring mixtures were tested, and drug interactions were assessed. Predicted and observed responses were highly correlated (r2 = 0.83). Results suggested that some mixtures allowed up to an 11-fold reduction of doxorubicin concentrations without sacrificing efficacy. Taken together, the models developed in this project present a general approach to rational design of mixtures during preclinical drug development. ^

Assessing the improved discriminatory power of a new biomarker in prognostic models

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Although the area under the receiver operating characteristic (AUC) is the most popular measure of the performance of prediction models, it has limitations, especially when it is used to evaluate the added discrimination of a new biomarker in the model. Pencina et al. (2008) proposed two indices, the net reclassification improvement (NRI) and integrated discrimination improvement (IDI), to supplement the improvement in the AUC (IAUC). Their NRI and IDI are based on binary outcomes in case-control settings, which do not involve time-to-event outcome. However, many disease outcomes are time-dependent and the onset time can be censored. Measuring discrimination potential of a prognostic marker without considering time to event can lead to biased estimates. In this dissertation, we have extended the NRI and IDI to survival analysis settings and derived the corresponding sample estimators and asymptotic tests. Simulation studies were conducted to compare the performance of the time-dependent NRI and IDI with Pencina’s NRI and IDI. For illustration, we have applied the proposed method to a breast cancer study.^ Key words: Prognostic model, Discrimination, Time-dependent NRI and IDI ^

Selection bias and the cross-validation of regression models for prediction

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Strategies are compared for the development of a linear regression model with stochastic (multivariate normal) regressor variables and the subsequent assessment of its predictive ability. Bias and mean squared error of four estimators of predictive performance are evaluated in simulated samples of 32 population correlation matrices. Models including all of the available predictors are compared with those obtained using selected subsets. The subset selection procedures investigated include two stopping rules, C$\sb{\rm p}$ and S$\sb{\rm p}$, each combined with an 'all possible subsets' or 'forward selection' of variables. The estimators of performance utilized include parametric (MSEP$\sb{\rm m}$) and non-parametric (PRESS) assessments in the entire sample, and two data splitting estimates restricted to a random or balanced (Snee's DUPLEX) 'validation' half sample. The simulations were performed as a designed experiment, with population correlation matrices representing a broad range of data structures.^ The techniques examined for subset selection do not generally result in improved predictions relative to the full model. Approaches using 'forward selection' result in slightly smaller prediction errors and less biased estimators of predictive accuracy than 'all possible subsets' approaches but no differences are detected between the performances of C$\sb{\rm p}$ and S$\sb{\rm p}$. In every case, prediction errors of models obtained by subset selection in either of the half splits exceed those obtained using all predictors and the entire sample.^ Only the random split estimator is conditionally (on $\\beta$) unbiased, however MSEP$\sb{\rm m}$ is unbiased on average and PRESS is nearly so in unselected (fixed form) models. When subset selection techniques are used, MSEP$\sb{\rm m}$ and PRESS always underestimate prediction errors, by as much as 27 percent (on average) in small samples. Despite their bias, the mean squared errors (MSE) of these estimators are at least 30 percent less than that of the unbiased random split estimator. The DUPLEX split estimator suffers from large MSE as well as bias, and seems of little value within the context of stochastic regressor variables.^ To maximize predictive accuracy while retaining a reliable estimate of that accuracy, it is recommended that the entire sample be used for model development, and a leave-one-out statistic (e.g. PRESS) be used for assessment. ^

A Bayesian approach to estimating the intraclass correlation coefficient

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Bayesian approach to estimating the intraclass correlation coefficient was used for this research project. The background of the intraclass correlation coefficient, a summary of its standard estimators, and a review of basic Bayesian terminology and methodology were presented. The conditional posterior density of the intraclass correlation coefficient was then derived and estimation procedures related to this derivation were shown in detail. Three examples of applications of the conditional posterior density to specific data sets were also included. Two sets of simulation experiments were performed to compare the mean and mode of the conditional posterior density of the intraclass correlation coefficient to more traditional estimators. Non-Bayesian methods of estimation used were: the methods of analysis of variance and maximum likelihood for balanced data; and the methods of MIVQUE (Minimum Variance Quadratic Unbiased Estimation) and maximum likelihood for unbalanced data. The overall conclusion of this research project was that Bayesian estimates of the intraclass correlation coefficient can be appropriate, useful and practical alternatives to traditional methods of estimation. ^

POLYCHOTOMOUS LOGISTIC REGRESSION ANALYSIS (HYPERTENSION, CORONARY HEART DISEASE)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^

MEASURING THE EFFECT OF ILLNESS ON THE EXPECTED REMAINING SERVICE TIME FOR THE ACTIVE DUTY ARMY: A LIFE TABLE APPROACH

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A life table methodology was developed which estimates the expected remaining Army service time and the expected remaining Army sick time by years of service for the United States Army population. A measure of illness impact was defined as the ratio of expected remaining Army sick time to the expected remaining Army service time. The variances of the resulting estimators were developed on the basis of current data. The theory of partial and complete competing risks was considered for each type of decrement (death, administrative separation, and medical separation) and for the causes of sick time.^ The methodology was applied to world-wide U.S. Army data for calendar year 1978. A total of 669,493 enlisted personnel and 97,704 officers were reported on active duty as of 30 September 1978. During calendar year 1978, the Army Medical Department reported 114,647 inpatient discharges and 1,767,146 sick days. Although the methodology is completely general with respect to the definition of sick time, only sick time associated with an inpatient episode was considered in this study.^ Since the temporal measure was years of Army service, an age-adjusting process was applied to the life tables for comparative purposes. Analyses were conducted by rank (enlisted and officer), race and sex, and were based on the ratio of expected remaining Army sick time to expected remaining Army service time. Seventeen major diagnostic groups, classified by the Eighth Revision, International Classification of Diseases, Adapted for Use In The United States, were ranked according to their cumulative (across years of service) contribution to expected remaining sick time.^ The study results indicated that enlisted personnel tend to have more expected hospital-associated sick time relative to their expected Army service time than officers. Non-white officers generally have more expected sick time relative to their expected Army service time than white officers. This racial differential was not supported within the enlisted population. Females tend to have more expected sick time relative to their expected Army service time than males. This tendency remained after diagnostic groups 580-629 (Genitourinary System) and 630-678 (Pregnancy and Childbirth) were removed. Problems associated with the circulatory system, digestive system and musculoskeletal system were among the three leading causes of cumulative sick time across years of service. ^

A hybrid method in combining treatment effects from matched and unmatched studies

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the biomedical studies, the general data structures have been the matched (paired) and unmatched designs. Recently, many researchers are interested in Meta-Analysis to obtain a better understanding from several clinical data of a medical treatment. The hybrid design, which is combined two data structures, may create the fundamental question for statistical methods and the challenges for statistical inferences. The applied methods are depending on the underlying distribution. If the outcomes are normally distributed, we would use the classic paired and two independent sample T-tests on the matched and unmatched cases. If not, we can apply Wilcoxon signed rank and rank sum test on each case. ^ To assess an overall treatment effect on a hybrid design, we can apply the inverse variance weight method used in Meta-Analysis. On the nonparametric case, we can use a test statistic which is combined on two Wilcoxon test statistics. However, these two test statistics are not in same scale. We propose the Hybrid Test Statistic based on the Hodges-Lehmann estimates of the treatment effects, which are medians in the same scale.^ To compare the proposed method, we use the classic meta-analysis T-test statistic on the combined the estimates of the treatment effects from two T-test statistics. Theoretically, the efficiency of two unbiased estimators of a parameter is the ratio of their variances. With the concept of Asymptotic Relative Efficiency (ARE) developed by Pitman, we show ARE of the hybrid test statistic relative to classic meta-analysis T-test statistic using the Hodges-Lemann estimators associated with two test statistics.^ From several simulation studies, we calculate the empirical type I error rate and power of the test statistics. The proposed statistic would provide effective tool to evaluate and understand the treatment effect in various public health studies as well as clinical trials.^

«
1
2
...
29
30
31
32
33
34
35
...
40
41
»