835 resultados para Ranked Regression
Resumo:
Prognostic procedures can be based on ranked linear models. Ranked regression type models are designed on the basis of feature vectors combined with set of relations defined on selected pairs of these vectors. Feature vectors are composed of numerical results of measurements on particular objects or events. Ranked relations defined on selected pairs of feature vectors represent additional knowledge and can reflect experts' opinion about considered objects. Ranked models have the form of linear transformations of feature vectors on a line which preserve a given set of relations in the best manner possible. Ranked models can be designed through the minimization of a special type of convex and piecewise linear (CPL) criterion functions. Some sets of ranked relations cannot be well represented by one ranked model. Decomposition of global model into a family of local ranked models could improve representation. A procedures of ranked models decomposition is described in this paper.
Resumo:
This article is motivated by a lung cancer study where a regression model is involved and the response variable is too expensive to measure but the predictor variable can be measured easily with relatively negligible cost. This situation occurs quite often in medical studies, quantitative genetics, and ecological and environmental studies. In this article, by using the idea of ranked-set sampling (RSS), we develop sampling strategies that can reduce cost and increase efficiency of the regression analysis for the above-mentioned situation. The developed method is applied retrospectively to a lung cancer study. In the lung cancer study, the interest is to investigate the association between smoking status and three biomarkers: polyphenol DNA adducts, micronuclei, and sister chromatic exchanges. Optimal sampling schemes with different optimality criteria such as A-, D-, and integrated mean square error (IMSE)-optimality are considered in the application. With set size 10 in RSS, the improvement of the optimal schemes over simple random sampling (SRS) is great. For instance, by using the optimal scheme with IMSE-optimality, the IMSEs of the estimated regression functions for the three biomarkers are reduced to about half of those incurred by using SRS.
Resumo:
Purpose: To determine whether curve-fitting analysis of the ranked segment distributions of topographic optic nerve head (ONH) parameters, derived using the Heidelberg Retina Tomograph (HRT), provide a more effective statistical descriptor to differentiate the normal from the glaucomatous ONH. Methods: The sample comprised of 22 normal control subjects (mean age 66.9 years; S.D. 7.8) and 22 glaucoma patients (mean age 72.1 years; S.D. 6.9) confirmed by reproducible visual field defects on the Humphrey Field Analyser. Three 10°-images of the ONH were obtained using the HRT. The mean topography image was determined and the HRT software was used to calculate the rim volume, rim area to disc area ratio, normalised rim area to disc area ratio and retinal nerve fibre cross-sectional area for each patient at 10°-sectoral intervals. The values were ranked in descending order, and each ranked-segment curve of ordered values was fitted using the least squares method. Results: There was no difference in disc area between the groups. The group mean cup-disc area ratio was significantly lower in the normal group (0.204 ± 0.16) compared with the glaucoma group (0.533 ± 0.083) (p < 0.001). The visual field indices, mean deviation and corrected pattern S.D., were significantly greater (p < 0.001) in the glaucoma group (-9.09 dB ± 3.3 and 7.91 ± 3.4, respectively) compared with the normal group (-0.15 dB ± 0.9 and 0.95 dB ± 0.8, respectively). Univariate linear regression provided the best overall fit to the ranked segment data. The equation parameters of the regression line manually applied to the normalised rim area-disc area and the rim area-disc area ratio data, correctly classified 100% of normal subjects and glaucoma patients. In this study sample, the regression analysis of ranked segment parameters method was more effective than conventional ranked segment analysis, in which glaucoma patients were misclassified in approximately 50% of cases. Further investigation in larger samples will enable the calculation of confidence intervals for normality. These reference standards will then need to be investigated for an independent sample to fully validate the technique. Conclusions: Using a curve-fitting approach to fit ranked segment curves retains information relating to the topographic nature of neural loss. Such methodology appears to overcome some of the deficiencies of conventional ranked segment analysis, and subject to validation in larger scale studies, may potentially be of clinical utility for detecting and monitoring glaucomatous damage. © 2007 The College of Optometrists.
Resumo:
Sampling strategies are developed based on the idea of ranked set sampling (RSS) to increase efficiency and therefore to reduce the cost of sampling in fishery research. The RSS incorporates information on concomitant variables that are correlated with the variable of interest in the selection of samples. For example, estimating a monitoring survey abundance index would be more efficient if the sampling sites were selected based on the information from previous surveys or catch rates of the fishery. We use two practical fishery examples to demonstrate the approach: site selection for a fishery-independent monitoring survey in the Australian northern prawn fishery (NPF) and fish age prediction by simple linear regression modelling a short-lived tropical clupeoid. The relative efficiencies of the new designs were derived analytically and compared with the traditional simple random sampling (SRS). Optimal sampling schemes were measured by different optimality criteria. For the NPF monitoring survey, the efficiency in terms of variance or mean squared errors of the estimated mean abundance index ranged from 114 to 199% compared with the SRS. In the case of a fish ageing study for Tenualosa ilisha in Bangladesh, the efficiency of age prediction from fish body weight reached 140%.
Resumo:
We consider ranked-based regression models for clustered data analysis. A weighted Wilcoxon rank method is proposed to take account of within-cluster correlations and varying cluster sizes. The asymptotic normality of the resulting estimators is established. A method to estimate covariance of the estimators is also given, which can bypass estimation of the density function. Simulation studies are carried out to compare different estimators for a number of scenarios on the correlation structure, presence/absence of outliers and different correlation values. The proposed methods appear to perform well, in particular, the one incorporating the correlation in the weighting achieves the highest efficiency and robustness against misspecification of correlation structure and outliers. A real example is provided for illustration.
Resumo:
This paper considers the one-sample sign test for data obtained from general ranked set sampling when the number of observations for each rank are not necessarily the same, and proposes a weighted sign test because observations with different ranks are not identically distributed. The optimal weight for each observation is distribution free and only depends on its associated rank. It is shown analytically that (1) the weighted version always improves the Pitman efficiency for all distributions; and (2) the optimal design is to select the median from each ranked set.
Resumo:
Nahhas, Wolfe, and Chen (2002, Biometrics 58, 964-971) considered optimal set size for ranked set sampling (RSS) with fixed operational costs. This framework can be very useful in practice to determine whether RSS is beneficial and to obtain the optimal set size that minimizes the variance of the population estimator for a fixed total cost. In this article, we propose a scheme of general RSS in which more than one observation can be taken from each ranked set. This is shown to be more cost-effective in some cases when the cost of ranking is not so small. We demonstrate using the example in Nahhas, Wolfe, and Chen (2002, Biometrics 58, 964-971), by taking two or more observations from one set even with the optimal set size from the RSS design can be more beneficial.
Resumo:
We have considered a Bayesian approach for the nonlinear regression model by replacing the normal distribution on the error term by some skewed distributions, which account for both skewness and heavy tails or skewness alone. The type of data considered in this paper concerns repeated measurements taken in time on a set of individuals. Such multiple observations on the same individual generally produce serially correlated outcomes. Thus, additionally, our model does allow for a correlation between observations made from the same individual. We have illustrated the procedure using a data set to study the growth curves of a clinic measurement of a group of pregnant women from an obstetrics clinic in Santiago, Chile. Parameter estimation and prediction were carried out using appropriate posterior simulation schemes based in Markov Chain Monte Carlo methods. Besides the deviance information criterion (DIC) and the conditional predictive ordinate (CPO), we suggest the use of proper scoring rules based on the posterior predictive distribution for comparing models. For our data set, all these criteria chose the skew-t model as the best model for the errors. These DIC and CPO criteria are also validated, for the model proposed here, through a simulation study. As a conclusion of this study, the DIC criterion is not trustful for this kind of complex model.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
We propose a nonparametric variance estimator when ranked set sampling (RSS) and judgment post stratification (JPS) are applied by measuring a concomitant variable. Our proposed estimator is obtained by conditioning on observed concomitant values and using nonparametric kernel regression.
Resumo:
The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^
Resumo:
Expert elicitation is the process of retrieving and quantifying expert knowledge in a particular domain. Such information is of particular value when the empirical data is expensive, limited, or unreliable. This paper describes a new software tool, called Elicitator, which assists in quantifying expert knowledge in a form suitable for use as a prior model in Bayesian regression. Potential environmental domains for applying this elicitation tool include habitat modeling, assessing detectability or eradication, ecological condition assessments, risk analysis, and quantifying inputs to complex models of ecological processes. The tool has been developed to be user-friendly, extensible, and facilitate consistent and repeatable elicitation of expert knowledge across these various domains. We demonstrate its application to elicitation for logistic regression in a geographically based ecological context. The underlying statistical methodology is also novel, utilizing an indirect elicitation approach to target expert knowledge on a case-by-case basis. For several elicitation sites (or cases), experts are asked simply to quantify their estimated ecological response (e.g. probability of presence), and its range of plausible values, after inspecting (habitat) covariates via GIS.