919 resultados para Rank regression
Resumo:
Objective HE4 has emerged as a promising biomarker in gynaecological oncology. The purpose of this study was to evaluate serum HE4 as a biomarker for high-risk phenotypes in a population-based endometrial cancer cohort. Methods Peri-operative serum HE4 and CA125 were measured in 373 patients identified from the prospective Australian National Endometrial Cancer Study (ANECS). HE4 and CA125 were quantified on the ARCHITECT instrument in a clinically accredited laboratory. Receiver operator curves (ROC), Spearman rank correlation coefficient, and chi-squared and Mann–Whitney tests were used for statistical analysis. Survival analysis was performed using Kaplan–Meier and Cox multivariate regression analyses. Results Median CA125 and HE4 levels were higher in stage III and IV tumours (p < 0.001) and in tumours with outer-half myometrial invasion (p < 0.001). ROC analysis demonstrated that HE4 (area under the curve (AUC) = 0.76) was a better predictor of outer-half myometrial invasion than CA125 (AUC = 0.65), particularly in patients with low-grade endometrioid tumours (AUC 0.77 vs 0.64 for CA125). Cox multivariate analysis demonstrated that elevated HE4 was an independent predictor of recurrence-free survival (HR = 2.40, 95% CI 1.19–4.83, p = 0.014) after adjusting for stage and grade of disease, particularly in the endometrioid subtype (HR = 2.86, 95% CI 1.25–6.51, p = 0.012). Conclusion These findings demonstrate the utility of serum HE4 as a prognostic biomarker in endometrial cancer in a large, population-based study. In particular they highlight the utility of HE4 for pre-operative risk stratification to identify high-risk patients within low-grade endometrioid endometrial cancer patients who might benefit from lymphadenectomy.
Resumo:
Hot spot identification (HSID) aims to identify potential sites—roadway segments, intersections, crosswalks, interchanges, ramps, etc.—with disproportionately high crash risk relative to similar sites. An inefficient HSID methodology might result in either identifying a safe site as high risk (false positive) or a high risk site as safe (false negative), and consequently lead to the misuse the available public funds, to poor investment decisions, and to inefficient risk management practice. Current HSID methods suffer from issues like underreporting of minor injury and property damage only (PDO) crashes, challenges of accounting for crash severity into the methodology, and selection of a proper safety performance function to model crash data that is often heavily skewed by a preponderance of zeros. Addressing these challenges, this paper proposes a combination of a PDO equivalency calculation and quantile regression technique to identify hot spots in a transportation network. In particular, issues related to underreporting and crash severity are tackled by incorporating equivalent PDO crashes, whilst the concerns related to the non-count nature of equivalent PDO crashes and the skewness of crash data are addressed by the non-parametric quantile regression technique. The proposed method identifies covariate effects on various quantiles of a population, rather than the population mean like most methods in practice, which more closely corresponds with how black spots are identified in practice. The proposed methodology is illustrated using rural road segment data from Korea and compared against the traditional EB method with negative binomial regression. Application of a quantile regression model on equivalent PDO crashes enables identification of a set of high-risk sites that reflect the true safety costs to the society, simultaneously reduces the influence of under-reported PDO and minor injury crashes, and overcomes the limitation of traditional NB model in dealing with preponderance of zeros problem or right skewed dataset.
Resumo:
Visual localization in outdoor environments is often hampered by the natural variation in appearance caused by such things as weather phenomena, diurnal fluctuations in lighting, and seasonal changes. Such changes are global across an environment and, in the case of global light changes and seasonal variation, the change in appearance occurs in a regular, cyclic manner. Visual localization could be greatly improved if it were possible to predict the appearance of a particular location at a particular time, based on the appearance of the location in the past and knowledge of the nature of appearance change over time. In this paper, we investigate whether global appearance changes in an environment can be learned sufficiently to improve visual localization performance. We use time of day as a test case, and generate transformations between morning and afternoon using sample images from a training set. We demonstrate the learned transformation can be generalized from training data and show the resulting visual localization on a test set is improved relative to raw image comparison. The improvement in localization remains when the area is revisited several weeks later.
Resumo:
Due to knowledge gaps in relation to urban stormwater quality processes, an in-depth understanding of model uncertainty can enhance decision making. Uncertainty in stormwater quality models can originate from a range of sources such as the complexity of urban rainfall-runoff-stormwater pollutant processes and the paucity of observed data. Unfortunately, studies relating to epistemic uncertainty, which arises from the simplification of reality are limited and often deemed mostly unquantifiable. This paper presents a statistical modelling framework for ascertaining epistemic uncertainty associated with pollutant wash-off under a regression modelling paradigm using Ordinary Least Squares Regression (OLSR) and Weighted Least Squares Regression (WLSR) methods with a Bayesian/Gibbs sampling statistical approach. The study results confirmed that WLSR assuming probability distributed data provides more realistic uncertainty estimates of the observed and predicted wash-off values compared to OLSR modelling. It was also noted that the Bayesian/Gibbs sampling approach is superior compared to the most commonly adopted classical statistical and deterministic approaches commonly used in water quality modelling. The study outcomes confirmed that the predication error associated with wash-off replication is relatively higher due to limited data availability. The uncertainty analysis also highlighted the variability of the wash-off modelling coefficient k as a function of complex physical processes, which is primarily influenced by surface characteristics and rainfall intensity.
Resumo:
A known limitation of the Probability Ranking Principle (PRP) is that it does not cater for dependence between documents. Recently, the Quantum Probability Ranking Principle (QPRP) has been proposed, which implicitly captures dependencies between documents through “quantum interference”. This paper explores whether this new ranking principle leads to improved performance for subtopic retrieval, where novelty and diversity is required. In a thorough empirical investigation, models based on the PRP, as well as other recently proposed ranking strategies for subtopic retrieval (i.e. Maximal Marginal Relevance (MMR) and Portfolio Theory(PT)), are compared against the QPRP. On the given task, it is shown that the QPRP outperforms these other ranking strategies. And unlike MMR and PT, one of the main advantages of the QPRP is that no parameter estimation/tuning is required; making the QPRP both simple and effective. This research demonstrates that the application of quantum theory to problems within information retrieval can lead to significant improvements.
Resumo:
In this paper we define two models of users that require diversity in search results; these models are theoretically grounded in the notion of intrinsic and extrinsic diversity. We then examine Intent-Aware Expected Reciprocal Rank (ERR-IA), one of the official measures used to assess diversity in TREC 2011-12, with respect to the proposed user models. By analyzing ranking preferences as expressed by the user models and those estimated by ERR-IA, we investigate whether ERR-IA assesses document rankings according to the requirements of the diversity retrieval task expressed by the two models. Empirical results demonstrate that ERR-IA neglects query-intents coverage by attributing excessive importance to redundant relevant documents. ERR-IA behavior is contrary to the user models that require measures to first assess diversity through the coverage of intents, and then assess the redundancy of relevant intents. Furthermore, diversity should be considered separately from document relevance and the documents positions in the ranking.
Resumo:
PURPOSE To examine correlates and consequences of parents' encouragement of girls' physical activity (PA) for weight loss (ENCLOSS). METHODS Data were collected for 181 girls, mothers and fathers when girls were 9, 11, and 13 years old. Mothers and fathers completed a self-report questionnaire of ENCLOSS (e.g., “I have talked to my daughter about how to exercise to lose weight”). Correlates of ENCLOSS that were assessed include girls' Body Mass Index (BMI) z-score and parents' modeling of and logistic support for PA. Dependent variables assessed at age 13 include girls' self-reported and objectively-measured PA, enjoyment of physical activity, and weight concerns. Associations between ENCLOSS, girls' BMI, and parent's support for PA were assessed using spearman rank correlations. To examine links between ENCLOSS and the outcome variables, scores for ENCLOSS were divided into tertiles at each age. Three groups were created including girls who were in the highest tertile at each age (high ENCLOSS), girls who were in the lowest tertile at each age (low ENCLOSS), and girls who varied in their tertile ranking (mid ENCLOSS). Group differences in the outcome variables were assessed using regression analysis (referent group: low ENCLOSS), controlling for girls' BMI and the outcome variable at age 9. RESULTS Girls' with higher BMI had mothers and fathers who reported higher ENCLOSS (r = .61-. 69, p<. 0001). Parents'reports of ENCLOSS were not associated with modeling of or logistic support for PA. Girls in the high ENCLOSS group reported significantly lower enjoyment of PA and higher weight concerns at age 13, independent of covariates. No differences in PA were noted. CONCLUSION Parents who encourage their daughters to be active for weight loss do not model PA or facilitate girls' PA. Persistent encouragement of PA for weight loss may lead to low enjoyment of PA and higher weight concerns among adolescent girls.
Resumo:
This paper develops a semiparametric estimation approach for mixed count regression models based on series expansion for the unknown density of the unobserved heterogeneity. We use the generalized Laguerre series expansion around a gamma baseline density to model unobserved heterogeneity in a Poisson mixture model. We establish the consistency of the estimator and present a computational strategy to implement the proposed estimation techniques in the standard count model as well as in truncated, censored, and zero-inflated count regression models. Monte Carlo evidence shows that the finite sample behavior of the estimator is quite good. The paper applies the method to a model of individual shopping behavior. © 1999 Elsevier Science S.A. All rights reserved.
Resumo:
Existing crowd counting algorithms rely on holistic, local or histogram based features to capture crowd properties. Regression is then employed to estimate the crowd size. Insufficient testing across multiple datasets has made it difficult to compare and contrast different methodologies. This paper presents an evaluation across multiple datasets to compare holistic, local and histogram based methods, and to compare various image features and regression models. A K-fold cross validation protocol is followed to evaluate the performance across five public datasets: UCSD, PETS 2009, Fudan, Mall and Grand Central datasets. Image features are categorised into five types: size, shape, edges, keypoints and textures. The regression models evaluated are: Gaussian process regression (GPR), linear regression, K nearest neighbours (KNN) and neural networks (NN). The results demonstrate that local features outperform equivalent holistic and histogram based features; optimal performance is observed using all image features except for textures; and that GPR outperforms linear, KNN and NN regression
Resumo:
Land-use regression (LUR) is a technique that can improve the accuracy of air pollution exposure assessment in epidemiological studies. Most LUR models are developed for single cities, which places limitations on their applicability to other locations. We sought to develop a model to predict nitrogen dioxide (NO2) concentrations with national coverage of Australia by using satellite observations of tropospheric NO2 columns combined with other predictor variables. We used a generalised estimating equation (GEE) model to predict annual and monthly average ambient NO2 concentrations measured by a national monitoring network from 2006 through 2011. The best annual model explained 81% of spatial variation in NO2 (absolute RMS error=1.4 ppb), while the best monthly model explained 76% (absolute RMS error=1.9 ppb). We applied our models to predict NO2 concentrations at the ~350,000 census mesh blocks across the country (a mesh block is the smallest spatial unit in the Australian census). National population-weighted average concentrations ranged from 7.3 ppb (2006) to 6.3 ppb (2011). We found that a simple approach using tropospheric NO2 column data yielded models with slightly better predictive ability than those produced using a more involved approach that required simulation of surface-to-column ratios. The models were capable of capturing within-urban variability in NO2, and offer the ability to estimate ambient NO2 concentrations at monthly and annual time scales across Australia from 2006–2011. We are making our model predictions freely available for research.
Resumo:
To enhance the efficiency of regression parameter estimation by modeling the correlation structure of correlated binary error terms in quantile regression with repeated measurements, we propose a Gaussian pseudolikelihood approach for estimating correlation parameters and selecting the most appropriate working correlation matrix simultaneously. The induced smoothing method is applied to estimate the covariance of the regression parameter estimates, which can bypass density estimation of the errors. Extensive numerical studies indicate that the proposed method performs well in selecting an accurate correlation structure and improving regression parameter estimation efficiency. The proposed method is further illustrated by analyzing a dental dataset.
Resumo:
In the Bayesian framework a standard approach to model criticism is to compare some function of the observed data to a reference predictive distribution. The result of the comparison can be summarized in the form of a p-value, and it's well known that computation of some kinds of Bayesian predictive p-values can be challenging. The use of regression adjustment approximate Bayesian computation (ABC) methods is explored for this task. Two problems are considered. The first is the calibration of posterior predictive p-values so that they are uniformly distributed under some reference distribution for the data. Computation is difficult because the calibration process requires repeated approximation of the posterior for different data sets under the reference distribution. The second problem considered is approximation of distributions of prior predictive p-values for the purpose of choosing weakly informative priors in the case where the model checking statistic is expensive to compute. Here the computation is difficult because of the need to repeatedly sample from a prior predictive distribution for different values of a prior hyperparameter. In both these problems we argue that high accuracy in the computations is not required, which makes fast approximations such as regression adjustment ABC very useful. We illustrate our methods with several samples.