53 resultados para Data selection

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this paper we report on our attempts to fit the optimal data selection (ODS) model (Oaksford Chater, 1994; Oaksford, Chater, & Larkin, 2000) to the selection task data reported in Feeney and Handley (2000) and Handley, Feeney, and Harper (2002). Although Oaksford (2002b) reports good fits to the data described in Feeney and Handley (2000), the model does not adequately capture the data described in Handley et al. (2002). Furthermore, across all six of the experiments modelled here, the ODS model does not predict participants' behaviour at the level of selection rates for individual cards. Finally, when people's probability estimates are used in the modelling exercise, the model adequately captures only I out of 18 conditions described in Handley et al. We discuss the implications of these results for models of the selection task and claim that they support deductive, rather than probabilistic, accounts of the task.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Three experiments investigated the effect of rarity on people's selection and interpretation of data in a variant of the pseudodiagnosticity task. For familiar (Experiment 1) but not for arbitrary (Experiment 3) materials, participants were more likely to select evidence so as to complete a likelihood ratio when the initial evidence they received was a single likelihood concerning a rare feature. This rarity effect with familiar materials was replicated in Experiment 2 where it was shown that participants were relatively insensitive to explicit manipulations of the likely diagnosticity of rare evidence. In contrast to the effects for data selection, there was an effect of rarity on confidence ratings after receipt of a single likelihood for arbitrary but not for familiar materials. It is suggested that selecting diagnostic evidence necessitates explicit consideration of the alternative hypothesis and that consideration of the possible consequences of the evidence for the alternative weakens the rarity effect in confidence ratings. Paradoxically, although rarity effects in evidence selection and confidence ratings are in the spirit of Bayesian reasoning, the effect on confidence ratings appears to rely on participants thinking less about the alternative hypothesis.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Objective To investigate the effects of weaning protocols on the total duration of mechanical ventilation, mortality, adverse events, quality of life, weaning duration, and length of stay in the intensive care unit and hospital.

Design Systematic review.

Data sources Cochrane Central Register of Controlled Trials, Medline, Embase, CINAHL, LILACS, ISI Web of Science, ISI Conference Proceedings, Cambridge Scientific Abstracts, and reference lists of articles. We did not apply language restrictions.

Review methods We included randomised and quasi-randomised controlled trials of weaning from mechanical ventilation with and without protocols in critically ill adults.

Data selection Three authors independently assessed trial quality and extracted data. A priori subgroup and sensitivity analyses were performed. We contacted study authors for additional information.

Results Eleven trials that included 1971 patients met the inclusion criteria. Compared with usual care, the geometric mean duration of mechanical ventilation in the weaning protocol group was reduced by 25% (95% confidence interval 9% to 39%, P=0.006; 10 trials); the duration of weaning was reduced by 78% (31% to 93%, P=0.009; six trials); and stay in the intensive care unit length by 10% (2% to 19%, P=0.02; eight trials). There was significant heterogeneity among studies for total duration of mechanical ventilation (I(2)=76%, P

Conclusion There is evidence of a reduction in the duration of mechanical ventilation, weaning, and stay in the intensive care unit when standardised weaning protocols are used, but there is significant heterogeneity among studies and an insufficient number of studies to investigate the source of this heterogeneity. Some studies suggest that organisational context could influence outcomes, but this could not be evaluated as it was outside the scope of this review.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Introduction: HIV testing is a cornerstone of efforts to combat the HIV epidemic, and testing conducted as part of surveillance provides invaluable data on the spread of infection and the effectiveness of campaigns to reduce the transmission of HIV. However, participation in HIV testing can be low, and if respondents systematically select not to be tested because they know or suspect they are HIV positive (and fear disclosure), standard approaches to deal with missing data will fail to remove selection bias. We implemented Heckman-type selection models, which can be used to adjust for missing data that are not missing at random, and established the extent of selection bias in a population-based HIV survey in an HIV hyperendemic community in rural South Africa.

Methods: We used data from a population-based HIV survey carried out in 2009 in rural KwaZulu-Natal, South Africa. In this survey, 5565 women (35%) and 2567 men (27%) provided blood for an HIV test. We accounted for missing data using interviewer identity as a selection variable which predicted consent to HIV testing but was unlikely to be independently associated with HIV status. Our approach involved using this selection variable to examine the HIV status of residents who would ordinarily refuse to test, except that they were allocated a persuasive interviewer. Our copula model allows for flexibility when modelling the dependence structure between HIV survey participation and HIV status.

Results: For women, our selection model generated an HIV prevalence estimate of 33% (95% CI 27–40) for all people eligible to consent to HIV testing in the survey. This estimate is higher than the estimate of 24% generated when only information from respondents who participated in testing is used in the analysis, and the estimate of 27% when imputation analysis is used to predict missing data on HIV status. For men, we found an HIV prevalence of 25% (95% CI 15–35) using the selection model, compared to 16% among those who participated in testing, and 18% estimated with imputation. We provide new confidence intervals that correct for the fact that the relationship between testing and HIV status is unknown and requires estimation.

Conclusions: We confirm the feasibility and value of adopting selection models to account for missing data in population-based HIV surveys and surveillance systems. Elements of survey design, such as interviewer identity, present the opportunity to adopt this approach in routine applications. Where non-participation is high, true confidence intervals are much wider than those generated by standard approaches to dealing with missing data suggest.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This study investigates face recognition with partial occlusion, illumination variation and their combination, assuming no prior information about the mismatch, and limited training data for each person. The authors extend their previous posterior union model (PUM) to give a new method capable of dealing with all these problems. PUM is an approach for selecting the optimal local image features for recognition to improve robustness to partial occlusion. The extension is in two stages. First, authors extend PUM from a probability-based formulation to a similarity-based formulation, so that it operates with as little as one single training sample to offer robustness to partial occlusion. Second, they extend this new formulation to make it robust to illumination variation, and to combined illumination variation and partial occlusion, by a novel combination of multicondition relighting and optimal feature selection. To evaluate the new methods, a number of databases with various simulated and realistic occlusion/illumination mismatches have been used. The results have demonstrated the improved robustness of the new methods.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

High-quality data from appropriate archives are needed for the continuing improvement of radiocarbon calibration curves. We discuss here the basic assumptions behind 14C dating that necessitate calibration and the relative strengths and weaknesses of archives from which calibration data are obtained. We also highlight the procedures, problems and uncertainties involved in determining atmospheric and surface ocean 14C/12C in these archives, including a discussion of the various methods used to derive an independent absolute timescale and uncertainty. The types of data required for the current IntCal database and calibration curve model are tabulated with examples.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Model selection between competing models is a key consideration in the discovery of prognostic multigene signatures. The use of appropriate statistical performance measures as well as verification of biological significance of the signatures is imperative to maximise the chance of external validation of the generated signatures. Current approaches in time-to-event studies often use only a single measure of performance in model selection, such as logrank test p-values, or dichotomise the follow-up times at some phase of the study to facilitate signature discovery. In this study we improve the prognostic signature discovery process through the application of the multivariate partial Cox model combined with the concordance index, hazard ratio of predictions, independence from available clinical covariates and biological enrichment as measures of signature performance. The proposed framework was applied to discover prognostic multigene signatures from early breast cancer data. The partial Cox model combined with the multiple performance measures were used in both guiding the selection of the optimal panel of prognostic genes and prediction of risk within cross validation without dichotomising the follow-up times at any stage. The signatures were successfully externally cross validated in independent breast cancer datasets, yielding a hazard ratio of 2.55 [1.44, 4.51] for the top ranking signature.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper investigates the gene selection problem for microarray data with small samples and variant correlation. Most existing algorithms usually require expensive computational effort, especially under thousands of gene conditions. The main objective of this paper is to effectively select the most informative genes from microarray data, while making the computational expenses affordable. This is achieved by proposing a novel forward gene selection algorithm (FGSA). To overcome the small samples' problem, the augmented data technique is firstly employed to produce an augmented data set. Taking inspiration from other gene selection methods, the L2-norm penalty is then introduced into the recently proposed fast regression algorithm to achieve the group selection ability. Finally, by defining a proper regression context, the proposed method can be fast implemented in the software, which significantly reduces computational burden. Both computational complexity analysis and simulation results confirm the effectiveness of the proposed algorithm in comparison with other approaches

Relevância:

40.00% 40.00%

Publicador:

Resumo:

his paper considers a problem of identification for a high dimensional nonlinear non-parametric system when only a limited data set is available. The algorithms are proposed for this purpose which exploit the relationship between the input variables and the output and further the inter-dependence of input variables so that the importance of the input variables can be established. A key to these algorithms is the non-parametric two stage input selection algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.