7 resultados para selection methods

em Duke University


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences. © 2010 American Statistical Association.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Few educational resources have been developed to inform patients' renal replacement therapy (RRT) selection decisions. Patients progressing toward end stage renal disease (ESRD) must decide among multiple treatment options with varying characteristics. Complex information about treatments must be adequately conveyed to patients with different educational backgrounds and informational needs. Decisions about treatment options also require family input, as families often participate in patients' treatment and support patients' decisions. We describe the development, design, and preliminary evaluation of an informational, evidence-based, and patient-and family-centered decision aid for patients with ESRD and varying levels of health literacy, health numeracy, and cognitive function. METHODS: We designed a decision aid comprising a complementary video and informational handbook. We based our development process on data previously obtained from qualitative focus groups and systematic literature reviews. We simultaneously developed the video and handbook in "stages." For the video, stages included (1) directed interviews with culturally appropriate patients and families and preliminary script development, (2) video production, and (3) screening the video with patients and their families. For the handbook, stages comprised (1) preliminary content design, (2) a mixed-methods pilot study among diverse patients to assess comprehension of handbook material, and (3) screening the handbook with patients and their families. RESULTS: The video and handbook both addressed potential benefits and trade-offs of treatment selections. The 50-minute video consisted of demographically diverse patients and their families describing their positive and negative experiences with selecting a treatment option. The video also incorporated health professionals' testimonials regarding various considerations that might influence patients' and families' treatment selections. The handbook was comprised of written words, pictures of patients and health care providers, and diagrams describing the findings and quality of scientific studies comparing treatments. The handbook text was written at a 4th to 6th grade reading level. Pilot study results demonstrated that a majority of patients could understand information presented in the handbook. Patient and families screening the nearly completed video and handbook reviewed the materials favorably. CONCLUSIONS: This rigorously designed decision aid may help patients and families make informed decisions about their treatment options for RRT that are well aligned with their values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Little is known regarding the types of information African American and non-African American patients with chronic kidney disease (CKD) and their families need to inform renal replacement therapy (RRT) decisions. METHODS: In 20 structured group interviews, we elicited views of African American and non-African American patients with CKD and their families about factors that should be addressed in educational materials informing patients' RRT selection decisions. We asked participants to select factors from a list and obtained their open-ended feedback. RESULTS: Ten groups of patients (5 African American, 5 non-African American; total 68 individuals) and ten groups of family members (5 African American, 5 non-African American; total 62 individuals) participated. Patients and families had a range (none to extensive) of experiences with various RRTs. Patients identified morbidity or mortality, autonomy, treatment delivery, and symptoms as important factors to address. Family members identified similar factors but also cited the effects of RRT decisions on patients' psychological well-being and finances. Views of African American and non-African American participants were largely similar. CONCLUSIONS: Educational resources addressing the influence of RRT selection on patients' morbidity and mortality, autonomy, treatment delivery, and symptoms could help patients and their families select RRT options closely aligned with their values. Including information about the influence of RRT selection on patients' personal relationships and finances could enhance resources' cultural relevance for African Americans.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2014, The International Biometric Society.A potential venue to improve healthcare efficiency is to effectively tailor individualized treatment strategies by incorporating patient level predictor information such as environmental exposure, biological, and genetic marker measurements. Many useful statistical methods for deriving individualized treatment rules (ITR) have become available in recent years. Prior to adopting any ITR in clinical practice, it is crucial to evaluate its value in improving patient outcomes. Existing methods for quantifying such values mainly consider either a single marker or semi-parametric methods that are subject to bias under model misspecification. In this article, we consider a general setting with multiple markers and propose a two-step robust method to derive ITRs and evaluate their values. We also propose procedures for comparing different ITRs, which can be used to quantify the incremental value of new markers in improving treatment selection. While working models are used in step I to approximate optimal ITRs, we add a layer of calibration to guard against model misspecification and further assess the value of the ITR non-parametrically, which ensures the validity of the inference. To account for the sampling variability of the estimated rules and their corresponding values, we propose a resampling procedure to provide valid confidence intervals for the value functions as well as for the incremental value of new markers for treatment selection. Our proposals are examined through extensive simulation studies and illustrated with the data from a clinical trial that studies the effects of two drug combinations on HIV-1 infected patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.

The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.

Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.