7 resultados para Data selection

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hydrologic research is a very demanding application of fiber-optic distributed temperature sensing (DTS) in terms of precision, accuracy and calibration. The physics behind the most frequently used DTS instruments are considered as they apply to four calibration methods for single-ended DTS installations. The new methods presented are more accurate than the instrument-calibrated data, achieving accuracies on the order of tenths of a degree root mean square error (RMSE) and mean bias. Effects of localized non-uniformities that violate the assumptions of single-ended calibration data are explored and quantified. Experimental design considerations such as selection of integration times or selection of the length of the reference sections are discussed, and the impacts of these considerations on calibrated temperatures are explored in two case studies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: Few educational resources have been developed to inform patients' renal replacement therapy (RRT) selection decisions. Patients progressing toward end stage renal disease (ESRD) must decide among multiple treatment options with varying characteristics. Complex information about treatments must be adequately conveyed to patients with different educational backgrounds and informational needs. Decisions about treatment options also require family input, as families often participate in patients' treatment and support patients' decisions. We describe the development, design, and preliminary evaluation of an informational, evidence-based, and patient-and family-centered decision aid for patients with ESRD and varying levels of health literacy, health numeracy, and cognitive function. METHODS: We designed a decision aid comprising a complementary video and informational handbook. We based our development process on data previously obtained from qualitative focus groups and systematic literature reviews. We simultaneously developed the video and handbook in "stages." For the video, stages included (1) directed interviews with culturally appropriate patients and families and preliminary script development, (2) video production, and (3) screening the video with patients and their families. For the handbook, stages comprised (1) preliminary content design, (2) a mixed-methods pilot study among diverse patients to assess comprehension of handbook material, and (3) screening the handbook with patients and their families. RESULTS: The video and handbook both addressed potential benefits and trade-offs of treatment selections. The 50-minute video consisted of demographically diverse patients and their families describing their positive and negative experiences with selecting a treatment option. The video also incorporated health professionals' testimonials regarding various considerations that might influence patients' and families' treatment selections. The handbook was comprised of written words, pictures of patients and health care providers, and diagrams describing the findings and quality of scientific studies comparing treatments. The handbook text was written at a 4th to 6th grade reading level. Pilot study results demonstrated that a majority of patients could understand information presented in the handbook. Patient and families screening the nearly completed video and handbook reviewed the materials favorably. CONCLUSIONS: This rigorously designed decision aid may help patients and families make informed decisions about their treatment options for RRT that are well aligned with their values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dynamics of a population undergoing selection is a central topic in evolutionary biology. This question is particularly intriguing in the case where selective forces act in opposing directions at two population scales. For example, a fast-replicating virus strain outcompetes slower-replicating strains at the within-host scale. However, if the fast-replicating strain causes host morbidity and is less frequently transmitted, it can be outcompeted by slower-replicating strains at the between-host scale. Here we consider a stochastic ball-and-urn process which models this type of phenomenon. We prove the weak convergence of this process under two natural scalings. The first scaling leads to a deterministic nonlinear integro-partial differential equation on the interval $[0,1]$ with dependence on a single parameter, $\lambda$. We show that the fixed points of this differential equation are Beta distributions and that their stability depends on $\lambda$ and the behavior of the initial data around $1$. The second scaling leads to a measure-valued Fleming-Viot process, an infinite dimensional stochastic process that is frequently associated with a population genetics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

© 2014, The International Biometric Society.A potential venue to improve healthcare efficiency is to effectively tailor individualized treatment strategies by incorporating patient level predictor information such as environmental exposure, biological, and genetic marker measurements. Many useful statistical methods for deriving individualized treatment rules (ITR) have become available in recent years. Prior to adopting any ITR in clinical practice, it is crucial to evaluate its value in improving patient outcomes. Existing methods for quantifying such values mainly consider either a single marker or semi-parametric methods that are subject to bias under model misspecification. In this article, we consider a general setting with multiple markers and propose a two-step robust method to derive ITRs and evaluate their values. We also propose procedures for comparing different ITRs, which can be used to quantify the incremental value of new markers in improving treatment selection. While working models are used in step I to approximate optimal ITRs, we add a layer of calibration to guard against model misspecification and further assess the value of the ITR non-parametrically, which ensures the validity of the inference. To account for the sampling variability of the estimated rules and their corresponding values, we propose a resampling procedure to provide valid confidence intervals for the value functions as well as for the incremental value of new markers for treatment selection. Our proposals are examined through extensive simulation studies and illustrated with the data from a clinical trial that studies the effects of two drug combinations on HIV-1 infected patients.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.

While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.

For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dynamics of a population undergoing selection is a central topic in evolutionary biology. This question is particularly intriguing in the case where selective forces act in opposing directions at two population scales. For example, a fast-replicating virus strain outcompetes slower-replicating strains at the within-host scale. However, if the fast-replicating strain causes host morbidity and is less frequently transmitted, it can be outcompeted by slower-replicating strains at the between-host scale. Here we consider a stochastic ball-and-urn process which models this type of phenomenon. We prove the weak convergence of this process under two natural scalings. The first scaling leads to a deterministic nonlinear integro-partial differential equation on the interval $[0,1]$ with dependence on a single parameter, $\lambda$. We show that the fixed points of this differential equation are Beta distributions and that their stability depends on $\lambda$ and the behavior of the initial data around $1$. The second scaling leads to a measure-valued Fleming-Viot process, an infinite dimensional stochastic process that is frequently associated with a population genetics.