9 resultados para Random Forests Classifier

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high-dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high-dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis.Additionally, we have shown the advantages of our approach over single-gene-based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to better utilize the information available from microarray data with survival outcomes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Empirical studies of education programs and systems, by nature, rely upon use of student outcomes that are measurable. Often, these come in the form of test scores. However, in light of growing evidence about the long-run importance of other student skills and behaviors, the time has come for a broader approach to evaluating education. This dissertation undertakes experimental, quasi-experimental, and descriptive analyses to examine social, behavioral, and health-related mechanisms of the educational process. My overarching research question is simply, which inside- and outside-the-classroom features of schools and educational interventions are most beneficial to students in the long term? Furthermore, how can we apply this evidence toward informing policy that could effectively reduce stark social, educational, and economic inequalities?

The first study of three assesses mechanisms by which the Fast Track project, a randomized intervention in the early 1990s for high-risk children in four communities (Durham, NC; Nashville, TN; rural PA; and Seattle, WA), reduced delinquency, arrests, and health and mental health service utilization in adolescence through young adulthood (ages 12-20). A decomposition of treatment effects indicates that about a third of Fast Track’s impact on later crime outcomes can be accounted for by improvements in social and self-regulation skills during childhood (ages 6-11), such as prosocial behavior, emotion regulation and problem solving. These skills proved less valuable for the prevention of mental and physical health problems.

The second study contributes new evidence on how non-instructional investments – such as increased spending on school social workers, guidance counselors, and health services – affect multiple aspects of student performance and well-being. Merging several administrative data sources spanning the 1996-2013 school years in North Carolina, I use an instrumental variables approach to estimate the extent to which local expenditure shifts affect students’ academic and behavioral outcomes. My findings indicate that exogenous increases in spending on non-instructional services not only reduce student absenteeism and disciplinary problems (important predictors of long-term outcomes) but also significantly raise student achievement, in similar magnitude to corresponding increases in instructional spending. Furthermore, subgroup analyses suggest that investments in student support personnel such as social workers, health services, and guidance counselors, in schools with concentrated low-income student populations could go a long way toward closing socioeconomic achievement gaps.

The third study examines individual pathways that lead to high school graduation or dropout. It employs a variety of machine learning techniques, including decision trees, random forests with bagging and boosting, and support vector machines, to predict student dropout using longitudinal administrative data from North Carolina. I consider a large set of predictor measures from grades three through eight including academic achievement, behavioral indicators, and background characteristics. My findings indicate that the most important predictors include eighth grade absences, math scores, and age-for-grade as well as early reading scores. Support vector classification (with a high cost parameter and low gamma parameter) predicts high school dropout with the highest overall validity in the testing dataset at 90.1 percent followed by decision trees with boosting and interaction terms at 89.5 percent.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Continuing our development of a mathematical theory of stochastic microlensing, we study the random shear and expected number of random lensed images of different types. In particular, we characterize the first three leading terms in the asymptotic expression of the joint probability density function (pdf) of the random shear tensor due to point masses in the limit of an infinite number of stars. Up to this order, the pdf depends on the magnitude of the shear tensor, the optical depth, and the mean number of stars through a combination of radial position and the star's mass. As a consequence, the pdf's of the shear components are seen to converge, in the limit of an infinite number of stars, to shifted Cauchy distributions, which shows that the shear components have heavy tails in that limit. The asymptotic pdf of the shear magnitude in the limit of an infinite number of stars is also presented. All the results on the random microlensing shear are given for a general point in the lens plane. Extending to the general random distributions (not necessarily uniform) of the lenses, we employ the Kac-Rice formula and Morse theory to deduce general formulas for the expected total number of images and the expected number of saddle images. We further generalize these results by considering random sources defined on a countable compact covering of the light source plane. This is done to introduce the notion of global expected number of positive parity images due to a general lensing map. Applying the result to microlensing, we calculate the asymptotic global expected number of minimum images in the limit of an infinite number of stars, where the stars are uniformly distributed. This global expectation is bounded, while the global expected number of images and the global expected number of saddle images diverge as the order of the number of stars. © 2009 American Institute of Physics.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genome rearrangement often produces chromosomes with two centromeres (dicentrics) that are inherently unstable because of bridge formation and breakage during cell division. However, mammalian dicentrics, and particularly those in humans, can be quite stable, usually because one centromere is functionally silenced. Molecular mechanisms of centromere inactivation are poorly understood since there are few systems to experimentally create dicentric human chromosomes. Here, we describe a human cell culture model that enriches for de novo dicentrics. We demonstrate that transient disruption of human telomere structure non-randomly produces dicentric fusions involving acrocentric chromosomes. The induced dicentrics vary in structure near fusion breakpoints and like naturally-occurring dicentrics, exhibit various inter-centromeric distances. Many functional dicentrics persist for months after formation. Even those with distantly spaced centromeres remain functionally dicentric for 20 cell generations. Other dicentrics within the population reflect centromere inactivation. In some cases, centromere inactivation occurs by an apparently epigenetic mechanism. In other dicentrics, the size of the alpha-satellite DNA array associated with CENP-A is reduced compared to the same array before dicentric formation. Extra-chromosomal fragments that contained CENP-A often appear in the same cells as dicentrics. Some of these fragments are derived from the same alpha-satellite DNA array as inactivated centromeres. Our results indicate that dicentric human chromosomes undergo alternative fates after formation. Many retain two active centromeres and are stable through multiple cell divisions. Others undergo centromere inactivation. This event occurs within a broad temporal window and can involve deletion of chromatin that marks the locus as a site for CENP-A maintenance/replenishment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015 IOP Publishing Ltd & London Mathematical Society.This is a detailed analysis of invariant measures for one-dimensional dynamical systems with random switching. In particular, we prove the smoothness of the invariant densities away from critical points and describe the asymptotics of the invariant densities at critical points.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015 Society for Industrial and Applied Mathematics.We consider parabolic PDEs with randomly switching boundary conditions. In order to analyze these random PDEs, we consider more general stochastic hybrid systems and prove convergence to, and properties of, a stationary distribution. Applying these general results to the heat equation with randomly switching boundary conditions, we find explicit formulae for various statistics of the solution and obtain almost sure results about its regularity and structure. These results are of particular interest for biological applications as well as for their significant departure from behavior seen in PDEs forced by disparate Gaussian noise. Our general results also have applications to other types of stochastic hybrid systems, such as ODEs with randomly switching right-hand sides.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

© 2015 Published by Elsevier B.V.Throughout the southern US, past forest management practices have replaced large areas of native forests with loblolly pine plantations and have resulted in changes in forest response to extreme weather conditions. However, uncertainty remains about the response of planted versus natural species to drought across the geographical range of these forests. Taking advantage of a cluster of unmanaged stands (85-130year-old hardwoods) and managed plantations (17-20year-old loblolly pine) in coastal and Piedmont areas of North Carolina, tree water use, cavitation resistance, whole-tree hydraulic (Ktree) and stomatal (Gs) conductances were measured in four sites covering representative forests growing in the region. We also used a hydraulic model to predict the resilience of those sites to extreme soil drying. Our objectives were to determine: (1) if Ktree and stomatal regulation in response to atmospheric and soil droughts differ between species and sites; (2) how ecosystem type, through tree water use, resistance to cavitation and rooting profiles, affects the water uptake limit that can be reached under drought; and (3) the influence of stand species composition on critical transpiration that sets a functional water uptake limit under drought conditions. The results show that across sites, water stress affected the coordination between Ktree and Gs. As soil water content dropped below 20% relative extractable water, Ktree declined faster and thus explained the decrease in Gs and in its sensitivity to vapor pressure deficit. Compared to branches, the capability of roots to resist high xylem tension has a great impact on tree-level water use and ultimately had important implications for pine plantations resistance to future summer droughts. Model simulations revealed that the decline in Ktree due to xylem cavitation aggravated the effects of soil drying on tree transpiration. The critical transpiration rate (Ecrit), which corresponds to the maximum rate at which transpiration begins to level off to prevent irreversible hydraulic failure, was higher in managed forest plantations than in their unmanaged counterparts. However, even with this higher Ecrit, the pine plantations operated very close to their critical leaf water potentials (i.e. to their permissible water potentials without total hydraulic failure), suggesting that intensively managed plantations are more drought-sensitive and can withstand less severe drought than natural forests.