3 resultados para SUPPORT VECTOR MACHINES

em Duke University


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Empirical studies of education programs and systems, by nature, rely upon use of student outcomes that are measurable. Often, these come in the form of test scores. However, in light of growing evidence about the long-run importance of other student skills and behaviors, the time has come for a broader approach to evaluating education. This dissertation undertakes experimental, quasi-experimental, and descriptive analyses to examine social, behavioral, and health-related mechanisms of the educational process. My overarching research question is simply, which inside- and outside-the-classroom features of schools and educational interventions are most beneficial to students in the long term? Furthermore, how can we apply this evidence toward informing policy that could effectively reduce stark social, educational, and economic inequalities?

The first study of three assesses mechanisms by which the Fast Track project, a randomized intervention in the early 1990s for high-risk children in four communities (Durham, NC; Nashville, TN; rural PA; and Seattle, WA), reduced delinquency, arrests, and health and mental health service utilization in adolescence through young adulthood (ages 12-20). A decomposition of treatment effects indicates that about a third of Fast Track’s impact on later crime outcomes can be accounted for by improvements in social and self-regulation skills during childhood (ages 6-11), such as prosocial behavior, emotion regulation and problem solving. These skills proved less valuable for the prevention of mental and physical health problems.

The second study contributes new evidence on how non-instructional investments – such as increased spending on school social workers, guidance counselors, and health services – affect multiple aspects of student performance and well-being. Merging several administrative data sources spanning the 1996-2013 school years in North Carolina, I use an instrumental variables approach to estimate the extent to which local expenditure shifts affect students’ academic and behavioral outcomes. My findings indicate that exogenous increases in spending on non-instructional services not only reduce student absenteeism and disciplinary problems (important predictors of long-term outcomes) but also significantly raise student achievement, in similar magnitude to corresponding increases in instructional spending. Furthermore, subgroup analyses suggest that investments in student support personnel such as social workers, health services, and guidance counselors, in schools with concentrated low-income student populations could go a long way toward closing socioeconomic achievement gaps.

The third study examines individual pathways that lead to high school graduation or dropout. It employs a variety of machine learning techniques, including decision trees, random forests with bagging and boosting, and support vector machines, to predict student dropout using longitudinal administrative data from North Carolina. I consider a large set of predictor measures from grades three through eight including academic achievement, behavioral indicators, and background characteristics. My findings indicate that the most important predictors include eighth grade absences, math scores, and age-for-grade as well as early reading scores. Support vector classification (with a high cost parameter and low gamma parameter) predicts high school dropout with the highest overall validity in the testing dataset at 90.1 percent followed by decision trees with boosting and interaction terms at 89.5 percent.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose: To build a model that will predict the survival time for patients that were treated with stereotactic radiosurgery for brain metastases using support vector machine (SVM) regression.

Methods and Materials: This study utilized data from 481 patients, which were equally divided into training and validation datasets randomly. The SVM model used a Gaussian RBF function, along with various parameters, such as the size of the epsilon insensitive region and the cost parameter (C) that are used to control the amount of error tolerated by the model. The predictor variables for the SVM model consisted of the actual survival time of the patient, the number of brain metastases, the graded prognostic assessment (GPA) and Karnofsky Performance Scale (KPS) scores, prescription dose, and the largest planning target volume (PTV). The response of the model is the survival time of the patient. The resulting survival time predictions were analyzed against the actual survival times by single parameter classification and two-parameter classification. The predicted mean survival times within each classification were compared with the actual values to obtain the confidence interval associated with the model’s predictions. In addition to visualizing the data on plots using the means and error bars, the correlation coefficients between the actual and predicted means of the survival times were calculated during each step of the classification.

Results: The number of metastases and KPS scores, were consistently shown to be the strongest predictors in the single parameter classification, and were subsequently used as first classifiers in the two-parameter classification. When the survival times were analyzed with the number of metastases as the first classifier, the best correlation was obtained for patients with 3 metastases, while patients with 4 or 5 metastases had significantly worse results. When the KPS score was used as the first classifier, patients with a KPS score of 60 and 90/100 had similar strong correlation results. These mixed results are likely due to the limited data available for patients with more than 3 metastases or KPS scores of 60 or less.

Conclusions: The number of metastases and the KPS score both showed to be strong predictors of patient survival time. The model was less accurate for patients with more metastases and certain KPS scores due to the lack of training data.