4 resultados para Random Forests Classifier
em Dalarna University College Electronic Archive
Resumo:
In a global economy, manufacturers mainly compete with cost efficiency of production, as the price of raw materials are similar worldwide. Heavy industry has two big issues to deal with. On the one hand there is lots of data which needs to be analyzed in an effective manner, and on the other hand making big improvements via investments in cooperate structure or new machinery is neither economically nor physically viable. Machine learning offers a promising way for manufacturers to address both these problems as they are in an excellent position to employ learning techniques with their massive resource of historical production data. However, choosing modelling a strategy in this setting is far from trivial and this is the objective of this article. The article investigates characteristics of the most popular classifiers used in industry today. Support Vector Machines, Multilayer Perceptron, Decision Trees, Random Forests, and the meta-algorithms Bagging and Boosting are mainly investigated in this work. Lessons from real-world implementations of these learners are also provided together with future directions when different learners are expected to perform well. The importance of feature selection and relevant selection methods in an industrial setting are further investigated. Performance metrics have also been discussed for the sake of completion.
Resumo:
Unplanned hospital readmissions increase health and medical care costs and indicate lower the lower quality of the healthcare services. Hence, predicting patients at risk to be readmitted is of interest. Using administrative data of patients being treated in the medical centers and hospitals in the Dalarna County, Sweden, during 2008 – 2016 two risk prediction models of hospital readmission are built. The first model relies on the logistic regression (LR) approach, predicts correctly 2,648 out of 3,392 observed readmission in the test dataset, reaching a c-statistics of 0.69. The second model is built using random forests (RF) algorithm; correctly predicts 2,183 readmission (out of 3,366) and 13,198 non-readmission events (out of 18,982). The discriminating ability of the best performing RF model (c-statistic 0.60) is comparable to that of the logistic model. Although the discriminating ability of both LR and RF risk prediction models is relatively modest, still these models are capable to identify patients running high risk of hospital readmission. These patients can then be targeted with specific interventions, in order to prevent the readmission, improve patients’ quality of life and reduce health and medical care costs.
Resumo:
Parkinson’s disease is a clinical syndrome manifesting with slowness and instability. As it is a progressive disease with varying symptoms, repeated assessments are necessary to determine the outcome of treatment changes in the patient. In the recent past, a computer-based method was developed to rate impairment in spiral drawings. The downside of this method is that it cannot separate the bradykinetic and dyskinetic spiral drawings. This work intends to construct the computer method which can overcome this weakness by using the Hilbert-Huang Transform (HHT) of tangential velocity. The work is done under supervised learning, so a target class is used which is acquired from a neurologist using a web interface. After reducing the dimension of HHT features by using PCA, classification is performed. C4.5 classifier is used to perform the classification. Results of the classification are close to random guessing which shows that the computer method is unsuccessful in assessing the cause of drawing impairment in spirals when evaluated against human ratings. One promising reason is that there is no difference between the two classes of spiral drawings. Displaying patients self ratings along with the spirals in the web application is another possible reason for this, as the neurologist may have relied too much on this in his own ratings.
Resumo:
Random effect models have been widely applied in many fields of research. However, models with uncertain design matrices for random effects have been little investigated before. In some applications with such problems, an expectation method has been used for simplicity. This method does not include the extra information of uncertainty in the design matrix is not included. The closed solution for this problem is generally difficult to attain. We therefore propose an two-step algorithm for estimating the parameters, especially the variance components in the model. The implementation is based on Monte Carlo approximation and a Newton-Raphson-based EM algorithm. As an example, a simulated genetics dataset was analyzed. The results showed that the proportion of the total variance explained by the random effects was accurately estimated, which was highly underestimated by the expectation method. By introducing heuristic search and optimization methods, the algorithm can possibly be developed to infer the 'model-based' best design matrix and the corresponding best estimates.