956 resultados para random forest regression


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Accurate speed prediction is a crucial step in the development of a dynamic vehcile activated sign (VAS). A previous study showed that the optimal trigger speed of such signs will need to be pre-determined according to the nature of the site and to the traffic conditions. The objective of this paper is to find an accurate predictive model based on historical traffic speed data to derive the optimal trigger speed for such signs. Adaptive neuro fuzzy (ANFIS), classification and regression tree (CART) and random forest (RF) were developed to predict one step ahead speed during all times of the day. The developed models were evaluated and compared to the results obtained from artificial neural network (ANN), multiple linear regression (MLR) and naïve prediction using traffic speed data collected at four sites located in Sweden. The data were aggregated into two periods, a short term period (5-min) and a long term period (1-hour). The results of this study showed that using RF is a promising method for predicting mean speed in the two proposed periods.. It is concluded that in terms of performance and computational complexity, a simplistic input features to the predicitive model gave a marked increase in the response time of the model whilse still delivering a low prediction error.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The early detection of subjects with probable Alzheimer's disease (AD) is crucial for effective appliance of treatment strategies. Here we explored the ability of a multitude of linear and non-linear classification algorithms to discriminate between the electroencephalograms (EEGs) of patients with varying degree of AD and their age-matched control subjects. Absolute and relative spectral power, distribution of spectral power, and measures of spatial synchronization were calculated from recordings of resting eyes-closed continuous EEGs of 45 healthy controls, 116 patients with mild AD and 81 patients with moderate AD, recruited in two different centers (Stockholm, New York). The applied classification algorithms were: principal component linear discriminant analysis (PC LDA), partial least squares LDA (PLS LDA), principal component logistic regression (PC LR), partial least squares logistic regression (PLS LR), bagging, random forest, support vector machines (SVM) and feed-forward neural network. Based on 10-fold cross-validation runs it could be demonstrated that even tough modern computer-intensive classification algorithms such as random forests, SVM and neural networks show a slight superiority, more classical classification algorithms performed nearly equally well. Using random forests classification a considerable sensitivity of up to 85% and a specificity of 78%, respectively for the test of even only mild AD patients has been reached, whereas for the comparison of moderate AD vs. controls, using SVM and neural networks, values of 89% and 88% for sensitivity and specificity were achieved. Such a remarkable performance proves the value of these classification algorithms for clinical diagnostics.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Landscape structure and heterogeneity play a potentially important, but little understood role in predator-prey interactions and behaviourally-mediated habitat selection. For example, habitat complexity may either reduce or enhance the efficiency of a predator's efforts to search, track, capture, kill and consume prey. For prey, structural heterogeneity may affect predator detection, avoidance and defense, escape tactics, and the ability to exploit refuges. This study, investigates whether and how vegetation and topographic structure influence the spatial patterns and distribution of moose (Alces alces) mortality due to predation and malnutrition at the local and landscape levels on Isle Royale National Park. 230 locations where wolves (Canis lupus) killed moose during the winters between 2002 and 2010, and 182 moose starvation death sites for the period 1996-2010, were selected from the extensive Isle Royale Wolf-Moose Project carcass database. A variety of LiDAR-derived metrics were generated and used in an algorithm model (Random Forest) to identify, characterize, and classify three-dimensional variables significant to each of the mortality classes. Furthermore, spatial models to predict and assess the likelihood at the landscape scale of moose mortality were developed. This research found that the patterns of moose mortality by predation and malnutrition across the landscape are non-random, have a high degree of spatial variability, and that both mechanisms operate in contexts of comparable physiographic and vegetation structure. Wolf winter hunting locations on Isle Royale are more likely to be a result of its prey habitat selection, although they seem to prioritize the overall areas with higher moose density in the winter. Furthermore, the findings suggest that the distribution of moose mortality by predation is habitat-specific to moose, and not to wolves. In addition, moose sex, age, and health condition also affect mortality site selection, as revealed by subtle differences between sites in vegetation heights, vegetation density, and topography. Vegetation density in particular appears to differentiate mortality locations for distinct classes of moose. The results also emphasize the significance of fine-scale landscape and habitat features when addressing predator-prey interactions. These finer scale findings would be easily missed if analyses were limited to the broader landscape scale alone.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Brain tumor is one of the most aggressive types of cancer in humans, with an estimated median survival time of 12 months and only 4% of the patients surviving more than 5 years after disease diagnosis. Until recently, brain tumor prognosis has been based only on clinical information such as tumor grade and patient age, but there are reports indicating that molecular profiling of gliomas can reveal subgroups of patients with distinct survival rates. We hypothesize that coupling molecular profiling of brain tumors with clinical information might improve predictions of patient survival time and, consequently, better guide future treatment decisions. In order to evaluate this hypothesis, the general goal of this research is to build models for survival prediction of glioma patients using DNA molecular profiles (U133 Affymetrix gene expression microarrays) along with clinical information. First, a predictive Random Forest model is built for binary outcomes (i.e. short vs. long-term survival) and a small subset of genes whose expression values can be used to predict survival time is selected. Following, a new statistical methodology is developed for predicting time-to-death outcomes using Bayesian ensemble trees. Due to a large heterogeneity observed within prognostic classes obtained by the Random Forest model, prediction can be improved by relating time-to-death with gene expression profile directly. We propose a Bayesian ensemble model for survival prediction which is appropriate for high-dimensional data such as gene expression data. Our approach is based on the ensemble "sum-of-trees" model which is flexible to incorporate additive and interaction effects between genes. We specify a fully Bayesian hierarchical approach and illustrate our methodology for the CPH, Weibull, and AFT survival models. We overcome the lack of conjugacy using a latent variable formulation to model the covariate effects which decreases computation time for model fitting. Also, our proposed models provides a model-free way to select important predictive prognostic markers based on controlling false discovery rates. We compare the performance of our methods with baseline reference survival methods and apply our methodology to an unpublished data set of brain tumor survival times and gene expression data, selecting genes potentially related to the development of the disease under study. A closing discussion compares results obtained by Random Forest and Bayesian ensemble methods under the biological/clinical perspectives and highlights the statistical advantages and disadvantages of the new methodology in the context of DNA microarray data analysis.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Extraction of both pelvic and femoral surface models of a hip joint from CT data for computer-assisted pre-operative planning of hip arthroscopy is addressed. We present a method for a fully automatic image segmentation of a hip joint. Our method works by combining fast random forest (RF) regression based landmark detection, atlas-based segmentation, with articulated statistical shape model (aSSM) based hip joint reconstruction. The two fundamental contributions of our method are: (1) An improved fast Gaussian transform (IFGT) is used within the RF regression framework for a fast and accurate landmark detection, which then allows for a fully automatic initialization of the atlas-based segmentation; and (2) aSSM based fitting is used to preserve hip joint structure and to avoid penetration between the pelvic and femoral models. Validation on 30 hip CT images show that our method achieves high performance in segmenting pelvis, left proximal femur, and right proximal femur surfaces with an average accuracy of 0.59 mm, 0.62 mm, and 0.58 mm, respectively.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Extraction of surface models of a hip joint from CT data is a pre-requisite step for computer assisted diagnosis and planning (CADP) of periacetabular osteotomy (PAO). Most of existing CADP systems are based on manual segmentation, which is time-consuming and hard to achieve reproducible results. In this paper, we present a Fully Automatic CT Segmentation (FACTS) approach to simultaneously extract both pelvic and femoral models. Our approach works by combining fast random forest (RF) regression based landmark detection, multi-atlas based segmentation, with articulated statistical shape model (aSSM) based fitting. The two fundamental contributions of our approach are: (1) an improved fast Gaussian transform (IFGT) is used within the RF regression framework for a fast and accurate landmark detection, which then allows for a fully automatic initialization of the multi-atlas based segmentation; and (2) aSSM based fitting is used to preserve hip joint structure and to avoid penetration between the pelvic and femoral models. Taking manual segmentation as the ground truth, we evaluated the present approach on 30 hip CT images (60 hips) with a 6-fold cross validation. When the present approach was compared to manual segmentation, a mean segmentation accuracy of 0.40, 0.36, and 0.36 mm was found for the pelvis, the left proximal femur, and the right proximal femur, respectively. When the models derived from both segmentations were used to compute the PAO diagnosis parameters, a difference of 2.0 ± 1.5°, 2.1 ± 1.6°, and 3.5 ± 2.3% were found for anteversion, inclination, and acetabular coverage, respectively. The achieved accuracy is regarded as clinically accurate enough for our target applications.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

To address growing concern over the effects of fisheries non-target catch on elasmobranchs worldwide, the accurate reporting of elasmobranch catch is essential. This requires data on a combination of measures, including reported landings, retained and discarded non-target catch, and post-discard survival. Identification of the factors influencing discard vs. retention is needed to improve catch estimates and to determine wasteful fishing practices. To do this we compared retention rates of elasmobranch non-target catch in a broad subset of fisheries throughout the world by taxon, fishing country, and gear. A regression tree and random forest analysis indicated that taxon was the most important determinant of retention in this dataset, but all three factors together explained 59% of the variance. Estimates of total elasmobranch removals were calculated by dividing the FAO global elasmobranch landings by average retention rates and suggest that total elasmobranch removals may exceed FAO reported landings by as much as 400%. This analysis is the first effort to directly characterize global drivers of discards for elasmobranch non-target catch. Our results highlight the importance of accurate quantification of retention and discard rates to improve assessments of the potential impacts of fisheries on these species.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance - typically proteins - resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours (kNN). The best performing method was kNN with 85.3% accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (http://www.ddg-pharmfac.net/AllerTOP). © Springer-Verlag 2014.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-08

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Security defects are common in large software systems because of their size and complexity. Although efficient development processes, testing, and maintenance policies are applied to software systems, there are still a large number of vulnerabilities that can remain, despite these measures. Some vulnerabilities stay in a system from one release to the next one because they cannot be easily reproduced through testing. These vulnerabilities endanger the security of the systems. We propose vulnerability classification and prediction frameworks based on vulnerability reproducibility. The frameworks are effective to identify the types and locations of vulnerabilities in the earlier stage, and improve the security of software in the next versions (referred to as releases). We expand an existing concept of software bug classification to vulnerability classification (easily reproducible and hard to reproduce) to develop a classification framework for differentiating between these vulnerabilities based on code fixes and textual reports. We then investigate the potential correlations between the vulnerability categories and the classical software metrics and some other runtime environmental factors of reproducibility to develop a vulnerability prediction framework. The classification and prediction frameworks help developers adopt corresponding mitigation or elimination actions and develop appropriate test cases. Also, the vulnerability prediction framework is of great help for security experts focus their effort on the top-ranked vulnerability-prone files. As a result, the frameworks decrease the number of attacks that exploit security vulnerabilities in the next versions of the software. To build the classification and prediction frameworks, different machine learning techniques (C4.5 Decision Tree, Random Forest, Logistic Regression, and Naive Bayes) are employed. The effectiveness of the proposed frameworks is assessed based on collected software security defects of Mozilla Firefox.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As a consequence of the diffusion of next generation sequencing techniques, metagenomics databases have become one of the most promising repositories of information about features and behavior of microorganisms. One of the subjects that can be studied from those data are bacteria populations. Next generation sequencing techniques allow to study the bacteria population within an environment by sampling genetic material directly from it, without the needing of culturing a similar population in vitro and observing its behavior. As a drawback, it is quite complex to extract information from those data and usually there is more than one way to do that; AMR is no exception. In this study we will discuss how the quantified AMR, which regards the genotype of the bacteria, can be related to the bacteria phenotype and its actual level of resistance against the specific substance. In order to have a quantitative information about bacteria genotype, we will evaluate the resistome from the read libraries, aligning them against CARD database. With those data, we will test various machine learning algorithms for predicting the bacteria phenotype. The samples that we exploit should resemble those that could be obtained from a natural context, but are actually produced by a read libraries simulation tool. In this way we are able to design the populations with bacteria of known genotype, so that we can relay on a secure ground truth for training and testing our algorithms.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Day by day, machine learning is changing our lives in ways we could not have imagined just 5 years ago. ML expertise is more and more requested and needed, though just a limited number of ML engineers are available on the job market, and their knowledge is always limited by an inherent characteristic of theirs: they are humans. This thesis explores the possibilities offered by meta-learning, a new field in ML that takes learning a level higher: models are trained on other models' training data, starting from features of the dataset they were trained on, inference times, obtained performances, to try to understand the relationship between a good model and the way it was obtained. The so-called metamodel was trained on data collected by OpenML, the largest ML metadata platform that's publicly available today. Datasets were analyzed to obtain meta-features that describe them, which were then tied to model performances in a regression task. The obtained metamodel predicts the expected performances of a given model type (e.g., a random forest) on a given ML task (e.g., classification on the UCI census dataset). This research was then integrated into a custom-made AutoML framework, to show how meta-learning is not an end in itself, but it can be used to further progress our ML research. Encoding ML engineering expertise in a model allows better, faster, and more impactful ML applications across the whole world, while reducing the cost that is inevitably tied to human engineers.