862 resultados para Random Forests Classifier
Resumo:
Thesis (Master's)--University of Washington, 2016-08
Resumo:
Three types of forecasts of the total Australian production of macadamia nuts (t nut-in-shell) have been produced early each year since 2001. The first is a long-term forecast, based on the expected production from the tree census data held by the Australian Macadamia Society, suitably scaled up for missing data and assumed new plantings each year. These long-term forecasts range out to 10 years in the future, and form a basis for industry and market planning. Secondly, a statistical adjustment (termed the climate-adjusted forecast) is made annually for the coming crop. As the name suggests, climatic influences are the dominant factors in this adjustment process, however, other terms such as bienniality of bearing, prices and orchard aging are also incorporated. Thirdly, industry personnel are surveyed early each year, with their estimates integrated into a growers and pest-scouts forecast. Initially conducted on a 'whole-country' basis, these models are now constructed separately for the six main production regions of Australia, with these being combined for national totals. Ensembles or suites of step-forward regression models using biologically-relevant variables have been the major statistical method adopted, however, developing methodologies such as nearest-neighbour techniques, general additive models and random forests are continually being evaluated in parallel. The overall error rates average 14% for the climate forecasts, and 12% for the growers' forecasts. These compare with 7.8% for USDA almond forecasts (based on extensive early-crop sampling) and 6.8% for coconut forecasts in Sri Lanka. However, our somewhatdisappointing results were mainly due to a series of poor crops attributed to human reasons, which have now been factored into the models. Notably, the 2012 and 2013 forecasts averaged 7.8 and 4.9% errors, respectively. Future models should also show continuing improvement, as more data-years become available.
Resumo:
BACKGROUND: The purpose of the present study was to investigate the diagnostic value of T2-mapping in acute myocarditis (ACM) and to define cut-off values for edema detection. METHODS: Cardiovascular magnetic resonance (CMR) data of 31 patients with ACM were retrospectively analyzed. 30 healthy volunteers (HV) served as a control. Additionally to the routine CMR protocol, T2-mapping data were acquired at 1.5 T using a breathhold Gradient-Spin-Echo T2-mapping sequence in six short axis slices. T2-maps were segmented according to the 16-segments AHA-model and segmental T2 values as well as the segmental pixel-standard deviation (SD) were analyzed. RESULTS: Mean differences of global myocardial T2 or pixel-SD between HV and ACM patients were only small, lying in the normal range of HV. In contrast, variation of segmental T2 values and pixel-SD was much larger in ACM patients compared to HV. In random forests and multiple logistic regression analyses, the combination of the highest segmental T2 value within each patient (maxT2) and the mean absolute deviation (MAD) of log-transformed pixel-SD (madSD) over all 16 segments within each patient proved to be the best discriminators between HV and ACM patients with an AUC of 0.85 in ROC-analysis. In classification trees, a combined cut-off of 0.22 for madSD and of 68 ms for maxT2 resulted in 83% specificity and 81% sensitivity for detection of ACM. CONCLUSIONS: The proposed cut-off values for maxT2 and madSD in the setting of ACM allow edema detection with high sensitivity and specificity and therefore have the potential to overcome the hurdles of T2-mapping for its integration into clinical routine.
Resumo:
Mestrado em Ciências Actuariais
Resumo:
L'abbandono del cliente, ossia il customer churn, si riferisce a quando un cliente cessa il suo rapporto con l'azienda. In genere, le aziende considerano un cliente come perso quando un determinato periodo di tempo è trascorso dall'ultima interazione del cliente con i servizi dell'azienda. La riduzione del tasso di abbandono è quindi un obiettivo di business chiave per ogni attività. Per riuscire a trattenere i clienti che stanno per abbandonare l'azienda, è necessario: prevedere in anticipo quali clienti abbandoneranno; sapere quali azioni di marketing avranno maggiore impatto sulla fidelizzazione di ogni particolare cliente. L'obiettivo della tesi è lo studio e l'implementazione di un sistema di previsione dell'abbandono dei clienti in una catena di palestre: il sistema è realizzato per conto di Technogym, azienda leader nel mercato del fitness. Technogym offre già un servizio di previsione del rischio di abbandono basato su regole statiche. Tale servizio offre risultati accettabili ma è un sistema che non si adatta automaticamente al variare delle caratteristiche dei clienti nel tempo. Con questa tesi si sono sfruttate le potenzialità offerte dalle tecnologie di apprendimento automatico, per cercare di far fronte ai limiti del sistema storicamente utilizzato dall'azienda. Il lavoro di tesi ha previsto tre macro-fasi: la prima fase è la comprensione e l'analisi del sistema storico, con lo scopo di capire la struttura dei dati, di migliorarne la qualità e di approfondirne tramite analisi statistiche il contenuto informativo in relazione alle features definite dagli algoritmi di apprendimento automatico. La seconda fase ha previsto lo studio, la definizione e la realizzazione di due modelli di ML basati sulle stesse features ma utilizzando due tecnologie differenti: Random Forest Classifier e il servizio AutoML Tables di Google. La terza fase si è concentrata su una valutazione comparativa delle performance dei modelli di ML rispetto al sistema storico.
Resumo:
The emissions estimation, both during homologation and standard driving, is one of the new challenges that automotive industries have to face. The new European and American regulation will allow a lower and lower quantity of Carbon Monoxide emission and will require that all the vehicles have to be able to monitor their own pollutants production. Since numerical models are too computationally expensive and approximated, new solutions based on Machine Learning are replacing standard techniques. In this project we considered a real V12 Internal Combustion Engine to propose a novel approach pushing Random Forests to generate meaningful prediction also in extreme cases (extrapolation, very high frequency peaks, noisy instrumentation etc.). The present work proposes also a data preprocessing pipeline for strongly unbalanced datasets and a reinterpretation of the regression problem as a classification problem in a logarithmic quantized domain. Results have been evaluated for two different models representing a pure interpolation scenario (more standard) and an extrapolation scenario, to test the out of bounds robustness of the model. The employed metrics take into account different aspects which can affect the homologation procedure, so the final analysis will focus on combining all the specific performances together to obtain the overall conclusions.
Resumo:
Combinatorial decision and optimization problems belong to numerous applications, such as logistics and scheduling, and can be solved with various approaches. Boolean Satisfiability and Constraint Programming solvers are some of the most used ones and their performance is significantly influenced by the model chosen to represent a given problem. This has led to the study of model reformulation methods, one of which is tabulation, that consists in rewriting the expression of a constraint in terms of a table constraint. To apply it, one should identify which constraints can help and which can hinder the solving process. So far this has been performed by hand, for example in MiniZinc, or automatically with manually designed heuristics, in Savile Row. Though, it has been shown that the performances of these heuristics differ across problems and solvers, in some cases helping and in others hindering the solving procedure. However, recent works in the field of combinatorial optimization have shown that Machine Learning (ML) can be increasingly useful in the model reformulation steps. This thesis aims to design a ML approach to identify the instances for which Savile Row’s heuristics should be activated. Additionally, it is possible that the heuristics miss some good tabulation opportunities, so we perform an exploratory analysis for the creation of a ML classifier able to predict whether or not a constraint should be tabulated. The results reached towards the first goal show that a random forest classifier leads to an increase in the performances of 4 different solvers. The experimental results in the second task show that a ML approach could improve the performance of a solver for some problem classes.
Resumo:
Altitudinal tree lines are mainly constrained by temperature, but can also be influenced by factors such as human activity, particularly in the European Alps, where centuries of agricultural use have affected the tree-line. Over the last decades this trend has been reversed due to changing agricultural practices and land-abandonment. We aimed to combine a statistical land-abandonment model with a forest dynamics model, to take into account the combined effects of climate and human land-use on the Alpine tree-line in Switzerland. Land-abandonment probability was expressed by a logistic regression function of degree-day sum, distance from forest edge, soil stoniness, slope, proportion of employees in the secondary and tertiary sectors, proportion of commuters and proportion of full-time farms. This was implemented in the TreeMig spatio-temporal forest model. Distance from forest edge and degree-day sum vary through feed-back from the dynamics part of TreeMig and climate change scenarios, while the other variables remain constant for each grid cell over time. The new model, TreeMig-LAb, was tested on theoretical landscapes, where the variables in the land-abandonment model were varied one by one. This confirmed the strong influence of distance from forest and slope on the abandonment probability. Degree-day sum has a more complex role, with opposite influences on land-abandonment and forest growth. TreeMig-LAb was also applied to a case study area in the Upper Engadine (Swiss Alps), along with a model where abandonment probability was a constant. Two scenarios were used: natural succession only (100% probability) and a probability of abandonment based on past transition proportions in that area (2.1% per decade). The former showed new forest growing in all but the highest-altitude locations. The latter was more realistic as to numbers of newly forested cells, but their location was random and the resulting landscape heterogeneous. Using the logistic regression model gave results consistent with observed patterns of land-abandonment: existing forests expanded and gaps closed, leading to an increasingly homogeneous landscape.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
The Optimum-Path Forest (OPF) classifier is a recent and promising method for pattern recognition, with a fast training algorithm and good accuracy results. Therefore, the investigation of a combining method for this kind of classifier can be important for many applications. In this paper we report a fast method to combine OPF-based classifiers trained with disjoint training subsets. Given a fixed number of subsets, the algorithm chooses random samples, without replacement, from the original training set. Each subset accuracy is improved by a learning procedure. The final decision is given by majority vote. Experiments with simulated and real data sets showed that the proposed combining method is more efficient and effective than naive approach provided some conditions. It was also showed that OPF training step runs faster for a series of small subsets than for the whole training set. The combining scheme was also designed to support parallel or distributed processing, speeding up the procedure even more. © 2011 Springer-Verlag.
Resumo:
The research on multiple classifiers systems includes the creation of an ensemble of classifiers and the proper combination of the decisions. In order to combine the decisions given by classifiers, methods related to fixed rules and decision templates are often used. Therefore, the influence and relationship between classifier decisions are often not considered in the combination schemes. In this paper we propose a framework to combine classifiers using a decision graph under a random field model and a game strategy approach to obtain the final decision. The results of combining Optimum-Path Forest (OPF) classifiers using the proposed model are reported, obtaining good performance in experiments using simulated and real data sets. The results encourage the combination of OPF ensembles and the framework to design multiple classifier systems. © 2011 Springer-Verlag.
Resumo:
Fungi are important members of soil microbial communities with a crucial role in biogeochemical processes. Although soil fungi are known to be highly diverse, little is known about factors influencing variations in their diversity and community structure among forests dominated by the same tree species but spread over different regions and under different managements. We analyzed the soil fungal diversity and community composition of managed and unmanaged European beech dominated forests located in three German regions, the Schwäbische Alb in Southwestern, the Hainich-Dün in Central and the Schorfheide Chorin in the Northeastern Germany, using internal transcribed spacer (ITS) rDNA pyrotag sequencing. Multiple sequence quality filtering followed by sequence data normalization revealed 1655 fungal operational taxonomic units. Further analysis based on 722 abundant fungal OTUs revealed the phylum Basidiomycota to be dominant (54%) and its community to comprise 71.4% of ectomycorrhizal taxa. Fungal community structure differed significantly (p≤0.001) among the three regions and was characterized by non-random fungal OTUs co-occurrence. Soil parameters, herbaceous understory vegetation, and litter cover affected fungal community structure. However, within each study region we found no difference in fungal community structure between management types. Our results also showed region specific significant correlation patterns between the dominant ectomycorrhizal fungal genera. This suggests that soil fungal communities are region-specific but nevertheless composed of functionally diverse and complementary taxa.
Resumo:
The Zagros oak forests in Western Iran are critically important to the sustainability of the region. These forests have undergone dramatic declines in recent decades. We evaluated the utility of the non-parametric Random Forest classification algorithm for land cover classification of Zagros landscapes, and selected the best spatial and spectral predictive variables. The algorithm resulted in high overall classification accuracies (>85%) and also equivalent classification accuracies for the datasets from the three different sensors. We evaluated the associations between trends in forest area and structure with trends in socioeconomic and climatic conditions, to identify the most likely driving forces creating deforestation and landscape structure change. We used available socioeconomic (urban and rural population, and rural income), and climatic (mean annual rainfall and mean annual temperature) data for two provinces in northern Zagros. The most correlated driving force of forest area loss was urban population, and climatic variables to a lesser extent. Landscape structure changes were more closely associated with rural population. We examined the effects of scale changes on the results from spatial pattern analysis. We assessed the impacts of eight years of protection in a protected area in northern Zagros at two different scales (both grain and extent). The effects of protection on the amount and structure of forests was scale dependent. We evaluated the nature and magnitude of changes in forest area and structure over the entire Zagros region from 1972 to 2009. We divided the Zagros region in 167 Landscape Units and developed two measures— Deforestation Sensitivity (DS) and Connectivity Sensitivity (CS) — for each landscape unit as the percent of the time steps that forest area and ECA experienced a decrease of greater than 10% in either measure. A considerable loss in forest area and connectivity was detected, but no sudden (nonlinear) changes were detected at the spatial and temporal scale of the study. Connectivity loss occurred more rapidly than forest loss due to the loss of connecting patches. More connectivity was lost in southern Zagros due to climatic differences and different forms of traditional land use.
Resumo:
Over the last decade, a plethora of computer-aided diagnosis (CAD) systems have been proposed aiming to improve the accuracy of the physicians in the diagnosis of interstitial lung diseases (ILD). In this study, we propose a scheme for the classification of HRCT image patches with ILD abnormalities as a basic component towards the quantification of the various ILD patterns in the lung. The feature extraction method relies on local spectral analysis using a DCT-based filter bank. After convolving the image with the filter bank, q-quantiles are computed for describing the distribution of local frequencies that characterize image texture. Then, the gray-level histogram values of the original image are added forming the final feature vector. The classification of the already described patches is done by a random forest (RF) classifier. The experimental results prove the superior performance and efficiency of the proposed approach compared against the state-of-the-art.
Resumo:
This study aims at exploring the potential impact of forest protection intervention on rural households’ private fuel tree planting in Chiro district of eastern Ethiopia. The study results revealed a robust and significant positive impact of the intervention on farmers’ decisions to produce private household energy by growing fuel trees on their farm. As participation in private fuel tree planting is not random, the study confronts a methodological issue in investigating the causal effect of forest protection intervention on rural farm households’ private fuel tree planting through non-parametric propensity score matching (PSM) method. The protection intervention on average has increased fuel tree planting by 503 (580.6%) compared to open access areas and indirectly contributed to slowing down the loss of biodiversity in the area. Land cover/use is a dynamic phenomenon that changes with time and space due to anthropogenic pressure and development. Forest cover and land use changes in Chiro District, Ethiopia over a period of 40 years was studied using remotely sensed data. Multi temporal satellite data of Landsat was used to map and monitor forest cover and land use changes occurred during three point of time of 1972,1986 and 2012. A pixel base supervised image classification was used to map land use land cover classes for maps of both time set. The result of change detection analysis revealed that the area has shown a remarkable land cover/land use changes in general and forest cover change in particular. Specifically, the dense forest cover land declined from 235 ha in 1972 to 51 ha in 1986. However, government interventions in forest protection in 1989 have slowed down the drastic change of dense forest cover loss around the protected area through reclaiming 1,300 hectares of deforested land through reforestation program up to 2012.