897 resultados para Decision tree method


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

[EN] [EN] In this paper we present a new method for image primitives tracking based on a CART (Classification and Regression Tree). Primitives tracking procedure uses lines and circles as primitives. We have applied the proposed method to sport event scenarios, specifically, soccer matches. We estimate CART parameters using a learning procedure based on RGB image channels. In order to illustrate its performance, it has been applied to real HD (High Definition) video sequences and some numerical experiments are shown. The quality of the primitives tracking with the decision tree is validated by the percentage error rates obtained and the comparison with other techniques as a morphological method. We also present applications of the proposed method to camera calibration and graphic object insertion in real video sequences.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: In order to optimise the cost-effectiveness of active surveillance to substantiate freedom from disease, a new approach using targeted sampling of farms was developed and applied on the example of infectious bovine rhinotracheitis (IBR) and enzootic bovine leucosis (EBL) in Switzerland. Relevant risk factors (RF) for the introduction of IBR and EBL into Swiss cattle farms were identified and their relative risks defined based on literature review and expert opinions. A quantitative model based on the scenario tree method was subsequently used to calculate the required sample size of a targeted sampling approach (TS) for a given sensitivity. We compared the sample size with that of a stratified random sample (sRS) with regard to efficiency. RESULTS: The required sample sizes to substantiate disease freedom were 1,241 farms for IBR and 1,750 farms for EBL to detect 0.2% herd prevalence with 99% sensitivity. Using conventional sRS, the required sample sizes were 2,259 farms for IBR and 2,243 for EBL. Considering the additional administrative expenses required for the planning of TS, the risk-based approach was still more cost-effective than a sRS (40% reduction on the full survey costs for IBR and 8% for EBL) due to the considerable reduction in sample size. CONCLUSIONS: As the model depends on RF selected through literature review and was parameterised with values estimated by experts, it is subject to some degree of uncertainty. Nevertheless, this approach provides the veterinary authorities with a promising tool for future cost-effective sampling designs.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: Suicide attempts are common in patients being treated for alcohol-use disorders (AUDs). However, clinical assessment of suicide risk is difficult. In this Swiss multisite study, we propose a decision tree to facilitate identification of profiles of AUD patients at high risk for suicidal behavior. Method: In this retrospective study, we used a sample of 700 patients (243 female), attending 1 of 12 treatment programs for AUDs in the German-speaking part of Switzerland. Sixty-nine patients who reported a suicide attempt in the 3 months before the index treatment were compared using risk factors with 631 patients without a suicide attempt. Receiver operating characteristic (ROC) analyses were used to identify patients at risk of having had a suicide attempt in the previous 3 months. Results: Consistent with previous empirical findings in AUD patients, a prior history of attempted suicide and severe symptoms of depression and aggression considerably increased the risk of a suicide attempt and, in combination, raised the likelihood of a prior suicide attempt to 52%. In addition, one third of AUD patients who had a history of suicide attempts and previous inpatient psychiatric treatment, or who were male and had previous inpatient psychiatric treatment, also reported a suicide attempt. Conclusions: The empirically supported decision tree helps to identify profiles of suicidal AUD patients in Switzerland and supplements clinicians' judgments in making triage decisions for suicide management.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Derivation of probability estimates complementary to geophysical data sets has gained special attention over the last years. Information about a confidence level of provided physical quantities is required to construct an error budget of higher-level products and to correctly interpret final results of a particular analysis. Regarding the generation of products based on satellite data a common input consists of a cloud mask which allows discrimination between surface and cloud signals. Further the surface information is divided between snow and snow-free components. At any step of this discrimination process a misclassification in a cloud/snow mask propagates to higher-level products and may alter their usability. Within this scope a novel probabilistic cloud mask (PCM) algorithm suited for the 1 km × 1 km Advanced Very High Resolution Radiometer (AVHRR) data is proposed which provides three types of probability estimates between: cloudy/clear-sky, cloudy/snow and clear-sky/snow conditions. As opposed to the majority of available techniques which are usually based on the decision-tree approach in the PCM algorithm all spectral, angular and ancillary information is used in a single step to retrieve probability estimates from the precomputed look-up tables (LUTs). Moreover, the issue of derivation of a single threshold value for a spectral test was overcome by the concept of multidimensional information space which is divided into small bins by an extensive set of intervals. The discrimination between snow and ice clouds and detection of broken, thin clouds was enhanced by means of the invariant coordinate system (ICS) transformation. The study area covers a wide range of environmental conditions spanning from Iceland through central Europe to northern parts of Africa which exhibit diverse difficulties for cloud/snow masking algorithms. The retrieved PCM cloud classification was compared to the Polar Platform System (PPS) version 2012 and Moderate Resolution Imaging Spectroradiometer (MODIS) collection 6 cloud masks, SYNOP (surface synoptic observations) weather reports, Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) vertical feature mask version 3 and to MODIS collection 5 snow mask. The outcomes of conducted analyses proved fine detection skills of the PCM method with results comparable to or better than the reference PPS algorithm.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

OBJECTIVES: Proteomics approaches to cardiovascular biology and disease hold the promise of identifying specific proteins and peptides or modification thereof to assist in the identification of novel biomarkers. METHOD: By using surface-enhanced laser desorption and ionization time of flight mass spectroscopy (SELDI-TOF-MS) serum peptide and protein patterns were detected enabling to discriminate between postmenopausal women with and without hormone replacement therapy (HRT). RESULTS: Serum of 13 HRT and 27 control subjects was analyzed and 42 peptides and proteins could be tentatively identified based on their molecular weight and binding characteristics on the chip surface. By using decision tree-based Biomarker Patternstrade mark Software classification and regression analysis a discriminatory function was developed allowing to distinguish between HRT women and controls correctly and, thus, yielding a sensitivity of 100% and a specificity of 100%. The results show that peptide and protein patterns have the potential to deliver novel biomarkers as well as pinpointing targets for improved treatment. The biomarkers obtained represent a promising tool to discriminate between HRT users and non-users. CONCLUSION: According to a tentative identification of the markers by their molecular weight and binding characteristics, most of them appear to be part of the inflammation induced acute-phase response

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BackgroundConsensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources.MethodsBased on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus.ResultsBased on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters.ConclusionRecommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Land degradation as well as land conservation maps at a (sub-) national scale are critical for pro-ject planning for sustainable land management. It has long been recognized that online accessible and low-cost raster data sets (e.g. Landsat imagery, SRTM-DEM’s) provide a readily available basis for land resource assessments for developing countries. However, choice of spatial, tempo-ral and spectral resolution of such data is often limited. Furthermore, while local expert knowl-edge on land degradation processes is abundant, difficulties are often encountered when linking existing knowledge with modern approaches including GIS and RS. The aim of this study was to develop an easily applicable, standardized workflow for preliminary spatial assessments of land degradation and conservation, which also allows the integration of existing expert knowledge. The core of the developed method consists of a workflow for rule-based land resource assess-ment. In a systematic way, this workflow leads from predefined land degradation and conserva-tion classes to field indicators, to suitable spatial proxy data, and finally to a set of rules for clas-sification of spatial datasets. Pre-conditions are used to narrow the area of interest. Decision tree models are used for integrating the different rules. It can be concluded that the workflow presented assists experts from different disciplines in col-laboration GIS/RS specialists in establishing a preliminary model for assessing land degradation and conservation in a spatially explicit manner. The workflow provides support when linking field indicators and spatial datasets, and when determining field indicators for groundtruthing.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Academic and industrial research in the late 90s have brought about an exponential explosion of DNA sequence data. Automated expert systems are being created to help biologists to extract patterns, trends and links from this ever-deepening ocean of information. Two such systems aimed on retrieving and subsequently utilizing phylogenetically relevant information have been developed in this dissertation, the major objective of which was to automate the often difficult and confusing phylogenetic reconstruction process. ^ Popular phylogenetic reconstruction methods, such as distance-based methods, attempt to find an optimal tree topology (that reflects the relationships among related sequences and their evolutionary history) by searching through the topology space. Various compromises between the fast (but incomplete) and exhaustive (but computationally prohibitive) search heuristics have been suggested. An intelligent compromise algorithm that relies on a flexible “beam” search principle from the Artificial Intelligence domain and uses the pre-computed local topology reliability information to adjust the beam search space continuously is described in the second chapter of this dissertation. ^ However, sometimes even a (virtually) complete distance-based method is inferior to the significantly more elaborate (and computationally expensive) maximum likelihood (ML) method. In fact, depending on the nature of the sequence data in question either method might prove to be superior. Therefore, it is difficult (even for an expert) to tell a priori which phylogenetic reconstruction method—distance-based, ML or maybe maximum parsimony (MP)—should be chosen for any particular data set. ^ A number of factors, often hidden, influence the performance of a method. For example, it is generally understood that for a phylogenetically “difficult” data set more sophisticated methods (e.g., ML) tend to be more effective and thus should be chosen. However, it is the interplay of many factors that one needs to consider in order to avoid choosing an inferior method (potentially a costly mistake, both in terms of computational expenses and in terms of reconstruction accuracy.) ^ Chapter III of this dissertation details a phylogenetic reconstruction expert system that selects a superior proper method automatically. It uses a classifier (a Decision Tree-inducing algorithm) to map a new data set to the proper phylogenetic reconstruction method. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Wadden Sea is located in the southeastern part of the North Sea forming an extended intertidal area along the Dutch, German and Danish coast. It is a highly dynamic and largely natural ecosystem influenced by climatic changes and anthropogenic use of the North Sea. Changes in the environment of the Wadden Sea, natural or anthropogenic origin, cannot be monitored by the standard measurement methods alone, because large-area surveys of the intertidal flats are often difficult due to tides, tidal channels and unstable underground. For this reason, remote sensing offers effective monitoring tools. In this study a multi-sensor concept for classification of intertidal areas in the Wadden Sea has been developed. Basis for this method is a combined analysis of RapidEye (RE) and TerraSAR-X (TSX) satellite data coupled with ancillary vector data about the distribution of vegetation, mussel beds and sediments. The classification of the vegetation and mussel beds is based on a decision tree and a set of hierarchically structured algorithms which use object and texture features. The sediments are classified by an algorithm which uses thresholds and a majority filter. Further improvements focus on radiometric enhancement and atmospheric correction. First results show that we are able to identify vegetation and mussel beds with the use of multi-sensor remote sensing. The classification of the sediments in the tidal flats is a challenge compared to vegetation and mussel beds. The results demonstrate that the sediments cannot be classified with high accuracy by their spectral properties alone due to their similarity which is predominately caused by their water content.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data mining, and in particular decision trees have been used in different fields: engineering, medicine, banking and finance, etc., to analyze a target variable through decision variables. The following article examines the use of the decision trees algorithm as a tool in territorial logistic planning. The decision tree built has estimated population density indexes for territorial units with similar logistics characteristics in a concise and practical way.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The energy demand for operating Information and Communication Technology (ICT) systems has been growing, implying in high operational costs and consequent increase of carbon emissions. Both in datacenters and telecom infrastructures, the networks represent a significant amount of energy spending. Given that, there is an increased demand for energy eficiency solutions, and several capabilities to save energy have been proposed. However, it is very dificult to orchestrate such energy eficiency capabilities, i.e., coordinate or combine them in the same network, ensuring a conflict-free operation and choosing the best one for a given scenario, ensuring that a capability not suited to the current bandwidth utilization will not be applied and lead to congestion or packet loss. Also, there is no way in the literature to do this taking business directives into account. In this regard, a method able to orchestrate diferent energy eficiency capabilities is proposed considering the possible combinations and conflicts among them, as well as the best option for a given bandwidth utilization and network characteristics. In the proposed method, the business policies specified in a high-level interface are refined down to the network level in order to bring highlevel directives into the operation, and a Utility Function is used to combine energy eficiency and performance requirements. A Decision Tree able to determine what to do in each scenario is deployed in a Software Defined Network environment. The proposed method was validated with diferent experiments, testing the Utility Function, checking the extra savings when combining several capabilities, the decision tree interpolation and dynamicity aspects. The orchestration proved to be valid to solve the problem of finding the best combination for a given scenario, achieving additional savings due to the combination, besides ensuring a conflict-free operation.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Fuzzy data has grown to be an important factor in data mining. Whenever uncertainty exists, simulation can be used as a model. Simulation is very flexible, although it can involve significant levels of computation. This article discusses fuzzy decision-making using the grey related analysis method. Fuzzy models are expected to better reflect decision-making uncertainty, at some cost in accuracy relative to crisp models. Monte Carlo simulation is used to incorporate experimental levels of uncertainty into the data and to measure the impact of fuzzy decision tree models using categorical data. Results are compared with decision tree models based on crisp continuous data.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Traditional vegetation mapping methods use high cost, labour-intensive aerial photography interpretation. This approach can be subjective and is limited by factors such as the extent of remnant vegetation, and the differing scale and quality of aerial photography over time. An alternative approach is proposed which integrates a data model, a statistical model and an ecological model using sophisticated Geographic Information Systems (GIS) techniques and rule-based systems to support fine-scale vegetation community modelling. This approach is based on a more realistic representation of vegetation patterns with transitional gradients from one vegetation community to another. Arbitrary, though often unrealistic, sharp boundaries can be imposed on the model by the application of statistical methods. This GIS-integrated multivariate approach is applied to the problem of vegetation mapping in the complex vegetation communities of the Innisfail Lowlands in the Wet Tropics bioregion of Northeastern Australia. The paper presents the full cycle of this vegetation modelling approach including sampling sites, variable selection, model selection, model implementation, internal model assessment, model prediction assessments, models integration of discrete vegetation community models to generate a composite pre-clearing vegetation map, independent data set model validation and model prediction's scale assessments. An accurate pre-clearing vegetation map of the Innisfail Lowlands was generated (0.83r(2)) through GIS integration of 28 separate statistical models. This modelling approach has good potential for wider application, including provision of. vital information for conservation planning and management; a scientific basis for rehabilitation of disturbed and cleared areas; a viable method for the production of adequate vegetation maps for conservation and forestry planning of poorly-studied areas. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.