974 resultados para Classification Tree Pruning


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aims of the project were twofold: 1) To investigate classification procedures for remotely sensed digital data, in order to develop modifications to existing algorithms and propose novel classification procedures; and 2) To investigate and develop algorithms for contextual enhancement of classified imagery in order to increase classification accuracy. The following classifiers were examined: box, decision tree, minimum distance, maximum likelihood. In addition to these the following algorithms were developed during the course of the research: deviant distance, look up table and an automated decision tree classifier using expert systems technology. Clustering techniques for unsupervised classification were also investigated. Contextual enhancements investigated were: mode filters, small area replacement and Wharton's CONAN algorithm. Additionally methods for noise and edge based declassification and contextual reclassification, non-probabilitic relaxation and relaxation based on Markov chain theory were developed. The advantages of per-field classifiers and Geographical Information Systems were investigated. The conclusions presented suggest suitable combinations of classifier and contextual enhancement, given user accuracy requirements and time constraints. These were then tested for validity using a different data set. A brief examination of the utility of the recommended contextual algorithms for reducing the effects of data noise was also carried out.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present syllable-based duration modelling in the context of a prosody model for Standard Yorùbá (SY) text-to-speech (TTS) synthesis applications. Our prosody model is conceptualised around a modular holistic framework. This framework is implemented using the Relational Tree (R-Tree) techniques. An important feature of our R-Tree framework is its flexibility in that it facilitates the independent implementation of the different dimensions of prosody, i.e. duration, intonation, and intensity, using different techniques and their subsequent integration. We applied the Fuzzy Decision Tree (FDT) technique to model the duration dimension. In order to evaluate the effectiveness of FDT in duration modelling, we have also developed a Classification And Regression Tree (CART) based duration model using the same speech data. Each of these models was integrated into our R-Tree based prosody model. We performed both quantitative (i.e. Root Mean Square Error (RMSE) and Correlation (Corr)) and qualitative (i.e. intelligibility and naturalness) evaluations on the two duration models. The results show that CART models the training data more accurately than FDT. The FDT model, however, shows a better ability to extrapolate from the training data since it achieved a better accuracy for the test data set. Our qualitative evaluation results show that our FDT model produces synthesised speech that is perceived to be more natural than our CART model. In addition, we also observed that the expressiveness of FDT is much better than that of CART. That is because the representation in FDT is not restricted to a set of piece-wise or discrete constant approximation. We, therefore, conclude that the FDT approach is a practical approach for duration modelling in SY TTS applications. © 2006 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Today, due to globalization of the world the size of data set is increasing, it is necessary to discover the knowledge. The discovery of knowledge can be typically in the form of association rules, classification rules, clustering, discovery of frequent episodes and deviation detection. Fast and accurate classifiers for large databases are an important task in data mining. There is growing evidence that integrating classification and association rules mining, classification approaches based on heuristic, greedy search like decision tree induction. Emerging associative classification algorithms have shown good promises on producing accurate classifiers. In this paper we focus on performance of associative classification and present a parallel model for classifier building. For classifier building some parallel-distributed algorithms have been proposed for decision tree induction but so far no such work has been reported for associative classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

2000 Mathematics Subject Classification: 17A50, 05C05.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): G.2.2, F.2.2.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

ACM Computing Classification System (1998): I.4.9, I.4.10.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study aims at exploring the potential impact of forest protection intervention on rural households’ private fuel tree planting in Chiro district of eastern Ethiopia. The study results revealed a robust and significant positive impact of the intervention on farmers’ decisions to produce private household energy by growing fuel trees on their farm. As participation in private fuel tree planting is not random, the study confronts a methodological issue in investigating the causal effect of forest protection intervention on rural farm households’ private fuel tree planting through non-parametric propensity score matching (PSM) method. The protection intervention on average has increased fuel tree planting by 503 (580.6%) compared to open access areas and indirectly contributed to slowing down the loss of biodiversity in the area. Land cover/use is a dynamic phenomenon that changes with time and space due to anthropogenic pressure and development. Forest cover and land use changes in Chiro District, Ethiopia over a period of 40 years was studied using remotely sensed data. Multi temporal satellite data of Landsat was used to map and monitor forest cover and land use changes occurred during three point of time of 1972,1986 and 2012. A pixel base supervised image classification was used to map land use land cover classes for maps of both time set. The result of change detection analysis revealed that the area has shown a remarkable land cover/land use changes in general and forest cover change in particular. Specifically, the dense forest cover land declined from 235 ha in 1972 to 51 ha in 1986. However, government interventions in forest protection in 1989 have slowed down the drastic change of dense forest cover loss around the protected area through reclaiming 1,300 hectares of deforested land through reforestation program up to 2012.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.

Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.

Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.

Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fruit of certain mango cultivars (e.g., 'Honey Gold') can develop blush on their skin. Skin blush due to red pigmentation is from the accumulation of anthocyanins. Anthocyanin biosynthesis is related to environmental determinants, including light received by the fruit. It has been observed that mango skin blush varies with position in the tree canopy. However, little investigation into this spatial relationship has been conducted. The objective of this preliminary study was to describe a 'Honey Gold' mango tree by capturing its three-dimensional (3D) architecture. A light path tracing model QuasiMC was then used to predict light received by fruit. The use of this 3D model was to better understand the relationship between mango fruit skin blush and fruit position in the canopy. The digitised mango tree mimicked the real tree at a high level of detail. Observations on mango skin blush distribution supported the proposition that sunlight exposure is an absolute requirement for anthocyanin development. No blush development occurred on shaded skin. It was affirmed that 3D mapping could allow for virtual experiments. For example, for virtual canopy thinning (e.g., 'window pruning') to admit more sunlight with a view to improve fruit blush. Improvements to 3D modelling of mango skin blush could focus on increasing accuracy, e.g., measurement of leaf light reflectance and transmission and the inclusion of the effect shading by branches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A self-organising model of macadamia, expressed using L-Systems, was used to explore aspects of canopy management. A small set of parameters control the basic architecture of the model, with a high degree of self-organisation occurring to determine the fate and growth of buds. Light was sensed at the leaf level and used to represent vigour and accumulated basipetally. Buds also sensed light so as to provide demand in the subsequent redistribution of the vigour. Empirical relationships were derived from a set of 24 completely digitised trees after conversion to multiscale tree graphs (MTG) and analysis with the OpenAlea software library. The ability to write MTG files was embedded within the model so that various tree statistics could be exported for each run of the model. To explore the parameter space a series of runs was completed using a high-throughput computing platform. When combined with MTG generation and analysis with OpenAlea it provided a convenient way in which thousands of simulations could be explored. We allowed the model trees to develop using self-organisation and simulated cultural practices such as hedging, topping, removal of the leader and limb removal within a small representation of an orchard. The model provides insight into the impact of these practices on potential for growth and the light distribution within the canopy and to the orchard floor by coupling the model with a path-tracing program to simulate the light environment. The lessons learnt from this will be applied to other evergreen, tropical fruit and nut trees.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

En este trabajo se propone un nuevo sistema híbrido para el análisis de sentimientos en clase múltiple basado en el uso del diccionario General Inquirer (GI) y un enfoque jerárquico del clasificador Logistic Model Tree (LMT). Este nuevo sistema se compone de tres capas, la capa bipolar (BL) que consta de un LMT (LMT-1) para la clasificación de la polaridad de sentimientos, mientras que la segunda capa es la capa de la Intensidad (IL) y comprende dos LMTs (LMT-2 y LMT3) para detectar por separado tres intensidades de sentimientos positivos y tres intensidades de sentimientos negativos. Sólo en la fase de construcción, la capa de Agrupación (GL) se utiliza para agrupar las instancias positivas y negativas mediante el empleo de 2 k-means, respectivamente. En la fase de Pre-procesamiento, los textos son segmentados por palabras que son etiquetadas, reducidas a sus raíces y sometidas finalmente al diccionario GI con el objetivo de contar y etiquetar sólo los verbos, los sustantivos, los adjetivos y los adverbios con 24 marcadores que se utilizan luego para calcular los vectores de características. En la fase de Clasificación de Sentimientos, los vectores de características se introducen primero al LMT-1, a continuación, se agrupan en GL según la etiqueta de clase, después se etiquetan estos grupos de forma manual, y finalmente las instancias positivas son introducidas a LMT-2 y las instancias negativas a LMT-3. Los tres árboles están entrenados y evaluados usando las bases de datos Movie Review y SenTube con validación cruzada estratificada de 10-pliegues. LMT-1 produce un árbol de 48 hojas y 95 de tamaño, con 90,88% de exactitud, mientras que tanto LMT-2 y LMT-3 proporcionan dos árboles de una hoja y uno de tamaño, con 99,28% y 99,37% de exactitud,respectivamente. Los experimentos muestran que la metodología de clasificación jerárquica propuesta da un mejor rendimiento en comparación con otros enfoques prevalecientes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The paper catalogues the procedures and steps involved in agroclimatic classification. These vary from conventional descriptive methods to modern computer-based numerical techniques. There are three mutually independent numerical classification techniques, namely Ordination, Cluster analysis, and Minimum spanning tree; and under each technique there are several forms of grouping techniques existing. The vhoice of numerical classification procedure differs with the type of data set. In the case of numerical continuous data sets with booth positive and negative values, the simple and least controversial procedures are unweighted pair group method (UPGMA) and weighted pair group method (WPGMA) under clustering techniques with similarity measure obtained either from Gower metric or standardized Euclidean metric. Where the number of attributes are large, these could be reduced to fewer new attributes defined by the principal components or coordinates by ordination technique. The first few components or coodinates explain the maximum variance in the data matrix. These revided attributes are less affected by noise in the data set. It is possible to check misclassifications using minimum spanning tree.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ochnaceae s.str. (Malpighiales) are a pantropical family of about 500 species and 27 genera of almost exclusively woody plants. Infrafamilial classification and relationships have been controversial partially due to the lack of a robust phylogenetic framework. Including all genera except Indosinia and Perissocarpa and DNA sequence data for five DNA regions (ITS, matK, ndhF, rbcL, trnL-F), we provide for the first time a nearly complete molecular phylogenetic analysis of Ochnaceae s.l. resolving most of the phylogenetic backbone of the family. Based on this, we present a new classification of Ochnaceae s.l., with Medusagynoideae and Quiinoideae included as subfamilies and the former subfamilies Ochnoideae and Sauvagesioideae recognized at the rank of tribe. Our data support a monophyletic Ochneae, but Sauvagesieae in the traditional circumscription is paraphyletic because Testulea emerges as sister to the rest of Ochnoideae, and the next clade shows Luxemburgia+Philacra as sister group to the remaining Ochnoideae. To avoid paraphyly, we classify Luxemburgieae and Testuleeae as new tribes. The African genus Lophira, which has switched between subfamilies (here tribes) in past classifications, emerges as sister to all other Ochneae. Thus, endosperm-free seeds and ovules with partly to completely united integuments (resulting in an apparently single integument) are characters that unite all members of that tribe. The relationships within its largest clade, Ochnineae (former Ochneae), are poorly resolved, but former Ochninae (Brackenridgea, Ochna) are polyphyletic. Within Sauvagesieae, the genus Sauvagesia in its broad circumscription is polyphyletic as Sauvagesia serrata is sister to a clade of Adenarake, Sauvagesia spp., and three other genera. Within Quiinoideae, in contrast to former phylogenetic hypotheses, Lacunaria and Touroulia form a clade that is sister to Quiina. Bayesian ancestral state reconstructions showed that zygomorphic flowers with adaptations to buzz-pollination (poricidal anthers), a syncarpous gynoecium (a near-apocarpous gynoecium evolved independently in Quiinoideae and Ochninae), numerous ovules, septicidal capsules, and winged seeds with endosperm are the ancestral condition in Ochnoideae. Although in some lineages poricidal anthers were lost secondarily, the evolution of poricidal superstructures secured the maintenance of buzz-pollination in some of these genera, indicating a strong selective pressure on keeping that specialized pollination system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Diabetic Retinopathy (DR) is a complication of diabetes that can lead to blindness if not readily discovered. Automated screening algorithms have the potential to improve identification of patients who need further medical attention. However, the identification of lesions must be accurate to be useful for clinical application. The bag-of-visual-words (BoVW) algorithm employs a maximum-margin classifier in a flexible framework that is able to detect the most common DR-related lesions such as microaneurysms, cotton-wool spots and hard exudates. BoVW allows to bypass the need for pre- and post-processing of the retinographic images, as well as the need of specific ad hoc techniques for identification of each type of lesion. An extensive evaluation of the BoVW model, using three large retinograph datasets (DR1, DR2 and Messidor) with different resolution and collected by different healthcare personnel, was performed. The results demonstrate that the BoVW classification approach can identify different lesions within an image without having to utilize different algorithms for each lesion reducing processing time and providing a more flexible diagnostic system. Our BoVW scheme is based on sparse low-level feature detection with a Speeded-Up Robust Features (SURF) local descriptor, and mid-level features based on semi-soft coding with max pooling. The best BoVW representation for retinal image classification was an area under the receiver operating characteristic curve (AUC-ROC) of 97.8% (exudates) and 93.5% (red lesions), applying a cross-dataset validation protocol. To assess the accuracy for detecting cases that require referral within one year, the sparse extraction technique associated with semi-soft coding and max pooling obtained an AUC of 94.2 ± 2.0%, outperforming current methods. Those results indicate that, for retinal image classification tasks in clinical practice, BoVW is equal and, in some instances, surpasses results obtained using dense detection (widely believed to be the best choice in many vision problems) for the low-level descriptors.