974 resultados para Classification Tree Pruning


Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the upper Jequitinhonha valley, state of Minas Gerais, Brazi, there are large plane areas known as "chapadas", which are separated by areas dissected by tributaries of the Jequitinhonha and Araçuaí rivers. These dissected areas have a surface drainage system with tree, shrub, and grass vegetation, more commonly known as "veredas", i.e., palm swamps. The main purpose of this study was to characterize soil physical, chemical and morphological properties of a representative toposequence in the watershed of the Vereda Lagoa do Leandro, a swamp near Minas Novas, MG, on "chapadas", the highlands of the Alto Jequitinhonha region Different soil types are observed in the landscape: at the top - Typic Haplustox (LVA), in the middle slope - Xanthic Haplustox (LA), at the footslope - Xanthic Haplustox, gray color, here called "Gray Haplustox" ("LAC") and, at the bottom of the palm swamp - Typic Albaquult (GXbd). These soils were first morphologically described; samples of disturbed and undisturbed soils were collected from all horizons and subhorizons, to evaluate their essential physical and chemical properties, by means of standard determination of Fe, Al, Mn, Ti and Si oxides after sulfuric extraction. The contents of Fe, Al and Mn, extracted with dithionite-citrate-bicarbonate and oxalate treatments, were also determined. In the well-drained soils of the slope positions, the typical morphological, physical and chemical properties of Oxisols were found. The GXbd sample, from the bottom of the palm swamp, is grayish and has high texture gradient (B/A) and massive structure. The reduction of the proportion of crystalline iron compounds and the low crystallinity along the slope confirmed the loss of iron during pedogenesis, which is reflected in the current soil color. The Si and Al contents were lowest in the "LAC" soil. There was a decrease of the Fe2O3/TiO2 ratio downhill, indicating progressive drainage restriction along the toposequence. The genesis and all physical and chemical properties of the soils at the footslope and the bottom of the palm swamp of the "chapadas" of the Alto Jequitinhonha region are strongly influenced by the occurrence of ground water on the surface or near the surface all year long, at present and/or in the past. Total concentrations of iron oxides, Fe d and Fe o in soils of the toposequence studied are related to the past and/or present soil colors and drainage conditions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we propose the use of the independent component analysis (ICA) [1] technique for improving the classification rate of decision trees and multilayer perceptrons [2], [3]. The use of an ICA for the preprocessing stage, makes the structure of both classifiers simpler, and therefore improves the generalization properties. The hypothesis behind the proposed preprocessing is that an ICA analysis will transform the feature space into a space where the components are independent, and aligned to the axes and therefore will be more adapted to the way that a decision tree is constructed. Also the inference of the weights of a multilayer perceptron will be much easier because the gradient search in the weight space will follow independent trajectories. The result is that classifiers are less complex and on some databases the error rate is lower. This idea is also applicable to regression

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biological systems are complex dynamical systems whose relationships with environment have strong implications on their regulation and survival. From the interactions between plant and environment can emerge a quite complex network of plant responses rarely observed through classical analytical approaches. The objective of this current study was to test the hypothesis that photosynthetic responses of different tree species to increasing irradiance are related to changes in network connectances of gas exchange and photochemical apparatus, and alterations in plant autonomy in relation to the environment. The heat dissipative capacity through daily changes in leaf temperature was also evaluated. It indicated that the early successional species (Citharexylum myrianthum Cham. and Rhamnidium elaeocarpum Reiss.) were more efficient as dissipative structures than the late successional one (Cariniana legalis (Mart.) Kuntze), suggesting that the parameter deltaT (T ºCair - T ºCleaf) could be a simple tool in order to help the classification of successional classes of tropical trees. Our results indicated a pattern of network responses and autonomy changes under high irradiance. Considering the maintenance of daily CO2 assimilation, the tolerant species (C. myrianthum and R. elaeocarpum) to high irradiance trended to maintain stable the level of gas exchange network connectance and to increase the autonomy in relation to the environment. On the other hand, the late successional species (C. legalis) trended to lose autonomy, decreasing the network connectance of gas exchange. All species showed lower autonomy and higher network connectance of the photochemical apparatus under high irradiance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We describe the floristic composition of the tree-shrub vegetation in 10 areas of rocky outcrop cerrado in Goiás State, Brazil. Ten 20×50m plots (totaling 1ha) were established and all of the individuals with diameters at 30cm above soil level (DB30) ³5cm were included in the sampling. Comparative analyses of the flora were realized using similarity indices (Sørensen and Czekanowski), classification analysis (TWINSPAN), and the Mantel test. A total of 13,041 tree-shrub individuals were sampled, distributed among 219 species, 129 genera and 55 families. Fabaceae was the most well-represented family, followed by Myrtaceae, Melastomataceae, Vochysiaceae, Malphigiaceae, and Rubiaceae. Fully 42.3% of the comparisons evaluated by the Sørensen index were >0.50, while all the values were <0.50 for the Czekanowski index, with the exception of Jaraguá and Mara Rosa areas. The TWINSPAN classification generated four divisions and, in general, only the differences in the size of the population were responsible for the groupings. The Mantel test indicated that there was no relationship between floristic similarity and the distances between the areas (r=0.32, P=0.05). It therefore appears that the areas of rocky outcrop cerrado in Goiás State are relatively floristically homogeneous and that they are principally distinguished by the differences in the sizes of the populations of their dominant species, and the presence of exclusive species in certain areas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to determine the variability of pequi tree (Caryocar brasiliense Camb.) populations, volatile compounds from fruits of eighteen trees representing five populations were extracted by headspace solid-phase microextraction and analyzed by gas chromatography-mass spectrometry. Seventy-seven compounds were identified, including esters, hydrocarbons, terpenoids, ketones, lactones, and alcohols. Several compounds had not been previously reported in the pequi fruit. The amount of total volatile compounds and the individual compound contents varied between plants. The volatile profile enabled the differentiation of all of the eighteen plants, indicating that there is a characteristic profile in terms of their origin. The use of Principal Component Analysis and Cluster Analysis enabled the establishment of markers (dendrolasin, ethyl octanoate, ethyl 2-octenoate and β-cis-ocimene) that discriminated among the pequi trees. According to the Cluster Analysis, the plants were classified into three main clusters, and four other plants showed a tendency to isolation. The results from multivariate analysis did not always group plants from the same population together, indicating that there is greater variability within the populations than between pequi tree populations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dans l'apprentissage machine, la classification est le processus d’assigner une nouvelle observation à une certaine catégorie. Les classifieurs qui mettent en œuvre des algorithmes de classification ont été largement étudié au cours des dernières décennies. Les classifieurs traditionnels sont basés sur des algorithmes tels que le SVM et les réseaux de neurones, et sont généralement exécutés par des logiciels sur CPUs qui fait que le système souffre d’un manque de performance et d’une forte consommation d'énergie. Bien que les GPUs puissent être utilisés pour accélérer le calcul de certains classifieurs, leur grande consommation de puissance empêche la technologie d'être mise en œuvre sur des appareils portables tels que les systèmes embarqués. Pour rendre le système de classification plus léger, les classifieurs devraient être capable de fonctionner sur un système matériel plus compact au lieu d'un groupe de CPUs ou GPUs, et les classifieurs eux-mêmes devraient être optimisés pour ce matériel. Dans ce mémoire, nous explorons la mise en œuvre d'un classifieur novateur sur une plate-forme matérielle à base de FPGA. Le classifieur, conçu par Alain Tapp (Université de Montréal), est basé sur une grande quantité de tables de recherche qui forment des circuits arborescents qui effectuent les tâches de classification. Le FPGA semble être un élément fait sur mesure pour mettre en œuvre ce classifieur avec ses riches ressources de tables de recherche et l'architecture à parallélisme élevé. Notre travail montre que les FPGAs peuvent implémenter plusieurs classifieurs et faire les classification sur des images haute définition à une vitesse très élevée.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new procedure for the classification of lower case English language characters is presented in this work . The character image is binarised and the binary image is further grouped into sixteen smaller areas ,called Cells . Each cell is assigned a name depending upon the contour present in the cell and occupancy of the image contour in the cell. A data reduction procedure called Filtering is adopted to eliminate undesirable redundant information for reducing complexity during further processing steps . The filtered data is fed into a primitive extractor where extraction of primitives is done . Syntactic methods are employed for the classification of the character . A decision tree is used for the interaction of the various components in the scheme . 1ike the primitive extraction and character recognition. A character is recognized by the primitive by primitive construction of its description . Openended inventories are used for including variants of the characters and also adding new members to the general class . Computer implementation of the proposal is discussed at the end using handwritten character samples . Results are analyzed and suggestions for future studies are made. The advantages of the proposal are discussed in detail .

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Decision trees are very powerful tools for classification in data mining tasks that involves different types of attributes. When coming to handling numeric data sets, usually they are converted first to categorical types and then classified using information gain concepts. Information gain is a very popular and useful concept which tells you, whether any benefit occurs after splitting with a given attribute as far as information content is concerned. But this process is computationally intensive for large data sets. Also popular decision tree algorithms like ID3 cannot handle numeric data sets. This paper proposes statistical variance as an alternative to information gain as well as statistical mean to split attributes in completely numerical data sets. The new algorithm has been proved to be competent with respect to its information gain counterpart C4.5 and competent with many existing decision tree algorithms against the standard UCI benchmarking datasets using the ANOVA test in statistics. The specific advantages of this proposed new algorithm are that it avoids the computational overhead of information gain computation for large data sets with many attributes, as well as it avoids the conversion to categorical data from huge numeric data sets which also is a time consuming task. So as a summary, huge numeric datasets can be directly submitted to this algorithm without any attribute mappings or information gain computations. It also blends the two closely related fields statistics and data mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper highlights the prediction of Learning Disabilities (LD) in school-age children using two classification methods, Support Vector Machine (SVM) and Decision Tree (DT), with an emphasis on applications of data mining. About 10% of children enrolled in school have a learning disability. Learning disability prediction in school age children is a very complicated task because it tends to be identified in elementary school where there is no one sign to be identified. By using any of the two classification methods, SVM and DT, we can easily and accurately predict LD in any child. Also, we can determine the merits and demerits of these two classifiers and the best one can be selected for the use in the relevant field. In this study, Sequential Minimal Optimization (SMO) algorithm is used in performing SVM and J48 algorithm is used in constructing decision trees.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work analyzes the use of linear discriminant models, multi-layer perceptron neural networks and wavelet networks for corporate financial distress prediction. Although simple and easy to interpret, linear models require statistical assumptions that may be unrealistic. Neural networks are able to discriminate patterns that are not linearly separable, but the large number of parameters involved in a neural model often causes generalization problems. Wavelet networks are classification models that implement nonlinear discriminant surfaces as the superposition of dilated and translated versions of a single "mother wavelet" function. In this paper, an algorithm is proposed to select dilation and translation parameters that yield a wavelet network classifier with good parsimony characteristics. The models are compared in a case study involving failed and continuing British firms in the period 1997-2000. Problems associated with over-parameterized neural networks are illustrated and the Optimal Brain Damage pruning technique is employed to obtain a parsimonious neural model. The results, supported by a re-sampling study, show that both neural and wavelet networks may be a valid alternative to classical linear discriminant models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzes the use of linear and neural network models for financial distress classification, with emphasis on the issues of input variable selection and model pruning. A data-driven method for selecting input variables (financial ratios, in this case) is proposed. A case study involving 60 British firms in the period 1997-2000 is used for illustration. It is shown that the use of the Optimal Brain Damage pruning technique can considerably improve the generalization ability of a neural model. Moreover, the set of financial ratios obtained with the proposed selection procedure is shown to be an appropriate alternative to the ratios usually employed by practitioners.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Full-waveform laser scanning data acquired with a Riegl LMS-Q560 instrument were used to classify an orange orchard into orange trees, grass and ground using waveform parameters alone. Gaussian decomposition was performed on this data capture from the National Airborne Field Experiment in November 2006 using a custom peak-detection procedure and a trust-region-reflective algorithm for fitting Gauss functions. Calibration was carried out using waveforms returned from a road surface, and the backscattering coefficient c was derived for every waveform peak. The processed data were then analysed according to the number of returns detected within each waveform and classified into three classes based on pulse width and c. For single-peak waveforms the scatterplot of c versus pulse width was used to distinguish between ground, grass and orange trees. In the case of multiple returns, the relationship between first (or first plus middle) and last return c values was used to separate ground from other targets. Refinement of this classification, and further sub-classification into grass and orange trees was performed using the c versus pulse width scatterplots of last returns. In all cases the separation was carried out using a decision tree with empirical relationships between the waveform parameters. Ground points were successfully separated from orange tree points. The most difficult class to separate and verify was grass, but those points in general corresponded well with the grass areas identified in the aerial photography. The overall accuracy reached 91%, using photography and relative elevation as ground truth. The overall accuracy for two classes, orange tree and combined class of grass and ground, yielded 95%. Finally, the backscattering coefficient c of single-peak waveforms was also used to derive reflectance values of the three classes. The reflectance of the orange tree class (0.31) and ground class (0.60) are consistent with published values at the wavelength of the Riegl scanner (1550 nm). The grass class reflectance (0.46) falls in between the other two classes as might be expected, as this class has a mixture of the contributions of both vegetation and ground reflectance properties.