906 resultados para decision tree
Resumo:
In this paper we propose a new algorithm for learning polyhedral classifiers. In contrast to existing methods for learning polyhedral classifier which solve a constrained optimization problem, our method solves an unconstrained optimization problem. Our method is based on a logistic function based model for the posterior probability function. We propose an alternating optimization algorithm, namely, SPLA1 (Single Polyhedral Learning Algorithm1) which maximizes the loglikelihood of the training data to learn the parameters. We also extend our method to make it independent of any user specified parameter (e.g., number of hyperplanes required to form a polyhedral set) in SPLA2. We show the effectiveness of our approach with experiments on various synthetic and real world datasets and compare our approach with a standard decision tree method (OC1) and a constrained optimization based method for learning polyhedral sets.
Resumo:
We perform a measurement of direct CP violation in b to s+gamma Acp, and the measurement of a difference between Acp for neutral B and charged B mesons, Delta A_{X_s\gamma}, using 429 inverse femtobarn of data recorded at the Upsilon(4S) resonance with the BABAR detector. B mesons are reconstructed from 16 exclusive final states. Particle identification is done using an algorithm based on Error Correcting Output Code with an exhaustive matrix. Background rejection and best candidate selection are done using two decision tree-based classifiers. We found $\acp = 1.73%+-1.93%+-1.02% and Delta A_X_sgamma = 4.97%+-3.90%+-1.45% where the uncertainties are statistical and systematic respectively. Based on the measured value of Delta A_X_sgamma, we determine a 90% confidence interval for Im C_8g/C_7gamma, where C_7gamma and C_8g are Wilson coefficients for New Physics amplitudes, at -1.64 < Im C_8g/C_7gamma < 6.52.
Resumo:
As borne out by everyday social experience, social cognition is highly dependent on context, modulated by a host of factors that arise from the social environment in which we live. While streamlined laboratory research provides excellent experimental control, it can be limited to telling us about the capabilities of the brain under artificial conditions, rather than elucidating the processes that come into play in the real world. Consideration of the impact of ecologically valid contextual cues on social cognition will improve the generalizability of social neuroscience findings also to pathology, e.g., to psychiatric illnesses. To help bridge between laboratory research and social cognition as we experience it in the real world, this thesis investigates three themes: (1) increasing the naturalness of stimuli with richer contextual cues, (2) the potentially special contextual case of social cognition when two people interact directly, and (3) a third theme of experimental believability, which runs in parallel to the first two themes. Focusing on the first two themes, in work with two patient populations, we explore neural contributions to two topics in social cognition. First, we document a basic approach bias in rare patients with bilateral lesions of the amygdala. This finding is then related to the contextual factor of ambiguity, and further investigated together with other contextual cues in a sample of healthy individuals tested over the internet, finally yielding a hierarchical decision tree for social threat evaluation. Second, we demonstrate that neural processing of eye gaze in brain structures related to face, gaze, and social processing is differently modulated by the direct presence of another live person. This question is investigated using fMRI in people with autism and controls. Across a range of topics, we demonstrate that two themes of ecological validity — integration of naturalistic contextual cues, and social interaction — influence social cognition, that particular brain structures mediate this processing, and that it will be crucial to study interaction in order to understand disorders of social interaction such as autism.
Resumo:
This document aims to describe an update of the implementation of the J48Consolidated class within WEKA platform. The J48Consolidated class implements the CTC algorithm [2][3] which builds a unique decision tree based on a set of samples. The J48Consolidated class extends WEKA’s J48 class which implements the well-known C4.5 algorithm. This implementation was described in the technical report "J48Consolidated: An implementation of CTC algorithm for WEKA". The main, but not only, change in this update is the integration of the notion of coverage in order to determine the number of samples to be generated to build a consolidated tree. We define coverage as the percentage of examples of the training sample present in –or covered by– the set of generated subsamples. So, depending on the type of samples that we use, we will need more or less samples in order to achieve a specific value of coverage.
Resumo:
The CTC algorithm, Consolidated Tree Construction algorithm, is a machine learning paradigm that was designed to solve a class imbalance problem, a fraud detection problem in the area of car insurance [1] where, besides, an explanation about the classification made was required. The algorithm is based on a decision tree construction algorithm, in this case the well-known C4.5, but it extracts knowledge from data using a set of samples instead of a single one as C4.5 does. In contrast to other methodologies based on several samples to build a classifier, such as bagging, the CTC builds a single tree and as a consequence, it obtains comprehensible classifiers. The main motivation of this implementation is to make public and available an implementation of the CTC algorithm. With this purpose we have implemented the algorithm within the well-known WEKA data mining environment http://www.cs.waikato.ac.nz/ml/weka/). WEKA is an open source project that contains a collection of machine learning algorithms written in Java for data mining tasks. J48 is the implementation of C4.5 algorithm within the WEKA package. We called J48Consolidated to the implementation of CTC algorithm based on the J48 Java class.
Resumo:
No presente trabalho foram utilizados modelos de classificação para minerar dados relacionados à aprendizagem de Matemática e ao perfil de professores do ensino fundamental. Mais especificamente, foram abordados os fatores referentes aos educadores do Estado do Rio de Janeiro que influenciam positivamente e negativamente no desempenho dos alunos do 9 ano do ensino básico nas provas de Matemática. Os dados utilizados para extrair estas informações são disponibilizados pelo Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira que avalia o sistema educacional brasileiro em diversos níveis e modalidades de ensino, incluindo a Educação Básica, cuja avaliação, que foi foco deste estudo, é realizada pela Prova Brasil. A partir desta base, foi aplicado o processo de Descoberta de Conhecimento em Bancos de Dados (KDD - Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós-processamento dos dados. Os padrões foram extraídos dos modelos de classificação gerados pelas técnicas árvore de decisão, indução de regras e classificadores Bayesianos, cujos algoritmos estão implementados no software Weka (Waikato Environment for Knowledge Analysis). Além disso, foram aplicados métodos de grupos e uma metodologia para tornar as classes uniformemente distribuídas, afim de melhorar a precisão dos modelos obtidos. Os resultados apresentaram importantes fatores que contribuem para o ensino-aprendizagem de Matemática, assim como evidenciaram aspectos que comprometem negativamente o desempenho dos discentes. Por fim, os resultados extraídos fornecem ao educador e elaborador de políticas públicas fatores para uma análise que os auxiliem em posteriores tomadas de decisão.
Resumo:
No presente trabalho foram desenvolvidos modelos de classificação aplicados à mineração de dados climáticos para a previsão de eventos extremos de precipitação com uma hora de antecedência. Mais especificamente, foram utilizados dados observacionais registrados pela estação meteorológica de superfície localizada no Instituto Politécnico da Universidade do Estado do Rio de Janeiro em Nova Friburgo RJ, durante o período de 2008 a 2012. A partir desses dados foi aplicado o processo de Descoberta de Conhecimento em Banco de Dados (KDD Knowledge Discovery in Databases), composto das etapas de preparação, mineração e pós processamento dos dados. Com base no uso de algoritmos de Redes Neurais Artificiais e Árvores de Decisão para a extração de padrões que indicassem um acúmulo de precipitação maior que 10 mm na hora posterior à medição das variáveis climáticas, pôde-se notar que a utilização da observação meteorológica de micro escala para previsões de curto prazo é suscetível a altas taxas de alarmes falsos (falsos positivos). Para contornar este problema, foram utilizados dados históricos de previsões realizadas pelo Modelo Eta com resolução de 15 km, disponibilizados pelo Centro de Previsão de Tempo e Estudos Climáticos do Instituto Nacional de Pesquisas Espaciais CPTEC/INPE. De posse desses dados, foi possível calcular os índices de instabilidade relacionados à formação de situação convectiva severa na região de Nova Friburgo e então armazená-los de maneira estruturada em um banco de dados, realizando a união entre os registros de micro e meso escala. Os resultados demonstraram que a união entre as bases de dados foi de extrema importância para a redução dos índices de falsos positivos, sendo essa uma importante contribuição aos estudos meteorológicos realizados em estações meteorológicas de superfície. Por fim, o modelo com maior precisão foi utilizado para o desenvolvimento de um sistema de alertas em tempo real, que verifica, para a região estudada, a possibilidade de chuva maior que 10 mm na próxima hora.
Resumo:
This study investigated the method of the focus identification in Chinese text discourse and the relationship between accent and focus, large corpus analysis and decision tree were used in the research. The main results are: 1. Based on the concept of the Focus and understanding of the discourse, Foci identification is consistent and steady; 2. Special Focus markers and specific Focus constructions have greater influence than special constituent order on identifying Focus in Chinese discourse; while information states also have great influence on focus identifying; part of speech,information state, the relative position in the sentence, focus-sensitive operator, specific Focus constructions, contrast relations, relations between the sentences are important factors to focus identifying; 3. Using multi-dimensional tagging and knowledge discovery, it is a feasible way to construct and employ decision trees by computing tagging results to identify Focus; 4. Focus predicting also depends on literal types and styles of the discourse, several types of decision trees should be constructed for different literal types; 5. In the monologue discourse, the most prominent accent is located on the Focus word or in the scope of the Focus; there are some kinds of rules on accent assignment in broad Focus; it is necessary to analyze and classify focus structure for the research of relations between accent and Focus.
Resumo:
M. Galea and Q. Shen. Simultaneous ant colony optimisation algorithms for learning linguistic fuzzy rules. A. Abraham, C. Grosan and V. Ramos (Eds.), Swarm Intelligence in Data Mining, pages 75-99.
Resumo:
Jasimuddin, Sajjad, 'Exploring knowledge transfer mechanisms: The case of a UK-based group within a high-tech global corporation', International Journal of Information Management (2007) 27(4) pp.294-300 RAE2008
Resumo:
A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. A part from introduction and references the paper is organized as follows. The second section presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learningbased algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting.
Resumo:
The tendency for island populations of mammalian taxa to diverge in body size from their mainland counterparts consistently in particular directions is both impressive for its regularity and, especially among rodents, troublesome for its exceptions. However, previous studies have largely ignored mainland body size variation, treating size differences of any magnitude as equally noteworthy. Here, we use distributions of mainland population body sizes to identify island populations as 'extremely' big or small, and we compare traits of extreme populations and their islands with those of island populations more typical in body size. We find that although insular rodents vary in the directions of body size change, 'extreme' populations tend towards gigantism. With classification tree methods, we develop a predictive model, which points to resource limitations as major drivers in the few cases of insular dwarfism. Highly successful in classifying our dataset, our model also successfully predicts change in untested cases.
Resumo:
Centropages typicus is a temperate neritic-coastal species of the North Atlantic Oceans, generally found between the latitudes of the Mediterranean and the Norwegian Sea. Therefore, the species experiences a large number of environments and adjusts its life cycle in response to changes in key abiotic parameters such as temperature. Using data from the Continuous Plankton Recorder (CPR) Survey, we review the macroecology of C. typicus and factors that influence its spatial distribution, phenology and year-to-year to decadal variability. The ecological preferences are identified and quantified. Mechanisms that allow the species to occur in such different environments are discussed and hypotheses are proposed as to how the species adapts to its environment. We show that temperature and both quantity and quality of phytoplankton are important factors explaining the space and time variability of C. typicus. These results show that C. typicus will not respond only to temperature increase in the region but also to changes in phytoplankton abundance, structure and composition and timing of occurrence. Methods such as a decision tree can help to forecast expected changes in the distribution of this species with hydro-climatic forcing. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
An extensive literature base worldwide demonstrates how spatial differences in estuarine fish assemblages are related to those in the environment at (bio)regional, estuary-wide or local (within-estuary) scales. Few studies, however, have examined all three scales, and those including more than one have often focused at the level of individual environmental variables rather than scales as a whole. This study has identified those spatial scales of environmental differences, across regional, estuary-wide and local levels, that are most important in structuring ichthyofaunal composition throughout south-western Australian estuaries. It is the first to adopt this approach for temperate microtidal waters. To achieve this, we have employed a novel approach to the BIOENV routine in PRIMER v6 and a modified global BEST test in an alpha version of PRIMER v7. A combination of all three scales best matched the pattern of ichthyofaunal differences across the study area (rho = 0.59; P = 0.001), with estuary-wide and regional scales accounting for about twice the variability of local scales. A shade plot analysis showed these broader-scale ichthyofaunal differences were driven by a greater diversity of marine and estuarine species in the permanently-open west coast estuaries and higher numbers of several small estuarine species in the periodically-open south coast estuaries. When interaction effects were explored, strong but contrasting influences of local environmental scales were revealed within each region and estuary type. A quantitative decision tree for predicting the fish fauna at any nearshore estuarine site in south-western Australia has also been produced. The estuarine management implications of the above findings are highlighted.
Resumo:
High level environmental screening study for offshore wind farm developments – marine habitats and species This report provides an awareness of the environmental issues related to marine habitats and species for developers and regulators of offshore wind farms. The information is also relevant to other offshore renewable energy developments. The marine habitats and species considered are those associated with the seabed, seabirds, and sea mammals. The report concludes that the following key ecological issues should be considered in the environmental assessment of offshore wind farms developments: • likely changes in benthic communities within the affected area and resultant indirect impacts on fish, populations and their predators such as seabirds and sea mammals; • potential changes to the hydrography and wave climate over a wide area, and potential changes to coastal processes and the ecology of the region; • likely effects on spawning or nursery areas of commercially important fish and shellfish species; • likely effects on mating and social behaviour in sea mammals, including migration routes; • likely effects on feeding water birds, seal pupping sites and damage of sensitive or important intertidal sites where cables come onshore; • potential displacement of fish, seabird and sea mammals from preferred habitats; • potential effects on species and habitats of marine natural heritage importance; • potential cumulative effects on seabirds, due to displacement of flight paths, and any mortality from bird strike, especially in sensitive rare or scarce species; • possible effects of electromagnetic fields on feeding behaviour and migration, especially in sharks and rays, and • potential marine conservation and biodiversity benefits of offshore wind farm developments as artificial reefs and 'no-take' zones. The report provides an especially detailed assessment of likely sensitivity of seabed species and habitats in the proposed development areas. Although sensitive to some of the factors created by wind farm developments, they mainly have a high recovery potential. The way in which survey data can be linked to Marine Life Information Network (MarLIN) sensitivity assessments to produce maps of sensitivity to factors is demonstrated. Assessing change to marine habitats and species as a result of wind farm developments has to take account of the natural variability of marine habitats, which might be high especially in shallow sediment biotopes. There are several reasons for such changes but physical disturbance of habitats and short-term climatic variability are likely to be especially important. Wind farm structures themselves will attract marine species including those that are attached to the towers and scour protection, fish that associate with offshore structures, and sea birds (especially sea duck) that may find food and shelter there. Nature conservation designations especially relevant to areas where wind farm might be developed are described and the larger areas are mapped. There are few designated sites that extend offshore to where wind farms are likely to be developed. However, cable routes and landfalls may especially impinge on designated sites. The criteria that have been developed to assess the likely marine natural heritage importance of a location or of the habitats and species that occur there can be applied to survey information to assess whether or not there is anything of particular marine natural heritage importance in a development area. A decision tree is presented that can be used to apply ‘duty of care’ principles to any proposed development. The potential ‘gains’ for the local environment are explored. Wind farms will enhance the biodiversity of areas, could act as refugia for fish, and could be developed in a way that encourages enhancement of fish stocks including shellfish.