786 resultados para Data mining models


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis introduces heat demand forecasting models which are generated by using data mining algorithms. The forecast spans one full day and this forecast can be used in regulating heat consumption of buildings. For training the data mining models, two years of heat consumption data from a case building and weather measurement data from Finnish Meteorological Institute are used. The thesis utilizes Microsoft SQL Server Analysis Services data mining tools in generating the data mining models and CRISP-DM process framework to implement the research. Results show that the built models can predict heat demand at best with mean average percentage errors of 3.8% for 24-h profile and 5.9% for full day. A deployment model for integrating the generated data mining models into an existing building energy management system is also discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

La gran cantidad de datos que se registran diariamente en los sistemas de base de datos de las organizaciones ha generado la necesidad de analizarla. Sin embargo, se enfrentan a la complejidad de procesar enormes volúmenes de datos a través de métodos tradicionales de análisis. Además, dentro de un contexto globalizado y competitivo las organizaciones se mantienen en la búsqueda constante de mejorar sus procesos, para lo cual requieren herramientas que les permitan tomar mejores decisiones. Esto implica estar mejor informado y conocer su historia digital para describir sus procesos y poder anticipar (predecir) eventos no previstos. Estos nuevos requerimientos de análisis de datos ha motivado el desarrollo creciente de proyectos de minería de datos. El proceso de minería de datos busca obtener desde un conjunto masivo de datos, modelos que permitan describir los datos o predecir nuevas instancias en el conjunto. Implica etapas de: preparación de los datos, procesamiento parcial o totalmente automatizado para identificar modelos en los datos, para luego obtener como salida patrones, relaciones o reglas. Esta salida debe significar un nuevo conocimiento para la organización, útil y comprensible para los usuarios finales, y que pueda ser integrado a los procesos para apoyar la toma de decisiones. Sin embargo, la mayor dificultad es justamente lograr que el analista de datos, que interviene en todo este proceso, pueda identificar modelos lo cual es una tarea compleja y muchas veces requiere de la experiencia, no sólo del analista de datos, sino que también del experto en el dominio del problema. Una forma de apoyar el análisis de datos, modelos y patrones es a través de su representación visual, utilizando las capacidades de percepción visual del ser humano, la cual puede detectar patrones con mayor facilidad. Bajo este enfoque, la visualización ha sido utilizada en minería datos, mayormente en el análisis descriptivo de los datos (entrada) y en la presentación de los patrones (salida), dejando limitado este paradigma para el análisis de modelos. El presente documento describe el desarrollo de la Tesis Doctoral denominada “Nuevos Esquemas de Visualizaciones para Mejorar la Comprensibilidad de Modelos de Data Mining”. Esta investigación busca aportar con un enfoque de visualización para apoyar la comprensión de modelos minería de datos, para esto propone la metáfora de modelos visualmente aumentados. ABSTRACT The large amount of data to be recorded daily in the systems database of organizations has generated the need to analyze it. However, faced with the complexity of processing huge volumes of data over traditional methods of analysis. Moreover, in a globalized and competitive environment organizations are kept constantly looking to improve their processes, which require tools that allow them to make better decisions. This involves being bettered informed and knows your digital story to describe its processes and to anticipate (predict) unanticipated events. These new requirements of data analysis, has led to the increasing development of data-mining projects. The data-mining process seeks to obtain from a massive data set, models to describe the data or predict new instances in the set. It involves steps of data preparation, partially or fully automated processing to identify patterns in the data, and then get output patterns, relationships or rules. This output must mean new knowledge for the organization, useful and understandable for end users, and can be integrated into the process to support decision-making. However, the biggest challenge is just getting the data analyst involved in this process, which can identify models is complex and often requires experience not only of the data analyst, but also the expert in the problem domain. One way to support the analysis of the data, models and patterns, is through its visual representation, i.e., using the capabilities of human visual perception, which can detect patterns easily in any context. Under this approach, the visualization has been used in data mining, mostly in exploratory data analysis (input) and the presentation of the patterns (output), leaving limited this paradigm for analyzing models. This document describes the development of the doctoral thesis entitled "New Visualizations Schemes to Improve Understandability of Data-Mining Models". This research aims to provide a visualization approach to support understanding of data mining models for this proposed metaphor visually enhanced models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, data mining is based on low-level specications of the employed techniques typically bounded to a specic analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Here, we propose a model-driven approach based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (via data-warehousing technology) and the analysis models for data mining (tailored to a specic platform). Thus, analysts can concentrate on the analysis problem via conceptual data-mining models instead of low-level programming tasks related to the underlying-platform technical details. These tasks are now entrusted to the model-transformations scaffolding.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining is one of the most important analysis techniques to automatically extract knowledge from large amount of data. Nowadays, data mining is based on low-level specifications of the employed techniques typically bounded to a specific analysis platform. Therefore, data mining lacks a modelling architecture that allows analysts to consider it as a truly software-engineering process. Bearing in mind this situation, we propose a model-driven approach which is based on (i) a conceptual modelling framework for data mining, and (ii) a set of model transformations to automatically generate both the data under analysis (that is deployed via data-warehousing technology) and the analysis models for data mining (tailored to a specific platform). Thus, analysts can concentrate on understanding the analysis problem via conceptual data-mining models instead of wasting efforts on low-level programming tasks related to the underlying-platform technical details. These time consuming tasks are now entrusted to the model-transformations scaffolding. The feasibility of our approach is shown by means of a hypothetical data-mining scenario where a time series analysis is required.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data Mining, Learning from data, graphical models, possibility theory

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The nation's freeway systems are becoming increasingly congested. A major contribution to traffic congestion on freeways is due to traffic incidents. Traffic incidents are non-recurring events such as accidents or stranded vehicles that cause a temporary roadway capacity reduction, and they can account for as much as 60 percent of all traffic congestion on freeways. One major freeway incident management strategy involves diverting traffic to avoid incident locations by relaying timely information through Intelligent Transportation Systems (ITS) devices such as dynamic message signs or real-time traveler information systems. The decision to divert traffic depends foremost on the expected duration of an incident, which is difficult to predict. In addition, the duration of an incident is affected by many contributing factors. Determining and understanding these factors can help the process of identifying and developing better strategies to reduce incident durations and alleviate traffic congestion. A number of research studies have attempted to develop models to predict incident durations, yet with limited success. ^ This dissertation research attempts to improve on this previous effort by applying data mining techniques to a comprehensive incident database maintained by the District 4 ITS Office of the Florida Department of Transportation (FDOT). Two categories of incident duration prediction models were developed: "offline" models designed for use in the performance evaluation of incident management programs, and "online" models for real-time prediction of incident duration to aid in the decision making of traffic diversion in the event of an ongoing incident. Multiple data mining analysis techniques were applied and evaluated in the research. The multiple linear regression analysis and decision tree based method were applied to develop the offline models, and the rule-based method and a tree algorithm called M5P were used to develop the online models. ^ The results show that the models in general can achieve high prediction accuracy within acceptable time intervals of the actual durations. The research also identifies some new contributing factors that have not been examined in past studies. As part of the research effort, software code was developed to implement the models in the existing software system of District 4 FDOT for actual applications. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The semiarid region of northeastern Brazil, the Caatinga, is extremely important due to its biodiversity and endemism. Measurements of plant physiology are crucial to the calibration of Dynamic Global Vegetation Models (DGVMs) that are currently used to simulate the responses of vegetation in face of global changes. In a field work realized in an area of preserved Caatinga forest located in Petrolina, Pernambuco, measurements of carbon assimilation (in response to light and CO2) were performed on 11 individuals of Poincianella microphylla, a native species that is abundant in this region. These data were used to calibrate the maximum carboxylation velocity (Vcmax) used in the INLAND model. The calibration techniques used were Multiple Linear Regression (MLR), and data mining techniques as the Classification And Regression Tree (CART) and K-MEANS. The results were compared to the UNCALIBRATED model. It was found that simulated Gross Primary Productivity (GPP) reached 72% of observed GPP when using the calibrated Vcmax values, whereas the UNCALIBRATED approach accounted for 42% of observed GPP. Thus, this work shows the benefits of calibrating DGVMs using field ecophysiological measurements, especially in areas where field data is scarce or non-existent, such as in the Caatinga

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The principle of using induction rules based on spatial environmental data to model a soil map has previously been demonstrated Whilst the general pattern of classes of large spatial extent and those with close association with geology were delineated small classes and the detailed spatial pattern of the map were less well rendered Here we examine several strategies to improve the quality of the soil map models generated by rule induction Terrain attributes that are better suited to landscape description at a resolution of 250 m are introduced as predictors of soil type A map sampling strategy is developed Classification error is reduced by using boosting rather than cross validation to improve the model Further the benefit of incorporating the local spatial context for each environmental variable into the rule induction is examined The best model was achieved by sampling in proportion to the spatial extent of the mapped classes boosting the decision trees and using spatial contextual information extracted from the environmental variables.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mestrado em Engenharia Electrotécnica – Sistemas Eléctricos de Energia