771 resultados para Multi-relational data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper discusses the results of applied research on the eco-driving domain based on a huge data set produced from a fleet of Lisbon's public transportation buses for a three-year period. This data set is based on events automatically extracted from the control area network bus and enriched with GPS coordinates, weather conditions, and road information. We apply online analytical processing (OLAP) and knowledge discovery (KD) techniques to deal with the high volume of this data set and to determine the major factors that influence the average fuel consumption, and then classify the drivers involved according to their driving efficiency. Consequently, we identify the most appropriate driving practices and styles. Our findings show that introducing simple practices, such as optimal clutch, engine rotation, and engine running in idle, can reduce fuel consumption on average from 3 to 5l/100 km, meaning a saving of 30 l per bus on one day. These findings have been strongly considered in the drivers' training sessions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

More than ever, there is an increase of the number of decision support methods and computer aided diagnostic systems applied to various areas of medicine. In breast cancer research, many works have been done in order to reduce false-positives when used as a double reading method. In this study, we aimed to present a set of data mining techniques that were applied to approach a decision support system in the area of breast cancer diagnosis. This method is geared to assist clinical practice in identifying mammographic findings such as microcalcifications, masses and even normal tissues, in order to avoid misdiagnosis. In this work a reliable database was used, with 410 images from about 115 patients, containing previous reviews performed by radiologists as microcalcifications, masses and also normal tissue findings. Throughout this work, two feature extraction techniques were used: the gray level co-occurrence matrix and the gray level run length matrix. For classification purposes, we considered various scenarios according to different distinct patterns of injuries and several classifiers in order to distinguish the best performance in each case described. The many classifiers used were Naïve Bayes, Support Vector Machines, k-nearest Neighbors and Decision Trees (J48 and Random Forests). The results in distinguishing mammographic findings revealed great percentages of PPV and very good accuracy values. Furthermore, it also presented other related results of classification of breast density and BI-RADS® scale. The best predictive method found for all tested groups was the Random Forest classifier, and the best performance has been achieved through the distinction of microcalcifications. The conclusions based on the several tested scenarios represent a new perspective in breast cancer diagnosis using data mining techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an electricity medium voltage (MV) customer characterization framework supportedby knowledge discovery in database (KDD). The main idea is to identify typical load profiles (TLP) of MVconsumers and to develop a rule set for the automatic classification of new consumers. To achieve ourgoal a methodology is proposed consisting of several steps: data pre-processing; application of severalclustering algorithms to segment the daily load profiles; selection of the best partition, corresponding tothe best consumers’ segmentation, based on the assessments of several clustering validity indices; andfinally, a classification model is built based on the resulting clusters. To validate the proposed framework,a case study which includes a real database of MV consumers is performed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Harnessing idle PCs CPU cycles, storage space and other resources of networked computers to collaborative are mainly fixated on for all major grid computing research projects. Most of the university computers labs are occupied with the high puissant desktop PC nowadays. It is plausible to notice that most of the time machines are lying idle or wasting their computing power without utilizing in felicitous ways. However, for intricate quandaries and for analyzing astronomically immense amounts of data, sizably voluminous computational resources are required. For such quandaries, one may run the analysis algorithms in very puissant and expensive computers, which reduces the number of users that can afford such data analysis tasks. Instead of utilizing single expensive machines, distributed computing systems, offers the possibility of utilizing a set of much less expensive machines to do the same task. BOINC and Condor projects have been prosperously utilized for solving authentic scientific research works around the world at a low cost. In this work the main goal is to explore both distributed computing to implement, Condor and BOINC, and utilize their potency to harness the ideal PCs resources for the academic researchers to utilize in their research work. In this thesis, Data mining tasks have been performed in implementation of several machine learning algorithms on the distributed computing environment.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data Mining (DM) methods are being increasingly used in prediction with time series data, in addition to traditional statistical approaches. This paper presents a literature review of the use of DM with time series data, focusing on short- time stocks prediction. This is an area that has been attracting a great deal of attention from researchers in the field. The main contribution of this paper is to provide an outline of the use of DM with time series data, using mainly examples related with short-term stocks prediction. This is important to a better understanding of the field. Some of the main trends and open issues will also be introduced.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex systems, i.e. systems composed of a large set of elements interacting in a non-linear way, are constantly found all around us. In the last decades, different approaches have been proposed toward their understanding, one of the most interesting being the Complex Network perspective. This legacy of the 18th century mathematical concepts proposed by Leonhard Euler is still current, and more and more relevant in real-world problems. In recent years, it has been demonstrated that network-based representations can yield relevant knowledge about complex systems. In spite of that, several problems have been detected, mainly related to the degree of subjectivity involved in the creation and evaluation of such network structures. In this Thesis, we propose addressing these problems by means of different data mining techniques, thus obtaining a novel hybrid approximation intermingling complex networks and data mining. Results indicate that such techniques can be effectively used to i) enable the creation of novel network representations, ii) reduce the dimensionality of analyzed systems by pre-selecting the most important elements, iii) describe complex networks, and iv) assist in the analysis of different network topologies. The soundness of such approach is validated through different validation cases drawn from actual biomedical problems, e.g. the diagnosis of cancer from tissue analysis, or the study of the dynamics of the brain under different neurological disorders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Qualquer assunto relacionado com a saúde é sempre um tema sensível, pela importância que tem junto da população, já que interage diretamente com o bem-estar das pessoas e, essencialmente, com a sensação de segurança que as estas pretendem ter na prestação dos cuidados básicos de saúde. Dados estatísticos mostram que a população está cada vez mais envelhecida, reforçando a importância da existência de bons centros hospitalares e de um bom Sistema Nacional de Saúde (SNS) (Plano Nacional de Saúde, 2010). Em Portugal, caso os pacientes necessitem de cuidados mais urgentes, podem recorrer ao Serviço de Urgências disponibilizado para toda a população através do SNS. No entanto, a gestão e planeamento deste serviço é complexa, dado este serviço ser frequentemente utilizado por pacientes que não necessitam de cuidados urgentes, levando a que os hospitais deixem de conseguir dar a resposta esperada, implicando a prestação por vezes um serviço de menor qualidade. Neste sentido, analisaram-se dados de um hospital do norte do país com o intuito de perceber o ponto de situação das urgências, de forma a encontrar padrões relevantes através da análise de clusters e de regras de associação. Começando pela análise de clusters, utilizaram-se apenas as variáveis que foram consideradas importantes para o problema, resultando da análise final 3 clusters. O primeiro cluster é constituído por elementos do sexo masculino de todas as idades, o segundo cluster por elementos do sexo masculino mais jovens e por elementos do sexo feminino até aos 60 anos e o terceiro cluster apenas por elementos do sexo feminino a partir dos 40 anos. No final verificaram-se muitas semelhanças entre os clusters 1 e 3, pois ambos continham os pacientes mais idosos, havendo um padrão comum no seu comportamento. No ano 2012 não houve registo de nenhuma epidemia, não havendo por isso nenhuma doença que se destacasse comparativamente às restantes. Concluiu-se também que na maior parte dos casos houve a necessidade de uma intervenção urgente (pulseira de cor Amarela), no entanto a maioria dos pacientes observados conseguiu regressar às suas habitações após as consultas nas Urgências Hospitalares, sem intervenções médicas adicionais. Relativamente às regras de associação, houve a necessidade de transformar e eliminar algumas variáveis que enviesassem o estudo. Após o processo da criação das regras de associação, percebeu-se que as regras eram muito similares entre si, apresentando uma maior confiança nas variáveis que apareceram em maior número (“Pacientes com pulseira de cor Amarela”, “distrito do Porto” ou “Alta Médica para a Residência”).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Worldwide, around 9% of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients’ observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96% and 98%, respectively.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Lecture Notes in Computer Science, 9273

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Maternity Care, a quick decision has to be made about the most suitable delivery type for the current patient. Guidelines are followed by physicians to support that decision; however, those practice recommendations are limited and underused. In the last years, caesarean delivery has been pursued in over 28% of pregnancies, and other operative techniques regarding specific problems have also been excessively employed. This study identifies obstetric and pregnancy factors that can be used to predict the most appropriate delivery technique, through the induction of data mining models using real data gathered in the perinatal and maternal care unit of Centro Hospitalar of Oporto (CHP). Predicting the type of birth envisions high-quality services, increased safety and effectiveness of specific practices to help guide maternity care decisions and facilitate optimal outcomes in mother and child. In this work was possible to acquire good results, achieving sensitivity and specificity values of 90.11% and 80.05%, respectively, providing the CHP with a model capable of correctly identify caesarean sections and vaginal deliveries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Rockburst is characterized by a violent explosion of a block causing a sudden rupture in the rock and is quite common in deep tunnels. It is critical to understand the phenomenon of rockburst, focusing on the patterns of occurrence so these events can be avoided and/or managed saving costs and possibly lives. The failure mechanism of rockburst needs to be better understood. Laboratory experiments are undergoing at the Laboratory for Geomechanics and Deep Underground Engineering (SKLGDUE) of Beijing and the system is described. A large number of rockburst tests were performed and their information collected, stored in a database and analyzed. Data Mining (DM) techniques were applied to the database in order to develop predictive models for the rockburst maximum stress (σRB) and rockburst risk index (IRB) that need the results of such tests to be determined. With the developed models it is possible to predict these parameters with high accuracy levels using data from the rock mass and specific project.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The research aimed to establish tyre-road noise models by using a Data Mining approach that allowed to build a predictive model and assess the importance of the tested input variables. The data modelling took into account three learning algorithms and three metrics to define the best predictive model. The variables tested included basic properties of pavement surfaces, macrotexture, megatexture, and uneven- ness and, for the first time, damping. Also, the importance of those variables was measured by using a sensitivity analysis procedure. Two types of models were set: one with basic variables and another with complex variables, such as megatexture and damping, all as a function of vehicles speed. More detailed models were additionally set by the speed level. As a result, several models with very good tyre-road noise predictive capacity were achieved. The most relevant variables were Speed, Temperature, Aggregate size, Mean Profile Depth, and Damping, which had the highest importance, even though influenced by speed. Megatexture and IRI had the lowest importance. The applicability of the models developed in this work is relevant for trucks tyre-noise prediction, represented by the AVON V4 test tyre, at the early stage of road pavements use. Therefore, the obtained models are highly useful for the design of pavements and for noise prediction by road authorities and contractors.