19 resultados para process data
em Universidade do Minho
Resumo:
Programa Doutoral em Matemática e Aplicações.
Resumo:
As huge amounts of data become available in organizations and society, specific data analytics skills and techniques are needed to explore this data and extract from it useful patterns, tendencies, models or other useful knowledge, which could be used to support the decision-making process, to define new strategies or to understand what is happening in a specific field. Only with a deep understanding of a phenomenon it is possible to fight it. In this paper, a data-driven analytics approach is used for the analysis of the increasing incidence of fatalities by pneumonia in the Portuguese population, characterizing the disease and its incidence in terms of fatalities, knowledge that can be used to define appropriate strategies that can aim to reduce this phenomenon, which has increased more than 65% in a decade.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
During the last few years many research efforts have been done to improve the design of ETL (Extract-Transform-Load) systems. ETL systems are considered very time-consuming, error-prone and complex involving several participants from different knowledge domains. ETL processes are one of the most important components of a data warehousing system that are strongly influenced by the complexity of business requirements, their changing and evolution. These aspects influence not only the structure of a data warehouse but also the structures of the data sources involved with. To minimize the negative impact of such variables, we propose the use of ETL patterns to build specific ETL packages. In this paper, we formalize this approach using BPMN (Business Process Modelling Language) for modelling more conceptual ETL workflows, mapping them to real execution primitives through the use of a domain-specific language that allows for the generation of specific instances that can be executed in an ETL commercial tool.
Resumo:
Business Intelligence (BI) can be seen as a method that gathers information and data from information systems in order to help companies to be more accurate in their decision-making process. Traditionally BI systems were associated with the use of Data Warehouses (DW). The prime purpose of DW is to serve as a repository that stores all the relevant information required for making the correct decision. The necessity to integrate streaming data became crucial with the need to improve the efficiency and effectiveness of the decision process. In primary and secondary education, there is a lack of BI solutions. Due to the schools reality the main purpose of this study is to provide a Pervasive BI solution able to monitoring the schools and student data anywhere and anytime in real-time as well as disseminating the information through ubiquitous devices. The first task consisted in gathering data regarding the different choices made by the student since his enrolment in a certain school year until the end of it. Thereafter a dimensional model was developed in order to be possible building a BI platform. This paper presents the dimensional model, a set of pre-defined indicators, the Pervasive Business Intelligence characteristics and the prototype designed. The main contribution of this study was to offer to the schools a tool that could help them to make accurate decisions in real-time. Data dissemination was achieved through a localized application that can be accessed anywhere and anytime.
Resumo:
Worldwide, around 9% of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients’ observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96% and 98%, respectively.
Resumo:
Children are an especially vulnerable population, particularly in respect to drug administration. It is estimated that neonatal and pediatric patients are at least three times more vulnerable to damage due to adverse events and medication errors than adults are. With the development of this framework, it is intended the provision of a Clinical Decision Support System based on a prototype already tested in a real environment. The framework will include features such as preparation of Total Parenteral Nutrition prescriptions, table pediatric and neonatal emergency drugs, medical scales of morbidity and mortality, anthropometry percentiles (weight, length/height, head circumference and BMI), utilities for supporting medical decision on the treatment of neonatal jaundice and anemia and support for technical procedures and other calculators and widespread use tools. The solution in development means an extension of INTCare project. The main goal is to provide an approach to get the functionality at all times of clinical practice and outside the hospital environment for dissemination, education and simulation of hypothetical situations. The aim is also to develop an area for the study and analysis of information and extraction of knowledge from the data collected by the use of the system. This paper presents the architecture, their requirements and functionalities and a SWOT analysis of the solution proposed.
Resumo:
The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
Dissertação de mestrado em Systems Engineering
Resumo:
Dissertação de mestrado integrado em Mechanical Engineering
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação