12 resultados para Multiple data
em Universidade do Minho
Resumo:
Dissertação de Mestrado em Engenharia Informática
Resumo:
Hospitals have multiple data sources, such as embedded systems, monitors and sensors. The number of data available is increasing and the information are used not only to care the patient but also to assist the decision processes. The introduction of intelligent environments in health care institutions has been adopted due their ability to provide useful information for health professionals, either in helping to identify prognosis or also to understand patient condition. Behind of this concept arises this Intelligent System to track patient condition (e.g. critic events) in health care. This system has the great advantage of being adaptable to the environment and user needs. The system is focused in identifying critic events from data streaming (e.g. vital signs and ventilation) which is particularly valuable for understanding the patient’s condition. This work aims to demonstrate the process of creating an intelligent system capable of operating in a real environment using streaming data provided by ventilators and vital signs monitors. Its development is important to the physician because becomes possible crossing multiple variables in real-time by analyzing if a value is critic or not and if their variation has or not clinical importance.
Resumo:
Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
Resumo:
The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
Compelling biological and epidemiological evidences point to a key role of genetic variants of the TERT and TERC genes in cancer development. We analyzed the genetic variability of these two gene regions using samples of 2,267 multiple myeloma (MM) cases and 2,796 healthy controls. We found that a TERT variant, rs2242652, is associated with reduced MM susceptibility (OR?=?0.81; 95% CI: 0.72-0.92; p?=?0.001). In addition we measured the leukocyte telomere length (LTL) in a subgroup of 140 cases who were chemotherapy-free at the time of blood donation and 468 controls, and found that MM patients had longer telomeres compared to controls (OR?=?1.19; 95% CI: 0.63-2.24; ptrend ?=?0.01 comparing the quartile with the longest LTL versus the shortest LTL). Our data suggest the hypothesis of decreased disease risk by genetic variants that reduce the efficiency of the telomerase complex. This reduced efficiency leads to shorter telomere ends, which in turn may also be a marker of decreased MM risk.
Resumo:
This paper describes the concept, technical realisation and validation of a largely data-driven method to model events with Z→ττ decays. In Z→μμ events selected from proton-proton collision data recorded at s√=8 TeV with the ATLAS experiment at the LHC in 2012, the Z decay muons are replaced by τ leptons from simulated Z→ττ decays at the level of reconstructed tracks and calorimeter cells. The τ lepton kinematics are derived from the kinematics of the original muons. Thus, only the well-understood decays of the Z boson and τ leptons as well as the detector response to the τ decay products are obtained from simulation. All other aspects of the event, such as the Z boson and jet kinematics as well as effects from multiple interactions, are given by the actual data. This so-called τ-embedding method is particularly relevant for Higgs boson searches and analyses in ττ final states, where Z→ττ decays constitute a large irreducible background that cannot be obtained directly from data control samples.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of net work partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conflicts. Another aspect of eventual consistency is detecting write conflicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. Weintroduceanewcausalitymanagementframeworkforeventuallyconsistentdatastores,thatleveragesnodelogicalclocks(BitmappedVersion Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new efficient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.
Resumo:
Dissertação de mestrado integrado em Engenharia Civil
Resumo:
The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00275
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.