9 resultados para Data classification
em Universidade do Minho
Resumo:
The MAP-i Doctoral Program of the Universities of Minho, Aveiro and Porto
Resumo:
Propolis is a chemically complex biomass produced by honeybees (Apis mellifera) from plant resins added of salivary enzymes, beeswax, and pollen. The biological activities described for propolis were also identified for donor plants resin, but a big challenge for the standardization of the chemical composition and biological effects of propolis remains on a better understanding of the influence of seasonality on the chemical constituents of that raw material. Since propolis quality depends, among other variables, on the local flora which is strongly influenced by (a)biotic factors over the seasons, to unravel the harvest season effect on the propolis chemical profile is an issue of recognized importance. For that, fast, cheap, and robust analytical techniques seem to be the best choice for large scale quality control processes in the most demanding markets, e.g., human health applications. For that, UV-Visible (UV-Vis) scanning spectrophotometry of hydroalcoholic extracts (HE) of seventy-three propolis samples, collected over the seasons in 2014 (summer, spring, autumn, and winter) and 2015 (summer and autumn) in Southern Brazil was adopted. Further machine learning and chemometrics techniques were applied to the UV-Vis dataset aiming to gain insights as to the seasonality effect on the claimed chemical heterogeneity of propolis samples determined by changes in the flora of the geographic region under study. Descriptive and classification models were built following a chemometric approach, i.e. principal component analysis (PCA) and hierarchical clustering analysis (HCA) supported by scripts written in the R language. The UV-Vis profiles associated with chemometric analysis allowed identifying a typical pattern in propolis samples collected in the summer. Importantly, the discrimination based on PCA could be improved by using the dataset of the fingerprint region of phenolic compounds ( = 280-400m), suggesting that besides the biological activities of those secondary metabolites, they also play a relevant role for the discrimination and classification of that complex matrix through bioinformatics tools. Finally, a series of machine learning approaches, e.g., partial least square-discriminant analysis (PLS-DA), k-Nearest Neighbors (kNN), and Decision Trees showed to be complementary to PCA and HCA, allowing to obtain relevant information as to the sample discrimination.
Resumo:
DNA microarrays are one of the most used technologies for gene expression measurement. However, there are several distinct microarray platforms, from different manufacturers, each with its own measurement protocol, resulting in data that can hardly be compared or directly integrated. Data integration from multiple sources aims to improve the assertiveness of statistical tests, reducing the data dimensionality problem. The integration of heterogeneous DNA microarray platforms comprehends a set of tasks that range from the re-annotation of the features used on gene expression, to data normalization and batch effect elimination. In this work, a complete methodology for gene expression data integration and application is proposed, which comprehends a transcript-based re-annotation process and several methods for batch effect attenuation. The integrated data will be used to select the best feature set and learning algorithm for a brain tumor classification case study. The integration will consider data from heterogeneous Agilent and Affymetrix platforms, collected from public gene expression databases, such as The Cancer Genome Atlas and Gene Expression Omnibus.
Resumo:
The chemical composition of propolis is affected by environmental factors and harvest season, making it difficult to standardize its extracts for medicinal usage. By detecting a typical chemical profile associated with propolis from a specific production region or season, certain types of propolis may be used to obtain a specific pharmacological activity. In this study, propolis from three agroecological regions (plain, plateau, and highlands) from southern Brazil, collected over the four seasons of 2010, were investigated through a novel NMR-based metabolomics data analysis workflow. Chemometrics and machine learning algorithms (PLS-DA and RF), including methods to estimate variable importance in classification, were used in this study. The machine learning and feature selection methods permitted construction of models for propolis sample classification with high accuracy (>75%, reaching 90% in the best case), better discriminating samples regarding their collection seasons comparatively to the harvest regions. PLS-DA and RF allowed the identification of biomarkers for sample discrimination, expanding the set of discriminating features and adding relevant information for the identification of the class-determining metabolites. The NMR-based metabolomics analytical platform, coupled to bioinformatic tools, allowed characterization and classification of Brazilian propolis samples regarding the metabolite signature of important compounds, i.e., chemical fingerprint, harvest seasons, and production regions.
Resumo:
Olive oil quality grading is traditionally assessed by human sensory evaluation of positive and negative attributes (olfactory, gustatory, and final olfactorygustatory sensations). However, it is not guaranteed that trained panelist can correctly classify monovarietal extra-virgin olive oils according to olive cultivar. In this work, the potential application of human (sensory panelists) and artificial (electronic tongue) sensory evaluation of olive oils was studied aiming to discriminate eight single-cultivar extra-virgin olive oils. Linear discriminant, partial least square discriminant, and sparse partial least square discriminant analyses were evaluated. The best predictive classification was obtained using linear discriminant analysis with simulated annealing selection algorithm. A low-level data fusion approach (18 electronic tongue signals and nine sensory attributes) enabled 100 % leave-one-out cross-validation correct classification, improving the discrimination capability of the individual use of sensor profiles or sensory attributes (70 and 57 % leave-one-out correct classifications, respectively). So, human sensory evaluation and electronic tongue analysis may be used as complementary tools allowing successful monovarietal olive oil discrimination.
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Dissertação de mestrado integrado em Engenharia Biomédica (área de especialização em Informática Médica)
Resumo:
An unsuitable patient flow as well as prolonged waiting lists in the emergency room of a maternity unit, regarding gynecology and obstetrics care, can affect the mother and child’s health, leading to adverse events and consequences regarding their safety and satisfaction. Predicting the patients’ waiting time in the emergency room is a means to avoid this problem. This study aims to predict the pre-triage waiting time in the emergency care of gynecology and obstetrics of Centro Materno Infantil do Norte (CMIN), the maternal and perinatal care unit of Centro Hospitalar of Oporto, situated in the north of Portugal. Data mining techniques were induced using information collected from the information systems and technologies available in CMIN. The models developed presented good results reaching accuracy and specificity values of approximately 74% and 94%, respectively. Additionally, the number of patients and triage professionals working in the emergency room, as well as some temporal variables were identified as direct enhancers to the pre-triage waiting time. The imp lementation of the attained knowledge in the decision support system and business intelligence platform, deployed in CMIN, leads to the optimization of the patient flow through the emergency room and improving the quality of services.
Resumo:
Patient blood pressure is an important vital signal to the physicians take a decision and to better understand the patient condition. In Intensive Care Units is possible monitoring the blood pressure due the fact of the patient being in continuous monitoring through bedside monitors and the use of sensors. The intensivist only have access to vital signs values when they look to the monitor or consult the values hourly collected. Most important is the sequence of the values collected, i.e., a set of highest or lowest values can signify a critical event and bring future complications to a patient as is Hypotension or Hypertension. This complications can leverage a set of dangerous diseases and side-effects. The main goal of this work is to predict the probability of a patient has a blood pressure critical event in the next hours by combining a set of patient data collected in real-time and using Data Mining classification techniques. As output the models indicate the probability (%) of a patient has a Blood Pressure Critical Event in the next hour. The achieved results showed to be very promising, presenting sensitivity around of 95%.