856 resultados para Data Driven Clustering
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Dissertação de mestrado integrado em Engenharia e Gestão de Sistemas de Informação
Resumo:
Tese de Doutoramento em Biologia Ambiental e Molecular
Resumo:
The algorithmic approach to data modelling has developed rapidly these last years, in particular methods based on data mining and machine learning have been used in a growing number of applications. These methods follow a data-driven methodology, aiming at providing the best possible generalization and predictive abilities instead of concentrating on the properties of the data model. One of the most successful groups of such methods is known as Support Vector algorithms. Following the fruitful developments in applying Support Vector algorithms to spatial data, this paper introduces a new extension of the traditional support vector regression (SVR) algorithm. This extension allows for the simultaneous modelling of environmental data at several spatial scales. The joint influence of environmental processes presenting different patterns at different scales is here learned automatically from data, providing the optimum mixture of short and large-scale models. The method is adaptive to the spatial scale of the data. With this advantage, it can provide efficient means to model local anomalies that may typically arise in situations at an early phase of an environmental emergency. However, the proposed approach still requires some prior knowledge on the possible existence of such short-scale patterns. This is a possible limitation of the method for its implementation in early warning systems. The purpose of this paper is to present the multi-scale SVR model and to illustrate its use with an application to the mapping of Cs137 activity given the measurements taken in the region of Briansk following the Chernobyl accident.
Resumo:
OBJECTIVES: To develop data-driven criteria for clinically inactive disease on and off therapy for juvenile dermatomyositis (JDM). METHODS: The Paediatric Rheumatology International Trials Organisation (PRINTO) database contains 275 patients with active JDM evaluated prospectively up to 24 months. Thirty-eight patients off therapy at 24 months were defined as clinically inactive and included in the reference group. These were compared with a random sample of 76 patients who had active disease at study baseline. Individual measures of muscle strength/endurance, muscle enzymes, physician's and parent's global disease activity/damage evaluations, inactive disease criteria derived from the literature and other ad hoc criteria were evaluated for sensitivity, specificity and Cohen's κ agreement. RESULTS: The individual measures that best characterised inactive disease (sensitivity and specificity >0.8 and Cohen's κ >0.8) were manual muscle testing (MMT) ≥78, physician global assessment of muscle activity=0, physician global assessment of overall disease activity (PhyGloVAS) ≤0.2, Childhood Myositis Assessment Scale (CMAS) ≥48, Disease Activity Score ≤3 and Myositis Disease Activity Assessment Visual Analogue Scale ≤0.2. The best combination of variables to classify a patient as being in a state of inactive disease on or off therapy is at least three of four of the following criteria: creatine kinase ≤150, CMAS ≥48, MMT ≥78 and PhyGloVAS ≤0.2. After 24 months, 30/31 patients (96.8%) were inactive off therapy and 69/145 (47.6%) were inactive on therapy. CONCLUSION: PRINTO established data-driven criteria with clearly evidence-based cut-off values to identify JDM patients with clinically inactive disease. These criteria can be used in clinical trials, in research and in clinical practice.
Resumo:
The investigation of perceptual and cognitive functions with non-invasive brain imaging methods critically depends on the careful selection of stimuli for use in experiments. For example, it must be verified that any observed effects follow from the parameter of interest (e.g. semantic category) rather than other low-level physical features (e.g. luminance, or spectral properties). Otherwise, interpretation of results is confounded. Often, researchers circumvent this issue by including additional control conditions or tasks, both of which are flawed and also prolong experiments. Here, we present some new approaches for controlling classes of stimuli intended for use in cognitive neuroscience, however these methods can be readily extrapolated to other applications and stimulus modalities. Our approach is comprised of two levels. The first level aims at equalizing individual stimuli in terms of their mean luminance. Each data point in the stimulus is adjusted to a standardized value based on a standard value across the stimulus battery. The second level analyzes two populations of stimuli along their spectral properties (i.e. spatial frequency) using a dissimilarity metric that equals the root mean square of the distance between two populations of objects as a function of spatial frequency along x- and y-dimensions of the image. Randomized permutations are used to obtain a minimal value between the populations to minimize, in a completely data-driven manner, the spectral differences between image sets. While another paper in this issue applies these methods in the case of acoustic stimuli (Aeschlimann et al., Brain Topogr 2008), we illustrate this approach here in detail for complex visual stimuli.
Multimodel inference and multimodel averaging in empirical modeling of occupational exposure levels.
Resumo:
Empirical modeling of exposure levels has been popular for identifying exposure determinants in occupational hygiene. Traditional data-driven methods used to choose a model on which to base inferences have typically not accounted for the uncertainty linked to the process of selecting the final model. Several new approaches propose making statistical inferences from a set of plausible models rather than from a single model regarded as 'best'. This paper introduces the multimodel averaging approach described in the monograph by Burnham and Anderson. In their approach, a set of plausible models are defined a priori by taking into account the sample size and previous knowledge of variables influent on exposure levels. The Akaike information criterion is then calculated to evaluate the relative support of the data for each model, expressed as Akaike weight, to be interpreted as the probability of the model being the best approximating model given the model set. The model weights can then be used to rank models, quantify the evidence favoring one over another, perform multimodel prediction, estimate the relative influence of the potential predictors and estimate multimodel-averaged effects of determinants. The whole approach is illustrated with the analysis of a data set of 1500 volatile organic compound exposure levels collected by the Institute for work and health (Lausanne, Switzerland) over 20 years, each concentration having been divided by the relevant Swiss occupational exposure limit and log-transformed before analysis. Multimodel inference represents a promising procedure for modeling exposure levels that incorporates the notion that several models can be supported by the data and permits to evaluate to a certain extent model selection uncertainty, which is seldom mentioned in current practice.
Resumo:
Self-consciousness has mostly been approached by philosophical enquiry and not by empirical neuroscientific study, leading to an overabundance of diverging theories and an absence of data-driven theories. Using robotic technology, we achieved specific bodily conflicts and induced predictable changes in a fundamental aspect of self-consciousness by altering where healthy subjects experienced themselves to be (self-location). Functional magnetic resonance imaging revealed that temporo-parietal junction (TPJ) activity reflected experimental changes in self-location that also depended on the first-person perspective due to visuo-tactile and visuo-vestibular conflicts. Moreover, in a large lesion analysis study of neurological patients with a well-defined state of abnormal self-location, brain damage was also localized at TPJ, providing causal evidence that TPJ encodes self-location. Our findings reveal that multisensory integration at the TPJ reflects one of the most fundamental subjective feelings of humans: the feeling of being an entity localized at a position in space and perceiving the world from this position and perspective.
Resumo:
The subthalamic nucleus (STN) is a small, glutamatergic nucleus situated in the diencephalon. A critical component of normal motor function, it has become a key target for deep brain stimulation in the treatment of Parkinson's disease. Animal studies have demonstrated the existence of three functional sub-zones but these have never been shown conclusively in humans. In this work, a data driven method with diffusion weighted imaging demonstrated that three distinct clusters exist within the human STN based on brain connectivity profiles. The STN was successfully sub-parcellated into these regions, demonstrating good correspondence with that described in the animal literature. The local connectivity of each sub-region supported the hypothesis of bilateral limbic, associative and motor regions occupying the anterior, mid and posterior portions of the nucleus respectively. This study is the first to achieve in-vivo, non-invasive anatomical parcellation of the human STN into three anatomical zones within normal diagnostic scan times, which has important future implications for deep brain stimulation surgery.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
A ubiquitous assessment of swimming velocity (main metric of the performance) is essential for the coach to provide a tailored feedback to the trainee. We present a probabilistic framework for the data-driven estimation of the swimming velocity at every cycle using a low-cost wearable inertial measurement unit (IMU). The statistical validation of the method on 15 swimmers shows that an average relative error of 0.1 ± 9.6% and high correlation with the tethered reference system (rX,Y=0.91 ) is achievable. Besides, a simple tool to analyze the influence of sacrum kinematics on the performance is provided.
Resumo:
In 2000 the European Statistical Office published the guidelines for developing theHarmonized European Time Use Surveys system. Under such a unified framework,the first Time Use Survey of national scope was conducted in Spain during 2002–03. The aim of these surveys is to understand human behavior and the lifestyle ofpeople. Time allocation data are of compositional nature in origin, that is, they aresubject to non-negativity and constant-sum constraints. Thus, standard multivariatetechniques cannot be directly applied to analyze them. The goal of this work is toidentify homogeneous Spanish Autonomous Communities with regard to the typicalactivity pattern of their respective populations. To this end, fuzzy clustering approachis followed. Rather than the hard partitioning of classical clustering, where objects areallocated to only a single group, fuzzy method identify overlapping groups of objectsby allowing them to belong to more than one group. Concretely, the probabilistic fuzzyc-means algorithm is conveniently adapted to deal with the Spanish Time Use Surveymicrodata. As a result, a map distinguishing Autonomous Communities with similaractivity pattern is drawn.Key words: Time use data, Fuzzy clustering; FCM; simplex space; Aitchison distance
Resumo:
Gut microbiota has recently been proposed as a crucial environmental factor in the development of metabolic diseases such as obesity and type 2 diabetes, mainly due to its contribution in the modulation of several processes including host energy metabolism, gut epithelial permeability, gut peptide hormone secretion, and host inflammatory state. Since the symbiotic interaction between the gut microbiota and the host is essentially reflected in specific metabolic signatures, much expectation is placed on the application of metabolomic approaches to unveil the key mechanisms linking the gut microbiota composition and activity with disease development. The present review aims to summarize the gut microbial-host co-metabolites identified so far by targeted and untargeted metabolomic studies in humans, in association with impaired glucose homeostasis and/or obesity. An alteration of the co-metabolism of bile acids, branched fatty acids, choline, vitamins (i.e., niacin), purines, and phenolic compounds has been associated so far with the obese or diabese phenotype, in respect to healthy controls. Furthermore, anti-diabetic treatments such as metformin and sulfonylurea have been observed to modulate the gut microbiota or at least their metabolic profiles, thereby potentially affecting insulin resistance through indirect mechanisms still unknown. Despite the scarcity of the metabolomic studies currently available on the microbial-host crosstalk, the data-driven results largely confirmed findings independently obtained from in vitro and animal model studies, putting forward the mechanisms underlying the implication of a dysfunctional gut microbiota in the development of metabolic disorders.
Resumo:
Functional connectivity (FC) as measured by correlation between fMRI BOLD time courses of distinct brain regions has revealed meaningful organization of spontaneous fluctuations in the resting brain. However, an increasing amount of evidence points to non-stationarity of FC; i.e., FC dynamically changes over time reflecting additional and rich information about brain organization, but representing new challenges for analysis and interpretation. Here, we propose a data-driven approach based on principal component analysis (PCA) to reveal hidden patterns of coherent FC dynamics across multiple subjects. We demonstrate the feasibility and relevance of this new approach by examining the differences in dynamic FC between 13 healthy control subjects and 15 minimally disabled relapse-remitting multiple sclerosis patients. We estimated whole-brain dynamic FC of regionally-averaged BOLD activity using sliding time windows. We then used PCA to identify FC patterns, termed "eigenconnectivities", that reflect meaningful patterns in FC fluctuations. We then assessed the contributions of these patterns to the dynamic FC at any given time point and identified a network of connections centered on the default-mode network with altered contribution in patients. Our results complement traditional stationary analyses, and reveal novel insights into brain connectivity dynamics and their modulation in a neurodegenerative disease.
Resumo:
OBJECTIVE: To develop a provisional definition for the evaluation of response to therapy in juvenile dermatomyositis (DM) based on the Paediatric Rheumatology International Trials Organisation juvenile DM core set of variables. METHODS: Thirty-seven experienced pediatric rheumatologists from 27 countries achieved consensus on 128 difficult patient profiles as clinically improved or not improved using a stepwise approach (patient's rating, statistical analysis, definition selection). Using the physicians' consensus ratings as the "gold standard measure," chi-square, sensitivity, specificity, false-positive and-negative rates, area under the receiver operating characteristic curve, and kappa agreement for candidate definitions of improvement were calculated. Definitions with kappa values >0.8 were multiplied by the face validity score to select the top definitions. RESULTS: The top definition of improvement was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 1 of the remaining worsening by more than 30%, which cannot be muscle strength. The second-highest scoring definition was at least 20% improvement from baseline in 3 of 6 core set variables with no more than 2 of the remaining worsening by more than 25%, which cannot be muscle strength (definition P1 selected by the International Myositis Assessment and Clinical Studies group). The third is similar to the second with the maximum amount of worsening set to 30%. This indicates convergent validity of the process. CONCLUSION: We propose a provisional data-driven definition of improvement that reflects well the consensus rating of experienced clinicians, which incorporates clinically meaningful change in core set variables in a composite end point for the evaluation of global response to therapy in juvenile DM.