856 resultados para Data Driven Clustering
Resumo:
Detecting local differences between groups of connectomes is a great challenge in neuroimaging, because the large number of tests that have to be performed and the impact on multiplicity correction. Any available information should be exploited to increase the power of detecting true between-group effects. We present an adaptive strategy that exploits the data structure and the prior information concerning positive dependence between nodes and connections, without relying on strong assumptions. As a first step, we decompose the brain network, i.e., the connectome, into subnetworks and we apply a screening at the subnetwork level. The subnetworks are defined either according to prior knowledge or by applying a data driven algorithm. Given the results of the screening step, a filtering is performed to seek real differences at the node/connection level. The proposed strategy could be used to strongly control either the family-wise error rate or the false discovery rate. We show by means of different simulations the benefit of the proposed strategy, and we present a real application of comparing connectomes of preschool children and adolescents.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
This study investigated the spatial, spectral, temporal and functional proprieties of functional brain connections involved in the concurrent execution of unrelated visual perception and working memory tasks. Electroencephalography data was analysed using a novel data-driven approach assessing source coherence at the whole-brain level. Three connections in the beta-band (18-24 Hz) and one in the gamma-band (30-40 Hz) were modulated by dual-task performance. Beta-coherence increased within two dorsofrontal-occipital connections in dual-task conditions compared to the single-task condition, with the highest coherence seen during low working memory load trials. In contrast, beta-coherence in a prefrontal-occipital functional connection and gamma-coherence in an inferior frontal-occipitoparietal connection was not affected by the addition of the second task and only showed elevated coherence under high working memory load. Analysis of coherence as a function of time suggested that the dorsofrontal-occipital beta-connections were relevant to working memory maintenance, while the prefrontal-occipital beta-connection and the inferior frontal-occipitoparietal gamma-connection were involved in top-down control of concurrent visual processing. The fact that increased coherence in the gamma-connection, from low to high working memory load, was negatively correlated with faster reaction time on the perception task supports this interpretation. Together, these results demonstrate that dual-task demands trigger non-linear changes in functional interactions between frontal-executive and occipitoparietal-perceptual cortices.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
In this paper, we develop a data-driven methodology to characterize the likelihood of orographic precipitation enhancement using sequences of weather radar images and a digital elevation model (DEM). Geographical locations with topographic characteristics favorable to enforce repeatable and persistent orographic precipitation such as stationary cells, upslope rainfall enhancement, and repeated convective initiation are detected by analyzing the spatial distribution of a set of precipitation cells extracted from radar imagery. Topographic features such as terrain convexity and gradients computed from the DEM at multiple spatial scales as well as velocity fields estimated from sequences of weather radar images are used as explanatory factors to describe the occurrence of localized precipitation enhancement. The latter is represented as a binary process by defining a threshold on the number of cell occurrences at particular locations. Both two-class and one-class support vector machine classifiers are tested to separate the presumed orographic cells from the nonorographic ones in the space of contributing topographic and flow features. Site-based validation is carried out to estimate realistic generalization skills of the obtained spatial prediction models. Due to the high class separability, the decision function of the classifiers can be interpreted as a likelihood or susceptibility of orographic precipitation enhancement. The developed approach can serve as a basis for refining radar-based quantitative precipitation estimates and short-term forecasts or for generating stochastic precipitation ensembles conditioned on the local topography.
Resumo:
Résumé Suite aux recentes avancées technologiques, les archives d'images digitales ont connu une croissance qualitative et quantitative sans précédent. Malgré les énormes possibilités qu'elles offrent, ces avancées posent de nouvelles questions quant au traitement des masses de données saisies. Cette question est à la base de cette Thèse: les problèmes de traitement d'information digitale à très haute résolution spatiale et/ou spectrale y sont considérés en recourant à des approches d'apprentissage statistique, les méthodes à noyau. Cette Thèse étudie des problèmes de classification d'images, c'est à dire de catégorisation de pixels en un nombre réduit de classes refletant les propriétés spectrales et contextuelles des objets qu'elles représentent. L'accent est mis sur l'efficience des algorithmes, ainsi que sur leur simplicité, de manière à augmenter leur potentiel d'implementation pour les utilisateurs. De plus, le défi de cette Thèse est de rester proche des problèmes concrets des utilisateurs d'images satellite sans pour autant perdre de vue l'intéret des méthodes proposées pour le milieu du machine learning dont elles sont issues. En ce sens, ce travail joue la carte de la transdisciplinarité en maintenant un lien fort entre les deux sciences dans tous les développements proposés. Quatre modèles sont proposés: le premier répond au problème de la haute dimensionalité et de la redondance des données par un modèle optimisant les performances en classification en s'adaptant aux particularités de l'image. Ceci est rendu possible par un système de ranking des variables (les bandes) qui est optimisé en même temps que le modèle de base: ce faisant, seules les variables importantes pour résoudre le problème sont utilisées par le classifieur. Le manque d'information étiquétée et l'incertitude quant à sa pertinence pour le problème sont à la source des deux modèles suivants, basés respectivement sur l'apprentissage actif et les méthodes semi-supervisées: le premier permet d'améliorer la qualité d'un ensemble d'entraînement par interaction directe entre l'utilisateur et la machine, alors que le deuxième utilise les pixels non étiquetés pour améliorer la description des données disponibles et la robustesse du modèle. Enfin, le dernier modèle proposé considère la question plus théorique de la structure entre les outputs: l'intègration de cette source d'information, jusqu'à présent jamais considérée en télédétection, ouvre des nouveaux défis de recherche. Advanced kernel methods for remote sensing image classification Devis Tuia Institut de Géomatique et d'Analyse du Risque September 2009 Abstract The technical developments in recent years have brought the quantity and quality of digital information to an unprecedented level, as enormous archives of satellite images are available to the users. However, even if these advances open more and more possibilities in the use of digital imagery, they also rise several problems of storage and treatment. The latter is considered in this Thesis: the processing of very high spatial and spectral resolution images is treated with approaches based on data-driven algorithms relying on kernel methods. In particular, the problem of image classification, i.e. the categorization of the image's pixels into a reduced number of classes reflecting spectral and contextual properties, is studied through the different models presented. The accent is put on algorithmic efficiency and the simplicity of the approaches proposed, to avoid too complex models that would not be used by users. The major challenge of the Thesis is to remain close to concrete remote sensing problems, without losing the methodological interest from the machine learning viewpoint: in this sense, this work aims at building a bridge between the machine learning and remote sensing communities and all the models proposed have been developed keeping in mind the need for such a synergy. Four models are proposed: first, an adaptive model learning the relevant image features has been proposed to solve the problem of high dimensionality and collinearity of the image features. This model provides automatically an accurate classifier and a ranking of the relevance of the single features. The scarcity and unreliability of labeled. information were the common root of the second and third models proposed: when confronted to such problems, the user can either construct the labeled set iteratively by direct interaction with the machine or use the unlabeled data to increase robustness and quality of the description of data. Both solutions have been explored resulting into two methodological contributions, based respectively on active learning and semisupervised learning. Finally, the more theoretical issue of structured outputs has been considered in the last model, which, by integrating outputs similarity into a model, opens new challenges and opportunities for remote sensing image processing.
Resumo:
This paper presents multiple kernel learning (MKL) regression as an exploratory spatial data analysis and modelling tool. The MKL approach is introduced as an extension of support vector regression, where MKL uses dedicated kernels to divide a given task into sub-problems and to treat them separately in an effective way. It provides better interpretability to non-linear robust kernel regression at the cost of a more complex numerical optimization. In particular, we investigate the use of MKL as a tool that allows us to avoid using ad-hoc topographic indices as covariables in statistical models in complex terrains. Instead, MKL learns these relationships from the data in a non-parametric fashion. A study on data simulated from real terrain features confirms the ability of MKL to enhance the interpretability of data-driven models and to aid feature selection without degrading predictive performances. Here we examine the stability of the MKL algorithm with respect to the number of training data samples and to the presence of noise. The results of a real case study are also presented, where MKL is able to exploit a large set of terrain features computed at multiple spatial scales, when predicting mean wind speed in an Alpine region.
Resumo:
The recognition that colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behaviour and response to therapy translates into an urgent need for robust molecular disease subclassifiers that can explain this heterogeneity beyond current parameters (MSI, KRAS, BRAF). Attempts to fill this gap are emerging. The Cancer Genome Atlas (TGCA) reported two main CRC groups, based on the incidence and spectrum of mutated genes, and another paper reported an EMT expression signature defined subgroup. We performed a prior free analysis of CRC heterogeneity on 1113 CRC gene expression profiles and confronted our findings to established molecular determinants and clinical, histopathological and survival data. Unsupervised clustering based on gene modules allowed us to distinguish at least five different gene expression CRC subtypes, which we call surface crypt-like, lower crypt-like, CIMP-H-like, mesenchymal and mixed. A gene set enrichment analysis combined with literature search of gene module members identified distinct biological motifs in different subtypes. The subtypes, which were not derived based on outcome, nonetheless showed differences in prognosis. Known gene copy number variations and mutations in key cancer-associated genes differed between subtypes, but the subtypes provided molecular information beyond that contained in these variables. Morphological features significantly differed between subtypes. The objective existence of the subtypes and their clinical and molecular characteristics were validated in an independent set of 720 CRC expression profiles. Our subtypes provide a novel perspective on the heterogeneity of CRC. The proposed subtypes should be further explored retrospectively on existing clinical trial datasets and, when sufficiently robust, be prospectively assessed for clinical relevance in terms of prognosis and treatment response predictive capacity. Original microarray data were uploaded to the ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) under Accession Nos E-MTAB-990 and E-MTAB-1026. © 2013 Swiss Institute of Bioinformatics. Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Resumo:
The objective of this work was to evaluate the efficiency of EST‑SSR markers in the assessment of the genetic diversity of rubber tree genotypes (Hevea brasiliensis) and to verify the transferability of these markers for wild species of Hevea. Forty‑five rubber tree accessions from the Instituto Agronômico (Campinas, SP, Brazil) and six wild species were used. Information provided by modified Roger's genetic distance were used to analyze EST‑SSR data. UPGMA clustering divided the samples into two major groups with high genetic differentiation, while the software Structure distributed the 51 clones into eight groups. A parallel could be established between both clustering analyses. The 30 polymorphic EST‑SSRs showed from two to ten alleles and were efficient in amplifying the six wild species. Functional EST‑SSR microsatellites are efficient in evaluating the genetic diversity among rubber tree clones and can be used to translate the genetic differences among cultivars and to fingerprint closely related materials. The accessions from the Instituto Agronômico show high genetic diversity. The EST‑SSR markers, developed from Hevea brasiliensis, show transferability and are able to amplify other species of Hevea.
Resumo:
The hydrological and biogeochemical processes that operate in catchments influence the ecological quality of freshwater systems through delivery of fine sediment, nutrients and organic matter. Most models that seek to characterise the delivery of diffuse pollutants from land to water are reductionist. The multitude of processes that are parameterised in such models to ensure generic applicability make them complex and difficult to test on available data. Here, we outline an alternative - data-driven - inverse approach. We apply SCIMAP, a parsimonious risk based model that has an explicit treatment of hydrological connectivity. we take a Bayesian approach to the inverse problem of determining the risk that must be assigned to different land uses in a catchment in order to explain the spatial patterns of measured in-stream nutrient concentrations. We apply the model to identify the key sources of nitrogen (N) and phosphorus (P) diffuse pollution risk in eleven UK catchments covering a range of landscapes. The model results show that: 1) some land use generates a consistently high or low risk of diffuse nutrient pollution; but 2) the risks associated with different land uses vary both between catchments and between nutrients; and 3) that the dominant sources of P and N risk in the catchment are often a function of the spatial configuration of land uses. Taken on a case-by-case basis, this type of inverse approach may be used to help prioritise the focus of interventions to reduce diffuse pollution risk for freshwater ecosystems. (C) 2012 Elsevier B.V. All rights reserved.
Resumo:
Interactions between stimuli's acoustic features and experience-based internal models of the environment enable listeners to compensate for the disruptions in auditory streams that are regularly encountered in noisy environments. However, whether auditory gaps are filled in predictively or restored a posteriori remains unclear. The current lack of positive statistical evidence that internal models can actually shape brain activity as would real sounds precludes accepting predictive accounts of filling-in phenomenon. We investigated the neurophysiological effects of internal models by testing whether single-trial electrophysiological responses to omitted sounds in a rule-based sequence of tones with varying pitch could be decoded from the responses to real sounds and by analyzing the ERPs to the omissions with data-driven electrical neuroimaging methods. The decoding of the brain responses to different expected, but omitted, tones in both passive and active listening conditions was above chance based on the responses to the real sound in active listening conditions. Topographic ERP analyses and electrical source estimations revealed that, in the absence of any stimulation, experience-based internal models elicit an electrophysiological activity different from noise and that the temporal dynamics of this activity depend on attention. We further found that the expected change in pitch direction of omitted tones modulated the activity of left posterior temporal areas 140-200 msec after the onset of omissions. Collectively, our results indicate that, even in the absence of any stimulation, internal models modulate brain activity as do real sounds, indicating that auditory filling in can be accounted for by predictive activity.
Resumo:
BACKGROUND: Little is known about the long-term changes in the functioning of schizophrenia patients receiving maintenance therapy with olanzapine long-acting injection (LAI), and whether observed changes differ from those seen with oral olanzapine. METHODS: This study describes changes in the levels of functioning among outpatients with schizophrenia treated with olanzapine-LAI compared with oral olanzapine over 2 years. This was a secondary analysis of data from a multicenter, randomized, open-label, 2-year study comparing the long-term treatment effectiveness of monthly olanzapine-LAI (405 mg/4 weeks; n=264) with daily oral olanzapine (10 mg/day; n=260). Levels of functioning were assessed with the Heinrichs-Carpenter Quality of Life Scale. Functional status was also classified as 'good', 'moderate', or 'poor', using a previous data-driven approach. Changes in functional levels were assessed with McNemar's test and comparisons between olanzapine-LAI and oral olanzapine employed the Student's t-test. RESULTS: Over the 2-year study, the patients treated with olanzapine-LAI improved their level of functioning (per Quality of Life total score) from 64.0-70.8 (P<0.001). Patients on oral olanzapine also increased their level of functioning from 62.1-70.1 (P<0.001). At baseline, 19.2% of the olanzapine-LAI-treated patients had a 'good' level of functioning, which increased to 27.5% (P<0.05). The figures for oral olanzapine were 14.2% and 24.5%, respectively (P<0.001). Results did not significantly differ between olanzapine-LAI and oral olanzapine. CONCLUSION: In this 2-year, open-label, randomized study of olanzapine-LAI, outpatients with schizophrenia maintained or improved their favorable baseline level of functioning over time. Results did not significantly differ between olanzapine-LAI and oral olanzapine.
Resumo:
Robotic grasping has been studied increasingly for a few decades. While progress has been made in this field, robotic hands are still nowhere near the capability of human hands. However, in the past few years, the increase in computational power and the availability of commercial tactile sensors have made it easier to develop techniques that exploit the feedback from the hand itself, the sense of touch. The focus of this thesis lies in the use of this sense. The work described in this thesis focuses on robotic grasping from two different viewpoints: robotic systems and data-driven grasping. The robotic systems viewpoint describes a complete architecture for the act of grasping and, to a lesser extent, more general manipulation. Two central claims that the architecture was designed for are hardware independence and the use of sensors during grasping. These properties enables the use of multiple different robotic platforms within the architecture. Secondly, new data-driven methods are proposed that can be incorporated into the grasping process. The first of these methods is a novel way of learning grasp stability from the tactile and haptic feedback of the hand instead of analytically solving the stability from a set of known contacts between the hand and the object. By learning from the data directly, there is no need to know the properties of the hand, such as kinematics, enabling the method to be utilized with complex hands. The second novel method, probabilistic grasping, combines the fields of tactile exploration and grasp planning. By employing well-known statistical methods and pre-existing knowledge of an object, object properties, such as pose, can be inferred with related uncertainty. This uncertainty is utilized by a grasp planning process which plans for stable grasps under the inferred uncertainty.
Resumo:
Few people see both opportunities and threats coming from IT legacy in current world. On one hand, effective legacy management can bring substantial hard savings and smooth transition to the desired future state. On the other hand, its mismanagement contributes to serious operational business risks, as old systems are not as reliable as it is required by the business users. This thesis offers one perspective of dealing with IT legacy – through effective contract management, as a component towards achieving Procurement Excellence in IT, thus bridging IT delivery departments, IT procurement, business units, and suppliers. It developed a model for assessing the impact of improvements on contract management process and set of tools and advices with regards to analysis and improvement actions. The thesis conducted case study to present and justify the implementation of Lean Six Sigma in IT legacy contract management environment. Lean Six Sigma proved to be successful and this thesis presents and discusses all the steps necessary, and pitfalls to avoid, to achieve breakthrough improvement in IT contract management process performance. For the IT legacy contract management process two improvements require special attention and can be easily copied to any organization. First is the issue of diluted contract ownership that stops all the improvements, as people do not know who is responsible for performing those actions. Second is the contract management performance evaluation tool, which can be used for monitoring, identifying outlying contracts and opportunities for improvements in the process. The study resulted in a valuable insight on the benefits of applying Lean Six Sigma to improve IT legacy contract management, as well as on how Lean Six Sigma can be applied in IT environment. Managerial implications are discussed. It is concluded that the use of data-driven Lean Six Sigma methodology for improving the existing IT contract management processes is a significant addition to the existing best practices in contract management.
Resumo:
Vaikka liiketoimintatiedon hallintaa sekä johdon päätöksentekoa on tutkittu laajasti, näiden kahden käsitteen yhteisvaikutuksesta on olemassa hyvin rajallinen määrä tutkimustietoa. Tulevaisuudessa aiheen tärkeys korostuu, sillä olemassa olevan datan määrä kasvaa jatkuvasti. Yritykset tarvitsevat jatkossa yhä enemmän kyvykkyyksiä sekä resursseja, jotta sekä strukturoitua että strukturoimatonta tietoa voidaan hyödyntää lähteestä riippumatta. Nykyiset Business Intelligence -ratkaisut mahdollistavat tehokkaan liiketoimintatiedon hallinnan osana johdon päätöksentekoa. Aiemman kirjallisuuden pohjalta, tutkimuksen empiirinen osuus tunnistaa liiketoimintatiedon hyödyntämiseen liittyviä tekijöitä, jotka joko tukevat tai rajoittavat johdon päätöksentekoprosessia. Tutkimuksen teoreettinen osuus johdattaa lukijan tutkimusaiheeseen kirjallisuuskatsauksen avulla. Keskeisimmät tutkimukseen liittyvät käsitteet, kuten Business Intelligence ja johdon päätöksenteko, esitetään relevantin kirjallisuuden avulla – tämän lisäksi myös dataan liittyvät käsitteet analysoidaan tarkasti. Tutkimuksen empiirinen osuus rakentuu tutkimusteorian pohjalta. Tutkimuksen empiirisessä osuudessa paneudutaan tutkimusteemoihin käytännön esimerkein: kolmen tapaustutkimuksen avulla tutkitaan sekä kuvataan toisistaan irrallisia tapauksia. Jokainen tapaus kuvataan sekä analysoidaan teoriaan perustuvien väitteiden avulla – nämä väitteet ovat perusedellytyksiä menestyksekkäälle liiketoimintatiedon hyödyntämiseen perustuvalle päätöksenteolle. Tapaustutkimusten avulla alkuperäistä tutkimusongelmaa voidaan analysoida tarkasti huomioiden jo olemassa oleva tutkimustieto. Analyysin tulosten avulla myös yksittäisiä rajoitteita sekä mahdollistavia tekijöitä voidaan analysoida. Tulokset osoittavat, että rajoitteilla on vahvasti negatiivinen vaikutus päätöksentekoprosessin onnistumiseen. Toisaalta yritysjohto on tietoinen liiketoimintatiedon hallintaan liittyvistä positiivisista seurauksista, vaikka kaikkia mahdollisuuksia ei olisikaan hyödynnetty. Tutkimuksen merkittävin tulos esittelee viitekehyksen, jonka puitteissa johdon päätöksentekoprosesseja voidaan arvioida sekä analysoida. Despite the fact that the literature on Business Intelligence and managerial decision-making is extensive, relatively little effort has been made to research the relationship between them. This particular field of study has become important since the amount of data in the world is growing every second. Companies require capabilities and resources in order to utilize structured data and unstructured data from internal and external data sources. However, the present Business Intelligence technologies enable managers to utilize data effectively in decision-making. Based on the prior literature, the empirical part of the thesis identifies the enablers and constraints in computer-aided managerial decision-making process. In this thesis, the theoretical part provides a preliminary understanding about the research area through a literature review. The key concepts such as Business Intelligence and managerial decision-making are explored by reviewing the relevant literature. Additionally, different data sources as well as data forms are analyzed in further detail. All key concepts are taken into account when the empirical part is carried out. The empirical part obtains an understanding of the real world situation when it comes to the themes that were covered in the theoretical part. Three selected case companies are analyzed through those statements, which are considered as critical prerequisites for successful computer-aided managerial decision-making. The case study analysis, which is a part of the empirical part, enables the researcher to examine the relationship between Business Intelligence and managerial decision-making. Based on the findings of the case study analysis, the researcher identifies the enablers and constraints through the case study interviews. The findings indicate that the constraints have a highly negative influence on the decision-making process. In addition, the managers are aware of the positive implications that Business Intelligence has for decision-making, but all possibilities are not yet utilized. As a main result of this study, a data-driven framework for managerial decision-making is introduced. This framework can be used when the managerial decision-making processes are evaluated and analyzed.