856 resultados para Data Driven Clustering
Resumo:
We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with excellent properties. The approach is in- spired by the principles of the generalized cross entropy method. The pro- posed density estimation procedure has numerous advantages over the tra- ditional kernel density estimator methods. Firstly, for the first time in the nonparametric literature, the proposed estimator allows for a genuine incor- poration of prior information in the density estimation procedure. Secondly, the approach provides the first data-driven bandwidth selection method that is guaranteed to provide a unique bandwidth for any data. Lastly, simulation examples suggest the proposed approach outperforms the current state of the art in nonparametric density estimation in terms of accuracy and reliability.
Resumo:
In this and a preceding paper, we provide an introduction to the Fujitsu VPP range of vector-parallel supercomputers and to some of the computational chemistry software available for the VPP. Here, we consider the implementation and performance of seven popular chemistry application packages. The codes discussed range from classical molecular dynamics to semiempirical and ab initio quantum chemistry. All have evolved from sequential codes, and have typically been parallelised using a replicated data approach. As such they are well suited to the large-memory/fast-processor architecture of the VPP. For one code, CASTEP, a distributed-memory data-driven parallelisation scheme is presented. (C) 2000 Published by Elsevier Science B.V. All rights reserved.
Resumo:
Functional magnetic resonance imaging (FMRI) analysis methods can be quite generally divided into hypothesis-driven and data-driven approaches. The former are utilised in the majority of FMRI studies, where a specific haemodynamic response is modelled utilising knowledge of event timing during the scan, and is tested against the data using a t test or a correlation analysis. These approaches often lack the flexibility to account for variability in haemodynamic response across subjects and brain regions which is of specific interest in high-temporal resolution event-related studies. Current data-driven approaches attempt to identify components of interest in the data, but currently do not utilise any physiological information for the discrimination of these components. Here we present a hypothesis-driven approach that is an extension of Friman's maximum correlation modelling method (Neurolmage 16, 454-464, 2002) specifically focused on discriminating the temporal characteristics of event-related haemodynamic activity. Test analyses, on both simulated and real event-related FMRI data, will be presented.
Resumo:
Simultaneous acquisition of electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) aims to disentangle the description of brain processes by exploiting the advantages of each technique. Most studies in this field focus on exploring the relationships between fMRI signals and the power spectrum at some specific frequency bands (alpha, beta, etc.). On the other hand, brain mapping of EEG signals (e.g., interictal spikes in epileptic patients) usually assumes an haemodynamic response function for a parametric analysis applying the GLM, as a rough approximation. The integration of the information provided by the high spatial resolution of MR images and the high temporal resolution of EEG may be improved by referencing them by transfer functions, which allows the identification of neural driven areas without strong assumptions about haemodynamic response shapes or brain haemodynamic`s homogeneity. The difference on sampling rate is the first obstacle for a full integration of EEG and fMRI information. Moreover, a parametric specification of a function representing the commonalities of both signals is not established. In this study, we introduce a new data-driven method for estimating the transfer function from EEG signal to fMRI signal at EEG sampling rate. This approach avoids EEG subsampling to fMRI time resolution and naturally provides a test for EEG predictive power over BOLD signal fluctuations, in a well-established statistical framework. We illustrate this concept in resting state (eyes closed) and visual simultaneous fMRI-EEG experiments. The results point out that it is possible to predict the BOLD fluctuations in occipital cortex by using EEG measurements. (C) 2010 Elsevier Inc. All rights reserved.
Resumo:
Resting state functional magnetic resonance imaging (fMRI) reveals a distinct network of correlated brain function representing a default mode state of the human brain The underlying structural basis of this functional connectivity pattern is still widely unexplored We combined fractional anisotropy measures of fiber tract integrity derived from diffusion tensor imaging (DTI) and resting state fMRI data obtained at 3 Tesla from 20 healthy elderly subjects (56 to 83 years of age) to determine white matter microstructure e 7 underlying default mode connectivity We hypothesized that the functional connectivity between the posterior cingulate and hippocampus from resting state fMRI data Would be associated with the white matter microstructure in the cingulate bundle and fiber tracts connecting posterior cingulate gyrus With lateral temporal lobes, medial temporal lobes, and precuneus This was demonstrated at the p<0001 level using a voxel-based multivariate analysis of covariance (MANCOVA) approach In addition, we used a data-driven technique of joint independent component analysis (ICA) that uncovers spatial pattern that are linked across modalities. It revealed a pattern of white matter tracts including cingulate bundle and associated fiber tracts resembling the findings from the hypothesis-driven analysis and was linked to the pattern of default mode network (DMN) connectivity in the resting state fMRI data Out findings support the notion that the functional connectivity between the posterior cingulate and hippocampus and the functional connectivity across the entire DMN is based oil distinct pattern of anatomical connectivity within the cerebral white matter (C) 2009 Elsevier Inc All rights reserved
Resumo:
Objective: Although suicide is a leading cause of death worldwide, clinicians and researchers lack a data-driven method to assess the risk of suicide attempts. This study reports the results of an analysis of a large cross-national epidemiologic survey database that estimates the 12-month prevalence of suicidal behaviors, identifies risk factors for suicide attempts, and combines these factors to create a risk index for 12-month suicide attempts separately for developed and developing countries. Method: Data come from the World Health Organization (WHO) World Mental Health (WMH) Surveys (conducted 2001-2007), in which 108,705 adults from 21 countries were interviewed using the WHO Composite International Diagnostic Interview. The survey assessed suicidal behaviors and potential risk factors across multiple domains, including socio-demographic characteristics, parent psychopathology, childhood adversities, DSM-IV disorders, and history of suicidal behavior. Results: Twelve-month prevalence estimates of suicide ideation, plans, and attempts are 2.0%, 0.6%, and 0.3%, respectively, for developed countries and 2.1%, 0.7%, and 0.4%, respectively, for developing countries. Risk factors for suicidal behaviors in both developed and developing countries include female sex, younger age, lower education and income, unmarried status, unemployment, parent psychopathology, childhood adversities, and presence of diverse 12-month DSM-IV mental disorders. Combining risk factors from multiple domains produced risk indices that accurately predicted 12-month suicide attempts in both developed and developing countries (area under the receiver operating characteristic curve = 0.74-0.80). Conclusions: Suicidal behaviors occur at similar rates in both developed and developing countries. Risk indices assessing multiple domains can predict suicide attempts with fairly good accuracy and may be useful in aiding clinicians in the prediction of these behaviors. J Clin Psychiatry 2010;71(12):1617-1628 (C) Copyright 2010 Physicians Postgraduate Press, Inc.
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. This has motivated many initiatives that have been developing scientific workflow tools. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from workflow tasks specification, decentralizing the control of workflow activities, and allowing their tasks to run autonomous in distributed infrastructures, for instance on Clouds. Furthermore many workflow tools only support the execution of Direct Acyclic Graphs (DAG) without the concept of iterations, where activities are executed millions of iterations during long periods of time and supporting dynamic workflow reconfigurations after certain iteration. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on the Process Networks model, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures, e. g. on Clouds. Each AWA executes a Task developed as a Java class that implements a generic interface allowing end-users to code their applications without concerns for low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables support to dynamic workflow reconfiguration and monitoring of the execution of workflows. We describe how AWARD supports dynamic reconfiguration and discuss typical workflow reconfiguration scenarios. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to a small dedicated cluster and the Amazon (Elastic Computing EC2) Cloud.
Resumo:
Workflows have been successfully applied to express the decomposition of complex scientific applications. However the existing tools still lack adequate support to important aspects namely, decoupling the enactment engine from tasks specification, decentralizing the control of workflow activities allowing their tasks to run in distributed infrastructures, and supporting dynamic workflow reconfigurations. We present the AWARD (Autonomic Workflow Activities Reconfigurable and Dynamic) model of computation, based on Process Networks, where the workflow activities (AWA) are autonomic processes with independent control that can run in parallel on distributed infrastructures. Each AWA executes a task developed as a Java class with a generic interface allowing end-users to code their applications without low-level details. The data-driven coordination of AWA interactions is based on a shared tuple space that also enables dynamic workflow reconfiguration. For evaluation we describe experimental results of AWARD workflow executions in several application scenarios, mapped to the Amazon (Elastic Computing EC2) Cloud.
Resumo:
This paper presents the characterization of high voltage (HV) electric power consumers based on a data clustering approach. The typical load profiles (TLP) are obtained selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The choice of the best partition is supported using several cluster validity indices. The proposed data-mining (DM) based methodology, that includes all steps presented in the process of knowledge discovery in databases (KDD), presents an automatic data treatment application in order to preprocess the initial database in an automatic way, allowing time saving and better accuracy during this phase. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ consumption behavior. To validate our approach, a case study with a real database of 185 HV consumers was used.
Resumo:
In this paper, a rule-based automatic syllabifier for Danish is described using the Maximal Onset Principle. Prior success rates of rule-based methods applied to Portuguese and Catalan syllabification modules were on the basis of this work. The system was implemented and tested using a very small set of rules. The results gave rise to 96.9% and 98.7% of word accuracy rate, contrary to our initial expectations, being Danish a language with a complex syllabic structure and thus difficult to be rule-driven. Comparison with data-driven syllabification system using artificial neural networks showed a higher accuracy rate of the former system.
Resumo:
Esta dissertação procura investigar e documentar o que está sendo realizado atualmente no Jornalismo de Dados (data-driven journalism) em Portugal. Por ser um campo novo no Jornalismo, se procura, por meio de entrevistas, compreender como os editores de jornais lusitanos definem, caracterizam, utilizam e percebem as potencialidades dessa nova categoria do jornalismo digital. Também são analisados exemplos de reportagens com características de Jornalismo de Dados que foram citadas pelos entrevistados. Contextualizar a evolução e a importância da tecnologia para o surgimento do Jornalismo de Dados foi outro objetivo da pesquisa. Assim, se pretende apresentar o estado da arte do Jornalismo de Dados nos jornais generalistas diários portugueses, visando perceber as tendências atuais na área e deixar um registro para futuros trabalhos sobre o assunto.
Resumo:
O paradigma de avaliação do ensino superior foi alterado em 2005 para ter em conta, para além do número de entradas, o número de alunos diplomados. Esta alteração pressiona as instituições académicas a melhorar o desempenho dos alunos. Um fenómeno perceptível ao analisar esse desempenho é que a performance registada não é nem uniforme nem constante ao longo da estadia do aluno no curso. Estas variações não estão a ser consideradas no esforço de melhorar o desempenho académico e surge motivação para detectar os diferentes perfis de desempenho e utilizar esse conhecimento para melhorar a o desempenho das instituições académicas. Este documento descreve o trabalho realizado no sentido de propor uma metodologia para detectar padrões de desempenho académico, num curso do ensino superior. Como ferramenta de análise são usadas técnicas de data mining, mais precisamente algoritmos de agrupamento. O caso de estudo para este trabalho é a população estudantil da licenciatura em Eng. Informática da FCT-UNL. Propõe-se dois modelos para o aluno, que servem de base para a análise. Um modelo analisa os alunos tendo em conta a sua performance num ano lectivo e o segundo analisa os alunos tendo em conta o seu percurso académico pelo curso, desde que entrou até se diplomar, transferir ou desistir. Esta análise é realizada recorrendo aos algoritmos de agrupamento: algoritmo aglomerativo hierárquico, k-means, SOM e SNN, entre outros.
Resumo:
Polysaccharides are gaining increasing attention as potential environmental friendly and sustainable building blocks in many fields of the (bio)chemical industry. The microbial production of polysaccharides is envisioned as a promising path, since higher biomass growth rates are possible and therefore higher productivities may be achieved compared to vegetable or animal polysaccharides sources. This Ph.D. thesis focuses on the modeling and optimization of a particular microbial polysaccharide, namely the production of extracellular polysaccharides (EPS) by the bacterial strain Enterobacter A47. Enterobacter A47 was found to be a metabolically versatile organism in terms of its adaptability to complex media, notably capable of achieving high growth rates in media containing glycerol byproduct from the biodiesel industry. However, the industrial implementation of this production process is still hampered due to a largely unoptimized process. Kinetic rates from the bioreactor operation are heavily dependent on operational parameters such as temperature, pH, stirring and aeration rate. The increase of culture broth viscosity is a common feature of this culture and has a major impact on the overall performance. This fact complicates the mathematical modeling of the process, limiting the possibility to understand, control and optimize productivity. In order to tackle this difficulty, data-driven mathematical methodologies such as Artificial Neural Networks can be employed to incorporate additional process data to complement the known mathematical description of the fermentation kinetics. In this Ph.D. thesis, we have adopted such an hybrid modeling framework that enabled the incorporation of temperature, pH and viscosity effects on the fermentation kinetics in order to improve the dynamical modeling and optimization of the process. A model-based optimization method was implemented that enabled to design bioreactor optimal control strategies in the sense of EPS productivity maximization. It is also critical to understand EPS synthesis at the level of the bacterial metabolism, since the production of EPS is a tightly regulated process. Methods of pathway analysis provide a means to unravel the fundamental pathways and their controls in bioprocesses. In the present Ph.D. thesis, a novel methodology called Principal Elementary Mode Analysis (PEMA) was developed and implemented that enabled to identify which cellular fluxes are activated under different conditions of temperature and pH. It is shown that differences in these two parameters affect the chemical composition of EPS, hence they are critical for the regulation of the product synthesis. In future studies, the knowledge provided by PEMA could foster the development of metabolically meaningful control strategies that target the EPS sugar content and oder product quality parameters.
Resumo:
Customer lifetime value (LTV) enables using client characteristics, such as recency, frequency and monetary (RFM) value, to describe the value of a client through time in terms of profitability. We present the concept of LTV applied to telemarketing for improving the return-on-investment, using a recent (from 2008 to 2013) and real case study of bank campaigns to sell long- term deposits. The goal was to benefit from past contacts history to extract additional knowledge. A total of twelve LTV input variables were tested, un- der a forward selection method and using a realistic rolling windows scheme, highlighting the validity of five new LTV features. The results achieved by our LTV data-driven approach using neural networks allowed an improvement up to 4 pp in the Lift cumulative curve for targeting the deposit subscribers when compared with a baseline model (with no history data). Explanatory knowledge was also extracted from the proposed model, revealing two highly relevant LTV features, the last result of the previous campaign to sell the same product and the frequency of past client successes. The obtained results are particularly valuable for contact center companies, which can improve pre- dictive performance without even having to ask for more information to the companies they serve.
Resumo:
Results of a search for decays of massive particles to fully hadronic final states are presented. This search uses 20.3 fb−1 of data collected by the ATLAS detector in s√=8TeV proton--proton collisions at the LHC. Signatures based on high jet multiplicities without requirements on the missing transverse momentum are used to search for R-parity-violating supersymmetric gluino pair production with subsequent decays to quarks. The analysis is performed using a requirement on the number of jets, in combination with separate requirements on the number of b-tagged jets, as well as a topological observable formed from the scalar sum of the mass values of large-radius jets in the event. Results are interpreted in the context of all possible branching ratios of direct gluino decays to various quark flavors. No significant deviation is observed from the expected Standard Model backgrounds estimated using jet-counting as well as data-driven templates of the total-jet-mass spectra. Gluino pair decays to ten or more quarks via intermediate neutralinos are excluded for a gluino with mass mg~<1TeV for a neutralino mass mχ~01=500GeV. Direct gluino decays to six quarks are excluded for mg~<917GeV for light-flavor final states, and results for various flavor hypotheses are presented.