944 resultados para discovery driven analysis
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.
Wavelet correlation between subjects: A time-scale data driven analysis for brain mapping using fMRI
Resumo:
Functional magnetic resonance imaging (fMRI) based on BOLD signal has been used to indirectly measure the local neural activity induced by cognitive tasks or stimulation. Most fMRI data analysis is carried out using the general linear model (GLM), a statistical approach which predicts the changes in the observed BOLD response based on an expected hemodynamic response function (HRF). In cases when the task is cognitively complex or in cases of diseases, variations in shape and/or delay may reduce the reliability of results. A novel exploratory method using fMRI data, which attempts to discriminate between neurophysiological signals induced by the stimulation protocol from artifacts or other confounding factors, is introduced in this paper. This new method is based on the fusion between correlation analysis and the discrete wavelet transform, to identify similarities in the time course of the BOLD signal in a group of volunteers. We illustrate the usefulness of this approach by analyzing fMRI data from normal subjects presented with standardized human face pictures expressing different degrees of sadness. The results show that the proposed wavelet correlation analysis has greater statistical power than conventional GLM or time domain intersubject correlation analysis. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
The hadronic light-by-light contribution to the anomalous magnetic moment of the muon was recently analyzed in the framework of dispersion theory, providing a systematic formalism where all input quantities are expressed in terms of on-shell form factors and scattering amplitudes that are in principle accessible in experiment. We briefly review the main ideas behind this framework and discuss the various experimental ingredients needed for the evaluation of one- and two-pion intermediate states. In particular, we identify processes that in the absence of data for doubly-virtual pion–photon interactions can help constrain parameters in the dispersive reconstruction of the relevant input quantities, the pion transition form factor and the helicity partial waves for γ⁎γ⁎→ππ.
Resumo:
Tourist accommodation expenditure is a widely investigated topic as it represents a major contribution to the total tourist expenditure. The identification of the determinant factors is commonly based on supply-driven applications while little research has been made on important travel characteristics. This paper proposes a demand-driven analysis of tourist accommodation price by focusing on data generated from room bookings. The investigation focuses on modeling the relationship between key travel characteristics and the price paid to book the accommodation. To accommodate the distributional characteristics of the expenditure variable, the analysis is based on the estimation of a quantile regression model. The findings support the econometric approach used and enable the elaboration of relevant managerial implications.
Resumo:
Real-world objects are often endowed with features that violate Gestalt principles. In our experiment, we examined the neural correlates of binding under conflict conditions in terms of the binding-by-synchronization hypothesis. We presented an ambiguous stimulus ("diamond illusion") to 12 observers. The display consisted of four oblique gratings drifting within circular apertures. Its interpretation fluctuates between bound ("diamond") and unbound (component gratings) percepts. To model a situation in which Gestalt-driven analysis contradicts the perceptually explicit bound interpretation, we modified the original diamond (OD) stimulus by speeding up one grating. Using OD and modified diamond (MD) stimuli, we managed to dissociate the neural correlates of Gestalt-related (OD vs. MD) and perception-related (bound vs. unbound) factors. Their interaction was expected to reveal the neural networks synchronized specifically in the conflict situation. The synchronization topography of EEG was analyzed with the multivariate S-estimator technique. We found that good Gestalt (OD vs. MD) was associated with a higher posterior synchronization in the beta-gamma band. The effect of perception manifested itself as reciprocal modulations over the posterior and anterior regions (theta/beta-gamma bands). Specifically, higher posterior and lower anterior synchronization supported the bound percept, and the opposite was true for the unbound percept. The interaction showed that binding under challenging perceptual conditions is sustained by enhanced parietal synchronization. We argue that this distributed pattern of synchronization relates to the processes of multistage integration ranging from early grouping operations in the visual areas to maintaining representations in the frontal networks of sensory memory.
Resumo:
Affiliation: Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
Primate multisensory object perception involves distributed brain regions. To investigate the network character of these regions of the human brain, we applied data-driven group spatial independent component analysis (ICA) to a functional magnetic resonance imaging (fMRI) data set acquired during a passive audio-visual (AV) experiment with common object stimuli. We labeled three group-level independent component (IC) maps as auditory (A), visual (V), and AV, based on their spatial layouts and activation time courses. The overlap between these IC maps served as definition of a distributed network of multisensory candidate regions including superior temporal, ventral occipito-temporal, posterior parietal and prefrontal regions. During an independent second fMRI experiment, we explicitly tested their involvement in AV integration. Activations in nine out of these twelve regions met the max-criterion (A < AV > V) for multisensory integration. Comparison of this approach with a general linear model-based region-of-interest definition revealed its complementary value for multisensory neuroimaging. In conclusion, we estimated functional networks of uni- and multisensory functional connectivity from one dataset and validated their functional roles in an independent dataset. These findings demonstrate the particular value of ICA for multisensory neuroimaging research and using independent datasets to test hypotheses generated from a data-driven analysis.
Resumo:
In today's internet world, web browsers are an integral part of our day-to-day activities. Therefore, web browser security is a serious concern for all of us. Browsers can be breached in different ways. Because of the over privileged access, extensions are responsible for many security issues. Browser vendors try to keep safe extensions in their official extension galleries. However, their security control measures are not always effective and adequate. The distribution of unsafe extensions through different social engineering techniques is also a very common practice. Therefore, before installation, users should thoroughly analyze the security of browser extensions. Extensions are not only available for desktop browsers, but many mobile browsers, for example, Firefox for Android and UC browser for Android, are also furnished with extension features. Mobile devices have various resource constraints in terms of computational capabilities, power, network bandwidth, etc. Hence, conventional extension security analysis techniques cannot be efficiently used by end users to examine mobile browser extension security issues. To overcome the inadequacies of the existing approaches, we propose CLOUBEX, a CLOUd-based security analysis framework for both desktop and mobile Browser EXtensions. This framework uses a client-server architecture model. In this framework, compute-intensive security analysis tasks are generally executed in a high-speed computing server hosted in a cloud environment. CLOUBEX is also enriched with a number of essential features, such as client-side analysis, requirements-driven analysis, high performance, and dynamic decision making. At present, the Firefox extension ecosystem is most susceptible to different security attacks. Hence, the framework is implemented for the security analysis of the Firefox desktop and Firefox for Android mobile browser extensions. A static taint analysis is used to identify malicious information flows in the Firefox extensions. In CLOUBEX, there are three analysis modes. A dynamic decision making algorithm assists us to select the best option based on some important parameters, such as the processing speed of a client device and network connection speed. Using the best analysis mode, performance and power consumption are improved significantly. In the future, this framework can be leveraged for the security analysis of other desktop and mobile browser extensions, too.
Resumo:
Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.
Resumo:
La metodologia convenzionalmente adottata per l’analisi e progettazione di reti acquedottistiche è la Demand Driven (DD), ovvero un’analisi guidata dalla domanda idrica dell’utenza che viene imposta, dal punto di vista matematico, nelle equazioni del modello idraulico; la struttura di risoluzione matematica di tale approccio è costituita dalle sole equazioni di continuità e del moto, le cui incognite sono i carichi idraulici da garantire per soddisfare la richiesta idrica (dato noto). Purtroppo la DD non è in grado di simulare, in modo ottimale, scenari in cui le pressioni in rete risultano essere inferiori rispetto a quelle richieste per un servizio di erogazione corretto. Dunque, per una completa analisi di una rete di distribuzione idrica, è utile adottare la metodologia Pressure Driven (PD), che si basa sullo stesso modello matematico utilizzato per l’analisi DD al quale viene aggiunta un’equazione che lega la portata erogata alle utenze e la perdita idrica al carico idraulico disponibile in corrispondenza dei nodi. In questo caso un ulteriore dato incognito, oltre al carico idraulico nei nodi ed alla portata nelle condotte della rete, risulta essere la portata effettivamente prelevata ai nodi oppure persa (leakage) dalla rete. Il presente lavoro di tesi ha portato alla calibrazione del modello di un distretto della rete di distribuzione idrica della città di Rapallo. Le simulazioni sono state eseguite attraverso il codice InfoWorks applicando sia la Demand Driven Analysis che risulta funzionale per l’ottenimento del modello calibrato, sia la Pressure Driven Analysis che non agisce sul bilancio complessivo della rete, ma interviene localmente nelle zone in cui le problematiche di funzionamento riscontrate riguardano le pressioni
Resumo:
Resting state functional magnetic resonance imaging (fMRI) reveals a distinct network of correlated brain function representing a default mode state of the human brain The underlying structural basis of this functional connectivity pattern is still widely unexplored We combined fractional anisotropy measures of fiber tract integrity derived from diffusion tensor imaging (DTI) and resting state fMRI data obtained at 3 Tesla from 20 healthy elderly subjects (56 to 83 years of age) to determine white matter microstructure e 7 underlying default mode connectivity We hypothesized that the functional connectivity between the posterior cingulate and hippocampus from resting state fMRI data Would be associated with the white matter microstructure in the cingulate bundle and fiber tracts connecting posterior cingulate gyrus With lateral temporal lobes, medial temporal lobes, and precuneus This was demonstrated at the p<0001 level using a voxel-based multivariate analysis of covariance (MANCOVA) approach In addition, we used a data-driven technique of joint independent component analysis (ICA) that uncovers spatial pattern that are linked across modalities. It revealed a pattern of white matter tracts including cingulate bundle and associated fiber tracts resembling the findings from the hypothesis-driven analysis and was linked to the pattern of default mode network (DMN) connectivity in the resting state fMRI data Out findings support the notion that the functional connectivity between the posterior cingulate and hippocampus and the functional connectivity across the entire DMN is based oil distinct pattern of anatomical connectivity within the cerebral white matter (C) 2009 Elsevier Inc All rights reserved
Resumo:
A classical application of biosignal analysis has been the psychophysiological detection of deception, also known as the polygraph test, which is currently a part of standard practices of law enforcement agencies and several other institutions worldwide. Although its validity is far from gathering consensus, the underlying psychophysiological principles are still an interesting add-on for more informal applications. In this paper we present an experimental off-the-person hardware setup, propose a set of feature extraction criteria and provide a comparison of two classification approaches, targeting the detection of deception in the context of a role-playing interactive multimedia environment. Our work is primarily targeted at recreational use in the context of a science exhibition, where the main goal is to present basic concepts related with knowledge discovery, biosignal analysis and psychophysiology in an educational way, using techniques that are simple enough to be understood by children of different ages. Nonetheless, this setting will also allow us to build a significant data corpus, annotated with ground-truth information, and collected with non-intrusive sensors, enabling more advanced research on the topic. Experimental results have shown interesting findings and provided useful guidelines for future work. Pattern Recognition
Resumo:
Dissertação apresentada ao Instituto Superior de Contabilidade e Administração do Porto para obtenção do Grau de Mestre em Gestão das Organizações, Ramo Gestão de Empresas Orientador: Professor Doutor Eduardo Manuel Lopes de Sá e Silva Co-orientador: Mestre Maria de Fátima Mendes Monteiro