811 resultados para Data-driven analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Real world biological systems such as the human brain are inherently nonlinear and difficult to model. However, most of the previous studies have either employed linear models or parametric nonlinear models for investigating brain function. In this paper, a novel application of a nonlinear measure of phase synchronization based on recurrences, correlation between probabilities of recurrence (CPR), to study connectivity in the brain has been proposed. Being non-parametric, this method makes very few assumptions, making it suitable for investigating brain function in a data-driven way. CPR's utility with application to multichannel electroencephalographic (EEG) signals has been demonstrated. Brain connectivity obtained using thresholded CPR matrix of multichannel EEG signals showed clear differences in the number and pattern of connections in brain connectivity between (a) epileptic seizure and pre-seizure and (b) eyes open and eyes closed states. Corresponding brain headmaps provide meaningful insights about synchronization in the brain in those states. K-means clustering of connectivity parameters of CPR and linear correlation obtained from global epileptic seizure and pre-seizure showed significantly larger cluster centroid distances for CPR as opposed to linear correlation, thereby demonstrating the superior ability of CPR for discriminating seizure from pre-seizure. The headmap in the case of focal epilepsy clearly enables us to identify the focus of the epilepsy which provides certain diagnostic value. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The detection of dense harmful algal blooms (HABs) by satellite remote sensing is usually based on analysis of chlorophyll-a as a proxy. However, this approach does not provide information about the potential harm of bloom, nor can it identify the dominant species. The developed HAB risk classification method employs a fully automatic data-driven approach to identify key characteristics of water leaving radiances and derived quantities, and to classify pixels into “harmful”, “non-harmful” and “no bloom” categories using Linear Discriminant Analysis (LDA). Discrimination accuracy is increased through the use of spectral ratios of water leaving radiances, absorption and backscattering. To reduce the false alarm rate the data that cannot be reliably classified are automatically labelled as “unknown”. This method can be trained on different HAB species or extended to new sensors and then applied to generate independent HAB risk maps; these can be fused with other sensors to fill gaps or improve spatial or temporal resolution. The HAB discrimination technique has obtained accurate results on MODIS and MERIS data, correctly identifying 89% of Phaeocystis globosa HABs in the southern North Sea and 88% of Karenia mikimotoi blooms in the Western English Channel. A linear transformation of the ocean colour discriminants is used to estimate harmful cell counts, demonstrating greater accuracy than if based on chlorophyll-a; this will facilitate its integration into a HAB early warning system operating in the southern North Sea.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Long working hours might increase the risk of cardiovascular disease, but prospective evidence is scarce, imprecise, and mostly limited to coronary heart disease. We aimed to assess long working hours as a risk factor for incident coronary heart disease and stroke. 

Methods We identified published studies through a systematic review of PubMed and Embase from inception to Aug 20, 2014. We obtained unpublished data for 20 cohort studies from the Individual-Participant-Data Meta-analysis in Working Populations (IPD-Work) Consortium and open-access data archives. We used cumulative random-effects meta-analysis to combine effect estimates from published and unpublished data. 

Findings We included 25 studies from 24 cohorts in Europe, the USA, and Australia. The meta-analysis of coronary heart disease comprised data for 603 838 men and women who were free from coronary heart disease at baseline; the meta-analysis of stroke comprised data for 528 908 men and women who were free from stroke at baseline. Follow-up for coronary heart disease was 5·1 million person-years (mean 8·5 years), in which 4768 events were recorded, and for stroke was 3·8 million person-years (mean 7·2 years), in which 1722 events were recorded. In cumulative meta-analysis adjusted for age, sex, and socioeconomic status, compared with standard hours (35-40 h per week), working long hours (≥55 h per week) was associated with an increase in risk of incident coronary heart disease (relative risk [RR] 1·13, 95% CI 1·02-1·26; p=0·02) and incident stroke (1·33, 1·11-1·61; p=0·002). The excess risk of stroke remained unchanged in analyses that addressed reverse causation, multivariable adjustments for other risk factors, and different methods of stroke ascertainment (range of RR estimates 1·30-1·42). We recorded a dose-response association for stroke, with RR estimates of 1·10 (95% CI 0·94-1·28; p=0·24) for 41-48 working hours, 1·27 (1·03-1·56; p=0·03) for 49-54 working hours, and 1·33 (1·11-1·61; p=0·002) for 55 working hours or more per week compared with standard working hours (ptrend<0·0001).

Interpretation Employees who work long hours have a higher risk of stroke than those working standard hours; the association with coronary heart disease is weaker. These findings suggest that more attention should be paid to the management of vascular risk factors in individuals who work long hours. 

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines the efficiency of search engine advertising strategies employed by firms. The research setting is the online retailing industry, which is characterized by extensive use of Web technologies and high competition for market share and profitability. For Internet retailers, search engines are increasingly serving as an information gateway for many decision-making tasks. In particular, Search engine advertising (SEA) has opened a new marketing channel for retailers to attract new customers and improve their performance. In addition to natural (organic) search marketing strategies, search engine advertisers compete for top advertisement slots provided by search brokers such as Google and Yahoo! through keyword auctions. The rationale being that greater visibility on a search engine during a keyword search will capture customers' interest in a business and its product or service offerings. Search engines account for most online activities today. Compared with the slow growth of traditional marketing channels, online search volumes continue to grow at a steady rate. According to the Search Engine Marketing Professional Organization, spending on search engine marketing by North American firms in 2008 was estimated at $13.5 billion. Despite the significant role SEA plays in Web retailing, scholarly research on the topic is limited. Prior studies in SEA have focused on search engine auction mechanism design. In contrast, research on the business value of SEA has been limited by the lack of empirical data on search advertising practices. Recent advances in search and retail technologies have created datarich environments that enable new research opportunities at the interface of marketing and information technology. This research uses extensive data from Web retailing and Google-based search advertising and evaluates Web retailers' use of resources, search advertising techniques, and other relevant factors that contribute to business performance across different metrics. The methods used include Data Envelopment Analysis (DEA), data mining, and multivariate statistics. This research contributes to empirical research by analyzing several Web retail firms in different industry sectors and product categories. One of the key findings is that the dynamics of sponsored search advertising vary between multi-channel and Web-only retailers. While the key performance metrics for multi-channel retailers include measures such as online sales, conversion rate (CR), c1ick-through-rate (CTR), and impressions, the key performance metrics for Web-only retailers focus on organic and sponsored ad ranks. These results provide a useful contribution to our organizational level understanding of search engine advertising strategies, both for multi-channel and Web-only retailers. These results also contribute to current knowledge in technology-driven marketing strategies and provide managers with a better understanding of sponsored search advertising and its impact on various performance metrics in Web retailing.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Affiliation: Département de biochimie, Faculté de médecine, Université de Montréal

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article explores how data envelopment analysis (DEA), along with a smoothed bootstrap method, can be used in applied analysis to obtain more reliable efficiency rankings for farms. The main focus is the smoothed homogeneous bootstrap procedure introduced by Simar and Wilson (1998) to implement statistical inference for the original efficiency point estimates. Two main model specifications, constant and variable returns to scale, are investigated along with various choices regarding data aggregation. The coefficient of separation (CoS), a statistic that indicates the degree of statistical differentiation within the sample, is used to demonstrate the findings. The CoS suggests a substantive dependency of the results on the methodology and assumptions employed. Accordingly, some observations are made on how to conduct DEA in order to get more reliable efficiency rankings, depending on the purpose for which they are to be used. In addition, attention is drawn to the ability of the SLICE MODEL, implemented in GAMS, to enable researchers to overcome the computational burdens of conducting DEA (with bootstrapping).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper examines two hydrochemical time-series derived from stream samples taken in the Upper Hafren catchment, Plynlimon, Wales. One time-series comprises data collected at 7-hour intervals over 22 months (Neal et al., submitted, this issue), while the other is based on weekly sampling over 20 years. A subset of determinands: aluminium, calcium, chloride, conductivity, dissolved organic carbon, iron, nitrate, pH, silicon and sulphate are examined within a framework of non-stationary time-series analysis to identify determinand trends, seasonality and short-term dynamics. The results demonstrate that both long-term and high-frequency monitoring provide valuable and unique insights into the hydrochemistry of a catchment. The long-term data allowed analysis of long-termtrends, demonstrating continued increases in DOC concentrations accompanied by declining SO4 concentrations within the stream, and provided new insights into the changing amplitude and phase of the seasonality of the determinands such as DOC and Al. Additionally, these data proved invaluable for placing the short-term variability demonstrated within the high-frequency data within context. The 7-hour data highlighted complex diurnal cycles for NO3, Ca and Fe with cycles displaying changes in phase and amplitude on a seasonal basis. The high-frequency data also demonstrated the need to consider the impact that the time of sample collection can have on the summary statistics of the data and also that sampling during the hours of darkness provides additional hydrochemical information for determinands which exhibit pronounced diurnal variability. Moving forward, this research demonstrates the need for both long-term and high-frequency monitoring to facilitate a full and accurate understanding of catchment hydrochemical dynamics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Current methods for estimating event-related potentials (ERPs) assume stationarity of the signal. Empirical Mode Decomposition (EMD) is a data-driven decomposition technique that does not assume stationarity. We evaluated an EMD-based method for estimating the ERP. On simulated data, EMD substantially reduced background EEG while retaining the ERP. EMD-denoised single trials also estimated shape, amplitude, and latency of the ERP better than raw single trials. On experimental data, EMD-denoised trials revealed event-related differences between two conditions (condition A and B) more effectively than trials lowpass filtered at 40 Hz. EMD also revealed event-related differences on both condition A and condition B that were clearer and of longer duration than those revealed by low-pass filtering at 40 Hz. Thus, EMD-based denoising is a promising data-driven, nonstationary method for estimating ERPs and should be investigated further.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of bivariate distributions plays a fundamental role in survival and reliability studies. In this paper, we consider a location scale model for bivariate survival times based on the proposal of a copula to model the dependence of bivariate survival data. For the proposed model, we consider inferential procedures based on maximum likelihood. Gains in efficiency from bivariate models are also examined in the censored data setting. For different parameter settings, sample sizes and censoring percentages, various simulation studies are performed and compared to the performance of the bivariate regression model for matched paired survival data. Sensitivity analysis methods such as local and total influence are presented and derived under three perturbation schemes. The martingale marginal and the deviance marginal residual measures are used to check the adequacy of the model. Furthermore, we propose a new measure which we call modified deviance component residual. The methodology in the paper is illustrated on a lifetime data set for kidney patients.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To develop a method for objective quantification of PD motor symptoms related to Off episodes and peak dose dyskinesias, using spiral data gathered by using a touch screen telemetry device. The aim was to objectively characterize predominant motor phenotypes (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Background: A retrospective analysis was conducted on recordings from 65 patients with advanced idiopathic PD from nine different clinics in Sweden, recruited from January 2006 until August 2010. In addition to the patient group, 10 healthy elderly subjects were recruited. Upper limb movement data were collected using a touch screen telemetry device from home environments of the subjects. Measurements with the device were performed four times per day during week-long test periods. On each test occasion, the subjects were asked to trace pre-drawn Archimedean spirals, using the dominant hand. The pre-drawn spiral was shown on the screen of the device. The spiral test was repeated three times per test occasion and they were instructed to complete it within 10 seconds. The device had a sampling rate of 10Hz and measured both position and time-stamps (in milliseconds) of the pen tip. Methods: Four independent raters (FB, DH, AJ and DN) used a web interface that animated the spiral drawings and allowed them to observe different kinematic features during the drawing process and to rate task performance. Initially, a number of kinematic features were assessed including ‘impairment’, ‘speed’, ‘irregularity’ and ‘hesitation’ followed by marking the predominant motor phenotype on a 3-category scale: tremor, bradykinesia and/or choreatic dyskinesia. There were only 2 test occasions for which all the four raters either classified them as tremor or could not identify the motor phenotype. Therefore, the two main motor phenotype categories were bradykinesia and dyskinesia. ‘Impairment’ was rated on a scale from 0 (no impairment) to 10 (extremely severe) whereas ‘speed’, ‘irregularity’ and ‘hesitation’ were rated on a scale from 0 (normal) to 4 (extremely severe). The proposed data-driven method consisted of the following steps. Initially, 28 spatiotemporal features were extracted from the time series signals before being presented to a Multilayer Perceptron (MLP) classifier. The features were based on different kinematic quantities of spirals including radius, angle, speed and velocity with the aim of measuring the severity of involuntary symptoms and discriminate between PD-specific (bradykinesia) and/or treatment-induced symptoms (dyskinesia). A Principal Component Analysis was applied on the features to reduce their dimensions where 4 relevant principal components (PCs) were retained and used as inputs to the MLP classifier. Finally, the MLP classifier mapped these components to the corresponding visually assessed motor phenotype scores for automating the process of scoring the bradykinesia and dyskinesia in PD patients whilst they draw spirals using the touch screen device. For motor phenotype (bradykinesia vs. dyskinesia) classification, the stratified 10-fold cross validation technique was employed. Results: There were good agreements between the four raters when rating the individual kinematic features with intra-class correlation coefficient (ICC) of 0.88 for ‘impairment’, 0.74 for ‘speed’, 0.70 for ‘irregularity’, and moderate agreements when rating ‘hesitation’ with an ICC of 0.49. When assessing the two main motor phenotype categories (bradykinesia or dyskinesia) in animated spirals the agreements between the four raters ranged from fair to moderate. There were good correlations between mean ratings of the four raters on individual kinematic features and computed scores. The MLP classifier classified the motor phenotype that is bradykinesia or dyskinesia with an accuracy of 85% in relation to visual classifications of the four movement disorder specialists. The test-retest reliability of the four PCs across the three spiral test trials was good with Cronbach’s Alpha coefficients of 0.80, 0.82, 0.54 and 0.49, respectively. These results indicate that the computed scores are stable and consistent over time. Significant differences were found between the two groups (patients and healthy elderly subjects) in all the PCs, except for the PC3. Conclusions: The proposed method automatically assessed the severity of unwanted symptoms and could reasonably well discriminate between PD-specific and/or treatment-induced motor symptoms, in relation to visual assessments of movement disorder specialists. The objective assessments could provide a time-effect summary score that could be useful for improving decision-making during symptom evaluation of individualized treatment when the goal is to maximize functional On time for patients while minimizing their Off episodes and troublesome dyskinesias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Instrumentation and automation plays a vital role to managing the water industry. These systems generate vast amounts of data that must be effectively managed in order to enable intelligent decision making. Time series data management software, commonly known as data historians are used for collecting and managing real-time (time series) information. More advanced software solutions provide a data infrastructure or utility wide Operations Data Management System (ODMS) that stores, manages, calculates, displays, shares, and integrates data from multiple disparate automation and business systems that are used daily in water utilities. These ODMS solutions are proven and have the ability to manage data from smart water meters to the collaboration of data across third party corporations. This paper focuses on practical, utility successes in the water industry where utility managers are leveraging instantaneous access to data from proven, commercial off-the-shelf ODMS solutions to enable better real-time decision making. Successes include saving $650,000 / year in water loss control, safeguarding water quality, saving millions of dollars in energy management and asset management. Immediate opportunities exist to integrate the research being done in academia with these ODMS solutions in the field and to leverage these successes to utilities around the world.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this research the 3DVAR data assimilation scheme is implemented in the numerical model DIVAST in order to optimize the performance of the numerical model by selecting an appropriate turbulence scheme and tuning its parameters. Two turbulence closure schemes: the Prandtl mixing length model and the two-equation k-ε model were incorporated into DIVAST and examined with respect to their universality of application, complexity of solutions, computational efficiency and numerical stability. A square harbour with one symmetrical entrance subject to tide-induced flows was selected to investigate the structure of turbulent flows. The experimental part of the research was conducted in a tidal basin. A significant advantage of such laboratory experiment is a fully controlled environment where domain setup and forcing are user-defined. The research shows that the Prandtl mixing length model and the two-equation k-ε model, with default parameterization predefined according to literature recommendations, overestimate eddy viscosity which in turn results in a significant underestimation of velocity magnitudes in the harbour. The data assimilation of the model-predicted velocity and laboratory observations significantly improves model predictions for both turbulence models by adjusting modelled flows in the harbour to match de-errored observations. 3DVAR allows also to identify and quantify shortcomings of the numerical model. Such comprehensive analysis gives an optimal solution based on which numerical model parameters can be estimated. The process of turbulence model optimization by reparameterization and tuning towards optimal state led to new constants that may be potentially applied to complex turbulent flows, such as rapidly developing flows or recirculating flows.