942 resultados para Data Driven Modeling
Resumo:
An efficient data based-modeling algorithm for nonlinear system identification is introduced for radial basis function (RBF) neural networks with the aim of maximizing generalization capability based on the concept of leave-one-out (LOO) cross validation. Each of the RBF kernels has its own kernel width parameter and the basic idea is to optimize the multiple pairs of regularization parameters and kernel widths, each of which is associated with a kernel, one at a time within the orthogonal forward regression (OFR) procedure. Thus, each OFR step consists of one model term selection based on the LOO mean square error (LOOMSE), followed by the optimization of the associated kernel width and regularization parameter, also based on the LOOMSE. Since like our previous state-of-the-art local regularization assisted orthogonal least squares (LROLS) algorithm, the same LOOMSE is adopted for model selection, our proposed new OFR algorithm is also capable of producing a very sparse RBF model with excellent generalization performance. Unlike our previous LROLS algorithm which requires an additional iterative loop to optimize the regularization parameters as well as an additional procedure to optimize the kernel width, the proposed new OFR algorithm optimizes both the kernel widths and regularization parameters within the single OFR procedure, and consequently the required computational complexity is dramatically reduced. Nonlinear system identification examples are included to demonstrate the effectiveness of this new approach in comparison to the well-known approaches of support vector machine and least absolute shrinkage and selection operator as well as the LROLS algorithm.
Resumo:
Effective public policy to mitigate climate change footprints should build on data-driven analysis of firm-level strategies. This article’s conceptual approach augments the resource-based view (RBV) of the firm and identifies investments in four firm-level resource domains (Governance, Information management, Systems, and Technology [GISTe]) to develop capabilities in climate change impact mitigation. The authors denote the resulting framework as the GISTe model, which frames their analysis and public policy recommendations. This research uses the 2008 Carbon Disclosure Project (CDP) database, with high-quality information on firm-level climate change strategies for 552 companies from North America and Europe. In contrast to the widely accepted myth that European firms are performing better than North American ones, the authors find a different result. Many firms, whether European or North American, do not just “talk” about climate change impact mitigation, but actually do “walk the talk.” European firms appear to be better than their North American counterparts in “walk I,” denoting attention to governance, information management, and systems. But when it comes down to “walk II,” meaning actual Technology-related investments, North American firms’ performance is equal or superior to that of the European companies. The authors formulate public policy recommendations to accelerate firm-level, sector-level, and cluster-level implementation of climate change strategies.
Resumo:
Empirical mode decomposition (EMD) is a data-driven method used to decompose data into oscillatory components. This paper examines to what extent the defined algorithm for EMD might be susceptible to data format. Two key issues with EMD are its stability and computational speed. This paper shows that for a given signal there is no significant difference between results obtained with single (binary32) and double (binary64) floating points precision. This implies that there is no benefit in increasing floating point precision when performing EMD on devices optimised for single floating point format, such as graphical processing units (GPUs).
Resumo:
Empirical Mode Decomposition (EMD) is a data driven technique for extraction of oscillatory components from data. Although it has been introduced over 15 years ago, its mathematical foundations are still missing which also implies lack of objective metrics for decomposed set evaluation. Most common technique for assessing results of EMD is their visual inspection, which is very subjective. This article provides objective measures for assessing EMD results based on the original definition of oscillatory components.
Resumo:
We present a data-driven mathematical model of a key initiating step in platelet activation, a central process in the prevention of bleeding following Injury. In vascular disease, this process is activated inappropriately and causes thrombosis, heart attacks and stroke. The collagen receptor GPVI is the primary trigger for platelet activation at sites of injury. Understanding the complex molecular mechanisms initiated by this receptor is important for development of more effective antithrombotic medicines. In this work we developed a series of nonlinear ordinary differential equation models that are direct representations of biological hypotheses surrounding the initial steps in GPVI-stimulated signal transduction. At each stage model simulations were compared to our own quantitative, high-temporal experimental data that guides further experimental design, data collection and model refinement. Much is known about the linear forward reactions within platelet signalling pathways but knowledge of the roles of putative reverse reactions are poorly understood. An initial model, that includes a simple constitutively active phosphatase, was unable to explain experimental data. Model revisions, incorporating a complex pathway of interactions (and specifically the phosphatase TULA-2), provided a good description of the experimental data both based on observations of phosphorylation in samples from one donor and in those of a wider population. Our model was used to investigate the levels of proteins involved in regulating the pathway and the effect of low GPVI levels that have been associated with disease. Results indicate a clear separation in healthy and GPVI deficient states in respect of the signalling cascade dynamics associated with Syk tyrosine phosphorylation and activation. Our approach reveals the central importance of this negative feedback pathway that results in the temporal regulation of a specific class of protein tyrosine phosphatases in controlling the rate, and therefore extent, of GPVI-stimulated platelet activation.
Resumo:
Nonlinear data assimilation is high on the agenda in all fields of the geosciences as with ever increasing model resolution and inclusion of more physical (biological etc.) processes, and more complex observation operators the data-assimilation problem becomes more and more nonlinear. The suitability of particle filters to solve the nonlinear data assimilation problem in high-dimensional geophysical problems will be discussed. Several existing and new schemes will be presented and it is shown that at least one of them, the Equivalent-Weights Particle Filter, does indeed beat the curse of dimensionality and provides a way forward to solve the problem of nonlinear data assimilation in high-dimensional systems.
Resumo:
Cloud streets are common feature in the Amazon Basin. They form from the combination of the vertical trade wind stress and moist convection. Here, satellite imagery, data collected during the COBRA-PARA (Caxiuan Observations in the Biosphere, River and Atmosphere of Para) field campaign, and high resolution modeling are used to understand the streets` formation and behavior. The observations show that the streets have an aspect ratio of about 3.5 and they reach their maximum activity around 15:00 UTC when the wind shear is weaker, and the convective boundary layer reaches its maximum height. The simulations reveal that the cloud streets onset is caused by the local circulations and convection produced at the interfaces between forest and rivers of the Amazon. The satellite data and modeling show that the large rivers anchor the cloud streets producing a quasi-stationary horizontal pattern. The streets are associated with horizontal roll vortices parallel to the mean flow that organizes the turbulence causing advection of latent heat flux towards the upward branches. The streets have multiple warm plumes that promote a connection between the rolls. These spatial patterns allow fundamental insights on the interpretation of the Amazon exchanges between surface and atmosphere with important consequences for the climate change understanding.
Resumo:
A challenge for the clinical management of Parkinson's disease (PD) is the large within- and between-patient variability in symptom profiles as well as the emergence of motor complications which represent a significant source of disability in patients. This thesis deals with the development and evaluation of methods and systems for supporting the management of PD by using repeated measures, consisting of subjective assessments of symptoms and objective assessments of motor function through fine motor tests (spirography and tapping), collected by means of a telemetry touch screen device. One aim of the thesis was to develop methods for objective quantification and analysis of the severity of motor impairments being represented in spiral drawings and tapping results. This was accomplished by first quantifying the digitized movement data with time series analysis and then using them in data-driven modelling for automating the process of assessment of symptom severity. The objective measures were then analysed with respect to subjective assessments of motor conditions. Another aim was to develop a method for providing comparable information content as clinical rating scales by combining subjective and objective measures into composite scores, using time series analysis and data-driven methods. The scores represent six symptom dimensions and an overall test score for reflecting the global health condition of the patient. In addition, the thesis presents the development of a web-based system for providing a visual representation of symptoms over time allowing clinicians to remotely monitor the symptom profiles of their patients. The quality of the methods was assessed by reporting different metrics of validity, reliability and sensitivity to treatment interventions and natural PD progression over time. Results from two studies demonstrated that the methods developed for the fine motor tests had good metrics indicating that they are appropriate to quantitatively and objectively assess the severity of motor impairments of PD patients. The fine motor tests captured different symptoms; spiral drawing impairment and tapping accuracy related to dyskinesias (involuntary movements) whereas tapping speed related to bradykinesia (slowness of movements). A longitudinal data analysis indicated that the six symptom dimensions and the overall test score contained important elements of information of the clinical scales and can be used to measure effects of PD treatment interventions and disease progression. A usability evaluation of the web-based system showed that the information presented in the system was comparable to qualitative clinical observations and the system was recognized as a tool that will assist in the management of patients.
Resumo:
Objective: To develop a method for objective quantification of PD motor symptoms related to Off episodes and peak dose dyskinesias, using spiral data gathered by using a touch screen telemetry device. The aim was to objectively characterize predominant motor phenotypes (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Background: A retrospective analysis was conducted on recordings from 65 patients with advanced idiopathic PD from nine different clinics in Sweden, recruited from January 2006 until August 2010. In addition to the patient group, 10 healthy elderly subjects were recruited. Upper limb movement data were collected using a touch screen telemetry device from home environments of the subjects. Measurements with the device were performed four times per day during week-long test periods. On each test occasion, the subjects were asked to trace pre-drawn Archimedean spirals, using the dominant hand. The pre-drawn spiral was shown on the screen of the device. The spiral test was repeated three times per test occasion and they were instructed to complete it within 10 seconds. The device had a sampling rate of 10Hz and measured both position and time-stamps (in milliseconds) of the pen tip. Methods: Four independent raters (FB, DH, AJ and DN) used a web interface that animated the spiral drawings and allowed them to observe different kinematic features during the drawing process and to rate task performance. Initially, a number of kinematic features were assessed including ‘impairment’, ‘speed’, ‘irregularity’ and ‘hesitation’ followed by marking the predominant motor phenotype on a 3-category scale: tremor, bradykinesia and/or choreatic dyskinesia. There were only 2 test occasions for which all the four raters either classified them as tremor or could not identify the motor phenotype. Therefore, the two main motor phenotype categories were bradykinesia and dyskinesia. ‘Impairment’ was rated on a scale from 0 (no impairment) to 10 (extremely severe) whereas ‘speed’, ‘irregularity’ and ‘hesitation’ were rated on a scale from 0 (normal) to 4 (extremely severe). The proposed data-driven method consisted of the following steps. Initially, 28 spatiotemporal features were extracted from the time series signals before being presented to a Multilayer Perceptron (MLP) classifier. The features were based on different kinematic quantities of spirals including radius, angle, speed and velocity with the aim of measuring the severity of involuntary symptoms and discriminate between PD-specific (bradykinesia) and/or treatment-induced symptoms (dyskinesia). A Principal Component Analysis was applied on the features to reduce their dimensions where 4 relevant principal components (PCs) were retained and used as inputs to the MLP classifier. Finally, the MLP classifier mapped these components to the corresponding visually assessed motor phenotype scores for automating the process of scoring the bradykinesia and dyskinesia in PD patients whilst they draw spirals using the touch screen device. For motor phenotype (bradykinesia vs. dyskinesia) classification, the stratified 10-fold cross validation technique was employed. Results: There were good agreements between the four raters when rating the individual kinematic features with intra-class correlation coefficient (ICC) of 0.88 for ‘impairment’, 0.74 for ‘speed’, 0.70 for ‘irregularity’, and moderate agreements when rating ‘hesitation’ with an ICC of 0.49. When assessing the two main motor phenotype categories (bradykinesia or dyskinesia) in animated spirals the agreements between the four raters ranged from fair to moderate. There were good correlations between mean ratings of the four raters on individual kinematic features and computed scores. The MLP classifier classified the motor phenotype that is bradykinesia or dyskinesia with an accuracy of 85% in relation to visual classifications of the four movement disorder specialists. The test-retest reliability of the four PCs across the three spiral test trials was good with Cronbach’s Alpha coefficients of 0.80, 0.82, 0.54 and 0.49, respectively. These results indicate that the computed scores are stable and consistent over time. Significant differences were found between the two groups (patients and healthy elderly subjects) in all the PCs, except for the PC3. Conclusions: The proposed method automatically assessed the severity of unwanted symptoms and could reasonably well discriminate between PD-specific and/or treatment-induced motor symptoms, in relation to visual assessments of movement disorder specialists. The objective assessments could provide a time-effect summary score that could be useful for improving decision-making during symptom evaluation of individualized treatment when the goal is to maximize functional On time for patients while minimizing their Off episodes and troublesome dyskinesias.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
With the service life of water supply network (WSN) growth, the growing phenomenon of aging pipe network has become exceedingly serious. As urban water supply network is hidden underground asset, it is difficult for monitoring staff to make a direct classification towards the faults of pipe network by means of the modern detecting technology. In this paper, based on the basic property data (e.g. diameter, material, pressure, distance to pump, distance to tank, load, etc.) of water supply network, decision tree algorithm (C4.5) has been carried out to classify the specific situation of water supply pipeline. Part of the historical data was used to establish a decision tree classification model, and the remaining historical data was used to validate this established model. Adopting statistical methods were used to access the decision tree model including basic statistical method, Receiver Operating Characteristic (ROC) and Recall-Precision Curves (RPC). These methods has been successfully used to assess the accuracy of this established classification model of water pipe network. The purpose of classification model was to classify the specific condition of water pipe network. It is important to maintain the pipeline according to the classification results including asset unserviceable (AU), near perfect condition (NPC) and serious deterioration (SD). Finally, this research focused on pipe classification which plays a significant role in maintaining water supply networks in the future.
Resumo:
An underwater gas pipeline is the portion of the pipeline that crosses a river beneath its bottom. Underwater gas pipelines are subject to increasing dangers as time goes by. An accident at an underwater gas pipeline can lead to technological and environmental disaster on the scale of an entire region. Therefore, timely troubleshooting of all underwater gas pipelines in order to prevent any potential accidents will remain a pressing task for the industry. The most important aspect of resolving this challenge is the quality of the automated system in question. Now the industry doesn't have any automated system that fully meets the needs of the experts working in the field maintaining underwater gas pipelines. Principle Aim of this Research: This work aims to develop a new system of automated monitoring which would simplify the process of evaluating the technical condition and decision making on planning and preventive maintenance and repair work on the underwater gas pipeline. Objectives: Creation a shared model for a new, automated system via IDEF3; Development of a new database system which would store all information about underwater gas pipelines; Development a new application that works with database servers, and provides an explanation of the results obtained from the server; Calculation of the values MTBF for specified pipelines based on quantitative data obtained from tests of this system. Conclusion: The new, automated system PodvodGazExpert has been developed for timely and qualitative determination of the physical conditions of underwater gas pipeline; The basis of the mathematical analysis of this new, automated system uses principal component analysis method; The process of determining the physical condition of an underwater gas pipeline with this new, automated system increases the MTBF by a factor of 8.18 above the existing system used today in the industry.
Resumo:
Due to the increase in water demand and hydropower energy, it is getting more important to operate hydraulic structures in an efficient manner while sustaining multiple demands. Especially, companies, governmental agencies, consultant offices require effective, practical integrated tools and decision support frameworks to operate reservoirs, cascades of run-of-river plants and related elements such as canals by merging hydrological and reservoir simulation/optimization models with various numerical weather predictions, radar and satellite data. The model performance is highly related with the streamflow forecast, related uncertainty and its consideration in the decision making. While deterministic weather predictions and its corresponding streamflow forecasts directly restrict the manager to single deterministic trajectories, probabilistic forecasts can be a key solution by including uncertainty in flow forecast scenarios for dam operation. The objective of this study is to compare deterministic and probabilistic streamflow forecasts on an earlier developed basin/reservoir model for short term reservoir management. The study is applied to the Yuvacık Reservoir and its upstream basin which is the main water supply of Kocaeli City located in the northwestern part of Turkey. The reservoir represents a typical example by its limited capacity, downstream channel restrictions and high snowmelt potential. Mesoscale Model 5 and Ensemble Prediction System data are used as a main input and the flow forecasts are done for 2012 year using HEC-HMS. Hydrometeorological rule-based reservoir simulation model is accomplished with HEC-ResSim and integrated with forecasts. Since EPS based hydrological model produce a large number of equal probable scenarios, it will indicate how uncertainty spreads in the future. Thus, it will provide risk ranges in terms of spillway discharges and reservoir level for operator when it is compared with deterministic approach. The framework is fully data driven, applicable, useful to the profession and the knowledge can be transferred to other similar reservoir systems.
Resumo:
Guias para exploração mineral são normalmente baseados em modelos conceituais de depósitos. Esses guias são, normalmente, baseados na experiência dos geólogos, em dados descritivos e em dados genéticos. Modelamentos numéricos, probabilísticos e não probabilísticos, para estimar a ocorrência de depósitos minerais é um novo procedimento que vem a cada dia aumentando sua utilização e aceitação pela comunidade geológica. Essa tese utiliza recentes metodologias para a geração de mapas de favorablidade mineral. A denominada Ilha Cristalina de Rivera, uma janela erosional da Bacia do Paraná, situada na porção norte do Uruguai, foi escolhida como estudo de caso para a aplicação das metodologias. A construção dos mapas de favorabilidade mineral foi feita com base nos seguintes tipos de dados, informações e resultados de prospecção: 1) imagens orbitais; 2) prospecção geoquimica; 3) prospecção aerogeofísica; 4) mapeamento geo-estrutural e 5) altimetria. Essas informacões foram selecionadas e processadas com base em um modelo de depósito mineral (modelo conceitual), desenvolvido com base na Mina de Ouro San Gregorio. O modelo conceitual (modelo San Gregorio), incluiu características descritivas e genéticas da Mina San Gregorio, a qual abrange os elementos característicos significativos das demais ocorrências minerais conhecidas na Ilha Cristalina de Rivera. A geração dos mapas de favorabilidade mineral envolveu a construção de um banco de dados, o processamento dos dados, e a integração dos dados. As etapas de construção e processamento dos dados, compreenderam a coleta, a seleção e o tratamento dos dados de maneira a constituírem os denominados Planos de Informação. Esses Planos de Informação foram gerados e processados organizadamente em agrupamentos, de modo a constituírem os Fatores de Integração para o mapeamento de favorabilidade mineral na Ilha Cristalina de Rivera. Os dados foram integrados por meio da utilização de duas diferentes metodologias: 1) Pesos de Evidência (dirigida pelos dados) e 2) Lógica Difusa (dirigida pelo conhecimento). Os mapas de favorabilidade mineral resultantes da implementação das duas metodologias de integração foram primeiramente analisados e interpretados de maneira individual. Após foi feita uma análise comparativa entre os resultados. As duas metodologias xxiv obtiveram sucesso em identificar, como áreas de alta favorabilidade, as áreas mineralizadas conhecidas, além de outras áreas ainda não trabalhadas. Os mapas de favorabilidade mineral resultantes das duas metodologias mostraram-se coincidentes em relação as áreas de mais alta favorabilidade. A metodologia Pesos de Evidência apresentou o mapa de favorabilidade mineral mais conservador em termos de extensão areal, porém mais otimista em termos de valores de favorabilidade em comparação aos mapas de favorabilidade mineral resultantes da implementação da metodologia Lógica Difusa. Novos alvos para exploração mineral foram identificados e deverão ser objeto de investigação em detalhe.
Resumo:
Nesta dissertação realizou-se um experimento de Monte Carlo para re- velar algumas características das distribuições em amostras finitas dos estimadores Backfitting (B) e de Integração Marginal(MI) para uma regressão aditiva bivariada. Está-se particularmente interessado em fornecer alguma evidência de como os diferentes métodos de seleção da janela hn, tais co- mo os métodos plug-in, impactam as propriedades em pequenas amostras dos estimadores. Está-se interessado, também, em fornecer evidência do comportamento de diferentes estimadores de hn relativamente a seqüência ótima de hn que minimiza uma função perda escolhida. O impacto de ignorar a dependência entre os regressores na estimação da janela é tam- bém investigado. Esta é uma prática comum e deve ter impacto sobre o desempenho dos estimadores. Além disso, não há nenhuma rotina atual- mente disponível nos pacotes estatísticos/econométricos para a estimação de regressões aditivas via os métodos de Backfitting e Integração Marginal. É um dos objetivos a criação de rotinas em Gauss para a implementação prática destes estimadores. Por fim, diferentemente do que ocorre atual- mente, quando a utilização dos estimadores-B e MI é feita de maneira completamente ad-hoc, há o objetivo de fornecer a usuários informação que permita uma escolha mais objetiva de qual estimador usar quando se está trabalhando com uma amostra finita.