942 resultados para Data Driven Modeling
Resumo:
This study presents an approach to combine uncertainties of the hydrological model outputs predicted from a number of machine learning models. The machine learning based uncertainty prediction approach is very useful for estimation of hydrological models' uncertainty in particular hydro-metrological situation in real-time application [1]. In this approach the hydrological model realizations from Monte Carlo simulations are used to build different machine learning uncertainty models to predict uncertainty (quantiles of pdf) of the a deterministic output from hydrological model . Uncertainty models are trained using antecedent precipitation and streamflows as inputs. The trained models are then employed to predict the model output uncertainty which is specific for the new input data. We used three machine learning models namely artificial neural networks, model tree, locally weighted regression to predict output uncertainties. These three models produce similar verification results, which can be improved by merging their outputs dynamically. We propose an approach to form a committee of the three models to combine their outputs. The approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model in the Brue catchment in UK and the Bagmati catchment in Nepal. The verification results show that merged output is better than an individual model output. [1] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press, 2013.
Resumo:
When an accurate hydraulic network model is available, direct modeling techniques are very straightforward and reliable for on-line leakage detection and localization applied to large class of water distribution networks. In general, this type of techniques based on analytical models can be seen as an application of the well-known fault detection and isolation theory for complex industrial systems. Nonetheless, the assumption of single leak scenarios is usually made considering a certain leak size pattern which may not hold in real applications. Upgrading a leak detection and localization method based on a direct modeling approach to handle multiple-leak scenarios can be, on one hand, quite straightforward but, on the other hand, highly computational demanding for large class of water distribution networks given the huge number of potential water loss hotspots. This paper presents a leakage detection and localization method suitable for multiple-leak scenarios and large class of water distribution networks. This method can be seen as an upgrade of the above mentioned method based on a direct modeling approach in which a global search method based on genetic algorithms has been integrated in order to estimate those network water loss hotspots and the size of the leaks. This is an inverse / direct modeling method which tries to take benefit from both approaches: on one hand, the exploration capability of genetic algorithms to estimate network water loss hotspots and the size of the leaks and on the other hand, the straightforwardness and reliability offered by the availability of an accurate hydraulic model to assess those close network areas around the estimated hotspots. The application of the resulting method in a DMA of the Barcelona water distribution network is provided and discussed. The obtained results show that leakage detection and localization under multiple-leak scenarios may be performed efficiently following an easy procedure.
Resumo:
This paper presents a comparative analysis between the experimental characterization and the numerical simulation results for a three-dimensional FCC photonic crystal (PhC) based on a self-assembly synthesis of monodispersive latex spheres. Specifically, experimental optical characterization, by means of reflectance measurements under variable angles over the lattice plane family [1,1, 1], are compared to theoretical calculations based on the Finite Di®erence Time Domain (FDTD) method, in order to investigate the correlation between theoretical predictions and experimental data. The goal is to highlight the influence of crystal defects on the achieved performance.
Resumo:
Este trabalho apresenta resultados práticos de uma atenção sistemática dada ao processamento e à interpretação sísmica de algumas linhas terrestres do conjunto de dados do gráben do Tacutu (Brasil), sobre os quais foram aplicadas etapas fundamentais do sistema WIT de imageamento do empilhamento CRS (Superfície de Reflexão Comum) vinculado a dados. Como resultado, esperamos estabelecer um fluxograma para a reavaliação sísmica de bacias sedimentares. Fundamentado nos atributos de frente de onda resultantes do empilhamento CRS, um macro-modelo suave de velocidades foi obtido através de inversão tomográfica. Usando este macro-modelo, foi realizado uma migração à profundidade pré- e pós-empilhamento. Além disso, outras técnicas baseadas no empilhamento CRS foram realizadas em paralelo como correção estática residual e migração de abertura-limitada baseada na zona de Fresnel projetada. Uma interpretação geológica sobre as seções empilhadas e migradas foi esboçada. A partir dos detalhes visuais dos painéis é possível interpretar desconformidades, afinamentos, um anticlinal principal falhado com conjuntos de horstes e grábens. Também, uma parte da linha selecionada precisa de processamento mais detalhado para evidenciar melhor qualquer estrutura presente na subsuperfície.
Resumo:
Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States.
Resumo:
The immune system exhibits an enormous complexity. High throughput methods such as the "-omic'' technologies generate vast amounts of data that facilitate dissection of immunological processes at ever finer resolution. Using high-resolution data-driven systems analysis, causal relationships between complex molecular processes and particular immunological phenotypes can be constructed. However, processes in tissues, organs, and the organism itself (so-called higher level processes) also control and regulate the molecular (lower level) processes. Reverse systems engineering approaches, which focus on the examination of the structure, dynamics and control of the immune system, can help to understand the construction principles of the immune system. Such integrative mechanistic models can properly describe, explain, and predict the behavior of the immune system in health and disease by combining both higher and lower level processes. Moving from molecular and cellular levels to a multiscale systems understanding requires the development of methodologies that integrate data from different biological levels into multiscale mechanistic models. In particular, 3D imaging techniques and 4D modeling of the spatiotemporal dynamics of immune processes within lymphoid tissues are central for such integrative approaches. Both dynamic and global organ imaging technologies will be instrumental in facilitating comprehensive multiscale systems immunology analyses as discussed in this review.
Resumo:
WE INVESTIGATED HOW WELL STRUCTURAL FEATURES such as note density or the relative number of changes in the melodic contour could predict success in implicit and explicit memory for unfamiliar melodies. We also analyzed which features are more likely to elicit increasingly confident judgments of "old" in a recognition memory task. An automated analysis program computed structural aspects of melodies, both independent of any context, and also with reference to the other melodies in the testset and the parent corpus of pop music. A few features predicted success in both memory tasks, which points to a shared memory component. However, motivic complexity compared to a large corpus of pop music had different effects on explicit and implicit memory. We also found that just a few features are associated with different rates of "old" judgments, whether the items were old or new. Rarer motives relative to the testset predicted hits and rarer motives relative to the corpus predicted false alarms. This data-driven analysis provides further support for both shared and separable mechanisms in implicit and explicit memory retrieval, as well as the role of distinctiveness in true and false judgments of familiarity.
Resumo:
Wind energy has been one of the most growing sectors of the nation’s renewable energy portfolio for the past decade, and the same tendency is being projected for the upcoming years given the aggressive governmental policies for the reduction of fossil fuel dependency. Great technological expectation and outstanding commercial penetration has shown the so called Horizontal Axis Wind Turbines (HAWT) technologies. Given its great acceptance, size evolution of wind turbines over time has increased exponentially. However, safety and economical concerns have emerged as a result of the newly design tendencies for massive scale wind turbine structures presenting high slenderness ratios and complex shapes, typically located in remote areas (e.g. offshore wind farms). In this regard, safety operation requires not only having first-hand information regarding actual structural dynamic conditions under aerodynamic action, but also a deep understanding of the environmental factors in which these multibody rotating structures operate. Given the cyclo-stochastic patterns of the wind loading exerting pressure on a HAWT, a probabilistic framework is appropriate to characterize the risk of failure in terms of resistance and serviceability conditions, at any given time. Furthermore, sources of uncertainty such as material imperfections, buffeting and flutter, aeroelastic damping, gyroscopic effects, turbulence, among others, have pleaded for the use of a more sophisticated mathematical framework that could properly handle all these sources of indetermination. The attainable modeling complexity that arises as a result of these characterizations demands a data-driven experimental validation methodology to calibrate and corroborate the model. For this aim, System Identification (SI) techniques offer a spectrum of well-established numerical methods appropriated for stationary, deterministic, and data-driven numerical schemes, capable of predicting actual dynamic states (eigenrealizations) of traditional time-invariant dynamic systems. As a consequence, it is proposed a modified data-driven SI metric based on the so called Subspace Realization Theory, now adapted for stochastic non-stationary and timevarying systems, as is the case of HAWT’s complex aerodynamics. Simultaneously, this investigation explores the characterization of the turbine loading and response envelopes for critical failure modes of the structural components the wind turbine is made of. In the long run, both aerodynamic framework (theoretical model) and system identification (experimental model) will be merged in a numerical engine formulated as a search algorithm for model updating, also known as Adaptive Simulated Annealing (ASA) process. This iterative engine is based on a set of function minimizations computed by a metric called Modal Assurance Criterion (MAC). In summary, the Thesis is composed of four major parts: (1) development of an analytical aerodynamic framework that predicts interacted wind-structure stochastic loads on wind turbine components; (2) development of a novel tapered-swept-corved Spinning Finite Element (SFE) that includes dampedgyroscopic effects and axial-flexural-torsional coupling; (3) a novel data-driven structural health monitoring (SHM) algorithm via stochastic subspace identification methods; and (4) a numerical search (optimization) engine based on ASA and MAC capable of updating the SFE aerodynamic model.
An Early-Warning System for Hypo-/Hyperglycemic Events Based on Fusion of Adaptive Prediction Models
Resumo:
Introduction: Early warning of future hypoglycemic and hyperglycemic events can improve the safety of type 1 diabetes mellitus (T1DM) patients. The aim of this study is to design and evaluate a hypoglycemia / hyperglycemia early warning system (EWS) for T1DM patients under sensor-augmented pump (SAP) therapy. Methods: The EWS is based on the combination of data-driven online adaptive prediction models and a warning algorithm. Three modeling approaches have been investigated: (i) autoregressive (ARX) models, (ii) auto-regressive with an output correction module (cARX) models, and (iii) recurrent neural network (RNN) models. The warning algorithm performs postprocessing of the models′ outputs and issues alerts if upcoming hypoglycemic/hyperglycemic events are detected. Fusion of the cARX and RNN models, due to their complementary prediction performances, resulted in the hybrid autoregressive with an output correction module/recurrent neural network (cARN)-based EWS. Results: The EWS was evaluated on 23 T1DM patients under SAP therapy. The ARX-based system achieved hypoglycemic (hyperglycemic) event prediction with median values of accuracy of 100.0% (100.0%), detection time of 10.0 (8.0) min, and daily false alarms of 0.7 (0.5). The respective values for the cARX-based system were 100.0% (100.0%), 17.5 (14.8) min, and 1.5 (1.3) and, for the RNN-based system, were 100.0% (92.0%), 8.4 (7.0) min, and 0.1 (0.2). The hybrid cARN-based EWS presented outperforming results with 100.0% (100.0%) prediction accuracy, detection 16.7 (14.7) min in advance, and 0.8 (0.8) daily false alarms. Conclusion: Combined use of cARX and RNN models for the development of an EWS outperformed the single use of each model, achieving accurate and prompt event prediction with few false alarms, thus providing increased safety and comfort.
Resumo:
Correct predictions of future blood glucose levels in individuals with Type 1 Diabetes (T1D) can be used to provide early warning of upcoming hypo-/hyperglycemic events and thus to improve the patient's safety. To increase prediction accuracy and efficiency, various approaches have been proposed which combine multiple predictors to produce superior results compared to single predictors. Three methods for model fusion are presented and comparatively assessed. Data from 23 T1D subjects under sensor-augmented pump (SAP) therapy were used in two adaptive data-driven models (an autoregressive model with output correction - cARX, and a recurrent neural network - RNN). Data fusion techniques based on i) Dempster-Shafer Evidential Theory (DST), ii) Genetic Algorithms (GA), and iii) Genetic Programming (GP) were used to merge the complimentary performances of the prediction models. The fused output is used in a warning algorithm to issue alarms of upcoming hypo-/hyperglycemic events. The fusion schemes showed improved performance with lower root mean square errors, lower time lags, and higher correlation. In the warning algorithm, median daily false alarms (DFA) of 0.25%, and 100% correct alarms (CA) were obtained for both event types. The detection times (DT) before occurrence of events were 13.0 and 12.1 min respectively for hypo-/hyperglycemic events. Compared to the cARX and RNN models, and a linear fusion of the two, the proposed fusion schemes represents a significant improvement.
Resumo:
The objectives of this study were to identify and measure the average outcomes of the Open Door Mission's nine-month community-based substance abuse treatment program, identify predictors of successful outcomes, and make recommendations to the Open Door Mission for improving its treatment program.^ The Mission's program is exclusive to adult men who have limited financial resources: most of which were homeless or dependent on parents or other family members for basic living needs. Many, but not all, of these men are either chemically dependent or have a history of substance abuse.^ This study tracked a cohort of the Mission's graduates throughout this one-year study and identified various indicators of success at short-term intervals, which may be predictive of longer-term outcomes. We tracked various levels of 12-step program involvement, as well as other social and spiritual activities, such as church affiliation and recovery support.^ Twenty-four of the 66 subjects, or 36% met the Mission's requirements for success. Specific to this success criteria; Fifty-four, or 82% reported affiliation with a home church; Twenty-six, or 39% reported full-time employment; Sixty-one, or 92% did not report or were not identified as having any post-treatment arrests or incarceration, and; Forty, or 61% reported continuous abstinence from both drugs and alcohol.^ Five research-based hypotheses were developed and tested. The primary analysis tool was the web-based non-parametric dependency modeling tool, B-Course, which revealed some strong associations with certain variables, and helped the researchers generate and test several data-driven hypotheses. Full-time employment is the greatest predictor of abstinence: 95% of those who reported full time employment also reported continuous post-treatment abstinence, while 50% of those working part-time were abstinent and 29% of those with no employment were abstinent. Working with a 12-step sponsor, attending aftercare, and service with others were identified as predictors of abstinence.^ This study demonstrates that associations with abstinence and the ODM success criteria are not simply based on one social or behavioral factor. Rather, these relationships are interdependent, and show that abstinence is achieved and maintained through a combination of several 12-step recovery activities. This study used a simple assessment methodology, which demonstrated strong associations across variables and outcomes, which have practical applicability to the Open Door Mission for improving its treatment program. By leveraging the predictive capability of the various success determination methodologies discussed and developed throughout this study, we can identify accurate outcomes with both validity and reliability. This assessment instrument can also be used as an intervention that, if operationalized to the Mission’s clients during the primary treatment program, may measurably improve the effectiveness and outcomes of the Open Door Mission.^
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
Abstract. The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies for describing different domains. Ac-cording to LD principles, developers should reuse as many available terms as possible to describe their data. Importing ontologies or referring to their terms’ URIs are the two main ways to reuse knowledge from available ontologies. In this paper, we have analyzed 18589 terms appearing within 196 ontologies in-cluded in the Linked Open Vocabularies (LOV) registry with the aim of under-standing the current state of ontology reuse in the LD context. In order to char-acterize the landscape of ontology reuse in this context, we have extracted sta-tistics about currently reused elements, calculated ratios for reuse, and drawn graphs about imports and references between ontologies. Keywords: ontology, vocabulary, reuse, linked data, ontology import
Resumo:
The uptake of Linked Data (LD) has promoted the proliferation of datasets and their associated ontologies for describing different domains. Par-ticular LD development characteristics such as agility and web-based architec-ture necessitate the revision, adaption, and lightening of existing methodologies for ontology development. This thesis proposes a lightweight method for ontol-ogy development in an LD context which will be based in data-driven agile de-velopments, existing resources to be reused, and the evaluation of the obtained products considering both classical ontological engineering principles and LD characteristics.
Resumo:
Enterprises are increasingly using a wide range of heterogeneous information systems for executing and governing their business activities. Even if the adoption of service orientation has improved loose coupling and reusability, applications are still isolated data silos whose integration requires complex transformations and mediations. However, by leveraging Linked Data principles those data silos can now be seamlessly integrated, and this opens the door to new data-driven approaches for Enterprise Application Integration (EAI). In this paper we present LDP4j, an open souce Java-based framework for the development of interoperable read-write Linked Data applications, based on the W3C Linked Data Platform (LDP) specification.