891 resultados para data-driven simulation
Resumo:
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.
Resumo:
Instrumentation and automation plays a vital role to managing the water industry. These systems generate vast amounts of data that must be effectively managed in order to enable intelligent decision making. Time series data management software, commonly known as data historians are used for collecting and managing real-time (time series) information. More advanced software solutions provide a data infrastructure or utility wide Operations Data Management System (ODMS) that stores, manages, calculates, displays, shares, and integrates data from multiple disparate automation and business systems that are used daily in water utilities. These ODMS solutions are proven and have the ability to manage data from smart water meters to the collaboration of data across third party corporations. This paper focuses on practical, utility successes in the water industry where utility managers are leveraging instantaneous access to data from proven, commercial off-the-shelf ODMS solutions to enable better real-time decision making. Successes include saving $650,000 / year in water loss control, safeguarding water quality, saving millions of dollars in energy management and asset management. Immediate opportunities exist to integrate the research being done in academia with these ODMS solutions in the field and to leverage these successes to utilities around the world.
Resumo:
This study presents an approach to combine uncertainties of the hydrological model outputs predicted from a number of machine learning models. The machine learning based uncertainty prediction approach is very useful for estimation of hydrological models' uncertainty in particular hydro-metrological situation in real-time application [1]. In this approach the hydrological model realizations from Monte Carlo simulations are used to build different machine learning uncertainty models to predict uncertainty (quantiles of pdf) of the a deterministic output from hydrological model . Uncertainty models are trained using antecedent precipitation and streamflows as inputs. The trained models are then employed to predict the model output uncertainty which is specific for the new input data. We used three machine learning models namely artificial neural networks, model tree, locally weighted regression to predict output uncertainties. These three models produce similar verification results, which can be improved by merging their outputs dynamically. We propose an approach to form a committee of the three models to combine their outputs. The approach is applied to estimate uncertainty of streamflows simulation from a conceptual hydrological model in the Brue catchment in UK and the Bagmati catchment in Nepal. The verification results show that merged output is better than an individual model output. [1] D. L. Shrestha, N. Kayastha, and D. P. Solomatine, and R. Price. Encapsulation of parameteric uncertainty statistics by various predictive machine learning models: MLUE method, Journal of Hydroinformatic, in press, 2013.
Resumo:
In this research the 3DVAR data assimilation scheme is implemented in the numerical model DIVAST in order to optimize the performance of the numerical model by selecting an appropriate turbulence scheme and tuning its parameters. Two turbulence closure schemes: the Prandtl mixing length model and the two-equation k-ε model were incorporated into DIVAST and examined with respect to their universality of application, complexity of solutions, computational efficiency and numerical stability. A square harbour with one symmetrical entrance subject to tide-induced flows was selected to investigate the structure of turbulent flows. The experimental part of the research was conducted in a tidal basin. A significant advantage of such laboratory experiment is a fully controlled environment where domain setup and forcing are user-defined. The research shows that the Prandtl mixing length model and the two-equation k-ε model, with default parameterization predefined according to literature recommendations, overestimate eddy viscosity which in turn results in a significant underestimation of velocity magnitudes in the harbour. The data assimilation of the model-predicted velocity and laboratory observations significantly improves model predictions for both turbulence models by adjusting modelled flows in the harbour to match de-errored observations. 3DVAR allows also to identify and quantify shortcomings of the numerical model. Such comprehensive analysis gives an optimal solution based on which numerical model parameters can be estimated. The process of turbulence model optimization by reparameterization and tuning towards optimal state led to new constants that may be potentially applied to complex turbulent flows, such as rapidly developing flows or recirculating flows.
Resumo:
Due to the increase in water demand and hydropower energy, it is getting more important to operate hydraulic structures in an efficient manner while sustaining multiple demands. Especially, companies, governmental agencies, consultant offices require effective, practical integrated tools and decision support frameworks to operate reservoirs, cascades of run-of-river plants and related elements such as canals by merging hydrological and reservoir simulation/optimization models with various numerical weather predictions, radar and satellite data. The model performance is highly related with the streamflow forecast, related uncertainty and its consideration in the decision making. While deterministic weather predictions and its corresponding streamflow forecasts directly restrict the manager to single deterministic trajectories, probabilistic forecasts can be a key solution by including uncertainty in flow forecast scenarios for dam operation. The objective of this study is to compare deterministic and probabilistic streamflow forecasts on an earlier developed basin/reservoir model for short term reservoir management. The study is applied to the Yuvacık Reservoir and its upstream basin which is the main water supply of Kocaeli City located in the northwestern part of Turkey. The reservoir represents a typical example by its limited capacity, downstream channel restrictions and high snowmelt potential. Mesoscale Model 5 and Ensemble Prediction System data are used as a main input and the flow forecasts are done for 2012 year using HEC-HMS. Hydrometeorological rule-based reservoir simulation model is accomplished with HEC-ResSim and integrated with forecasts. Since EPS based hydrological model produce a large number of equal probable scenarios, it will indicate how uncertainty spreads in the future. Thus, it will provide risk ranges in terms of spillway discharges and reservoir level for operator when it is compared with deterministic approach. The framework is fully data driven, applicable, useful to the profession and the knowledge can be transferred to other similar reservoir systems.
Resumo:
We study semiparametric two-step estimators which have the same structure as parametric doubly robust estimators in their second step. The key difference is that we do not impose any parametric restriction on the nuisance functions that are estimated in a first stage, but retain a fully nonparametric model instead. We call these estimators semiparametric doubly robust estimators (SDREs), and show that they possess superior theoretical and practical properties compared to generic semiparametric two-step estimators. In particular, our estimators have substantially smaller first-order bias, allow for a wider range of nonparametric first-stage estimates, rate-optimal choices of smoothing parameters and data-driven estimates thereof, and their stochastic behavior can be well-approximated by classical first-order asymptotics. SDREs exist for a wide range of parameters of interest, particularly in semiparametric missing data and causal inference models. We illustrate our method with a simulation exercise.
Resumo:
The C 2 * radical is used as a system probe tool to the reactive flow diagnostic, and it was chosen due to its large occurrence in plasma and combustion in aeronautics and aerospace applications. The rotational temperatures of C 2 * species were determined by the comparison between experimental and theoretical data. The simulation code was developed by the authors, using C++ language and the object oriented paradigm, and it includes a set of new tools that increase the efficacy of the C 2 * probe to determine the rotational temperature of the system. A brute force approach for the determination of spectral parameters was adopted in this version of the computer code. The statistical parameter c 2 was used as an objective criterion to determine the better match of experimental and synthesized spectra. The results showed that the program works even with low-quality experimental data, typically collected from in situ airborne compact apparatus. The technique was applied to flames of a Bunsen burner, and the rotational temperature of ca. 2100 K was calculated.
Resumo:
We present a measurement of the semileptonic mixing asymmetry for B0 mesons, asld, using two independent decay channels: B0→μ +D -X, with D -→K +π -π -; and B0→μ +D *-X, with D * -→D ̄0π -, D ̄0→ K +π - (and charge conjugate processes). We use a data sample corresponding to 10.4fb -1 of pp̄ collisions at √s=1.96TeV, collected with the D0 experiment at the Fermilab Tevatron collider. We extract the charge asymmetries in these two channels as a function of the visible proper decay length of the B0 meson, correct for detector-related asymmetries using data-driven methods, and account for dilution from charge-symmetric processes using Monte Carlo simulation. The final measurement combines four signal visible proper decay length regions for each channel, yielding asld=[0.68±0.45(stat)±0.14(syst)]%. This is the single most precise measurement of this parameter, with uncertainties smaller than the current world average of B factory measurements. © 2012 American Physical Society.
Resumo:
Este trabalho apresenta resultados práticos de uma atenção sistemática dada ao processamento e à interpretação sísmica de algumas linhas terrestres do conjunto de dados do gráben do Tacutu (Brasil), sobre os quais foram aplicadas etapas fundamentais do sistema WIT de imageamento do empilhamento CRS (Superfície de Reflexão Comum) vinculado a dados. Como resultado, esperamos estabelecer um fluxograma para a reavaliação sísmica de bacias sedimentares. Fundamentado nos atributos de frente de onda resultantes do empilhamento CRS, um macro-modelo suave de velocidades foi obtido através de inversão tomográfica. Usando este macro-modelo, foi realizado uma migração à profundidade pré- e pós-empilhamento. Além disso, outras técnicas baseadas no empilhamento CRS foram realizadas em paralelo como correção estática residual e migração de abertura-limitada baseada na zona de Fresnel projetada. Uma interpretação geológica sobre as seções empilhadas e migradas foi esboçada. A partir dos detalhes visuais dos painéis é possível interpretar desconformidades, afinamentos, um anticlinal principal falhado com conjuntos de horstes e grábens. Também, uma parte da linha selecionada precisa de processamento mais detalhado para evidenciar melhor qualquer estrutura presente na subsuperfície.
Resumo:
Determination of the utility harmonic impedance based on measurements is a significant task for utility power-quality improvement and management. Compared to those well-established, accurate invasive methods, the noninvasive methods are more desirable since they work with natural variations of the loads connected to the point of common coupling (PCC), so that no intentional disturbance is needed. However, the accuracy of these methods has to be improved. In this context, this paper first points out that the critical problem of the noninvasive methods is how to select the measurements that can be used with confidence for utility harmonic impedance calculation. Then, this paper presents a new measurement technique which is based on the complex data-based least-square regression, combined with two techniques of data selection. Simulation and field test results show that the proposed noninvasive method is practical and robust so that it can be used with confidence to determine the utility harmonic impedances.
Resumo:
Gossip protocols have been analyzed as a feasible solution for data dissemination on peer-to-peer networks. In this thesis, a new data dissemination protocol is proposed and compared with other known gossip mechanisms. Performance evaluation is based on simulation.
Resumo:
The production of the Z boson in proton-proton collisions at the LHC serves as a standard candle at the ATLAS experiment during early data-taking. The decay of the Z into an electron-positron pair gives a clean signature in the detector that allows for calibration and performance studies. The cross-section of ~ 1 nb allows first LHC measurements of parton density functions. In this thesis, simulations of 10 TeV collisions at the ATLAS detector are studied. The challenges for an experimental measurement of the cross-section with an integrated luminositiy of 100 pb−1 are discussed. In preparation for the cross-section determination, the single-electron efficiencies are determined via a simulation based method and in a test of a data-driven ansatz. The two methods show a very good agreement and differ by ~ 3% at most. The ingredients of an inclusive and a differential Z production cross-section measurement at ATLAS are discussed and their possible contributions to systematic uncertainties are presented. For a combined sample of signal and background the expected uncertainty on the inclusive cross-section for an integrated luminosity of 100 pb−1 is determined to 1.5% (stat) +/- 4.2% (syst) +/- 10% (lumi). The possibilities for single-differential cross-section measurements in rapidity and transverse momentum of the Z boson, which are important quantities because of the impact on parton density functions and the capability to check for non-pertubative effects in pQCD, are outlined. The issues of an efficiency correction based on electron efficiencies as function of the electron’s transverse momentum and pseudorapidity are studied. A possible alternative is demonstrated by expanding the two-dimensional efficiencies with the additional dimension of the invariant mass of the two leptons of the Z decay.
Resumo:
This is the first part of a study investigating a model-based transient calibration process for diesel engines. The motivation is to populate hundreds of parameters (which can be calibrated) in a methodical and optimum manner by using model-based optimization in conjunction with the manual process so that, relative to the manual process used by itself, a significant improvement in transient emissions and fuel consumption and a sizable reduction in calibration time and test cell requirements is achieved. Empirical transient modelling and optimization has been addressed in the second part of this work, while the required data for model training and generalization are the focus of the current work. Transient and steady-state data from a turbocharged multicylinder diesel engine have been examined from a model training perspective. A single-cylinder engine with external air-handling has been used to expand the steady-state data to encompass transient parameter space. Based on comparative model performance and differences in the non-parametric space, primarily driven by a high engine difference between exhaust and intake manifold pressures (ΔP) during transients, it has been recommended that transient emission models should be trained with transient training data. It has been shown that electronic control module (ECM) estimates of transient charge flow and the exhaust gas recirculation (EGR) fraction cannot be accurate at the high engine ΔP frequently encountered during transient operation, and that such estimates do not account for cylinder-to-cylinder variation. The effects of high engine ΔP must therefore be incorporated empirically by using transient data generated from a spectrum of transient calibrations. Specific recommendations on how to choose such calibrations, how many data to acquire, and how to specify transient segments for data acquisition have been made. Methods to process transient data to account for transport delays and sensor lags have been developed. The processed data have then been visualized using statistical means to understand transient emission formation. Two modes of transient opacity formation have been observed and described. The first mode is driven by high engine ΔP and low fresh air flowrates, while the second mode is driven by high engine ΔP and high EGR flowrates. The EGR fraction is inaccurately estimated at both modes, while EGR distribution has been shown to be present but unaccounted for by the ECM. The two modes and associated phenomena are essential to understanding why transient emission models are calibration dependent and furthermore how to choose training data that will result in good model generalization.
Resumo:
In-cylinder pressure transducers have been used for decades to record combustion pressure inside a running engine. However, due to the extreme operating environment, transducer design and installation must be considered in order to minimize measurement error. One such error is caused by thermal shock, where the pressure transducer experiences a high heat flux that can distort the pressure transducer diaphragm and also change the crystal sensitivity. This research focused on investigating the effects of thermal shock on in-cylinder pressure transducer data quality using a 2.0L, four-cylinder, spark-ignited, direct-injected, turbo-charged GM engine. Cylinder four was modified with five ports to accommodate pressure transducers of different manufacturers. They included an AVL GH14D, an AVL GH15D, a Kistler 6125C, and a Kistler 6054AR. The GH14D, GH15D, and 6054AR were M5 size transducers. The 6125C was a larger, 6.2mm transducer. Note that both of the AVL pressure transducers utilized a PH03 flame arrestor. Sweeps of ignition timing (spark sweep), engine speed, and engine load were performed to study the effects of thermal shock on each pressure transducer. The project consisted of two distinct phases which included experimental engine testing as well as simulation using a commercially available software package. A comparison was performed to characterize the quality of the data between the actual cylinder pressure and the simulated results. This comparison was valuable because the simulation results did not include thermal shock effects. All three sets of tests showed the peak cylinder pressure was basically unaffected by thermal shock. Comparison of the experimental data with the simulated results showed very good correlation. The spark sweep was performed at 1300 RPM and 3.3 bar NMEP and showed that the differences between the simulated results (no thermal shock) and the experimental data for the indicated mean effective pressure (IMEP) and the pumping mean effective pressure (PMEP) were significantly less than the published accuracies. All transducers had an IMEP percent difference less than 0.038% and less than 0.32% for PMEP. Kistler and AVL publish that the accuracy of their pressure transducers are within plus or minus 1% for the IMEP (AVL 2011; Kistler 2011). In addition, the difference in average exhaust absolute pressure between the simulated results and experimental data was the greatest for the two Kistler pressure transducers. The location and lack of flame arrestor are believed to be the cause of the increased error. For the engine speed sweep, the torque output was held constant at 203 Nm (150 ft-lbf) from 1500 to 4000 RPM. The difference in IMEP was less than 0.01% and the PMEP was less than 1%, except for the AVL GH14D which was 5% and the AVL GH15DK which was 2.25%. A noticeable error in PMEP appeared as the load increased during the engine speed sweeps, as expected. The load sweep was conducted at 2000 RPM over a range of NMEP from 1.1 to 14 bar. The difference in IMEP values were less 0.08% while the PMEP values were below 1% except for the AVL GH14D which was 1.8% and the AVL GH15DK which was at 1.25%. In-cylinder pressure transducer data quality was effectively analyzed using a combination of experimental data and simulation results. Several criteria can be used to investigate the impact of thermal shock on data quality as well as determine the best location and thermal protection for various transducers.
Resumo:
Correct predictions of future blood glucose levels in individuals with Type 1 Diabetes (T1D) can be used to provide early warning of upcoming hypo-/hyperglycemic events and thus to improve the patient's safety. To increase prediction accuracy and efficiency, various approaches have been proposed which combine multiple predictors to produce superior results compared to single predictors. Three methods for model fusion are presented and comparatively assessed. Data from 23 T1D subjects under sensor-augmented pump (SAP) therapy were used in two adaptive data-driven models (an autoregressive model with output correction - cARX, and a recurrent neural network - RNN). Data fusion techniques based on i) Dempster-Shafer Evidential Theory (DST), ii) Genetic Algorithms (GA), and iii) Genetic Programming (GP) were used to merge the complimentary performances of the prediction models. The fused output is used in a warning algorithm to issue alarms of upcoming hypo-/hyperglycemic events. The fusion schemes showed improved performance with lower root mean square errors, lower time lags, and higher correlation. In the warning algorithm, median daily false alarms (DFA) of 0.25%, and 100% correct alarms (CA) were obtained for both event types. The detection times (DT) before occurrence of events were 13.0 and 12.1 min respectively for hypo-/hyperglycemic events. Compared to the cARX and RNN models, and a linear fusion of the two, the proposed fusion schemes represents a significant improvement.