888 resultados para data-driven simulation
Resumo:
Currently, the quality of the Indonesian national road network is inadequate due to several constraints, including overcapacity and overloaded trucks. The high deterioration rate of the road infrastructure in developing countries along with major budgetary restrictions and high growth in traffic have led to an emerging need for improving the performance of the highway maintenance system. However, the high number of intervening factors and their complex effects require advanced tools to successfully solve this problem. The high learning capabilities of Data Mining (DM) are a powerful solution to this problem. In the past, these tools have been successfully applied to solve complex and multi-dimensional problems in various scientific fields. Therefore, it is expected that DM can be used to analyze the large amount of data regarding the pavement and traffic, identify the relationship between variables, and provide information regarding the prediction of the data. In this paper, we present a new approach to predict the International Roughness Index (IRI) of pavement based on DM techniques. DM was used to analyze the initial IRI data, including age, Equivalent Single Axle Load (ESAL), crack, potholes, rutting, and long cracks. This model was developed and verified using data from an Integrated Indonesia Road Management System (IIRMS) that was measured with the National Association of Australian State Road Authorities (NAASRA) roughness meter. The results of the proposed approach are compared with the IIRMS analytical model adapted to the IRI, and the advantages of the new approach are highlighted. We show that the novel data-driven model is able to learn (with high accuracy) the complex relationships between the IRI and the contributing factors of overloaded trucks
Advanced mapping of environmental data: Geostatistics, Machine Learning and Bayesian Maximum Entropy
Resumo:
This book combines geostatistics and global mapping systems to present an up-to-the-minute study of environmental data. Featuring numerous case studies, the reference covers model dependent (geostatistics) and data driven (machine learning algorithms) analysis techniques such as risk mapping, conditional stochastic simulations, descriptions of spatial uncertainty and variability, artificial neural networks (ANN) for spatial data, Bayesian maximum entropy (BME), and more.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.
Resumo:
Radioactive soil-contamination mapping and risk assessment is a vital issue for decision makers. Traditional approaches for mapping the spatial concentration of radionuclides employ various regression-based models, which usually provide a single-value prediction realization accompanied (in some cases) by estimation error. Such approaches do not provide the capability for rigorous uncertainty quantification or probabilistic mapping. Machine learning is a recent and fast-developing approach based on learning patterns and information from data. Artificial neural networks for prediction mapping have been especially powerful in combination with spatial statistics. A data-driven approach provides the opportunity to integrate additional relevant information about spatial phenomena into a prediction model for more accurate spatial estimates and associated uncertainty. Machine-learning algorithms can also be used for a wider spectrum of problems than before: classification, probability density estimation, and so forth. Stochastic simulations are used to model spatial variability and uncertainty. Unlike regression models, they provide multiple realizations of a particular spatial pattern that allow uncertainty and risk quantification. This paper reviews the most recent methods of spatial data analysis, prediction, and risk mapping, based on machine learning and stochastic simulations in comparison with more traditional regression models. The radioactive fallout from the Chernobyl Nuclear Power Plant accident is used to illustrate the application of the models for prediction and classification problems. This fallout is a unique case study that provides the challenging task of analyzing huge amounts of data ('hard' direct measurements, as well as supplementary information and expert estimates) and solving particular decision-oriented problems.
Resumo:
The neural mechanisms determining the timing of even simple actions, such as when to walk or rest, are largely mysterious. One intriguing, but untested, hypothesis posits a role for ongoing activity fluctuations in neurons of central action selection circuits that drive animal behavior from moment to moment. To examine how fluctuating activity can contribute to action timing, we paired high-resolution measurements of freely walking Drosophila melanogaster with data-driven neural network modeling and dynamical systems analysis. We generated fluctuation-driven network models whose outputs-locomotor bouts-matched those measured from sensory-deprived Drosophila. From these models, we identified those that could also reproduce a second, unrelated dataset: the complex time-course of odor-evoked walking for genetically diverse Drosophila strains. Dynamical models that best reproduced both Drosophila basal and odor-evoked locomotor patterns exhibited specific characteristics. First, ongoing fluctuations were required. In a stochastic resonance-like manner, these fluctuations allowed neural activity to escape stable equilibria and to exceed a threshold for locomotion. Second, odor-induced shifts of equilibria in these models caused a depression in locomotor frequency following olfactory stimulation. Our models predict that activity fluctuations in action selection circuits cause behavioral output to more closely match sensory drive and may therefore enhance navigation in complex sensory environments. Together these data reveal how simple neural dynamics, when coupled with activity fluctuations, can give rise to complex patterns of animal behavior.
Inference for nonparametric high-frequency estimators with an application to time variation in betas
Resumo:
We consider the problem of conducting inference on nonparametric high-frequency estimators without knowing their asymptotic variances. We prove that a multivariate subsampling method achieves this goal under general conditions that were not previously available in the literature. We suggest a procedure for a data-driven choice of the bandwidth parameters. Our simulation study indicates that the subsampling method is much more robust than the plug-in method based on the asymptotic expression for the variance. Importantly, the subsampling method reliably estimates the variability of the Two Scale estimator even when its parameters are chosen to minimize the finite sample Mean Squared Error; in contrast, the plugin estimator substantially underestimates the sampling uncertainty. By construction, the subsampling method delivers estimates of the variance-covariance matrices that are always positive semi-definite. We use the subsampling method to study the dynamics of financial betas of six stocks on the NYSE. We document significant variation in betas within year 2006, and find that tick data captures more variation in betas than the data sampled at moderate frequencies such as every five or twenty minutes. To capture this variation we estimate a simple dynamic model for betas. The variance estimation is also important for the correction of the errors-in-variables bias in such models. We find that the bias corrections are substantial, and that betas are more persistent than the naive estimators would lead one to believe.
Resumo:
This thesis Entitled “modelling and analysis of recurrent event data with multiple causes.Survival data is a term used for describing data that measures the time to occurrence of an event.In survival studies, the time to occurrence of an event is generally referred to as lifetime.Recurrent event data are commonly encountered in longitudinal studies when individuals are followed to observe the repeated occurrences of certain events. In many practical situations, individuals under study are exposed to the failure due to more than one causes and the eventual failure can be attributed to exactly one of these causes.The proposed model was useful in real life situations to study the effect of covariates on recurrences of certain events due to different causes.In Chapter 3, an additive hazards model for gap time distributions of recurrent event data with multiple causes was introduced. The parameter estimation and asymptotic properties were discussed .In Chapter 4, a shared frailty model for the analysis of bivariate competing risks data was presented and the estimation procedures for shared gamma frailty model, without covariates and with covariates, using EM algorithm were discussed. In Chapter 6, two nonparametric estimators for bivariate survivor function of paired recurrent event data were developed. The asymptotic properties of the estimators were studied. The proposed estimators were applied to a real life data set. Simulation studies were carried out to find the efficiency of the proposed estimators.
Resumo:
One of the major applications of underwater acoustic sensor networks (UWASN) is ocean environment monitoring. Employing data mules is an energy efficient way of data collection from the underwater sensor nodes in such a network. A data mule node such as an autonomous underwater vehicle (AUV) periodically visits the stationary nodes to download data. By conserving the power required for data transmission over long distances to a remote data sink, this approach extends the network life time. In this paper we propose a new MAC protocol to support a single mobile data mule node to collect the data sensed by the sensor nodes in periodic runs through the network. In this approach, the nodes need to perform only short distance, single hop transmission to the data mule. The protocol design discussed in this paper is motivated to support such an application. The proposed protocol is a hybrid protocol, which employs a combination of schedule based access among the stationary nodes along with handshake based access to support mobile data mules. The new protocol, RMAC-M is developed as an extension to the energy efficient MAC protocol R-MAC by extending the slot time of R-MAC to include a contention part for a hand shake based data transfer. The mobile node makes use of a beacon to signal its presence to all the nearby nodes, which can then hand-shake with the mobile node for data transfer. Simulation results show that the new protocol provides efficient support for a mobile data mule node while preserving the advantages of R-MAC such as energy efficiency and fairness.
Resumo:
Caches are known to consume up to half of all system power in embedded processors. Co-optimizing performance and power of the cache subsystems is therefore an important step in the design of embedded systems, especially those employing application specific instruction processors. In this project, we propose an analytical cache model that succinctly captures the miss performance of an application over the entire cache parameter space. Unlike exhaustive trace driven simulation, our model requires that the program be simulated once so that a few key characteristics can be obtained. Using these application-dependent characteristics, the model can span the entire cache parameter space consisting of cache sizes, associativity and cache block sizes. In our unified model, we are able to cater for direct-mapped, set and fully associative instruction, data and unified caches. Validation against full trace-driven simulations shows that our model has a high degree of fidelity. Finally, we show how the model can be coupled with a power model for caches such that one can very quickly decide on pareto-optimal performance-power design points for rapid design space exploration.
Resumo:
Pullpipelining, a pipeline technique where data is pulled from successor stages from predecessor stages is proposed Control circuits using a synchronous, a semi-synchronous and an asynchronous approach are given. Simulation examples for a DLX generic RISC datapath show that common control pipeline circuit overhead is avoided using the proposal. Applications to linear systolic arrays in cases when computation is finished at early stages in the array are foreseen. This would allow run-time data-driven digital frequency modulation of synchronous pipelined designs. This has applications to implement algorithms exhibiting average-case processing time using a synchronous approach.
Resumo:
This contribution introduces a new digital predistorter to compensate serious distortions caused by memory high power amplifiers (HPAs) which exhibit output saturation characteristics. The proposed design is based on direct learning using a data-driven B-spline Wiener system modeling approach. The nonlinear HPA with memory is first identified based on the B-spline neural network model using the Gauss-Newton algorithm, which incorporates the efficient De Boor algorithm with both B-spline curve and first derivative recursions. The estimated Wiener HPA model is then used to design the Hammerstein predistorter. In particular, the inverse of the amplitude distortion of the HPA's static nonlinearity can be calculated effectively using the Newton-Raphson formula based on the inverse of De Boor algorithm. A major advantage of this approach is that both the Wiener HPA identification and the Hammerstein predistorter inverse can be achieved very efficiently and accurately. Simulation results obtained are presented to demonstrate the effectiveness of this novel digital predistorter design.
Resumo:
Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.
Resumo:
Instrumentation and automation plays a vital role to managing the water industry. These systems generate vast amounts of data that must be effectively managed in order to enable intelligent decision making. Time series data management software, commonly known as data historians are used for collecting and managing real-time (time series) information. More advanced software solutions provide a data infrastructure or utility wide Operations Data Management System (ODMS) that stores, manages, calculates, displays, shares, and integrates data from multiple disparate automation and business systems that are used daily in water utilities. These ODMS solutions are proven and have the ability to manage data from smart water meters to the collaboration of data across third party corporations. This paper focuses on practical, utility successes in the water industry where utility managers are leveraging instantaneous access to data from proven, commercial off-the-shelf ODMS solutions to enable better real-time decision making. Successes include saving $650,000 / year in water loss control, safeguarding water quality, saving millions of dollars in energy management and asset management. Immediate opportunities exist to integrate the research being done in academia with these ODMS solutions in the field and to leverage these successes to utilities around the world.