48 resultados para data-driven simulation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider a random design model based on independent and identically distributed (iid) pairs of observations (Xi, Yi), where the regression function m(x) is given by m(x) = E(Yi|Xi = x) with one independent variable. In a nonparametric setting the aim is to produce a reasonable approximation to the unknown function m(x) when we have no precise information about the form of the true density, f(x) of X. We describe an estimation procedure of non-parametric regression model at a given point by some appropriately constructed fixed-width (2d) confidence interval with the confidence coefficient of at least 1−. Here, d(> 0) and 2 (0, 1) are two preassigned values. Fixed-width confidence intervals are developed using both Nadaraya-Watson and local linear kernel estimators of nonparametric regression with data-driven bandwidths.

The sample size was optimized using the purely and two-stage sequential procedure together with asymptotic properties of the Nadaraya-Watson and local linear estimators. A large scale simulation study was performed to compare their coverage accuracy. The numerical results indicate that the confidence bands based on the local linear estimator have the best performance than those constructed by using Nadaraya-Watson estimator. However both estimators are shown to have asymptotically correct coverage properties.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We consider a random design model based on independent and identically distributed pairs of observations (Xi, Yi), where the regression function m(x) is given by m(x) = E(Yi|Xi = x) with one independent variable. In a nonparametric setting the aim is to produce a reasonable approximation to the unknown function m(x) when we have no precise information about the form of the true density, f(x) of X. We describe an estimation procedure of non-parametric regression model at a given point by some appropriately constructed fixed-width (2d) confidence interval with the confidence coefficient of at least 1−. Here, d(> 0) and 2 (0, 1) are two preassigned values. Fixed-width confidence intervals are developed using both Nadaraya-Watson and local linear kernel estimators of nonparametric regression with data-driven bandwidths. The sample size was optimized using the purely and two-stage sequential procedures together with asymptotic properties of the Nadaraya-Watson and local linear estimators. A large scale simulation study was performed to compare their coverage accuracy. The numerical results indicate that the confi dence bands based on the local linear estimator have the better performance than those constructed by using Nadaraya-Watson estimator. However both estimators are shown to have asymptotically correct coverage properties.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Precise and reliable modelling of polymerization reactor is challenging due to its complex reaction mechanism and non-linear nature. Researchers often make several assumptions when deriving theories and developing models for polymerization reactor. Therefore, traditional available models suffer from high prediction error. In contrast, data-driven modelling techniques provide a powerful framework to describe the dynamic behaviour of polymerization reactor. However, the traditional NN prediction performance is significantly dropped in the presence of polymerization process disturbances. Besides, uncertainty effects caused by disturbances present in reactor operation can be properly quantified through construction of prediction intervals (PIs) for model outputs. In this study, we propose and apply a PI-based neural network (PI-NN) model for the free radical polymerization system. This strategy avoids assumptions made in traditional modelling techniques for polymerization reactor system. Lower upper bound estimation (LUBE) method is used to develop PI-NN model for uncertainty quantification. To further improve the quality of model, a new method is proposed for aggregation of upper and lower bounds of PIs obtained from individual PI-NN models. Simulation results reveal that combined PI-NN performance is superior to those individual PI-NN models in terms of PI quality. Besides, constructed PIs are able to properly quantify effects of uncertainties in reactor operation, where these can be later used as part of the control process. © 2014 Taiwan Institute of Chemical Engineers.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Despite several years of research, type reduction (TR) operation in interval type-2 fuzzy logic system (IT2FLS) cannot perform as fast as a type-1 defuzzifier. In particular, widely used Karnik-Mendel (KM) TR algorithm is computationally much more demanding than alternative TR approaches. In this work, a data driven framework is proposed to quickly, yet accurately, estimate the output of the KM TR algorithm using simple regression models. Comprehensive simulation performed in this study shows that the centroid end-points of KM algorithm can be approximated with a mean absolute percentage error as low as 0.4%. Also, switch point prediction accuracy can be as high as 100%. In conjunction with the fact that simple regression model can be trained with data generated using exhaustive defuzzification method, this work shows the potential of proposed method to provide highly accurate, yet extremely fast, TR approximation method. Speed of the proposed method should theoretically outperform all available TR methods while keeping the uncertainty information intact in the process.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Karnik-Mendel (KM) algorithm is the most used and researched type reduction (TR) algorithm in literature. This algorithm is iterative in nature and despite consistent long term effort, no general closed form formula has been found to replace this computationally expensive algorithm. In this research work, we demonstrate that the outcome of KM algorithm can be approximated by simple linear regression techniques. Since most of the applications will have a fixed range of inputs with small scale variations, it is possible to handle those complexities in design phase and build a fuzzy logic system (FLS) with low run time computational burden. This objective can be well served by the application of regression techniques. This work presents an overview of feasibility of regression techniques for design of data-driven type reducers while keeping the uncertainty bound in FLS intact. Simulation results demonstrates the approximation error is less than 2%. Thus our work preserve the essence of Karnik-Mendel algorithm and serves the requirement of low
computational complexities.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Karnik-Mendel (KM) algorithm is the most widely used type reduction (TR) method in literature for the design of interval type-2 fuzzy logic systems (IT2FLS). Its iterative nature for finding left and right switch points is its Achilles heel. Despite a decade of research, none of the alternative TR methods offer uncertainty measures equivalent to KM algorithm. This paper takes a data-driven approach to tackle the computational burden of this algorithm while keeping its key features. We propose a regression method to approximate left and right switch points found by KM algorithm. Approximator only uses the firing intervals, rnles centroids, and FLS strnctural features as inputs. Once training is done, it can precisely approximate the left and right switch points through basic vector multiplications. Comprehensive simulation results demonstrate that the approximation accuracy for a wide variety of FLSs is 100%. Flexibility, ease of implementation, and speed are other features of the proposed method.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Extracting knowledge from the transaction records and the personal data of credit card holders has great profit potential for the banking industry. The challenge is to detect/predict bankrupts and to keep and recruit the profitable customers. However, grouping and targeting credit card customers by traditional data-driven mining often does not directly meet the needs of the banking industry, because data-driven mining automatically generates classification outputs that are imprecise, meaningless, and beyond users' control. In this paper, we provide a novel domain-driven classification method that takes advantage of multiple criteria and multiple constraint-level programming for intelligent credit scoring. The method involves credit scoring to produce a set of customers' scores that allows the classification results actionable and controllable by human interaction during the scoring process. Domain knowledge and experts' experience parameters are built into the criteria and constraint functions of mathematical programming and the human and machine conversation is employed to generate an efficient and precise solution. Experiments based on various data sets validated the effectiveness and efficiency of the proposed methods. © 2006 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this paper we develop a data-driven weight learning method for weighted quasi-arithmetic means where the observed data may vary in dimension.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although the development of geographic information system (GIS) technology and digital data manipulation techniques has enabled practitioners in the geographical and geophysical sciences to make more efficient use of resource information, many of the methods used in forming spatial prediction models are still inherently based on traditional techniques of map stacking in which layers of data are combined under the guidance of a theoretical domain model. This paper describes a data-driven approach by which Artificial Neural Networks (ANNs) can be trained to represent a function characterising the probability that an instance of a discrete event, such as the presence of a mineral deposit or the sighting of an endangered animal species, will occur over some grid element of the spatial area under consideration. A case study describes the application of the technique to the task of mineral prospectivity mapping in the Castlemaine region of Victoria using a range of geological, geophysical and geochemical input variables. Comparison of the maps produced using neural networks with maps produced using a density estimation-based technique demonstrates that the maps can reliably be interpreted as representing probabilities. However, while the neural network model and the density estimation-based model yield similar results under an appropriate choice of values for the respective parameters, the neural network approach has several advantages, especially in high dimensional input spaces.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Mineral potential mapping is the process of combining a set of input maps, each representing a distinct geo-scientific variable, to produce a single map which ranks areas according to their potential to host deposits of a particular type. The maps are combined using a mapping function which must be either provided by an expert (knowledge-driven approach), or induced from sample data (data-driven approach). Current data-driven approaches using multilayer perceptrons (MLPs) to represent the mapping function have several inherent problems: they rely heavily on subjective judgment in selecting training data and are highly sensitive to this selection; they do not utilize the contextual information provided by unlabeled data; and, there is no objective interpretation of the values output by the MLP. This paper presents a novel approach which overcomes these three problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the design and evaluation of an efficient per-to-peer (P2P) web cache indexing and lookup system, which can be used to integrate the resources of locally available web pages into globally addressable index using a distributed hash table. The salient feature of the indexing system’s design is the efficient dissemination of cache index information using a next-url index which allows cache clients to determine ahead of time whether linked content is also available at a remote cache. In addition, conventional optimizations such as in browser caching and batching of index write requests are also used. These optimizations are evaluated using trace-driven simulation and the results show that these design trade-offs improve cache lookup performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The advent of commodity-based high-performance clusters has raised parallel and distributed computing to a new level. However, in order to achieve the best possible performance improvements for large-scale computing problems as well as good resource utilization, efficient resource management and scheduling is required. This paper proposes a new two-level adaptive space-sharing scheduling policy for non-dedicated heterogeneous commodity-based high-performance clusters. Using trace-driven simulation, the performance of the proposed scheduling policy is compared with existing adaptive space-sharing policies. Results of the simulation show that the proposed policy performs substantially better than the existing policies.