893 resultados para data driven approach


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Enterprises are increasingly using a wide range of heterogeneous information systems for executing and governing their business activities. Even if the adoption of service orientation has improved loose coupling and reusability, applications are still isolated data silos whose integration requires complex transformations and mediations. However, by leveraging Linked Data principles those data silos can now be seamlessly integrated, and this opens the door to new data-driven approaches for Enterprise Application Integration (EAI). In this paper we present LDP4j, an open souce Java-based framework for the development of interoperable read-write Linked Data applications, based on the W3C Linked Data Platform (LDP) specification.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Model-driven Engineering (MDE) approaches are often acknowledged to improve the maintainability of the resulting applications. However, there is a scarcity of empirical evidence that backs their claimed benefits and limitations with respect to code-centric approaches. The purpose of this paper is to compare the performance and satisfaction of junior software maintainers while executing maintainability tasks on Web applications with two different development approaches, one being OOH4RIA, a model-driven approach, and the other being a code-centric approach based on Visual Studio .NET and the Agile Unified Process. We have conducted a quasi-experiment with 27 graduated students from the University of Alicante. They were randomly divided into two groups, and each group was assigned to a different Web application on which they performed a set of maintainability tasks. The results show that maintaining Web applications with OOH4RIA clearly improves the performance of subjects. It also tips the satisfaction balance in favor of OOH4RIA, although not significantly. Model-driven development methods seem to improve both the developers’ objective performance and subjective opinions on ease of use of the method. This notwithstanding, further experimentation is needed to be able to generalize the results to different populations, methods, languages and tools, different domains and different application sizes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this study, a methodology based in a dynamical framework is proposed to incorporate additional sources of information to normalized difference vegetation index (NDVI) time series of agricultural observations for a phenological state estimation application. The proposed implementation is based on the particle filter (PF) scheme that is able to integrate multiple sources of data. Moreover, the dynamics-led design is able to conduct real-time (online) estimations, i.e., without requiring to wait until the end of the campaign. The evaluation of the algorithm is performed by estimating the phenological states over a set of rice fields in Seville (SW, Spain). A Landsat-5/7 NDVI series of images is complemented with two distinct sources of information: SAR images from the TerraSAR-X satellite and air temperature information from a ground-based station. An improvement in the overall estimation accuracy is obtained, especially when the time series of NDVI data is incomplete. Evaluations on the sensitivity to different development intervals and on the mitigation of discontinuities of the time series are also addressed in this work, demonstrating the benefits of this data fusion approach based on the dynamic systems.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-06

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis work we develop a new generative model of social networks belonging to the family of Time Varying Networks. The importance of correctly modelling the mechanisms shaping the growth of a network and the dynamics of the edges activation and inactivation are of central importance in network science. Indeed, by means of generative models that mimic the real-world dynamics of contacts in social networks it is possible to forecast the outcome of an epidemic process, optimize the immunization campaign or optimally spread an information among individuals. This task can now be tackled taking advantage of the recent availability of large-scale, high-quality and time-resolved datasets. This wealth of digital data has allowed to deepen our understanding of the structure and properties of many real-world networks. Moreover, the empirical evidence of a temporal dimension in networks prompted the switch of paradigm from a static representation of graphs to a time varying one. In this work we exploit the Activity-Driven paradigm (a modeling tool belonging to the family of Time-Varying-Networks) to develop a general dynamical model that encodes fundamental mechanism shaping the social networks' topology and its temporal structure: social capital allocation and burstiness. The former accounts for the fact that individuals does not randomly invest their time and social interactions but they rather allocate it toward already known nodes of the network. The latter accounts for the heavy-tailed distributions of the inter-event time in social networks. We then empirically measure the properties of these two mechanisms from seven real-world datasets and develop a data-driven model, analytically solving it. We then check the results against numerical simulations and test our predictions with real-world datasets, finding a good agreement between the two. Moreover, we find and characterize a non-trivial interplay between burstiness and social capital allocation in the parameters phase space. Finally, we present a novel approach to the development of a complete generative model of Time-Varying-Networks. This model is inspired by the Kaufman's adjacent possible theory and is based on a generalized version of the Polya's urn. Remarkably, most of the complex and heterogeneous feature of real-world social networks are naturally reproduced by this dynamical model, together with many high-order topological properties (clustering coefficient, community structure etc.).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We describe a method of recognizing handwritten digits by fitting generative models that are built from deformable B-splines with Gaussian ``ink generators'' spaced along the length of the spline. The splines are adjusted using a novel elastic matching procedure based on the Expectation Maximization (EM) algorithm that maximizes the likelihood of the model generating the data. This approach has many advantages. (1) After identifying the model most likely to have generated the data, the system not only produces a classification of the digit but also a rich description of the instantiation parameters which can yield information such as the writing style. (2) During the process of explaining the image, generative models can perform recognition driven segmentation. (3) The method involves a relatively small number of parameters and hence training is relatively easy and fast. (4) Unlike many other recognition schemes it does not rely on some form of pre-normalization of input images, but can handle arbitrary scalings, translations and a limited degree of image rotation. We have demonstrated our method of fitting models to images does not get trapped in poor local minima. The main disadvantage of the method is it requires much more computation than more standard OCR techniques.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We analyse how the Generative Topographic Mapping (GTM) can be modified to cope with missing values in the training data. Our approach is based on an Expectation -Maximisation (EM) method which estimates the parameters of the mixture components and at the same time deals with the missing values. We incorporate this algorithm into a hierarchical GTM. We verify the method on a toy data set (using a single GTM) and a realistic data set (using a hierarchical GTM). The results show our algorithm can help to construct informative visualisation plots, even when some of the training points are corrupted with missing values.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The thesis presents an account of an attempt to utilize expert systems within the domain of production planning and control. The use of expert systems was proposed due to the problematical nature of a particular function within British Steel Strip Products' Operations Department: the function of Order Allocation, allocating customer orders to a production week and site. Approaches to tackling problems within production planning and control are reviewed, as are the general capabilities of expert systems. The conclusions drawn are that the domain of production planning and control contains both `soft' and `hard' problems, and that while expert systems appear to be a useful technology for this domain, this usefulness has by no means yet been demonstrated. Also, it is argued that the main stream methodology for developing expert systems is unsuited for the domain. A problem-driven approach is developed and used to tackle the Order Allocation function. The resulting system, UAAMS, contained two expert components. One of these, the scheduling procedure was not fully implemented due to inadequate software. The second expert component, the product routing procedure, was untroubled by such difficulties, though it was unusable on its own; thus a second system was developed. This system, MICRO-X10, duplicated the function of X10, a complex database query routine used daily by Order Allocation. A prototype version of MICRO-X10 proved too slow to be useful but allowed implementation and maintenance issues to be analysed. In conclusion, the usefulness of the problem-driven approach to expert systems development within production planning and control is demonstrated but restrictions imposed by current expert system software are highlighted in that the abilities of such software to cope with `hard' scheduling constructs and also the slow processing speeds of such software can restrict the current usefulness of expert systems within production planning and control.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This thesis addresses the problem of information hiding in low dimensional digital data focussing on issues of privacy and security in Electronic Patient Health Records (EPHRs). The thesis proposes a new security protocol based on data hiding techniques for EPHRs. This thesis contends that embedding of sensitive patient information inside the EPHR is the most appropriate solution currently available to resolve the issues of security in EPHRs. Watermarking techniques are applied to one-dimensional time series data such as the electroencephalogram (EEG) to show that they add a level of confidence (in terms of privacy and security) in an individual’s diverse bio-profile (the digital fingerprint of an individual’s medical history), ensure belief that the data being analysed does indeed belong to the correct person, and also that it is not being accessed by unauthorised personnel. Embedding information inside single channel biomedical time series data is more difficult than the standard application for images due to the reduced redundancy. A data hiding approach which has an in built capability to protect against illegal data snooping is developed. The capability of this secure method is enhanced by embedding not just a single message but multiple messages into an example one-dimensional EEG signal. Embedding multiple messages of similar characteristics, for example identities of clinicians accessing the medical record helps in creating a log of access while embedding multiple messages of dissimilar characteristics into an EPHR enhances confidence in the use of the EPHR. The novel method of embedding multiple messages of both similar and dissimilar characteristics into a single channel EEG demonstrated in this thesis shows how this embedding of data boosts the implementation and use of the EPHR securely.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many tests of financial contagion require a definition of the dates separating calm from crisis periods. We propose to use a battery of break search procedures for individual time series to objectively identify potential break dates in relationships between countries. Applied to the biggest European stock markets and combined with two well established tests for financial contagion, this approach results in break dates which correctly identify the timing of changes in cross-country transmission mechanisms. Application of break search procedures breathes new life into the established contagion tests, allowing for an objective, data-driven timing of crisis periods.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper demonstrates that the conventional approach of using official liberalisation dates as the only existing breakdates could lead to inaccurate conclusions as to the effect of the underlying liberalisation policies. It also proposes an alternative paradigm for obtaining more robust estimates of volatility changes around official liberalisation dates and/or other important market events. By focusing on five East Asian emerging markets, all of which liberalised their financial markets in the late, and by using recent advances in the econometrics of structural change, it shows that (i) the detected breakdates in the volatility of stock market returns can be dramatically different to official liberalisation dates and (ii) the use of official liberalisation dates as breakdates can readily entail inaccurate inference. In contrast, the use of data-driven techniques for the detection of multiple structural changes leads to a richer and inevitably more accurate pattern of volatility evolution emerges in comparison with focussing on official liberalisation dates.