21 resultados para File processing (Computer science)

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Complex Event processing (CEP) has emerged over the last ten years. CEP systems are outstanding in processing large amount of data and responding in a timely fashion. While CEP applications are fast growing, performance management in this area has not gain much attention. It is critical to meet the promised level of service for both system designers and users. In this paper, we present a benchmark for complex event processing systems: CEPBen. The CEPBen benchmark is designed to evaluate CEP functional behaviours, i.e., filtering, transformation and event pattern detection and provides a novel methodology of evaluating the performance of CEP systems. A performance study by running the CEPBen on Esper CEP engine is described and discussed. The results obtained from performance tests demonstrate the influences of CEP functional behaviours on the system performance. © 2014 Springer International Publishing Switzerland.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A major problem in modern probabilistic modeling is the huge computational complexity involved in typical calculations with multivariate probability distributions when the number of random variables is large. Because exact computations are infeasible in such cases and Monte Carlo sampling techniques may reach their limits, there is a need for methods that allow for efficient approximate computations. One of the simplest approximations is based on the mean field method, which has a long history in statistical physics. The method is widely used, particularly in the growing field of graphical models. Researchers from disciplines such as statistical physics, computer science, and mathematical statistics are studying ways to improve this and related methods and are exploring novel application areas. Leading approaches include the variational approach, which goes beyond factorizable distributions to achieve systematic improvements; the TAP (Thouless-Anderson-Palmer) approach, which incorporates correlations by including effective reaction terms in the mean field theory; and the more general methods of graphical models. Bringing together ideas and techniques from these diverse disciplines, this book covers the theoretical foundations of advanced mean field methods, explores the relation between the different approaches, examines the quality of the approximation obtained, and demonstrates their application to various areas of probabilistic modeling.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recently, there has been a considerable research activity in extending topographic maps of vectorial data to more general data structures, such as sequences or trees. However, the representational capabilities and internal representations of the models are not well understood. We rigorously analyze a generalization of the Self-Organizing Map (SOM) for processing sequential data, Recursive SOM (RecSOM [1]), as a non-autonomous dynamical system consisting off a set of fixed input maps. We show that contractive fixed input maps are likely to produce Markovian organizations of receptive fields o the RecSOM map. We derive bounds on parameter $\beta$ (weighting the importance of importing past information when processing sequences) under which contractiveness of the fixed input maps is guaranteed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the prototype tool CADS* for the computer-aided development of an important class of self-* systems, namely systems whose components can be modelled as Markov chains. Given a Markov chain representation of the IT components to be included into a self-* system, CADS* automates or aids (a) the development of the artifacts necessary to build the self-* system; and (b) their integration into a fully-operational self-* solution. This is achieved through a combination of formal software development techniques including model transformation, model-driven code generation and dynamic software reconfiguration.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We introduce a type of 2-tier convolutional neural network model for learning distributed paragraph representations for a special task (e.g. paragraph or short document level sentiment analysis and text topic categorization). We decompose the paragraph semantics into 3 cascaded constitutes: word representation, sentence composition and document composition. Specifically, we learn distributed word representations by a continuous bag-of-words model from a large unstructured text corpus. Then, using these word representations as pre-trained vectors, distributed task specific sentence representations are learned from a sentence level corpus with task-specific labels by the first tier of our model. Using these sentence representations as distributed paragraph representation vectors, distributed paragraph representations are learned from a paragraph-level corpus by the second tier of our model. It is evaluated on DBpedia ontology classification dataset and Amazon review dataset. Empirical results show the effectiveness of our proposed learning model for generating distributed paragraph representations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper summarizes the scientific work presented at the 32nd European Conference on Information Retrieval. It demonstrates that information retrieval (IR) as a research area continues to thrive with progress being made in three complementary sub-fields, namely IR theory and formal methods together with indexing and query representation issues, furthermore Web IR as a primary application area and finally research into evaluation methods and metrics. It is the combination of these areas that gives IR its solid scientific foundations. The paper also illustrates that significant progress has been made in other areas of IR. The keynote speakers addressed three such subject fields, social search engines using personalization and recommendation technologies, the renewed interest in applying natural language processing to IR, and multimedia IR as another fast-growing area.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper aims to identify the communication goal(s) of a user's information-seeking query out of a finite set of within-domain goals in natural language queries. It proposes using Tree-Augmented Naive Bayes networks (TANs) for goal detection. The problem is formulated as N binary decisions, and each is performed by a TAN. Comparative study has been carried out to compare the performance with Naive Bayes, fully-connected TANs, and multi-layer neural networks. Experimental results show that TANs consistently give better results when tested on the ATIS and DARPA Communicator corpora.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Clinical Decision Support Systems (CDSSs) need to disseminate expertise in formats that suit different end users and with functionality tuned to the context of assessment. This paper reports research into a method for designing and implementing knowledge structures that facilitate the required flexibility. A psychological model of expertise is represented using a series of formally specified and linked XML trees that capture increasing elements of the model, starting with hierarchical structuring, incorporating reasoning with uncertainty, and ending with delivering the final CDSS. The method was applied to the Galatean Risk and Safety Tool, GRiST, which is a web-based clinical decision support system (www.egrist.org) for assessing mental-health risks. Results of its clinical implementation demonstrate that the method can produce a system that is able to deliver expertise targetted and formatted for specific patient groups, different clinical disciplines, and alternative assessment settings. The approach may be useful for developing other real-world systems using human expertise and is currently being applied to a logistics domain. © 2013 Polish Information Processing Society.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The inference and optimization in sparse graphs with real variables is studied using methods of statistical mechanics. Efficient distributed algorithms for the resource allocation problem are devised. Numerical simulations show excellent performance and full agreement with the theoretical results. © Springer-Verlag Berlin Heidelberg 2006.