986 resultados para Pattern matching


Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

When augmented with the longest common prefix (LCP) array and some other structures, the suffix array can solve many string processing problems in optimal time and space. A compressed representation of the LCP array is also one of the main building blocks in many compressed suffix tree proposals. In this paper, we describe a new compressed LCP representation: the sampled LCP array. We show that when used with a compressed suffix array (CSA), the sampled LCP array often offers better time/space trade-offs than the existing alternatives. We also show how to construct the compressed representations of the LCP array directly from a CSA

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This doctoral dissertation takes a buy side perspective to third-party logistics (3PL) providers’ service tiering by applying a linear serial dyadic view to transactions. It takes its point of departure not only from the unalterable focus on the dyad levels as units of analysis and how to manage them, but also the characteristics both creating and determining purposeful conditions for a longer duration. A conceptual framework is proposed and evaluated on its ability to capture logistics service buyers’ perceptions of service tiering. The problem discussed is in the theoretical context of logistics and reflects value appropriation, power dependencies, visibility in linear serial dyads, a movement towards the more market governed modes of transactions (i.e. service tiering) and buyers’ risk perception of broader utilisation of the logistics services market. Service tiering, in a supply chain setting, with the lack of multilateral agreements between supply chain members, is new. The deductive research approach applied, in which theoretically based propositions are empirically tested with quantitative and qualitative data, provides new insight into (contractual) transactions in 3PL. The study findings imply that the understanding of power dependencies and supply chain dynamics in a 3PL context is still in its infancy. The issues found include separation of service responsibilities, supply chain visibility, price-making behaviour and supply chain strategies under changing circumstances or influence of non-immediate supply chain actors. Understanding (or failing to understand) these issues may mean remarkable implications for the industry. Thus, the contingencies may trigger more open-book policies, larger liability scope of 3PL service providers or insourcing of critical logistics activities from the first-tier buyer core business and customer service perspectives. In addition, a sufficient understanding of the issues surrounding service tiering enables proactive responses to devise appropriate supply chain strategies. The author concludes that qualitative research designs, facilitating data collection on multiple supply chain actors, may capture and increase understanding of the impact of broader supply chain strategies. This would enable pattern-matching through an examination of two or more sides of exchange transactions to measure relational symmetries across linear serial dyads. Indeed, the performance of the firm depends not only on how efficiently it cooperates with its partners, but also on how well exchange partners cooperate with an organisation’s own business.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Over past few years, the studies of cultured neuronal networks have opened up avenues for understanding the ion channels, receptor molecules, and synaptic plasticity that may form the basis of learning and memory. The hippocampal neurons from rats are dissociated and cultured on a surface containing a grid of 64 electrodes. The signals from these 64 electrodes are acquired using a fast data acquisition system MED64 (Alpha MED Sciences, Japan) at a sampling rate of 20 K samples with a precision of 16-bits per sample. A few minutes of acquired data runs in to a few hundreds of Mega Bytes. The data processing for the neural analysis is highly compute-intensive because the volume of data is huge. The major processing requirements are noise removal, pattern recovery, pattern matching, clustering and so on. In order to interface a neuronal colony to a physical world, these computations need to be performed in real-time. A single processor such as a desk top computer may not be adequate to meet this computational requirements. Parallel computing is a method used to satisfy the real-time computational requirements of a neuronal system that interacts with an external world while increasing the flexibility and scalability of the application. In this work, we developed a parallel neuronal system using a multi-node Digital Signal processing system. With 8 processors, the system is able to compute and map incoming signals segmented over a period of 200 ms in to an action in a trained cluster system in real time.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Fragment Finder 2.0 is a web-based interactive computing server which can be used to retrieve structurally similar protein fragments from 25 and 90% nonredundant data sets. The computing server identifies structurally similar fragments using the protein backbone C alpha angles. In addition, the identified fragments can be superimposed using either of the two structural superposition programs, STAMP and PROFIT, provided in the server. The freely available Java plug-in Jmol has been interfaced with the server for the visualization of the query and superposed fragments. The server is the updated version of a previously developed search engine and employs an in-house-developed fast pattern matching algorithm. This server can be accessed freely over the World Wide Web through the URL http://cluster.physics.iisc.ernet.in/ff/.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better. Results: Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions. Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family. Conclusions: CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

VODIS II, a research system in which recognition is based on the conventional one-pass connected-word algorithm extended in two ways, is described. Syntactic constraints can now be applied directly via context-free-grammar rules, and the algorithm generates a lattice of candidate word matches rather than a single globally optimal sequence. This lattice is then processed by a chart parser and an intelligent dialogue controller to obtain the most plausible interpretations of the input. A key feature of the VODIS II architecture is that the concept of an abstract word model allows the system to be used with different pattern-matching technologies and hardware. The current system implements the word models on a real-time dynamic-time-warping recognizer.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A parallel processing network derived from Kanerva's associative memory theory Kanerva 1984 is shown to be able to train rapidly on connected speech data and recognize further speech data with a label error rate of 0·68%. This modified Kanerva model can be trained substantially faster than other networks with comparable pattern discrimination properties. Kanerva presented his theory of a self-propagating search in 1984, and showed theoretically that large-scale versions of his model would have powerful pattern matching properties. This paper describes how the design for the modified Kanerva model is derived from Kanerva's original theory. Several designs are tested to discover which form may be implemented fastest while still maintaining versatile recognition performance. A method is developed to deal with the time varying nature of the speech signal by recognizing static patterns together with a fixed quantity of contextual information. In order to recognize speech features in different contexts it is necessary for a network to be able to model disjoint pattern classes. This type of modelling cannot be performed by a single layer of links. Network research was once held back by the inability of single-layer networks to solve this sort of problem, and the lack of a training algorithm for multi-layer networks. Rumelhart, Hinton & Williams 1985 provided one solution by demonstrating the "back propagation" training algorithm for multi-layer networks. A second alternative is used in the modified Kanerva model. A non-linear fixed transformation maps the pattern space into a space of higher dimensionality in which the speech features are linearly separable. A single-layer network may then be used to perform the recognition. The advantage of this solution over the other using multi-layer networks lies in the greater power and speed of the single-layer network training algorithm. © 1989.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

分析了目前网络上最流行的BM算法及其改进算法BMH,在此基础上提出了BMH算法的改进算法BMH2。考虑了模式串自身的特征,在原有移动距离数组的基础上增加一个新的移动数组,从而充分利用模式串特征进行更大距离的移动,使算法获得更高的效率。实验证明,改进后的算法能够增加"坏字符"方法的右移量,有效地提高匹配速率。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A Function Definition Language (FDL) is presented. Though designed for describing specifications, FDL is also a general-purpose functional programming language. It uses context-free language as data type, supports pattern matching definition of functions, offers several function definition forms, and is executable. It is shown that FDL has strong expressiveness, is easy to use and describes algorithms concisely and naturally. An interpreter of FDL is introduced. Experiments and discussion are included.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

C.H. Orgill, N.W. Hardy, M.H. Lee, and K.A.I. Sharpe. An application of a multiple agent system for flexible assemble tasks. In Knowledge based envirnments for industrial applications including cooperating expert systems in control. IEE London, 1989.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Murphy, L., Lewandowski, G., McCauley, R., Simon, B., Thomas, L., and Zander, C. 2008. Debugging: the good, the bad, and the quirky -- a qualitative analysis of novices' strategies. SIGCSE Bull. 40, 1 (Feb. 2008), 163-167

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Simulation of pedestrian evacuations of smart buildings in emergency is a powerful tool for building analysis, dynamic evacuation planning and real-time response to the evolving state of evacuations. Macroscopic pedestrian models are low-complexity models that are and well suited to algorithmic analysis and planning, but are quite abstract. Microscopic simulation models allow for a high level of simulation detail but can be computationally intensive. By combining micro- and macro- models we can use each to overcome the shortcomings of the other and enable new capability and applications for pedestrian evacuation simulation that would not be possible with either alone. We develop the EvacSim multi-agent pedestrian simulator and procedurally generate macroscopic flow graph models of building space, integrating micro- and macroscopic approaches to simulation of the same emergency space. By “coupling” flow graph parameters to microscopic simulation results, the graph model captures some of the higher detail and fidelity of the complex microscopic simulation model. The coupled flow graph is used for analysis and prediction of the movement of pedestrians in the microscopic simulation, and investigate the performance of dynamic evacuation planning in simulated emergencies using a variety of strategies for allocation of macroscopic evacuation routes to microscopic pedestrian agents. The predictive capability of the coupled flow graph is exploited for the decomposition of microscopic simulation space into multiple future states in a scalable manner. By simulating multiple future states of the emergency in short time frames, this enables sensing strategy based on simulation scenario pattern matching which we show to achieve fast scenario matching, enabling rich, real-time feedback in emergencies in buildings with meagre sensing capabilities.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

BACKGROUND: Despite the impact of hypertension and widely accepted target values for blood pressure (BP), interventions to improve BP control have had limited success. OBJECTIVES: We describe the design of a 'translational' study that examines the implementation, impact, sustainability, and cost of an evidence-based nurse-delivered tailored behavioral self-management intervention to improve BP control as it moves from a research context to healthcare delivery. The study addresses four specific aims: assess the implementation of an evidence-based behavioral self-management intervention to improve BP levels; evaluate the clinical impact of the intervention as it is implemented; assess organizational factors associated with the sustainability of the intervention; and assess the cost of implementing and sustaining the intervention. METHODS: The project involves three geographically diverse VA intervention facilities and nine control sites. We first conduct an evaluation of barriers and facilitators for implementing the intervention at intervention sites. We examine the impact of the intervention by comparing 12-month pre/post changes in BP control between patients in intervention sites versus patients in the matched control sites. Next, we examine the sustainability of the intervention and organizational factors facilitating or hindering the sustained implementation. Finally, we examine the costs of intervention implementation. Key outcomes are acceptability and costs of the program, as well as changes in BP. Outcomes will be assessed using mixed methods (e.g., qualitative analyses--pattern matching; quantitative methods--linear mixed models). DISCUSSION: The study results will provide information about the challenges and costs to implement and sustain the intervention, and what clinical impact can be expected.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Based on an algorithm for pattern matching in character strings, we implement a pattern matching machine that searches for occurrences of patterns in multidimensional time series. Before the search process takes place, time series are encoded in user-designed alphabets. The patterns, on the other hand, are formulated as regular expressions that are composed of letters from these alphabets and operators. Furthermore, we develop a genetic algorithm to breed patterns that maximize a user-defined fitness function. In an application to financial data, we show that patterns bred to predict high exchange rates volatility in training samples retain statistically significant predictive power in validation samples.