988 resultados para finite-state transducer


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Many evolutionary algorithm applications involve either fitness functions with high time complexity or large dimensionality (hence very many fitness evaluations will typically be needed) or both. In such circumstances, there is a dire need to tune various features of the algorithm well so that performance and time savings are optimized. However, these are precisely the circumstances in which prior tuning is very costly in time and resources. There is hence a need for methods which enable fast prior tuning in such cases. We describe a candidate technique for this purpose, in which we model a landscape as a finite state machine, inferred from preliminary sampling runs. In prior algorithm-tuning trials, we can replace the 'real' landscape with the model, enabling extremely fast tuning, saving far more time than was required to infer the model. Preliminary results indicate much promise, though much work needs to be done to establish various aspects of the conditions under which it can be most beneficially used. A main limitation of the method as described here is a restriction to mutation-only algorithms, but there are various ways to address this and other limitations.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper deals with a stochastic stability concept for discrete-time Markovian jump linear systems. The random jump parameter is associated to changes between the system operation modes due to failures or repairs, which can be well described by an underlying finite-state Markov chain. In the model studied, a fixed number of failures or repairs is allowed, after which, the system is brought to a halt for maintenance or for replacement. The usual concepts of stochastic stability are related to pure infinite horizon problems, and are not appropriate in this scenario. A new stability concept is introduced, named stochastic tau-stability that is tailored to the present setting. Necessary and sufficient conditions to ensure the stochastic tau-stability are provided, and the almost sure stability concept associated with this class of processes is also addressed. The paper also develops equivalences among second order concepts that parallels the results for infinite horizon problems. (C) 2003 Elsevier B.V. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Since the 1950s, the theory of deterministic and nondeterministic finite automata (DFAs and NFAs, respectively) has been a cornerstone of theoretical computer science. In this dissertation, our main object of study is minimal NFAs. In contrast with minimal DFAs, minimal NFAs are computationally challenging: first, there can be more than one minimal NFA recognizing a given language; second, the problem of converting an NFA to a minimal equivalent NFA is NP-hard, even for NFAs over a unary alphabet. Our study is based on the development of two main theories, inductive bases and partials, which in combination form the foundation for an incremental algorithm, ibas, to find minimal NFAs. An inductive basis is a collection of languages with the property that it can generate (through union) each of the left quotients of its elements. We prove a fundamental characterization theorem which says that a language can be recognized by an n-state NFA if and only if it can be generated by an n-element inductive basis. A partial is an incompletely-specified language. We say that an NFA recognizes a partial if its language extends the partial, meaning that the NFA's behavior is unconstrained on unspecified strings; it follows that a minimal NFA for a partial is also minimal for its language. We therefore direct our attention to minimal NFAs recognizing a given partial. Combining inductive bases and partials, we generalize our characterization theorem, showing that a partial can be recognized by an n-state NFA if and only if it can be generated by an n-element partial inductive basis. We apply our theory to develop and implement ibas, an incremental algorithm that finds minimal partial inductive bases generating a given partial. In the case of unary languages, ibas can often find minimal NFAs of up to 10 states in about an hour of computing time; with brute-force search this would require many trillions of years.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

O desenvolvimento actual de aplicações paralelas com processamento intensivo (HPC - High Performance Computing) para alojamento em computadores organizados em Cluster baseia-se muito no modelo de passagem de mensagens, do qual é de realçar os esforços de definição de standards, por exemplo, MPI - Message - Passing Interface. Por outro lado, com a generalização do paradigma de programação orientado aos objectos para ambientes distribuídos (Java RMI, .NET Remoting), existe a possibilidade de considerar que a execução de uma aplicação, de processamento paralelo e intensivo, pode ser decomposta em vários fluxos de execução paralela, em que cada fluxo é constituído por uma ou mais tarefas executadas no contexto de objectos distribuídos. Normalmente, em ambientes baseados em objectos distribuídos, a especificação, controlo e sincronização dos vários fluxos de execução paralela, é realizada de forma explicita e codificada num programa principal (hard-coded), dificultando possíveis e necessárias modificações posteriores. No entanto, existem, neste contexto, trabalhos que propõem uma abordagem de decomposição, seguindo o paradigma de workflow com interacções entre as tarefas por, entre outras, data-flow, control-flow, finite - state - machine. Este trabalho consistiu em propor e explorar um modelo de execução, sincronização e controlo de múltiplas tarefas, que permita de forma flexível desenhar aplicações de processamento intensivo, tirando partido da execução paralela de tarefas em diferentes máquinas. O modelo proposto e consequente implementação, num protótipo experimental, permite: especificar aplicações usando fluxos de execução; submeter fluxos para execução e controlar e monitorizar a execução desses fluxos. As tarefas envolvidas nos fluxos de execução podem executar-se num conjunto de recursos distribuídos. As principais características a realçar no modelo proposto, são a expansibilidade e o desacoplamento entre as diferentes componentes envolvidas na execução dos fluxos de execução. São ainda descritos casos de teste que permitiram validar o modelo e o protótipo implementado. Tendo consciência da necessidade de continuar no futuro esta linha de investigação, este trabalho é um contributo para demonstrar que o paradigma de workflow é adequado para expressar e executar, de forma paralela e distribuída, aplicações complexas de processamento intensivo.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Amorphous glass/ZnO-Al/p(a-Si:H)/i(a-Si:H)/n(a-Si1-xCx:H)/Al imagers with different n-layer resistivities were produced by plasma enhanced chemical vapour deposition technique (PE-CVD). An image is projected onto the sensing element and leads to spatially confined depletion regions that can be readout by scanning the photodiode with a low-power modulated laser beam. The essence of the scheme is the analog readout, and the absence of semiconductor arrays or electrode potential manipulations to transfer the information coming from the transducer. The influence of the intensity of the optical image projected onto the sensor surface is correlated with the sensor output characteristics (sensitivity, linearity blooming, resolution and signal-to-noise ratio) are analysed for different material compositions (0.5 < x < 1). The results show that the responsivity and the spatial resolution are limited by the conductivity of the doped layers. An enhancement of one order of magnitude in the image intensity signal and on the spatial resolution are achieved at 0.2 mW cm(-2) light flux by decreasing the n-layer conductivity by the same amount. A physical model supported by electrical simulation gives insight into the image-sensing technique used.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We consider an optimal control problem with a deterministic finite horizon and state variable dynamics given by a Markov-switching jump–diffusion stochastic differential equation. Our main results extend the dynamic programming technique to this larger family of stochastic optimal control problems. More specifically, we provide a detailed proof of Bellman’s optimality principle (or dynamic programming principle) and obtain the corresponding Hamilton–Jacobi–Belman equation, which turns out to be a partial integro-differential equation due to the extra terms arising from the Lévy process and the Markov process. As an application of our results, we study a finite horizon consumption– investment problem for a jump–diffusion financial market consisting of one risk-free asset and one risky asset whose coefficients are assumed to depend on the state of a continuous time finite state Markov process. We provide a detailed study of the optimal strategies for this problem, for the economically relevant families of power utilities and logarithmic utilities.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Counter automata are more powerful versions of finite state automata where addition and subtraction operations are permitted on a set of n integer registers, called counters. We show that the word problem of Zn is accepted by a nondeterministic m-counter automaton if and only if m &= n.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

En este artículo se presentan los resultados y conclusiones del trabajo deinvestigación llevado a cabo sobre herramientas informáticas para representación de grafos de autómatas de estado finitos. El principal resultado de esta investigación es el desarrollo de una nueva herramienta, que permita dibujar el grafo de forma totalmente automática, partiendo de una tabla de transiciones donde se describe al autómata en cuestión.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sequential randomized prediction of an arbitrary binary sequence isinvestigated. No assumption is made on the mechanism of generating the bit sequence. The goal of the predictor is to minimize its relative loss, i.e., to make (almost) as few mistakes as the best ``expert'' in a fixed, possibly infinite, set of experts. We point out a surprising connection between this prediction problem and empirical process theory. First, in the special case of static (memoryless) experts, we completely characterize the minimax relative loss in terms of the maximum of an associated Rademacher process. Then we show general upper and lower bounds on the minimaxrelative loss in terms of the geometry of the class of experts. As main examples, we determine the exact order of magnitude of the minimax relative loss for the class of autoregressive linear predictors and for the class of Markov experts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose an endpoint detection system based on the use of several features extracted from each speech frame, followed by a robust classifier (i.e Adaboost and Bagging of decision trees, and a multilayer perceptron) and a finite state automata (FSA). We present results for four different classifiers. The FSA module consisted of a 4-state decision logic that filtered false alarms and false positives. We compare the use of four different classifiers in this task. The look ahead of the method that we propose was of 7 frames, which are the number of frames that maximized the accuracy of the system. The system was tested with real signals recorded inside a car, with signal to noise ratio that ranged from 6 dB to 30dB. Finally we present experimental results demonstrating that the system yields robust endpoint detection.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Advancements in high-throughput technologies to measure increasingly complex biological phenomena at the genomic level are rapidly changing the face of biological research from the single-gene single-protein experimental approach to studying the behavior of a gene in the context of the entire genome (and proteome). This shift in research methodologies has resulted in a new field of network biology that deals with modeling cellular behavior in terms of network structures such as signaling pathways and gene regulatory networks. In these networks, different biological entities such as genes, proteins, and metabolites interact with each other, giving rise to a dynamical system. Even though there exists a mature field of dynamical systems theory to model such network structures, some technical challenges are unique to biology such as the inability to measure precise kinetic information on gene-gene or gene-protein interactions and the need to model increasingly large networks comprising thousands of nodes. These challenges have renewed interest in developing new computational techniques for modeling complex biological systems. This chapter presents a modeling framework based on Boolean algebra and finite-state machines that are reminiscent of the approach used for digital circuit synthesis and simulation in the field of very-large-scale integration (VLSI). The proposed formalism enables a common mathematical framework to develop computational techniques for modeling different aspects of the regulatory networks such as steady-state behavior, stochasticity, and gene perturbation experiments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The article describes some concrete problems that were encountered when writing a two-level model of Mari morphology. Mari is an agglutinative Finno-Ugric language spoken in Russia by about 600 000 people. The work was begun in the 1980s on the basis of K. Koskenniemi’s Two-Level Morphology (1983), but in the latest stage R. Beesley’s and L. Karttunen’s Finite State Morphology (2003) was used. Many of the problems described in the article concern the inexplicitness of the rules in Mari grammars and the lack of information about the exact distribution of some suffixes, e.g. enclitics. The Mari grammars usually give complete paradigms for a few unproblematic verb stems, whereas the difficult or unclear forms of certain verbs are only superficially discussed. Another example of phenomena that are poorly described in grammars is the way suffixes with an initial sibilant combine to stems ending in a sibilant. The help of informants and searches from electronic corpora were used to overcome such difficulties in the development of the two-level model of Mari. The variation of the order of plural markers, case suffixes and possessive suffixes is a typical feature of Mari. The morphotactic rules constructed for Mari declensional forms tend to be recursive and their productivity must be limited by some technical device, such as filters. In the present model, certain plural markers were treated like nouns. The positional and functional versatility of the possessive suffixes can be regarded as the most challenging phenomenon in attempts to formalize the Mari morphology. Cyrillic orthography, which was used in the model, also caused problems. For instance, a Cyrillic letter may represent a sequence of two sounds, the first being part of the word stem while the other belongs to a suffix. In some cases, letters for voiced consonants are also generalized to represent voiceless consonants. Such orthographical conventions distance a morphological model based on orthography from the actual (morpho)phonological processes in the language.