226 resultados para lazy parsing


Relevância:

10.00% 10.00%

Publicador:

Resumo:

For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Conventional structured methods of software engineering are often based on the use of functional decomposition coupled with the Waterfall development process model. This approach is argued to be inadequate for coping with the evolutionary nature of large software systems. Alternative development paradigms, including the operational paradigm and the transformational paradigm, have been proposed to address the inadequacies of this conventional view of software developement, and these are reviewed. JSD is presented as an example of an operational approach to software engineering, and is contrasted with other well documented examples. The thesis shows how aspects of JSD can be characterised with reference to formal language theory and automata theory. In particular, it is noted that Jackson structure diagrams are equivalent to regular expressions and can be thought of as specifying corresponding finite automata. The thesis discusses the automatic transformation of structure diagrams into finite automata using an algorithm adapted from compiler theory, and then extends the technique to deal with areas of JSD which are not strictly formalisable in terms of regular languages. In particular, an elegant and novel method for dealing with so called recognition (or parsing) difficulties is described,. Various applications of the extended technique are described. They include a new method of automatically implementing the dismemberment transformation; an efficient way of implementing inversion in languages lacking a goto-statement; and a new in-the-large implementation strategy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We address the question of how to communicate among distributed processes valuessuch as real numbers, continuous functions and geometrical solids with arbitrary precision, yet efficiently. We extend the established concept of lazy communication using streams of approximants by introducing explicit queries. We formalise this approach using protocols of a query-answer nature. Such protocols enable processes to provide valid approximations with certain accuracy and focusing on certain locality as demanded by the receiving processes through queries. A lattice-theoretic denotational semantics of channel and process behaviour is developed. Thequery space is modelled as a continuous lattice in which the top element denotes the query demanding all the information, whereas other elements denote queries demanding partial and/or local information. Answers are interpreted as elements of lattices constructed over suitable domains of approximations to the exact objects. An unanswered query is treated as an error anddenoted using the top element. The major novel characteristic of our semantic model is that it reflects the dependency of answerson queries. This enables the definition and analysis of an appropriate concept of convergence rate, by assigning an effort indicator to each query and a measure of information content to eachanswer. Thus we capture not only what function a process computes, but also how a process transforms the convergence rates from its inputs to its outputs. In future work these indicatorscan be used to capture further computational complexity measures. A robust prototype implementation of our model is available.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The biggest threat to any business is a lack of timely and accurate information. Without all the facts, businesses are pressured to make critical decisions and assess risks and opportunities based largely on guesswork, sometimes resulting in financial losses and missed opportunities. The meteoric rise of Databases (DB) appears to confirm the adage that “information is power”, but the stark reality is that information is useless if one has no way to find what one needs to know. It is more accurate perhaps to state that, “the ability to find information is power”. In this paper we show how Instantaneous Database Access System (IDAS) can make a crucial difference by pulling data together and allowing users to summarise information quickly from all areas of a business organisation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A general technique for transforming a timed finite state automaton into an equivalent automated planning domain based on a numerical parameter model is introduced. Timed transition automata have many applications in control systems and agents models; they are used to describe sequential processes, where actions are labelling by automaton transitions subject to temporal constraints. The language of timed words accepted by a timed automaton, the possible sequences of system or agent behaviour, can be described in term of an appropriate planning domain encapsulating the timed actions patterns and constraints. The time words recognition problem is then posed as a planning problem where the goal is to reach a final state by a sequence of actions, which corresponds to the timed symbols labeling the automaton transitions. The transformation is proved to be correct and complete and it is space/time linear on the automaton size. Experimental results shows that the performance of the planning domain obtained by transformation is scalable for real world applications. A major advantage of the planning based approach, beside of the solving the parsing problem, is to represent in a single automated reasoning framework problems of plan recognitions, plan synthesis and plan optimisation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

В статье рассмотрен формальный подход и основное содержание методологии формализованного проектирования.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Large read-only or read-write transactions with a large read set and a small write set constitute an important class of transactions used in such applications as data mining, data warehousing, statistical applications, and report generators. Such transactions are best supported with optimistic concurrency, because locking of large amounts of data for extended periods of time is not an acceptable solution. The abort rate in regular optimistic concurrency algorithms increases exponentially with the size of the transaction. The algorithm proposed in this dissertation solves this problem by using a new transaction scheduling technique that allows a large transaction to commit safely with significantly greater probability that can exceed several orders of magnitude versus regular optimistic concurrency algorithms. A performance simulation study and a formal proof of serializability and external consistency of the proposed algorithm are also presented.^ This dissertation also proposes a new query optimization technique (lazy queries). Lazy Queries is an adaptive query execution scheme which optimizes itself as the query runs. Lazy queries can be used to find an intersection of sub-queries in a very efficient way, which does not require full execution of large sub-queries nor does it require any statistical knowledge about the data.^ An efficient optimistic concurrency control algorithm used in a massively parallel B-tree with variable-length keys is introduced. B-trees with variable-length keys can be effectively used in a variety of database types. In particular, we show how such a B-tree was used in our implementation of a semantic object-oriented DBMS. The concurrency control algorithm uses semantically safe optimistic virtual "locks" that achieve very fine granularity in conflict detection. This algorithm ensures serializability and external consistency by using logical clocks and backward validation of transactional queries. A formal proof of correctness of the proposed algorithm is also presented. ^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Coral reefs are increasingly threatened by global and local anthropogenic stressors, such as rising seawater temperature and nutrient enrichment. These two stressors vary widely across the reef face and parsing out their influence on coral communities at reef system scales has been particularly challenging. Here, we investigate the influence of temperature and nutrients on coral community traits and life history strategies on lagoonal reefs across the Belize Mesoamerican Barrier Reef System (MBRS). A novel metric was developed using ultra-high-resolution sea surface temperatures (SST) to classify reefs as enduring low (lowTP), moderate (modTP), or extreme (extTP) temperature parameters over 10 years (2003 to 2012). Chlorophyll-a (chl a) records obtained for the same interval were employed as a proxy for bulk nutrients and these records were complemented with in situ measurements to "sea truth" nutrient content across the three reef types. Chl a concentrations were highest at extTP sites, medial at modTP sites and lowest at lowTP sites. Coral species richness, abundance, diversity, density, and percent cover were lower at extTP sites compared to lowTP and modTP sites, but these reef community traits did not differ between lowTP and modTP sites. Coral life history strategy analyses showed that extTP sites were dominated by hardy stress-tolerant and fast-growing weedy coral species, while lowTP and modTP sites consisted of competitive, generalist, weedy, and stress-tolerant coral species. These results suggest that differences in coral community traits and life history strategies between extTP and lowTP/modTP sites were driven primarily by temperature differences with differences in nutrients across site types playing a lesser role. Dominance of weedy and stress-tolerant genera at extTP sites suggests that corals utilizing these two life history strategies may be better suited to cope with warmer oceans and thus may warrant further protective status during this climate change interval. Data associated with this project are archived here, including: -SST data -Satellite Chl a data -Nutrient measurements -Raw coral community survey data For questions contact Justin Baumann (j.baumann3 gmail.com)

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The semantic model developed in this research was in response to the difficulty a group of mathematics learners had with conventional mathematical language and their interpretation of mathematical constructs. In order to develop the model ideas from linguistics, psycholinguistics, cognitive psychology, formal languages and natural language processing were investigated. This investigation led to the identification of four main processes: the parsing process, syntactic processing, semantic processing and conceptual processing. The model showed the complex interdependency between these four processes and provided a theoretical framework in which the behaviour of the mathematics learner could be analysed. The model was then extended to include the use of technological artefacts into the learning process. To facilitate this aspect of the research, the theory of instrumentation was incorporated into the semantic model. The conclusion of this research was that although the cognitive processes were interdependent, they could develop at different rates until mastery of a topic was achieved. It also found that the introduction of a technological artefact into the learning environment introduced another layer of complexity, both in terms of the learning process and the underlying relationship between the four cognitive processes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Les langages de programmation typés dynamiquement tels que JavaScript et Python repoussent la vérification de typage jusqu’au moment de l’exécution. Afin d’optimiser la performance de ces langages, les implémentations de machines virtuelles pour langages dynamiques doivent tenter d’éliminer les tests de typage dynamiques redondants. Cela se fait habituellement en utilisant une analyse d’inférence de types. Cependant, les analyses de ce genre sont souvent coûteuses et impliquent des compromis entre le temps de compilation et la précision des résultats obtenus. Ceci a conduit à la conception d’architectures de VM de plus en plus complexes. Nous proposons le versionnement paresseux de blocs de base, une technique de compilation à la volée simple qui élimine efficacement les tests de typage dynamiques redondants sur les chemins d’exécution critiques. Cette nouvelle approche génère paresseusement des versions spécialisées des blocs de base tout en propageant de l’information de typage contextualisée. Notre technique ne nécessite pas l’utilisation d’analyses de programme coûteuses, n’est pas contrainte par les limitations de précision des analyses d’inférence de types traditionnelles et évite la complexité des techniques d’optimisation spéculatives. Trois extensions sont apportées au versionnement de blocs de base afin de lui donner des capacités d’optimisation interprocédurale. Une première extension lui donne la possibilité de joindre des informations de typage aux propriétés des objets et aux variables globales. Puis, la spécialisation de points d’entrée lui permet de passer de l’information de typage des fonctions appellantes aux fonctions appellées. Finalement, la spécialisation des continuations d’appels permet de transmettre le type des valeurs de retour des fonctions appellées aux appellants sans coût dynamique. Nous démontrons empiriquement que ces extensions permettent au versionnement de blocs de base d’éliminer plus de tests de typage dynamiques que toute analyse d’inférence de typage statique.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the Accurate Google Cloud Simulator (AGOCS) – a novel high-fidelity Cloud workload simulator based on parsing real workload traces, which can be conveniently used on a desktop machine for day-to-day research. Our simulation is based on real-world workload traces from a Google Cluster with 12.5K nodes, over a period of a calendar month. The framework is able to reveal very precise and detailed parameters of the executed jobs, tasks and nodes as well as to provide actual resource usage statistics. The system has been implemented in Scala language with focus on parallel execution and an easy-to-extend design concept. The paper presents the detailed structural framework for AGOCS and discusses our main design decisions, whilst also suggesting alternative and possibly performance enhancing future approaches. The framework is available via the Open Source GitHub repository.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we introduce the online version of our ReaderBench framework, which includes multi-lingual comprehension-centered web services designed to address a wide range of individual and collaborative learning scenarios, as follows. First, students can be engaged in reading a course material, then eliciting their understanding of it; the reading strategies component provides an in-depth perspective of comprehension processes. Second, students can write an essay or a summary; the automated essay grading component provides them access to more than 200 textual complexity indices covering lexical, syntax, semantics and discourse structure measurements. Third, students can start discussing in a chat or a forum; the Computer Supported Collaborative Learning (CSCL) component provides indepth conversation analysis in terms of evaluating each member’s involvement in the CSCL environments. Eventually, the sentiment analysis, as well as the semantic models and topic mining components enable a clearer perspective in terms of learner’s points of view and of underlying interests.