82 resultados para Parser


Relevância:

10.00% 10.00%

Publicador:

Resumo:

La tesi ha lo scopo di indagare le tecnologie disponibili per la realizzazione di linguaggi di programmazione e linguaggi domain specific in ambiente Java. In particolare, vengono proposti e analizzati tre strumenti presenti sul mercato: JavaCC, ANTLR e Xtext. Al termine dell’elaborato, il lettore dovrebbe avere un’idea generale dei principali meccanismi e sistemi utilizzati (come lexer, parser, AST, parse trees, etc.), oltre che del funzionamento dei tre tools presentati. Inoltre, si vogliono individuare vantaggi e svantaggi di ciascuno strumento attraverso un’analisi delle funzionalità offerte, così da fornire un giudizio critico per la scelta e la valutazione dei sistemi da utilizzare.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Grammars for programming languages are traditionally specified statically. They are hard to compose and reuse due to ambiguities that inevitably arise. PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically. Through examples and benchmarks we demonstrate that dynamic grammars are not only flexible but highly practical.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a sequential coupling of a Hidden Markov Model (HMM) recognizer for offline handwritten English sentences with a probabilistic bottom-up chart parser using Stochastic Context-Free Grammars (SCFG) extracted from a text corpus. Based on extensive experiments, we conclude that syntax analysis helps to improve recognition rates significantly.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Systems must co-evolve with their context. Reverse engineering tools are a great help in this process of required adaption. In order for these tools to be flexible, they work with models, abstract representations of the source code. The extraction of such information from source code can be done using a parser. However, it is fairly tedious to build new parsers. And this is made worse by the fact that it has to be done over and over again for every language we want to analyze. In this paper we propose a novel approach which minimizes the knowledge required of a certain language for the extraction of models implemented in that language by reflecting on the implementation of preparsed ASTs provided by an IDE. In a second phase we use a technique referred to as Model Mapping by Example to map platform dependent models onto domain specific model.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In order to analyze software systems, it is necessary to model them. Static software models are commonly imported by parsing source code and related data. Unfortunately, building custom parsers for most programming languages is a non-trivial endeavour. This poses a major bottleneck for analyzing software systems programmed in languages for which importers do not already exist. Luckily, initial software models do not require detailed parsers, so it is possible to start analysis with a coarse-grained importer, which is then gradually refined. In this paper we propose an approach to "agile modeling" that exploits island grammars to extract initial coarse-grained models, parser combinators to enable gradual refinement of model importers, and various heuristics to recognize language structure, keywords and other language artifacts.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The domain of context-free languages has been extensively explored and there exist numerous techniques for parsing (all or a subset of) context-free languages. Unfortunately, some programming languages are not context-free. Using standard context-free parsing techniques to parse a context-sensitive programming language poses a considerable challenge. Im- plementors of programming language parsers have adopted various techniques, such as hand-written parsers, special lex- ers, or post-processing of an ambiguous parser output to deal with that challenge. In this paper we suggest a simple extension of a top-down parser with contextual information. Contrary to the tradi- tional approach that uses only the input stream as an input to a parsing function, we use a parsing context that provides ac- cess to a stream and possibly to other context-sensitive infor- mation. At a same time we keep the context-free formalism so a grammar definition stays simple without mind-blowing context-sensitive rules. We show that our approach can be used for various purposes such as indent-sensitive parsing, a high-precision island parsing or XML (with arbitrary el- ement names) parsing. We demonstrate our solution with PetitParser, a parsing-expression grammar based, top-down, parser combinator framework written in Smalltalk.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually, water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a programmer has to create water tailored to each individual island. Such an approach is fragile, however, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by a programmer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing - bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. We integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Este trabajo corresponde con la implementación de componentes software dentro de la Plataforma COMPUTAPLEX, la cual tiene como objetivo facilitar a los investigadores la realización de tareas del proceso experimental de ingeniería de software. Uno de los aportes a esta plataforma tecnológica corresponde con el desarrolló de los componentes necesarios para la recuperación de datos experimentales disponibles en diversas fuentes de datos, para ello se hizo uso de un mecanismo capaz de unificar la extracción de información de MySQL, ficheros excel y ficheros SPSS. Con ello diferentes grupos de investigación asociados pueden compartir y tener acceso a repositorios experimentales que se mantienen tanto de manera local como externa. Por otra parte, se ha realizado un estudio de la tecnología de agentes en la que se describe sus definiciones, lenguajes de comunicación, especificación FIPA, JADE como implementación FIPA y parser XML. Además para este trabajo se ha definido e implementado una ontología de comunicación entre agentes, la misma que fue diseñada en la herramienta Protégé. En lo que se refiere al desarrollo de componentes se hizo uso de una amplía variedad de tecnologías que incluye lenguaje de programación Java, framework JADE para el desarrollo de agentes, librería JENA para manejo de ontologías, librería SAXParser para lectura de archivos XML y patrón de diseño Factory. Finalmente se describe la metodología de trabajo utilizada en el proyecto, la cual por medio de la realización de varios ciclos iterativos permitió obtener prototipos que poco a poco fueron cubriendo las necesidades del producto software.----ABSTRACT---- This work relates to the implementation of software components within the platform Computaplex, which aims to enable researchers to conduct experimental software engineering process tasks. One of the contributions to this platform technology corresponds to the development of components which are necessary for the recovery of experimental data available in different data sources, to archive this goal a mechanism able to unify the extraction of information from MySQL, Excel and SPSS files was made. Therefore, associated research groups can share and access experimental repositories that remain both locally and externally. Moreover, it has been conducted a study of agent technology in its definition is described, languages communication, FIPA, JADE and FIPA implementation and XML parser. In addition to this work, it has been defined and implemented an ontology for communication between agents, the same as was designed in the Protégé tool. In what refers to the development of components, a wide range of technologies have been made which includes Java programming language, framework JADE for agent development, JENA library for handling ontologies, SAXParser for reading XML files and Factory design pattern. Finally, describing the work methodology used in this project, which through the implementation of several iterative cycles allowed to obtain prototypes were gradually meeting the needs of the software product.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Este Trabajo de Fin de Grado recoge el diseño e implementación de un compilador y una librería de entorno de ejecución para el lenguaje específico del dominio TESL, un lenguaje de alto nivel para el análisis de series temporales diseñado por un grupo de investigación de la Universidad Politécnica de Madrid. Este compilador es el primer compilador completo disponible para TESL y sirve como base para la continuación del desarrollo del lenguaje, estando ideado para permitir su adaptación a cambios en el mismo. El compilador ha sido implementado en Java siguiendo la arquitectura clásica para este tipo de aplicaciones, incluyendo un Analizador Léxico, Sintáctico y Semántico, así como un Generador de Código. Se ha documentado su arquitectura y las decisiones de diseño que han conducido a la misma. Además, se ha demostrado su funcionamiento con un caso práctico de análisis de eventos en métricas de servidores. Por último, se ha documentado el lenguaje TESL, en cuyo desarrollo se ha colaborado. ---ABSTRACT---This Bachelor’s Thesis describes the design and implementation of a compiler and a runtime library for the domain-specific language TESL, a high-level language for analyzing time series events developed by a research group from the Technical University of Madrid. This is the first fully implemented TESL compiler, and serves as basis for the continuation of the development of the language. The compiler has been implemented in Java following the classical architecture for this kind of systems, having a four phase compilation with a Lexer, a Parser, a Semantic Analyzer and a Code Generator. Its architecture and the design decisions that lead to it have been documented. Its use has been demonstrated in an use-case in the domain of server metrics. Finally, the TESL language itself has been extended and documented.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Abstract Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing —- bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For more than forty years, research has been on going in the use of the computer in the processing of natural language. During this period methods have evolved, with various parsing techniques and grammars coming to prominence. Problems still exist, not least in the field of Machine Translation. However, one of the successes in this field is the translation of sublanguage. The present work reports Deterministic Parsing, a relatively new parsing technique, and its application to the sublanguage of an aircraft maintenance manual for Machine Translation. The aim has been to investigate the practicability of using Deterministic Parsers in the analysis stage of a Machine Translation system. Machine Translation, Sublanguage and parsing are described in general terms with a review of Deterministic parsing systems, pertinent to this research, being presented in detail. The interaction between machine Translation, Sublanguage and Parsing, including Deterministic parsing, is also highlighted. Two types of Deterministic Parser have been investigated, a Marcus-type parser, based on the basic design of the original Deterministic parser (Marcus, 1980) and an LR-type Deterministic Parser for natural language, based on the LR parsing algorithm. In total, four Deterministic Parsers have been built and are described in the thesis. Two of the Deterministic Parsers are prototypes from which the remaining two parsers to be used on sublanguage have been developed. This thesis reports the results of parsing by the prototypes, a Marcus-type parser and an LR-type parser which have a similar grammatical and linguistic range to the original Marcus parser. The Marcus-type parser uses a grammar of production rules, whereas the LR-type parser employs a Definite Clause Grammar(DGC).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Unified Modeling Language (UML) is the most comprehensive and widely accepted object-oriented modeling language due to its multi-paradigm modeling capabilities and easy to use graphical notations, with strong international organizational support and industrial production quality tool support. However, there is a lack of precise definition of the semantics of individual UML notations as well as the relationships among multiple UML models, which often introduces incomplete and inconsistent problems for software designs in UML, especially for complex systems. Furthermore, there is a lack of methodologies to ensure a correct implementation from a given UML design. The purpose of this investigation is to verify and validate software designs in UML, and to provide dependability assurance for the realization of a UML design.^ In my research, an approach is proposed to transform UML diagrams into a semantic domain, which is a formal component-based framework. The framework I proposed consists of components and interactions through message passing, which are modeled by two-layer algebraic high-level nets and transformation rules respectively. In the transformation approach, class diagrams, state machine diagrams and activity diagrams are transformed into component models, and transformation rules are extracted from interaction diagrams. By applying transformation rules to component models, a (sub)system model of one or more scenarios can be constructed. Various techniques such as model checking, Petri net analysis techniques can be adopted to check if UML designs are complete or consistent. A new component called property parser was developed and merged into the tool SAM Parser, which realize (sub)system models automatically. The property parser generates and weaves runtime monitoring code into system implementations automatically for dependability assurance. The framework in the investigation is creative and flexible since it not only can be explored to verify and validate UML designs, but also provides an approach to build models for various scenarios. As a result of my research, several kinds of previous ignored behavioral inconsistencies can be detected.^

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The increasing amount of available semistructured data demands efficient mechanisms to store, process, and search an enormous corpus of data to encourage its global adoption. Current techniques to store semistructured documents either map them to relational databases, or use a combination of flat files and indexes. These two approaches result in a mismatch between the tree-structure of semistructured data and the access characteristics of the underlying storage devices. Furthermore, the inefficiency of XML parsing methods has slowed down the large-scale adoption of XML into actual system implementations. The recent development of lazy parsing techniques is a major step towards improving this situation, but lazy parsers still have significant drawbacks that undermine the massive adoption of XML. ^ Once the processing (storage and parsing) issues for semistructured data have been addressed, another key challenge to leverage semistructured data is to perform effective information discovery on such data. Previous works have addressed this problem in a generic (i.e. domain independent) way, but this process can be improved if knowledge about the specific domain is taken into consideration. ^ This dissertation had two general goals: The first goal was to devise novel techniques to efficiently store and process semistructured documents. This goal had two specific aims: We proposed a method for storing semistructured documents that maps the physical characteristics of the documents to the geometrical layout of hard drives. We developed a Double-Lazy Parser for semistructured documents which introduces lazy behavior in both the pre-parsing and progressive parsing phases of the standard Document Object Model’s parsing mechanism. ^ The second goal was to construct a user-friendly and efficient engine for performing Information Discovery over domain-specific semistructured documents. This goal also had two aims: We presented a framework that exploits the domain-specific knowledge to improve the quality of the information discovery process by incorporating domain ontologies. We also proposed meaningful evaluation metrics to compare the results of search systems over semistructured documents. ^