226 resultados para lazy parsing
Resumo:
In combinator parsing, the text of parsers resembles BNF notation. We present the basic method, and a number of extensions. We address the special problems presented by white-space, and parsers with separate lexical and syntactic phases. In particular, a combining form for handling the offside rule is given. Other extensions to the basic method include an $quot;into$quot; combining form with many useful applications, and a simple means by which combinator parsers can produce more informative error messages.
Resumo:
This paper is a tutorial on defining recursive descent parsers in Haskell. In the spirit of one-stop shopping, the paper combines material from three areas into a single source. The three areas are functional parsers, the use of monads to structure functional programs, and the use of special syntax for monadic programs in Haskell. More specifically, the paper shows how to define monadic parsers using do notation in Haskell. The paper is targeted at the level of a good undergraduate student who is familiar with Haskell, and has completed a grammars and parsing course. Some knowledge of functional parsers would be useful, but no experience with monads is assumed.
Resumo:
Humans use their grammatical knowledge in more than one way. On one hand, they use it to understand what others say. On the other hand, they use it to say what they want to convey to others (or to themselves). In either case, they need to assemble the structure of sentences in a systematic fashion, in accordance with the grammar of their language. Despite the fact that the structures that comprehenders and speakers assemble are systematic in an identical fashion (i.e., obey the same grammatical constraints), the two ‘modes’ of assembling sentence structures might or might not be performed by the same cognitive mechanisms. Currently, the field of psycholinguistics implicitly adopts the position that they are supported by different cognitive mechanisms, as evident from the fact that most psycholinguistic models seek to explain either comprehension or production phenomena. The potential existence of two independent cognitive systems underlying linguistic performance doubles the problem of linking the theory of linguistic knowledge and the theory of linguistic performance, making the integration of linguistics and psycholinguistic harder. This thesis thus aims to unify the structure building system in comprehension, i.e., parser, and the structure building system in production, i.e., generator, into one, so that the linking theory between knowledge and performance can also be unified into one. I will discuss and unify both existing and new data pertaining to how structures are assembled in understanding and speaking, and attempt to show that the unification between parsing and generation is at least a plausible research enterprise. In Chapter 1, I will discuss the previous and current views on how parsing and generation are related to each other. I will outline the challenges for the current view that the parser and the generator are the same cognitive mechanism. This single system view is discussed and evaluated in the rest of the chapters. In Chapter 2, I will present new experimental evidence suggesting that the grain size of the pre-compiled structural units (henceforth simply structural units) is rather small, contrary to some models of sentence production. In particular, I will show that the internal structure of the verb phrase in a ditransitive sentence (e.g., The chef is donating the book to the monk) is not specified at the onset of speech, but is specified before the first internal argument (the book) needs to be uttered. I will also show that this timing of structural processes with respect to the verb phrase structure is earlier than the lexical processes of verb internal arguments. These two results in concert show that the size of structure building units in sentence production is rather small, contrary to some models of sentence production, yet structural processes still precede lexical processes. I argue that this view of generation resembles the widely accepted model of parsing that utilizes both top-down and bottom-up structure building procedures. In Chapter 3, I will present new experimental evidence suggesting that the structural representation strongly constrains the subsequent lexical processes. In particular, I will show that conceptually similar lexical items interfere with each other only when they share the same syntactic category in sentence production. The mechanism that I call syntactic gating, will be proposed, and this mechanism characterizes how the structural and lexical processes interact in generation. I will present two Event Related Potential (ERP) experiments that show that the lexical retrieval in (predictive) comprehension is also constrained by syntactic categories. I will argue that the syntactic gating mechanism is operative both in parsing and generation, and that the interaction between structural and lexical processes in both parsing and generation can be characterized in the same fashion. In Chapter 4, I will present a series of experiments examining the timing at which verbs’ lexical representations are planned in sentence production. It will be shown that verbs are planned before the articulation of their internal arguments, regardless of the target language (Japanese or English) and regardless of the sentence type (active object-initial sentence in Japanese, passive sentences in English, and unaccusative sentences in English). I will discuss how this result sheds light on the notion of incrementality in generation. In Chapter 5, I will synthesize the experimental findings presented in this thesis and in previous research to address the challenges to the single system view I outlined in Chapter 1. I will then conclude by presenting a preliminary single system model that can potentially capture both the key sentence comprehension and sentence production data without assuming distinct mechanisms for each.
Resumo:
Existing parsers for textual model representation formats such as XMI and HUTN are unforgiving and fail upon even the smallest inconsistency between the structure and naming of metamodel elements and the contents of serialised models. In this paper, we demonstrate how a fuzzy parsing approach can transparently and automatically resolve a number of these inconsistencies, and how it can eventually turn XML into a human-readable and editable textual model representation format for particular classes of models.
Resumo:
This paper tries to show that the developments in linguistic sciences are better viewed as stages in a single research program, rather than different ideological -isms. The first part contains an overview of the structuralistas' beliefs about the universality and equivalence of human languages, and their search for syntactic universals. In the second part, we will see that the generative program, in its turn, tries to answer why language is a universal faculty in the human species and addresses questions about its form, its development and its use. In the second part, we will see that the paper gives a brief glimpse of the tentative answers the program has been giving to each of these issues.
Resumo:
0na eg rcarmitimcaal tfieca tcuorem opfr ethhee nospieornat iodneafli cdietf i(nAitiCoDn )o ft hthaet frequently co-occurs with Broca’s aphasia is above-chance performance on well-formedness judgment tasks for many syntactic constructions, but impaired performance where syntactic binding of traces to their antecedents occurs. However, the methodologies used to establish this aspect of the performance profile of the ACD have been predominantly offline. Offline well-formedness tasks entail extralinguistic processing (e.g. perception, attention, short-term memory, conscious reflection) in varying amounts and the influence of such processes on parsing mechanisms is yet to be fully established. In order to (a) further understand the role of extra-linguistic processing on parsing, and (b) gain a more direct insight into the online nature of parsing in Broca’s aphasia, 8 subjects underwent a series of wellformedness judgment investigations using both offline and online test batteries. The sentence types and error types used were motivated by three current theories about the nature of the ACD, namely, the Trace-Based Account (Grodzinsky, 2000), the Mapping Hypothesis (Linebarger et al., 1983) and Capacity proposals (e.g. Frazier & Friederici, 1991). The results from the present investigation speak directly to the three aforementioned theories and also demonstrate the important role that extralinguistic processing plays during offline assessment. The clinical implications of the different outcomes from the offline vs. online tasks are also discussed.
Resumo:
Nesta Dissertação os capítulos foram elaborados de maneira a estabelecer inicialmente um panorama da história da política pública ao longo do tempo e sua relação com a cultura, passando pelos conceitos da política pública de cultura, analisando os modelos de políticas culturais e a gestão cultural na democracia. Em seguida foi feito um estudo sobre a política cultural a partir da década de 80, para então analisar a participação das instituições públicas no processo de desenvolvimento após 1988. Diante da nova constituição foi lançado um olhar analítico sobre seus reflexos no campo da cultura desde o neoliberalismo até a segunda década do século XXI, além de uma visão do MinC sobre a arte contemporânea. Só então foram pesquisadas com mais profundidade as políticas públicas de cultura no estado do Espírito Santo, considerando as atuações das instituições públicas no processo de desenvolvimento, proporcionadas pelas aplicações dos Editais e seus desdobramentos, ligados a cada área de atuação dos segmentos culturais que são beneficiários, enfocando ainda as dimensões da cultura e os dilemas e alternativas das políticas públicas culturais com relação aos excluídos. A partir daí foi feita uma abordagem dos diversos segmentos artísticos do estado, suas atuações e suas carências, tendo como ponto de apoio os Editais. Foi analisado o Plano estadual de Cultura no contexto de sua relação com os segmentos culturais, considerando sua concepção original e o estado atual. Foram ainda analisadas as implicações das ações transversais entre os diversos órgãos governamentais e a cultura tendo em vista a equalização de Políticas Públicas de Cultura para o estado.
Resumo:
Admission controllers are used to prevent overload in systems with dynamically arriving tasks. Typically, these admission controllers are based on suÆcient (but not necessary) capacity bounds in order to maintain a low computational complexity. In this paper we present how exact admission-control for aperiodic tasks can be eÆciently obtained. Our rst result is an admission controller for purely aperiodic task sets where the test has the same runtime complexity as utilization-based tests. Our second result is an extension of the previous controller for a baseload of periodic tasks. The runtime complexity of this test is lower than for any known exact admission-controller. In addition to presenting our main algorithm and evaluating its performance, we also discuss some general issues concerning admission controllers and their implementation.
Resumo:
Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.
Resumo:
The study of electricity markets operation has been gaining an increasing importance in the last years, as result of the new challenges that the restructuring process produced. Currently, lots of information concerning electricity markets is available, as market operators provide, after a period of confidentiality, data regarding market proposals and transactions. These data can be used as source of knowledge to define realistic scenarios, which are essential for understanding and forecast electricity markets behavior. The development of tools able to extract, transform, store and dynamically update data, is of great importance to go a step further into the comprehension of electricity markets and of the behaviour of the involved entities. In this paper an adaptable tool capable of downloading, parsing and storing data from market operators’ websites is presented, assuring constant updating and reliability of the stored data.
Resumo:
The study of Electricity Markets operation has been gaining an increasing importance in the last years, as result of the new challenges that the restructuring produced. Currently, lots of information concerning Electricity Markets is available, as market operators provide, after a period of confidentiality, data regarding market proposals and transactions. These data can be used as source of knowledge, to define realistic scenarios, essential for understanding and forecast Electricity Markets behaviour. The development of tools able to extract, transform, store and dynamically update data, is of great importance to go a step further into the comprehension of Electricity Markets and the behaviour of the involved entities. In this paper we present an adaptable tool capable of downloading, parsing and storing data from market operators’ websites, assuring actualization and reliability of stored data.
Resumo:
A análise forense de documentos é uma das áreas das Ciências Forenses, responsável pela verificação da autenticidade dos documentos. Os documentos podem ser de diferentes tipos, sendo a moeda ou escrita manual as evidências forenses que mais frequentemente motivam a análise. A associação de novas tecnologias a este processo de análise permite uma melhor avaliação dessas evidências, tornando o processo mais célere. Esta tese baseia-se na análise forense de dois tipos de documentos - notas de euro e formulários preenchidos por escrita manual. Neste trabalho pretendeu-se desenvolver técnicas de processamento e análise de imagens de evidências dos tipos referidos com vista a extração de medidas que permitam aferir da autenticidade dos mesmos. A aquisição das imagens das notas foi realizada por imagiologia espetral, tendo-se definidas quatro modalidades de aquisição: luz visível transmitida, luz visível refletida, ultravioleta A e ultravioleta C. Para cada uma destas modalidades de aquisição, foram também definidos 2 protocolos: frente e verso. A aquisição das imagens dos documentos escritos manualmente efetuou-se através da digitalização dos mesmos com recurso a um digitalizador automático de um aparelho multifunções. Para as imagens das notas desenvolveram-se vários algoritmos de processamento e análise de imagem, específicos para este tipo de evidências. Esses algoritmos permitem a segmentação da região de interesse da imagem, a segmentação das sub-regiões que contém as marcas de segurança a avaliar bem como da extração de algumas características. Relativamente as imagens dos documentos escritos manualmente, foram também desenvolvidos algoritmos de segmentação que permitem obter todas as sub-regiões de interesse dos formulários, de forma a serem analisados os vários elementos. Neste tipo de evidências, desenvolveu-se ainda um algoritmo de análise para os elementos correspondentes à escrita de uma sequência numérica o qual permite a obtenção das imagens correspondentes aos caracteres individuais. O trabalho desenvolvido e os resultados obtidos permitiram a definição de protocolos de aquisição de imagens destes tipos de evidências. Os algoritmos automáticos de segmentação e análise desenvolvidos ao longo deste trabalho podem ser auxiliares preciosos no processo de análise da autenticidade dos documentos, o qual, ate então, é feito manualmente. Apresentam-se ainda os resultados dos estudos feitos às diversas evidências, nomeadamente as performances dos diversos algoritmos analisados, bem como algumas das adversidades encontradas durante o processo. Apresenta-se também uma discussão da metodologia adotada e dos resultados, bem como de propostas de continuação deste trabalho, nomeadamente, a extração de características e a implementação de classificadores capazes aferir da autenticidade dos documentos.
Resumo:
Dissertação apresentada para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Ciências da Linguagem
Resumo:
These are the proceedings for the eighth national conference on XML, its Associated Technologies and its Applications (XATA'2010). The paper selection resulted in 33% of papers accepted as full papers, and 33% of papers accepted as short papers. While these two types of papers were distinguish during the conference, and they had different talk duration, they all had the same limit of 12 pages. We are happy that the selected papers focus both aspects of the conference: XML technologies, and XML applications. In the first group we can include the articles on parsing and transformation technologies, like “Processing XML: a rewriting system approach", “Visual Programming of XSLT from examples", “A Refactoring Model for XML Documents", “A Performance based Approach for Processing Large XML Files in Multicore Machines", “XML to paper publishing with manual intervention" and “Parsing XML Documents in Java using Annotations". XML-core related papers are also available, focusing XML tools testing on “Test::XML::Generator: Generating XML for Unit Testing" and “XML Archive for Testing: a benchmark for GuessXQ". XML as the base for application development is also present, being discussed on different areas, like “Web Service for Interactive Products and Orders Configuration", “XML Description for Automata Manipulations", “Integration of repositories in Moodle", “XML, Annotations and Database: a Comparative Study of Metadata Definition Strategies for Frameworks", “CardioML: Integrating Personal Cardiac Information for Ubiquous Diagnosis and Analysis", “A Semantic Representation of Users Emotions when Watching Videos" and “Integrating SVG and SMIL in DAISY DTB production to enhance the contents accessibility in the Open Library for Higher Education". The wide spread of subjects makes us believe that for the time being XML is here to stay what enhances the importance of gathering this community to discuss related science and technology. Small conferences are traversing a bad period. Authors look for impact and numbers and only submit their works to big conferences sponsored by the right institutions. However the group of people behind this conference still believes that spaces like this should be preserved and maintained. This 8th gathering marks the beginning of a new cycle. We know who we are, what is our identity and we will keep working to preserve that. We hope the publication containing the works of this year's edition will catch the same attention and interest of the previous editions and above all that this publication helps in some other's work. Finally, we would like to thank all authors for their work and interest in the conference, and to the scientific committee members for their review work.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática