226 resultados para lazy parsing
Resumo:
This paper presents two approaches of Artificial Immune System for Pattern Recognition (CLONALG and Parallel AIRS2) to classify automatically the well drilling operation stages. The classification is carried out through the analysis of some mud-logging parameters. In order to validate the performance of AIS techniques, the results were compared with others classification methods: neural network, support vector machine and lazy learning.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
A etiquetagem morfossintática é uma tarefa básica requerida por muitas aplicações de processamento de linguagem natural, tais como análise gramatical e tradução automática, e por aplicações de processamento de fala, por exemplo, síntese de fala. Essa tarefa consiste em etiquetar palavras em uma sentença com as suas categorias gramaticais. Apesar dessas aplicações requererem etiquetadores que demandem maior precisão, os etiquetadores do estado da arte ainda alcançam acurácia de 96 a 97%. Nesta tese, são investigados recursos de corpus e de software para o desenvolvimento de um etiquetador com acurácia superior à do estado da arte para o português brasileiro. Centrada em uma solução híbrida que combina etiquetagem probabilística com etiquetagem baseada em regras, a proposta de tese se concentra em um estudo exploratório sobre o método de etiquetagem, o tamanho, a qualidade, o conjunto de etiquetas e o gênero dos corpora de treinamento e teste, além de avaliar a desambiguização de palavras novas ou desconhecidas presentes nos textos a serem etiquetados. Quatro corpora foram usados nos experimentos: CETENFolha, Bosque CF 7.4, Mac-Morpho e Selva Científica. O modelo de etiquetagem proposto partiu do uso do método de aprendizado baseado em transformação(TBL) ao qual foram adicionadas três estratégias, combinadas em uma arquitetura que integra as saídas (textos etiquetados) de duas ferramentas de uso livre, o TreeTagger e o -TBL, com os módulos adicionados ao modelo. No modelo de etiquetador treinado com o corpus Mac-Morpho, de gênero jornalístico, foram obtidas taxas de acurácia de 98,05% na etiquetagem de textos do Mac-Morpho e 98,27% em textos do Bosque CF 7.4, ambos de gênero jornalístico. Avaliou-se também o desempenho do modelo de etiquetador híbrido proposto na etiquetagem de textos do corpus Selva Científica, de gênero científico. Foram identificadas necessidades de ajustes no etiquetador e nos corpora e, como resultado, foram alcançadas taxas de acurácia de 98,07% no Selva Científica, 98,06% no conjunto de teste do Mac-Morpho e 98,30% em textos do Bosque CF 7.4. Esses resultados são significativos, pois as taxas de acurácia alcançadas são superiores às do estado da arte, validando o modelo proposto em busca de um etiquetador morfossintático mais confiável.
Resumo:
O presente estudo teve como objeto de pesquisa o pensamento de José Veríssimo (Brasil) e José Ingenieros (Argentina) sobre raça e educação. Trata-se de uma proposta circunscrita num estudo comparado deste pensamento entre esses dois intelectuais. Problematizou-se como questão central: de que forma o pensamento de José Veríssimo e José Ingenieros articula a relação entre raça e educação na América Latina do final século XIX e início do século XX? Como objetivo geral, desejou-se analisar, por meio de um estudo comparado, o pensamento de José Veríssimo e de José Ingenieros sobre educação, dando destaque às interações destes com o conceito de raça na América Latina do século XIX. Como objetivos específicos, pretende-se: 1) destacar o contexto histórico do pensamento educacional de José Veríssimo e José Ingenieros; 2) identificar nas obras destes autores as relações entre raça e educação, assim como correlacionar o pensamento de José Veríssimo e de José Ingenieros sobre raça e educação com a história do pensamento intelectual latino-americano. Metodologicamente, inscreve-se o estudo no campo da História Intelectual e da História Cultural. O corpus da pesquisa está composto de duas obras de cada autor. De José Veríssimo, trabalhou-se com As Populações indígenas e mestiças da Amazônia: sua linguagem, suas crenças e seus costumes (1887) e Educação nacional (1906). De José Ingenieros, cotejou-se El hombre medíocre (1913) e Las fuerças morales (obra póstuma). Os resultados do estudo indicam que o modo como as teorias da raça chegam a América Latina são fundamentais para a compreensão do pensamneto dos autores. Nesse sentido, foi preciso realizar uma breve reflexão sobre as discussões teóricas que o tema raça suscitou na América Latina do século XIX, já que tanto José Veríssimo quanto José Ingenieiros nasceram e viveram parte de suas vidas nesse período. O primeiro nasceu no extremo norte do Brasil, no Estado do Pará, e viveu entre 1857 e 1916. Dedicou-se ao estudo da Crítica Literária e refletiu sobre a educação, colocando-a como instrumental necessário para a elevação da população mestiça do país à condição de civilizada. O segundo nasceu em Palermo, na Itália, mas migrou para a Argentina ainda criança, tornando-se cidadão argentino. Dedicou-se ao estudo da Psiquiatria, mas enveredouse, em particular, pela área da Antropologia Criminológica. Ao discutir as perturbações mentais dos indivíduos na sociedade argentina, José Ingenieros se reporta à colonização e às condições materiais dos sujeitos. Para ele, no final do século XIX as raças inferiores continuavam a representar um entrave para o desenvolvimento da Argentina. À princípio, identifica-se que o homem medíocre de Ingenieiros muito se assemelha ao homem indolente de Veríssimo. Ambos os estados – medíocre e indolente – representavam, para estes intelectuais, um estado atrasado que não se via mais presente no homem civilizado. Desse modo, defendem condições externas objetivas diferentes para que, tanto na Argentina quanto no Brasil, as mudanças internas determinadas pela raça, que resultaram no homem medíocre e indolente, fossem superadas. Dentre essas condições externas, a educação desponta como elemento necessário para a superação da indolência e da mediocridade.
Resumo:
The web services (WS) technology provides a comprehensive solution for representing, discovering, and invoking services in a wide variety of environments, including Service Oriented Architectures (SOA) and grid computing systems. At the core of WS technology lie a number of XML-based standards, such as the Simple Object Access Protocol (SOAP), that have successfully ensured WS extensibility, transparency, and interoperability. Nonetheless, there is an increasing demand to enhance WS performance, which is severely impaired by XML's verbosity. SOAP communications produce considerable network traffic, making them unfit for distributed, loosely coupled, and heterogeneous computing environments such as the open Internet. Also, they introduce higher latency and processing delays than other technologies, like Java RMI and CORBA. WS research has recently focused on SOAP performance enhancement. Many approaches build on the observation that SOAP message exchange usually involves highly similar messages (those created by the same implementation usually have the same structure, and those sent from a server to multiple clients tend to show similarities in structure and content). Similarity evaluation and differential encoding have thus emerged as SOAP performance enhancement techniques. The main idea is to identify the common parts of SOAP messages, to be processed only once, avoiding a large amount of overhead. Other approaches investigate nontraditional processor architectures, including micro-and macrolevel parallel processing solutions, so as to further increase the processing rates of SOAP/XML software toolkits. This survey paper provides a concise, yet comprehensive review of the research efforts aimed at SOAP performance enhancement. A unified view of the problem is provided, covering almost every phase of SOAP processing, ranging over message parsing, serialization, deserialization, compression, multicasting, security evaluation, and data/instruction-level processing.
Resumo:
Matita (that means pencil in Italian) is a new interactive theorem prover under development at the University of Bologna. When compared with state-of-the-art proof assistants, Matita presents both traditional and innovative aspects. The underlying calculus of the system, namely the Calculus of (Co)Inductive Constructions (CIC for short), is well-known and is used as the basis of another mainstream proof assistant—Coq—with which Matita is to some extent compatible. In the same spirit of several other systems, proof authoring is conducted by the user as a goal directed proof search, using a script for storing textual commands for the system. In the tradition of LCF, the proof language of Matita is procedural and relies on tactic and tacticals to proceed toward proof completion. The interaction paradigm offered to the user is based on the script management technique at the basis of the popularity of the Proof General generic interface for interactive theorem provers: while editing a script the user can move forth the execution point to deliver commands to the system, or back to retract (or “undo”) past commands. Matita has been developed from scratch in the past 8 years by several members of the Helm research group, this thesis author is one of such members. Matita is now a full-fledged proof assistant with a library of about 1.000 concepts. Several innovative solutions spun-off from this development effort. This thesis is about the design and implementation of some of those solutions, in particular those relevant for the topic of user interaction with theorem provers, and of which this thesis author was a major contributor. Joint work with other members of the research group is pointed out where needed. The main topics discussed in this thesis are briefly summarized below. Disambiguation. Most activities connected with interactive proving require the user to input mathematical formulae. Being mathematical notation ambiguous, parsing formulae typeset as mathematicians like to write down on paper is a challenging task; a challenge neglected by several theorem provers which usually prefer to fix an unambiguous input syntax. Exploiting features of the underlying calculus, Matita offers an efficient disambiguation engine which permit to type formulae in the familiar mathematical notation. Step-by-step tacticals. Tacticals are higher-order constructs used in proof scripts to combine tactics together. With tacticals scripts can be made shorter, readable, and more resilient to changes. Unfortunately they are de facto incompatible with state-of-the-art user interfaces based on script management. Such interfaces indeed do not permit to position the execution point inside complex tacticals, thus introducing a trade-off between the usefulness of structuring scripts and a tedious big step execution behavior during script replaying. In Matita we break this trade-off with tinycals: an alternative to a subset of LCF tacticals which can be evaluated in a more fine-grained manner. Extensible yet meaningful notation. Proof assistant users often face the need of creating new mathematical notation in order to ease the use of new concepts. The framework used in Matita for dealing with extensible notation both accounts for high quality bidimensional rendering of formulae (with the expressivity of MathMLPresentation) and provides meaningful notation, where presentational fragments are kept synchronized with semantic representation of terms. Using our approach interoperability with other systems can be achieved at the content level, and direct manipulation of formulae acting on their rendered forms is possible too. Publish/subscribe hints. Automation plays an important role in interactive proving as users like to delegate tedious proving sub-tasks to decision procedures or external reasoners. Exploiting the Web-friendliness of Matita we experimented with a broker and a network of web services (called tutors) which can try independently to complete open sub-goals of a proof, currently being authored in Matita. The user receives hints from the tutors on how to complete sub-goals and can interactively or automatically apply them to the current proof. Another innovative aspect of Matita, only marginally touched by this thesis, is the embedded content-based search engine Whelp which is exploited to various ends, from automatic theorem proving to avoiding duplicate work for the user. We also discuss the (potential) reusability in other systems of the widgets presented in this thesis and how we envisage the evolution of user interfaces for interactive theorem provers in the Web 2.0 era.
Resumo:
Nowadays, there is an increasing interest in wireless sensor networks (WSN) for environmental monitoring systems because it can be used to improve the quality of life and living conditions are becoming a major concern to people. This paper describes the design and development of a real time monitoring system based on ZigBee WSN characterized by a lower energy consumption, low cost, reduced dimensions and fast adaptation to the network tree topology. The developed system encompasses an optimized sensing process about environmental parameters, low rate transmission from sensor nodes to the gateway, packet parsing and data storing in a remote database and real time visualization through a web server.
Resumo:
[EN]In this paper, a clothes segmentation method for fashion parsing is described. This method does not rely in a previous pose estimation but people segmentation. Therefore, novel and classic segmentation techniques have been considered and improved in order to achieve accurate people segmentation. Unlike other methods described in the literature, the output is the bounding box and the predominant color of the different clothes and not a pixel level segmentation. The proposal is based on dividing the person area into an initial fixed number of stripes, that are later fused according to similar color distribution. To assess the quality of the proposed method the experiments are carried out with the Fashionista dataset that is widely used in the fashion parsing community.
Resumo:
Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.
Resumo:
Gewalttätig, korrupt und faul oder eher gesetzestreu, hilfsbereit und freundlich? Diese Abhandlung befasst sich mit der Arbeitsweise der Beniner Polizei und den Bildern, die sie von sich erzeugt und den Eindrücken, die sie bei den Bürgern hinterlässt. Die Arbeit liefert Erkenntnisse über den Aufbau und die Arbeitsweise der Beniner Polizei. Sie verweist auch auf das Konkurrenzverhältnis der Polizei zu anderen Sicherheitskräften, wie etwa der Gendarmerie und sie zeigt, dass sich die Polizeirnin diversen Grauzonen – der Legalität, der Staatlichkeit und der Formalität – bewegt. Informelle Strategien, schleichende Privatisierung und Korruption sichern in einem gewissen Rahmen das Funktionieren der Institution. Diese Schwächen der Institution haben jedoch negative Auswirkungen auf das Bild der Polizei und ihr Verhältnis zu den Bürgern. Nicht das propagierte Ideal einer Polizei, sondern die realen Interaktionen mit ihr dominieren die Wahrnehmung der Bürger von der Organisation.
Resumo:
Grammars for programming languages are traditionally specified statically. They are hard to compose and reuse due to ambiguities that inevitably arise. PetitParser combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically. Through examples and benchmarks we demonstrate that dynamic grammars are not only flexible but highly practical.
Resumo:
The benefits animals derive from living in social groups have produced the evolution of many forms of cooperative behavior. To cooperate, two or more individuals coordinate their actions to accomplish a common goal. One cognitive process that has the potential to influence cooperation is self control. Individuals delaying their impulsive choice for an immediate reward may potentially receive a larger reward later by cooperating with others. In this study, I measured whether brown capuchin monkeys (Cebus apella) were capable of impulse control and whether impulse control was related to cooperation. Impulse control and cooperation were measured using a lazy susan-like apparatus, on which animals could turn a wheel to receive food rewards. The capuchins went through two training phases that taught them how to turn the wheel efficiently to obtain rewards and how to turn the wheel to obtain the larger of two rewards. After training, I tested impulse control by giving the capuchins a choice between a smaller and a larger reward placed at shorter or more distant locations on the wheel. The capuchins demonstrated impulse control in that they tended to inhibit the impulse to select the smaller reward when it was closer and easier to reach and instead selected the larger reward when it was farther away. Cooperation was tested in all possible dyads of seven individuals, a total of 21 dyads, by allowing each dyad 10 trials to work together with effort on the lazy-susan so that each would obtain a reward. Seventeen out of 21 dyads cooperated by simultaneously moving the wheel in the same direction. The correlation between how often a particular dyad cooperated and their average impulse control score was not statistically significant, r(21) = -.125, p = .591. Capuchins demonstrated impulse control and cooperation using this novel apparatus but the two abilities were not related. Other factors such as the unique social relationship between two individuals may play a more prominent role in the motivation to cooperate rather than the cognitive capacity to inhibit behavior.
Resumo:
Mr. Kubon's project was inspired by the growing need for an automatic, syntactic analyser (parser) of Czech, which could be used in the syntactic processing of large amounts of texts. Mr. Kubon notes that such a tool would be very useful, especially in the field of corpus linguistics, where creating a large-scale "tree bank" (a collection of syntactic representations of natural language sentences) is a very important step towards the investigation of the properties of a given language. The work involved in syntactically parsing a whole corpus in order to get a representative set of syntactic structures would be almost inconceivable without the help of some kind of robust (semi)automatic parser. The need for the automatic natural language parser to be robust increases with the size of the linguistic data in the corpus or in any other kind of text which is going to be parsed. Practical experience shows that apart from syntactically correct sentences, there are many sentences which contain a "real" grammatical error. These sentences may be corrected in small-scale texts, but not generally in the whole corpus. In order to be able to complete the overall project, it was necessary to address a number of smaller problems. These were; 1. the adaptation of a suitable formalism able to describe the formal grammar of the system; 2. the definition of the structure of the system's dictionary containing all relevant lexico-syntactic information, and the development of a formal grammar able to robustly parse Czech sentences from the test suite; 3. filling the syntactic dictionary with sample data allowing the system to be tested and debugged during its development (about 1000 words); 4. the development of a set of sample sentences containing a reasonable amount of grammatical and ungrammatical phenomena covering some of the most typical syntactic constructions being used in Czech. Number 3, building a formal grammar, was the main task of the project. The grammar is of course far from complete (Mr. Kubon notes that it is debatable whether any formal grammar describing a natural language may ever be complete), but it covers the most frequent syntactic phenomena, allowing for the representation of a syntactic structure of simple clauses and also the structure of certain types of complex sentences. The stress was not so much on building a wide coverage grammar, but on the description and demonstration of a method. This method uses a similar approach as that of grammar-based grammar checking. The problem of reconstructing the "correct" form of the syntactic representation of a sentence is closely related to the problem of localisation and identification of syntactic errors. Without a precise knowledge of the nature and location of syntactic errors it is not possible to build a reliable estimation of a "correct" syntactic tree. The incremental way of building the grammar used in this project is also an important methodological issue. Experience from previous projects showed that building a grammar by creating a huge block of metarules is more complicated than the incremental method, which begins with the metarules covering most common syntactic phenomena first, and adds less important ones later, especially from the point of view of testing and debugging the grammar. The sample of the syntactic dictionary containing lexico-syntactical information (task 4) now has slightly more than 1000 lexical items representing all classes of words. During the creation of the dictionary it turned out that the task of assigning complete and correct lexico-syntactic information to verbs is a very complicated and time-consuming process which would itself be worth a separate project. The final task undertaken in this project was the development of a method allowing effective testing and debugging of the grammar during the process of its development. The problem of the consistency of new and modified rules of the formal grammar with the rules already existing is one of the crucial problems of every project aiming at the development of a large-scale formal grammar of a natural language. This method allows for the detection of any discrepancy or inconsistency of the grammar with respect to a test-bed of sentences containing all syntactic phenomena covered by the grammar. This is not only the first robust parser of Czech, but also one of the first robust parsers of a Slavic language. Since Slavic languages display a wide range of common features, it is reasonable to claim that this system may serve as a pattern for similar systems in other languages. To transfer the system into any other language it is only necessary to revise the grammar and to change the data contained in the dictionary (but not necessarily the structure of primary lexico-syntactic information). The formalism and methods used in this project can be used in other Slavic languages without substantial changes.
Resumo:
Brian electric activity is viewed as sequences of momentary maps of potential distribution. Frequency-domain source modeling, estimation of the complexity of the trajectory of the mapped brain field distributions in state space, and microstate parsing were used as analysis tools. Input-presentation as well as task-free (spontaneous thought) data collection paradigms were employed. We found: Alpha EEG field strength is more affected by visualizing mentation than by abstract mentation, both input-driven as well as self-generated. There are different neuronal populations and brain locations of the electric generators for different temporal frequencies of the brain field. Different alpha frequencies execute different brain functions as revealed by canonical correlations with mentation profiles. Different modes of mentation engage the same temporal frequencies at different brain locations. The basic structure of alpha electric fields implies inhomogeneity over time — alpha consists of concatenated global microstates in the sub-second range, characterized by quasi-stable field topographies, and rapid transitions between the microstates. In general, brain activity is strongly discontinuous, indicating that parsing into field landscape-defined microstates is appropriate. Different modes of spontaneous and induced mentation are associated with different brain electric microstates; these are proposed as candidates for psychophysiological ``atoms of thought''.
Resumo:
This paper addresses the problem of service development based on GSM handset signaling. The aim is to achieve this goal without the participation of the users, which requires the use of a passive GSM receiver on the uplink. Since no tool for GSM uplink capturing was available, we developed a new method that can synchronize to multiple mobile devices by simply overhearing traffic between them and the network. Our work includes the implementation of modules for signal recovery, message reconstruction and parsing. The method has been validated against a benchmark solution on GSM downlink and independently evaluated on uplink channels. Initial evaluations show up to 99% success rate in message decoding, which is a very promising result. Moreover, we conducted measurements that reveal insights on the impact of signal power on the capturing performance and investigate possible reactive measures.