200 resultados para parsing
Resumo:
As part of a larger research project in musical structure, a program has been written which "reads" scores encoded in an input language isomorphic to music notation. The program is believed to be the first of its kind. From a small number of parsing rules the program derives complex configurations, each of which is associated with a set of reference points in a numerical representation of a time-continuum. The logical structure of the program is such that all and only the defined classes of events are represented in the output. Because the basis of the program is syntactic (in the sense that parsing operations are performed on formal structures in the input string), many extensions and refinements can be made without excessive difficulty. The program can be applied to any music which can be represented in the input language. At present, however, it constitutes the first stage in the development of a set of analytic tools for the study of so-called atonal music, the revolutionary and little understood music which has exerted a decisive influence upon contemporary practice of the art. The program and the approach to automatic data-structuring may be of interest to linguists and scholars in other fields concerned with basic studies of complex structures produced by human beings.
Resumo:
Clare, A. and King R.D. (2003) Data mining the yeast genome in a lazy functional language. In Practical Aspects of Declarative Languages (PADL'03) (won Best/Most Practical Paper award).
Resumo:
By utilizing structure sharing among its parse trees, a GB parser can increase its efficiency dramatically. Using a GB parser which has as its phrase structure recovery component an implementation of Tomita's algorithm (as described in [Tom86]), we investigate how a GB parser can preserve the structure sharing output by Tomita's algorithm. In this report, we discuss the implications of using Tomita's algorithm in GB parsing, and we give some details of the structuresharing parser currently under construction. We also discuss a method of parallelizing a GB parser, and relate it to the existing literature on parallel GB parsing. Our approach to preserving sharing within a shared-packed forest is applicable not only to GB parsing, but anytime we want to preserve structure sharing in a parse forest in the presence of features.
Resumo:
We describe a GB parser implemented along the lines of those written by Fong [4] and Dorr [2]. The phrase structure recovery component is an implementation of Tomita's generalized LR parsing algorithm (described in [10]), with recursive control flow (similar to Fong's implementation). The major principles implemented are government, binding, bounding, trace theory, case theory, θ-theory, and barriers. The particular version of GB theory we use is that described by Haegeman [5]. The parser is minimal in the sense that it implements the major principles needed in a GB parser, and has fairly good coverage of linguistically interesting portions of the English language.
Resumo:
In this paper we examine a number of admission control and scheduling protocols for high-performance web servers based on a 2-phase policy for serving HTTP requests. The first "registration" phase involves establishing the TCP connection for the HTTP request and parsing/interpreting its arguments, whereas the second "service" phase involves the service/transmission of data in response to the HTTP request. By introducing a delay between these two phases, we show that the performance of a web server could be potentially improved through the adoption of a number of scheduling policies that optimize the utilization of various system components (e.g. memory cache and I/O). In addition, to its premise for improving the performance of a single web server, the delineation between the registration and service phases of an HTTP request may be useful for load balancing purposes on clusters of web servers. We are investigating the use of such a mechanism as part of the Commonwealth testbed being developed at Boston University.
Resumo:
The mechanisms underlying the parsing of a spatial distribution of velocity vectors into two adjacent (spatially segregated) or overlapping (transparent) motion surfaces were examined using random dot kinematograms. Parsing might occur using either of two principles. Surfaces might be defined on the basis of similarity of motion vectors and then sharp perceptual boundaries drawn between different surfaces (continuity-based segmentation). Alternatively, detection of a high gradient of direction or speed separating the motion surfaces might drive the process (discontinuity-based segmentation). To establish which method is used, we examined the effect of blurring the motion direction gradient. In the case of a sharp direction gradient, each dot had one of two directions differing by 135°. With a shallow gradient, most dots had one of two directions but the directions of the remainder spanned the range between one motion-defined surface and the other. In the spatial segregation case the gradient defined a central boundary separating two regions. In the transparent version the dots were randomly positioned. In both cases all dots moved with the same speed and existed for only two frames before being randomly replaced. The ability of observers to parse the motion distribution was measured in terms of their ability to discriminate the direction of one of the two surfaces. Performance was hardly affected by spreading the gradient over at least 25% of the dots (corresponding to a 1° strip in the segregation case). We conclude that detection of sharp velocity gradients is not necessary for distinguishing different motion surfaces.
Resumo:
Recent advances in neuroimaging technologies have allowed ever more detailed studies of the human brain. The combination of neuroimaging techniques with genetics may provide a more sensitive measure of the influence of genetic variants on cognitive function than behavioural measures alone. Here we present a review of functional magnetic resonance imaging (fMRI) studies of genetic links to executive functions, focusing on sustained attention, working memory and response inhibition. In addition to studies in the normal population, we also address findings from three clinical populations: schizophrenia, ADHD and autism spectrum disorders. While the findings in the populations studied do not always converge, they all point to the usefulness of neuroimaging techniques such as fMRI as potential endophenotypes for parsing the genetic aetiology of executive function. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended. © 2007 SPIE-IS&T.
Resumo:
Story understanding involves many perceptual and cognitive subprocesses, from perceiving individual words, to parsing sentences, to understanding the relationships among the story characters. We present an integrated computational model of reading that incorporates these and additional subprocesses, simultaneously discovering their fMRI signatures. Our model predicts the fMRI activity associated with reading arbitrary text passages, well enough to distinguish which of two story segments is being read with 74% accuracy. This approach is the first to simultaneously track diverse reading subprocesses during complex story processing and predict the detailed neural representation of diverse story features, ranging from visual word properties to the mention of different story characters and different actions they perform. We construct brain representation maps that replicate many results from a wide range of classical studies that focus each on one aspect of language processing and offer new insights on which type of information is processed by different areas involved in language processing. Additionally, this approach is promising for studying individual differences: it can be used to create single subject maps that may potentially be used to measure reading comprehension and diagnose reading disorders.
Resumo:
The physiological basis of human cerebral asymmetry for language remains mysterious. We have used simultaneous physiological and anatomical measurements to investigate the issue. Concentrating on neural oscillatory activity in speech-specific frequency bands and exploring interactions between gestural (motor) and auditory-evoked activity, we find, in the absence of language-related processing, that left auditory, somatosensory, articulatory motor, and inferior parietal cortices show specific, lateralized, speech-related physiological properties. With the addition of ecologically valid audiovisual stimulation, activity in auditory cortex synchronizes with left-dominant input from the motor cortex at frequencies corresponding to syllabic, but not phonemic, speech rhythms. Our results support theories of language lateralization that posit a major role for intrinsic, hardwired perceptuomotor processing in syllabic parsing and are compatible both with the evolutionary view that speech arose from a combination of syllable-sized vocalizations and meaningful hand gestures and with developmental observations suggesting phonemic analysis is a developmentally acquired process.
Resumo:
Dynamic logic is an extension of modal logic originally intended for reasoning about computer programs. The method of proving correctness of properties of a computer program using the well-known Hoare Logic can be implemented by utilizing the robustness of dynamic logic. For a very broad range of languages and applications in program veri cation, a theorem prover named KIV (Karlsruhe Interactive Veri er) Theorem Prover has already been developed. But a high degree of automation and its complexity make it di cult to use it for educational purposes. My research work is motivated towards the design and implementation of a similar interactive theorem prover with educational use as its main design criteria. As the key purpose of this system is to serve as an educational tool, it is a self-explanatory system that explains every step of creating a derivation, i.e., proving a theorem. This deductive system is implemented in the platform-independent programming language Java. In addition, a very popular combination of a lexical analyzer generator, JFlex, and the parser generator BYacc/J for parsing formulas and programs has been used.
Resumo:
Dans cette dissertation, nous présentons plusieurs techniques d’apprentissage d’espaces sémantiques pour plusieurs domaines, par exemple des mots et des images, mais aussi à l’intersection de différents domaines. Un espace de représentation est appelé sémantique si des entités jugées similaires par un être humain, ont leur similarité préservée dans cet espace. La première publication présente un enchaînement de méthodes d’apprentissage incluant plusieurs techniques d’apprentissage non supervisé qui nous a permis de remporter la compétition “Unsupervised and Transfer Learning Challenge” en 2011. Le deuxième article présente une manière d’extraire de l’information à partir d’un contexte structuré (177 détecteurs d’objets à différentes positions et échelles). On montrera que l’utilisation de la structure des données combinée à un apprentissage non supervisé permet de réduire la dimensionnalité de 97% tout en améliorant les performances de reconnaissance de scènes de +5% à +11% selon l’ensemble de données. Dans le troisième travail, on s’intéresse à la structure apprise par les réseaux de neurones profonds utilisés dans les deux précédentes publications. Plusieurs hypothèses sont présentées et testées expérimentalement montrant que l’espace appris a de meilleures propriétés de mixage (facilitant l’exploration de différentes classes durant le processus d’échantillonnage). Pour la quatrième publication, on s’intéresse à résoudre un problème d’analyse syntaxique et sémantique avec des réseaux de neurones récurrents appris sur des fenêtres de contexte de mots. Dans notre cinquième travail, nous proposons une façon d’effectuer de la recherche d’image ”augmentée” en apprenant un espace sémantique joint où une recherche d’image contenant un objet retournerait aussi des images des parties de l’objet, par exemple une recherche retournant des images de ”voiture” retournerait aussi des images de ”pare-brises”, ”coffres”, ”roues” en plus des images initiales.
Resumo:
This thesis summarizes the results on the studies on a syntax based approach for translation between Malayalam, one of Dravidian languages and English and also on the development of the major modules in building a prototype machine translation system from Malayalam to English. The development of the system is a pioneering effort in Malayalam language unattempted by previous researchers. The computational models chosen for the system is first of its kind for Malayalam language. An in depth study has been carried out in the design of the computational models and data structures needed for different modules: morphological analyzer , a parser, a syntactic structure transfer module and target language sentence generator required for the prototype system. The generation of list of part of speech tags, chunk tags and the hierarchical dependencies among the chunks required for the translation process also has been done. In the development process, the major goals are: (a) accuracy of translation (b) speed and (c) space. Accuracy-wise, smart tools for handling transfer grammar and translation standards including equivalent words, expressions, phrases and styles in the target language are to be developed. The grammar should be optimized with a view to obtaining a single correct parse and hence a single translated output. Speed-wise, innovative use of corpus analysis, efficient parsing algorithm, design of efficient Data Structure and run-time frequency-based rearrangement of the grammar which substantially reduces the parsing and generation time are required. The space requirement also has to be minimised
Resumo:
This thesis aims at empowering software customers with a tool to build software tests them selves, based on a gradual refinement of natural language scenarios into executable visual test models. The process is divided in five steps: 1. First, a natural language parser is used to extract a graph of grammatical relations from the textual scenario descriptions. 2. The resulting graph is transformed into an informal story pattern by interpreting structurization rules based on Fujaba Story Diagrams. 3. While the informal story pattern can already be used by humans the diagram still lacks technical details, especially type information. To add them, a recommender based framework uses web sites and other resources to generate formalization rules. 4. As a preparation for the code generation the classes derived for formal story patterns are aligned across all story steps, substituting a class diagram. 5. Finally, a headless version of Fujaba is used to generate an executable JUnit test. The graph transformations used in the browser application are specified in a textual domain specific language and visualized as story pattern. Last but not least, only the heavyweight parsing (step 1) and code generation (step 5) are executed on the server side. All graph transformation steps (2, 3 and 4) are executed in the browser by an interpreter written in JavaScript/GWT. This result paves the way for online collaboration between global teams of software customers, IT business analysts and software developers.
Resumo:
Free-word order languages have long posed significant problems for standard parsing algorithms. This thesis presents an implemented parser, based on Government-Binding (GB) theory, for a particular free-word order language, Warlpiri, an aboriginal language of central Australia. The words in a sentence of a free-word order language may swap about relatively freely with little effect on meaning: the permutations of a sentence mean essentially the same thing. It is assumed that this similarity in meaning is directly reflected in the syntax. The parser presented here properly processes free word order because it assigns the same syntactic structure to the permutations of a single sentence. The parser also handles fixed word order, as well as other phenomena. On the view presented here, there is no such thing as a "configurational" or "non-configurational" language. Rather, there is a spectrum of languages that are more or less ordered. The operation of this parsing system is quite different in character from that of more traditional rule-based parsing systems, e.g., context-free parsers. In this system, parsing is carried out via the construction of two different structures, one encoding precedence information and one encoding hierarchical information. This bipartite representation is the key to handling both free- and fixed-order phenomena. This thesis first presents an overview of the portion of Warlpiri that can be parsed. Following this is a description of the linguistic theory on which the parser is based. The chapter after that describes the representations and algorithms of the parser. In conclusion, the parser is compared to related work. The appendix contains a substantial list of test cases ??th grammatical and ungrammatical ??at the parser has actually processed.