36 resultados para Computational linguistics

em Helda - Digital Repository of University of Helsinki


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology to create finitestate transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.

Relevância:

60.00% 60.00%

Publicador:

Relevância:

60.00% 60.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Finite-state methods have been adopted widely in computational morphology and related linguistic applications. To enable efficient development of finite-state based linguistic descriptions, these methods should be a freely available resource for academic language research and the language technology industry. The following needs can be identified: (i) a registry that maps the existing approaches, implementations and descriptions, (ii) managing the incompatibilities of the existing tools, (iii) increasing synergy and complementary functionality of the tools, (iv) persistent availability of the tools used to manipulate the archived descriptions, (v) an archive for free finite-state based tools and linguistic descriptions. Addressing these challenges contributes to building a common research infrastructure for advanced language technology.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the thesis it is discussed in what ways concepts and methodology developed in evolutionary biology can be applied to the explanation and research of language change. The parallel nature of the mechanisms of biological evolution and language change is explored along with the history of the exchange of ideas between these two disciplines. Against this background computational methods developed in evolutionary biology are taken into consideration in terms of their applicability to the study of historical relationships between languages. Different phylogenetic methods are explained in common terminology, avoiding the technical language of statistics. The thesis is on one hand a synthesis of earlier scientific discussion, and on the other an attempt to map out the problems of earlier approaches in addition to finding new guidelines in the study of language change on their basis. Primarily literature about the connections between evolutionary biology and language change, along with research articles describing applications of phylogenetic methods into language change have been used as source material. The thesis starts out by describing the initial development of the disciplines of evolutionary biology and historical linguistics, a process which right from the beginning can be seen to have involved an exchange of ideas concerning the mechanisms of language change and biological evolution. The historical discussion lays the foundation for the handling of the generalised account of selection developed during the recent few decades. This account is aimed for creating a theoretical framework capable of explaining both biological evolution and cultural change as selection processes acting on self-replicating entities. This thesis focusses on the capacity of the generalised account of selection to describe language change as a process of this kind. In biology, the mechanisms of evolution are seen to form populations of genetically related organisms through time. One of the central questions explored in this thesis is whether selection theory makes it possible to picture languages are forming populations of a similar kind, and what a perspective like this can offer to the understanding of language in general. In historical linguistics, the comparative method and other, complementing methods have been traditionally used to study the development of languages from a common ancestral language. Computational, quantitative methods have not become widely used as part of the central methodology of historical linguistics. After the fading of a limited popularity enjoyed by the lexicostatistical method since the 1950s, only in the recent years have also the computational methods of phylogenetic inference used in evolutionary biology been applied to the study of early language history. In this thesis the possibilities offered by the traditional methodology of historical linguistics and the new phylogenetic methods are compared. The methods are approached through the ways in which they have been applied to the Indo-European languages, which is the most thoroughly investigated language family using both the traditional and the phylogenetic methods. The problems of these applications along with the optimal form of the linguistic data used in these methods are explored in the thesis. The mechanisms of biological evolution are seen in the thesis as parallel in a limited sense to the mechanisms of language change, however sufficiently so that the development of a generalised account of selection is deemed as possibly fruiful for understanding language change. These similarities are also seen to support the validity of using phylogenetic methods in the study of language history, although the use of linguistic data and the models of language change employed by these models are seen to await further development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis combines a computational analysis of a comprehensive corpus of Finnish lake names with a theoretical background in cognitive linguistics. The combination results on the one hand in a description of the toponymic system and the processes involved in analogy-based naming and on the other hand some adjustments to Construction Grammar. Finnish lake names are suitable for this kind of study, as they are to a large extent semantically transparent even when relatively old. There is also a large number of them, and they are comprehensively collected in a computer database. The current work starts with an exploratory computational analysis of co-location patterns between different lake names. Such an analysis makes it possible to assess the importance of analogy and patterns in naming. Prior research has suggested that analogy plays an important role, often also in cases where there are other motivations for the name, and the current study confirms this. However, it also appears that naming patterns are very fuzzy and that their nature is somewhat hard to define in an essentially structuralist tradition. In describing toponymic structure and the processes involved in naming, cognitive linguistics presents itself as a promising theoretical basis. The descriptive formalism of Construction Grammar seems especially well suited for the task. However, now productivity becomes a problem: it is not nearly as clear-cut as the latter theory often assumes, and this is even more apparent in names than in more traditional linguistic material. The varying degree of productivity is most naturally described by a prototype-based theory. Such an approach, however, requires some adjustments to onstruction Grammar. Based on all this, the thesis proposes a descriptive model where a new name -- or more generally, a new linguistic expression -- can be formed by conceptual integration from either a single prior example or a construction generalised from a number of different prior ones. The new model accounts nicely for various aspects of naming that are problematic for the traditional description based on analogy and patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Atherosclerosis is a disease of the arteries; its characteristic features include chronic inflammation, extra- and intracellular lipid accumulation, extracellular matrix remodeling, and an increase in extracellular matrix volume. The underlying mechanisms in the pathogenesis of advanced atherosclerotic plaques, that involve local acidity of the extracellular fluid, are still incompletely understood. In this thesis project, my co-workers and I studied the different mechanisms by which local extracellular acidity could promote accumulation of the atherogenic apolipoprotein B-100 (apoB-100)-containing plasma lipoprotein particles in the inner layer of the arterial wall, the intima. We found that lipolysis of atherogenic apoB-100-containing plasma lipoprotein particles (LDL, IDL, and sVLDL) by the secretory phospholipase A2 group V (sPLA2-V) enzyme, was increased at acidic pH. Also, the binding of apoB-100-containing plasma lipoprotein particles to human aortic proteoglycans was dramatically enhanced at acidic pH. Additionally, lipolysis by sPLA2-V enzyme further increased this binding. Using proteoglycan-affinity chromatography, we found that sVLDL lipoprotein particles consist of populations, differing in their affinities toward proteoglycans. These populations also contained different amounts of apolipoprotein E (apoE) and apolipoprotein C-III (apoC-III); the amounts of apoC-III and apoE per particle were highest in the population with the lowest affinity toward proteoglycans. Since PLA2-modification of LDL particles has been shown to change their aggregation behavior, we also studied the effect of acidic pH on the monolayer structure covering lipoprotein particles after PLA2-induced hydrolysis. Using molecular dynamics simulations, we found that, in acidity, the monolayer is more tightly packed laterally; moreover, its spontaneous curvature is negative, suggesting that acidity may promote lipoprotein particles fusion. In addition to extracellular lipid accumulation, the apoB-100-containing plasma lipoprotein particles can be taken up by inflammatory cells, namely macrophages. Using radiolabeled lipoprotein particles and cell cultures, we showed that sPLA2-V-modification of LDL, IDL, and sVLDL lipoproteins particles, at neutral or acidic pH, increased their uptake by human monocyte-derived macrophages.