53 resultados para Computational-linguistic domain

em Helda - Digital Repository of University of Helsinki


Relevância:

30.00% 30.00%

Publicador:

Resumo:

We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this dissertation I study language complexity from a typological perspective. Since the structuralist era, it has been assumed that local complexity differences in languages are balanced out in cross-linguistic comparisons and that complexity is not affected by the geopolitical or sociocultural aspects of the speech community. However, these assumptions have seldom been studied systematically from a typological point of view. My objective is to define complexity so that it is possible to compare it across languages and to approach its variation with the methods of quantitative typology. My main empirical research questions are: i) does language complexity vary in any systematic way in local domains, and ii) can language complexity be affected by the geographical or social environment? These questions are studied in three articles, whose findings are summarized in the introduction to the dissertation. In order to enable cross-language comparison, I measure complexity as the description length of the regularities in an entity; I separate it from difficulty, focus on local instead of global complexity, and break it up into different types. This approach helps avoid the problems that plagued earlier metrics of language complexity. My approach to grammar is functional-typological in nature, and the theoretical framework is basic linguistic theory. I delimit the empirical research functionally to the marking of core arguments (the basic participants in the sentence). I assess the distributions of complexity in this domain with multifactorial statistical methods and use different sampling strategies, implementing, for instance, the Greenbergian view of universals as diachronic laws of type preference. My data come from large and balanced samples (up to approximately 850 languages), drawn mainly from reference grammars. The results suggest that various significant trends occur in the marking of core arguments in regard to complexity and that complexity in this domain correlates with population size. These results provide evidence that linguistic patterns interact among themselves in terms of complexity, that language structure adapts to the social environment, and that there may be cognitive mechanisms that limit complexity locally. My approach to complexity and language universals can therefore be successfully applied to empirical data and may serve as a model for further research in these areas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an earlier study, we reported on the excitation of large-scale vortices in Cartesian hydrodynamical convection models subject to rapid enough rotation. In that study, the conditions for the onset of the instability were investigated in terms of the Reynolds (Re) and Coriolis (Co) numbers in models located at the stellar North pole. In this study, we extend our investigation to varying domain sizes, increasing stratification, and place the box at different latitudes. The effect of the increasing box size is to increase the sizes of the generated structures, so that the principal vortex always fills roughly half of the computational domain. The instability becomes stronger in the sense that the temperature anomaly and change in the radial velocity are observed to be enhanced. The model with the smallest box size is found to be stable against the instability, suggesting that a sufficient scale separation between the convective eddies and the scale of the domain is required for the instability to work. The instability can be seen upto the colatitude of 30 degrees, above which value the flow becomes dominated by other types of mean flows. The instability can also be seen in a model with larger stratification. Unlike the weakly stratified cases, the temperature anomaly caused by the vortex structures is seen to depend on depth.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The methodology of designing normative terminological products has been described in several guides and international standards. However, this methodology is not always applicable to designing translation-oriented terminological products which differ greatly from normative ones in terms of volume, function, and primary target group. This dissertation has three main goals. The first is to revise and enrich the stock of concepts and terms required in the process of designing an LSP dictionary for translators. The second is to detect, classify, and describe the factors which determine the characteristics of an LSP dictionary for translators and affect the process of its compilation. The third goal is to provide recommendations on different aspects of dictionary design. The study is based on an analysis of dictionaries, dictionary reviews, literature on translation-oriented lexicography, material from several dictionary projects, and the results of questionnaires. Thorough analysis of the concept of a dictionary helped us to compile a list of designable characteristics of a dictionary. These characteristics include target group, function, links to other resources, data carrier, list of lemmata, information about the lemmata, composition of other parts of the dictionary, compression of the data, structure of the data, and access structure. The factors which determine the characteristics of a dictionary have been divided into those derived from the needs of the intended users and those reflecting the restrictions of the real world (e.g. characteristics of the data carrier and organizational factors) and attitudes (e.g. traditions and scientific paradigms). The designer of a dictionary is recommended to take the intended users' needs as the starting point and aim at finding the best compromise between the conflicting factors. When designing an LSP dictionary, much depends on the level of knowledge of the intended users about the domain in question as well as their general linguistic competence, LSP competence, and lexicographic competence. This dissertation discusses the needs of LSP translators and the role of the dictionary in the process of translation of an LSP text. It also emphasizes the importance of planning lexicographic products and activities, and addresses many practical aspects of dictionary design.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Abstract This dissertation is a cross-linguistic study of lexical iconicity. The study is based on a genealogically stratified sample of 237 languages. The aim is to contribute with an empirical study to the growing dialogue focusing on different forms of lexical iconicity. The conceptual framework of the present study is based on an analysis of types and means of lexical iconicity in the sample languages. Archaeological and cultural evidence are used to tie lexical iconicity to its context. Phenomena related to lexical iconicity are studied both cross-linguistically and language-specifically. The cognitive difference between imitation and symbolism is essential. Lexical iconicity is not only about the iconic relationship between form and referents, but also about how certain iconic properties may become conventional, means used to create sound symbolism. All the sample languages show some evidence of lexical iconicity, demonstrating that it is a universal feature. Nine comparisons of onomatopoeic verbs and nouns, with samples varying between six and 141 languages, show that typologically highly different languages use similar means for creating words based on sound imitation. Two cross-linguistic comparisons of bird names demonstrate that a vast majority of the Eurasian names of the common cuckoo and the world-wide names of crow and raven of the 141 genera are onomatopoeic.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The methodology of extracting information from texts has widely been described in the current literature. However, the methodology has been developed mainly for the purposes of other fields than terminology science. In addition, the research has been English language oriented. Therefore, there are no satisfactory language-independent methods for extracting terminological information from texts. The aim of the present study is to form the basis for a further improvement of methods for extraction of terminological information. A further aim is to determine differences in term extraction between subject groups with or without knowledge of the special field in question. The study is based on the theory of terminology, and has mainly a qualitative approach. The research material consists of electronically readable specialized texts in the subject domain of maritime safety. Textbooks, conference papers, research reports and articles from professional journals in Finnish and in Russian are included. The thesis first deals with certain term extraction methods. These are manual term identification and semi-automatic term extraction, the latter of which was carried out by using three commercial computer programs. The results of term extraction were compared and the recall and precision of the methods were evaluated. The latter part of the study is dedicated to the identification of concept relations. Certain linguistic expressions, which some researchers call knowledge probes, were applied to identify concept relations. The results of the present thesis suggest that special field knowledge is an advantage in manual term identification. However, in the candidate term lists the variation between subject groups was not as remarkable as it was between individual subjects. The term extraction software tested here produces candidate term lists which can be useful, but only after some manual work. Therefore, the work emphasizes the need to further develop term extraction software. Furthermore, the analyses indicate that there are a certain number of terms which were extracted by all the subjects and the software. These terms we call core terms. As the result of the experiment on linguistic expressions which signal concept relations, a proposal of Finnish and Russian knowledge probes in the field of maritime safety was made. The main finding was that it would be useful to combine the use of knowledge probes with semi-automatic term extraction since knowledge probes usually occur in the vicinity of terms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

My dissertation is a corpus-based study of non-finite constructions in Old English (OE). It revisits the question of Latin influence on the OE syntax, offering a new evaluation of syntactic interference between Latin and OE, and, more generally, of the contact situation in the OE period, drawing on methods used in studying grammaticalization and language contact. I address three non-finite constructions: absolute participial construction, accusative-and-infinitive construction, and nominative-and-infinitive construction, exemplified respectively in present-day English as - She looked like a pixie sometimes, her eyes darting here and there, forever watchful (BNC CCM 98); - My first acquaintance with her was when I heard her sing (BNC CFY 2215); - Charles the Bald was said to resemble his grandfather physically (BNC HPT 175). This study compares data from translated texts against the background of original OE writings, establishing dependencies and differences between the two. Although the contrastive analysis of source and target texts is one of the major methods employed in the study, translation and translation strategies as such are only my secondary foci. The emphasis is rather on what source/target comparison can tell us about the OE non-finite syntax and the typological differences between Latin and OE in this domain, and on whether contact-induced change can originate in translation. In terms of theoretical framework, I have adopted functional-typological approach, which rests on the principles of iconicity and event integration, and to the best of my knowledge, has not been applied systematically to OE non-finite constructions. Therefore one more aim of the dissertation is to test this framework and to see how OE fits into the cross-linguistic picture of non-finites. My research corpus consists of two samples: 1) written OE closely dependent on the Latin originals, based on editions of two gloss texts, five translations, and Latin originals of these texts, representing four text types: hymns, religious regulations, homily/life narrative, and biblical narrative (180,622 words); and 2) written OE as far independent from Latin as possible, based on a selection from the York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) and representing five text types: laws, charters, correspondence, chronicle narrative, and homily/life narrative (274,757 words).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the thesis it is discussed in what ways concepts and methodology developed in evolutionary biology can be applied to the explanation and research of language change. The parallel nature of the mechanisms of biological evolution and language change is explored along with the history of the exchange of ideas between these two disciplines. Against this background computational methods developed in evolutionary biology are taken into consideration in terms of their applicability to the study of historical relationships between languages. Different phylogenetic methods are explained in common terminology, avoiding the technical language of statistics. The thesis is on one hand a synthesis of earlier scientific discussion, and on the other an attempt to map out the problems of earlier approaches in addition to finding new guidelines in the study of language change on their basis. Primarily literature about the connections between evolutionary biology and language change, along with research articles describing applications of phylogenetic methods into language change have been used as source material. The thesis starts out by describing the initial development of the disciplines of evolutionary biology and historical linguistics, a process which right from the beginning can be seen to have involved an exchange of ideas concerning the mechanisms of language change and biological evolution. The historical discussion lays the foundation for the handling of the generalised account of selection developed during the recent few decades. This account is aimed for creating a theoretical framework capable of explaining both biological evolution and cultural change as selection processes acting on self-replicating entities. This thesis focusses on the capacity of the generalised account of selection to describe language change as a process of this kind. In biology, the mechanisms of evolution are seen to form populations of genetically related organisms through time. One of the central questions explored in this thesis is whether selection theory makes it possible to picture languages are forming populations of a similar kind, and what a perspective like this can offer to the understanding of language in general. In historical linguistics, the comparative method and other, complementing methods have been traditionally used to study the development of languages from a common ancestral language. Computational, quantitative methods have not become widely used as part of the central methodology of historical linguistics. After the fading of a limited popularity enjoyed by the lexicostatistical method since the 1950s, only in the recent years have also the computational methods of phylogenetic inference used in evolutionary biology been applied to the study of early language history. In this thesis the possibilities offered by the traditional methodology of historical linguistics and the new phylogenetic methods are compared. The methods are approached through the ways in which they have been applied to the Indo-European languages, which is the most thoroughly investigated language family using both the traditional and the phylogenetic methods. The problems of these applications along with the optimal form of the linguistic data used in these methods are explored in the thesis. The mechanisms of biological evolution are seen in the thesis as parallel in a limited sense to the mechanisms of language change, however sufficiently so that the development of a generalised account of selection is deemed as possibly fruiful for understanding language change. These similarities are also seen to support the validity of using phylogenetic methods in the study of language history, although the use of linguistic data and the models of language change employed by these models are seen to await further development.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This dissertation consists of four articles and an introduction. The five parts address the same topic, nonverbal predication in Erzya, from different perspectives. The work is at the same time linguistic typology and Uralic studies. The findings based on a large corpus of empirical Erzya data, which was collected using several different methods and included recordings of the spoken language, made it possible for the present study to apply, then test and finally discuss the previous theories based on cross-linguistic data. Erzya makes use of multiple predication patterns which vary from totally analytic to the morphologically very complex. Nonverbal predicate clause types are classified on the basis of propositional acts in clauses denoting class-membership, identity, property and location. The predicates of these clauses are nouns, adjectives and locational expressions, respectively. The following three predication strategies in Erzya nonverbal predication can be identified: i. the zero-copula construction, ii. the predicative suffix construction and iii. the copula construction. It has been suggested that verbs and nouns cannot be clearly distinguished on morphological grounds when functioning as predicates in Erzya. This study shows that even though predicativity must not be considered a sufficient tool for defining parts of speech in any language, the Erzya lexical classes of adjective, noun and verb can be distinguished from each other also in predicate position. The relative frequency and degree of obligation for using the predicative suffix construction decreases when moving left to right on the scale verb adjective/locative noun ( identificational statement). The predicative suffix is the main pattern in the present tense over the whole domain of nonverbal predication in Standard Erzya, but if it is replaced it is most likely to be with a zero-copula construction in a nominal predication. This study exploits the theory of (a)symmetry for the first time in order to describe verbal vs. nonverbal predication. It is shown that the asymmetry of paradigms and constructions differentiates the lexical classes. Asymmetrical structures are motivated by functional level asymmetry. Variation in predication as such adds to the complexity of the grammar. When symmetric structures are employed, the functional complexity of grammar decreases, even though morphological complexity increases. The genre affects the employment of predication strategies in Erzya. There are differences in the relative frequency of the patterns, and some patterns are totally lacking from some of the data. The clearest difference is that the past tense predicative suffix construction occurs relatively frequently in Standard Erzya, while it occurs infrequently in the other data. Also, the predicative suffixes of the present tense are used more regularly in written Standard Erzya than in any other genre. The genre also affects the incidence of the translative in uľ(ń)ems copula constructions. In translations from Russian to Erzya the translative case is employed relatively frequently in comparison to other data. This study reveals differences between the two Mordvinic languages Erzya and Moksha. The predicative suffixes (bound person markers) of the present tense are used more regularly in Moksha in all kinds of nonverbal predicate clauses compared to Erzya. It should further be observed that identificational statements are encoded with a predicative suffix in Moksha, but seldom in Erzya. Erzya clauses are more frequently encoded using zero-constructions, displaying agreement in number only.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis combines a computational analysis of a comprehensive corpus of Finnish lake names with a theoretical background in cognitive linguistics. The combination results on the one hand in a description of the toponymic system and the processes involved in analogy-based naming and on the other hand some adjustments to Construction Grammar. Finnish lake names are suitable for this kind of study, as they are to a large extent semantically transparent even when relatively old. There is also a large number of them, and they are comprehensively collected in a computer database. The current work starts with an exploratory computational analysis of co-location patterns between different lake names. Such an analysis makes it possible to assess the importance of analogy and patterns in naming. Prior research has suggested that analogy plays an important role, often also in cases where there are other motivations for the name, and the current study confirms this. However, it also appears that naming patterns are very fuzzy and that their nature is somewhat hard to define in an essentially structuralist tradition. In describing toponymic structure and the processes involved in naming, cognitive linguistics presents itself as a promising theoretical basis. The descriptive formalism of Construction Grammar seems especially well suited for the task. However, now productivity becomes a problem: it is not nearly as clear-cut as the latter theory often assumes, and this is even more apparent in names than in more traditional linguistic material. The varying degree of productivity is most naturally described by a prototype-based theory. Such an approach, however, requires some adjustments to onstruction Grammar. Based on all this, the thesis proposes a descriptive model where a new name -- or more generally, a new linguistic expression -- can be formed by conceptual integration from either a single prior example or a construction generalised from a number of different prior ones. The new model accounts nicely for various aspects of naming that are problematic for the traditional description based on analogy and patterns.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Atherosclerosis is a disease of the arteries; its characteristic features include chronic inflammation, extra- and intracellular lipid accumulation, extracellular matrix remodeling, and an increase in extracellular matrix volume. The underlying mechanisms in the pathogenesis of advanced atherosclerotic plaques, that involve local acidity of the extracellular fluid, are still incompletely understood. In this thesis project, my co-workers and I studied the different mechanisms by which local extracellular acidity could promote accumulation of the atherogenic apolipoprotein B-100 (apoB-100)-containing plasma lipoprotein particles in the inner layer of the arterial wall, the intima. We found that lipolysis of atherogenic apoB-100-containing plasma lipoprotein particles (LDL, IDL, and sVLDL) by the secretory phospholipase A2 group V (sPLA2-V) enzyme, was increased at acidic pH. Also, the binding of apoB-100-containing plasma lipoprotein particles to human aortic proteoglycans was dramatically enhanced at acidic pH. Additionally, lipolysis by sPLA2-V enzyme further increased this binding. Using proteoglycan-affinity chromatography, we found that sVLDL lipoprotein particles consist of populations, differing in their affinities toward proteoglycans. These populations also contained different amounts of apolipoprotein E (apoE) and apolipoprotein C-III (apoC-III); the amounts of apoC-III and apoE per particle were highest in the population with the lowest affinity toward proteoglycans. Since PLA2-modification of LDL particles has been shown to change their aggregation behavior, we also studied the effect of acidic pH on the monolayer structure covering lipoprotein particles after PLA2-induced hydrolysis. Using molecular dynamics simulations, we found that, in acidity, the monolayer is more tightly packed laterally; moreover, its spontaneous curvature is negative, suggesting that acidity may promote lipoprotein particles fusion. In addition to extracellular lipid accumulation, the apoB-100-containing plasma lipoprotein particles can be taken up by inflammatory cells, namely macrophages. Using radiolabeled lipoprotein particles and cell cultures, we showed that sPLA2-V-modification of LDL, IDL, and sVLDL lipoproteins particles, at neutral or acidic pH, increased their uptake by human monocyte-derived macrophages.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The molecular level structure of mixtures of water and alcohols is very complicated and has been under intense research in the recent past. Both experimental and computational methods have been used in the studies. One method for studying the intra- and intermolecular bindings in the mixtures is the use of the so called difference Compton profiles, which are a way to obtain information about changes in the electron wave functions. In the process of Compton scattering a photon scatters inelastically from an electron. The Compton profile that is obtained from the electron wave functions is directly proportional to the probability of photon scattering at a given energy to a given solid angle. In this work we develop a method to compute Compton profiles numerically for mixtures of liquids. In order to obtain the electronic wave functions necessary to calculate the Compton profiles we need some statistical information about atomic coordinates. Acquiring this using ab-initio molecular dynamics is beyond our computational capabilities and therefore we use classical molecular dynamics to model the movement of atoms in the mixture. We discuss the validity of the chosen method in view of the results obtained from the simulations. There are some difficulties in using classical molecular dynamics for the quantum mechanical calculations, but these can possibly be overcome by parameter tuning. According to the calculations clear differences can be seen in the Compton profiles of different mixtures. This prediction needs to be tested in experiments in order to find out whether the approximations made are valid.