960 resultados para Syntactic Projection
Resumo:
Information available on company websites can help people navigate to the offices of groups and individuals within the company. Automatically retrieving this within-organisation spatial information is a challenging AI problem This paper introduces a novel unsupervised pattern-based method to extract within-organisation spatial information by taking advantage of HTML structure patterns, together with a novel Conditional Random Fields (CRF) based method to identify different categories of within-organisation spatial information. The results show that the proposed method can achieve a high performance in terms of F-Score, indicating that this purely syntactic method based on web search and an analysis of HTML structure is well-suited for retrieving within-organisation spatial information.
Resumo:
A method for reconstruction of an object f(x) x=(x,y,z) from a limited set of cone-beam projection data has been developed. This method uses a modified form of convolution back-projection and projection onto convex sets (POCS) for handling the limited (or incomplete) data problem. In cone-beam tomography, one needs to have a complete geometry to completely reconstruct the original three-dimensional object. While complete geometries do exist, they are of little use in practical implementations. The most common trajectory used in practical scanners is circular, which is incomplete. It is, however, possible to recover some of the information of the original signal f(x) based on a priori knowledge of the nature of f(x). If this knowledge can be posed in a convex set framework, then POCS can be utilized. In this report, we utilize this a priori knowledge as convex set constraints to reconstruct f(x) using POCS. While we demonstrate the effectiveness of our algorithm for circular trajectories, it is essentially geometry independent and will be useful in any limited-view cone-beam reconstruction.
Resumo:
This dissertation studies the language of Latin letters that were written in Egypt and Vindolanda (in northern Britain) during the period 1st century BC 3rd century AD on papyri, ostraca, and wooden tablets. The majority of the texts is, in one way or another, connected with the Roman army. The focus of the study is on syntax and pragmatics. Besides traditional philological methods, modern syntactic theory is used as well, especially in the pragmatic analysis. The study begins with a critical survey of certain concepts that are current in the research on the Latin language, most importantly the concept of vulgar Latin , which, it is argued, seems to be used as an abstract noun for variation and change in Latin . Further, it is necessary to treat even the non-literary material primarily as written texts and not as straightforward reflections of spoken language. An examination of letter phraseology shows that there is considerable variation between the two major geographical areas of provenance. Latin letter writing in Egypt was influenced by Greek. The study highlights the importance of seeing the letters as a text type, with recurring phraseological elements appearing in the body text as well. It is argued that recognising these elements is essential for the correct analysis of the syntax. Three areas of syntax are discussed in detail: sentence connection (mainly parataxis), syntactically incoherent structures and word order (the order of the object and the verb). For certain types of sentence connection we may plausibly posit an origin in spoken Latin, but for many other linguistic phenomena attested in this material the issue of spoken Latin is anything but simple. Concerning the study of historical syntax, the letters offer information about the changing status of the accusative case. Incoherent structures may reflect contaminations in spoken language but usually the reason for them is the inability of the writer to put his thoughts into writing, especially when there is something more complicated to be expressed. Many incoherent expressions reflect the need to start the predication with a thematic constituent. Latin word order is seen as resulting from an interaction of syntactic and pragmatic factors. The preference for an order where the topic is placed sentence-initially can be seen in word order more generally as well. Furthermore, there appears a difference between Egypt and Vindolanda. The letters from Vindolanda show the order O(bject) V(erb) clearly more often than the letters from Egypt. Interestingly, this difference correlates with another, namely the use of the anaphoric pronoun is. This is an interesting observation in view of the fact that both of these are traditional Latin features, as opposed to those that foreshadow the Romance development (VO order and use of the anaphoric ille). However, it is difficult to say whether this is an indication of social or regional variation.
Resumo:
This dissertation is a theoretical study of finite-state based grammars used in natural language processing. The study is concerned with certain varieties of finite-state intersection grammars (FSIG) whose parsers define regular relations between surface strings and annotated surface strings. The study focuses on the following three aspects of FSIGs: (i) Computational complexity of grammars under limiting parameters In the study, the computational complexity in practical natural language processing is approached through performance-motivated parameters on structural complexity. Each parameter splits some grammars in the Chomsky hierarchy into an infinite set of subset approximations. When the approximations are regular, they seem to fall into the logarithmic-time hierarchyand the dot-depth hierarchy of star-free regular languages. This theoretical result is important and possibly relevant to grammar induction. (ii) Linguistically applicable structural representations Related to the linguistically applicable representations of syntactic entities, the study contains new bracketing schemes that cope with dependency links, left- and right branching, crossing dependencies and spurious ambiguity. New grammar representations that resemble the Chomsky-Schützenberger representation of context-free languages are presented in the study, and they include, in particular, representations for mildly context-sensitive non-projective dependency grammars whose performance-motivated approximations are linear time parseable. (iii) Compilation and simplification of linguistic constraints Efficient compilation methods for certain regular operations such as generalized restriction are presented. These include an elegant algorithm that has already been adopted as the approach in a proprietary finite-state tool. In addition to the compilation methods, an approach to on-the-fly simplifications of finite-state representations for parse forests is sketched. These findings are tightly coupled with each other under the theme of locality. I argue that the findings help us to develop better, linguistically oriented formalisms for finite-state parsing and to develop more efficient parsers for natural language processing. Avainsanat: syntactic parsing, finite-state automata, dependency grammar, first-order logic, linguistic performance, star-free regular approximations, mildly context-sensitive grammars
Resumo:
Relative Constructions with Pronominal Heads in Contemporary Russian Chapter 1 introduces the distinctive syntactic and semantic properties of Russian relative constructions (RCs), which are then divided into two main classes according to the type of the head phrase. The study concentrates on RCs with pronominal heads, which are systematically compared with noun-headed RCs. Chapter 2 clarifies the categorization of pronouns in Russian. The conclusion is that Russian pronouns include only personal, reflexive and wh-pronouns. The remaining words that are traditionally seen as pronouns are actually functional equivalents of determiners. This idea leads to the suggestion that RCs with these determiner-like words as the only constituent of the head phrase are actually headed by zero pronouns. In the other type of RCs with pronominal heads, the head position is occupied by wh-pronouns with clitics expressing different types of indefiniteness and quantification. Comparison of the two types of pronoun-headed RCs shows that the wh-heads and zero-heads share a number of common properties with respect to the grammatical gender, number and person as well as to the semantic distinction between animates and inanimates. The rest of Chapter 2 gives an overview of various uses of wh-pronouns in Russian and an experimental analysis of RCs headed by pronominal adverbs. Chapter 3 discusses fundamental differences between RCs with noun and pronominal heads. One of the main findings is that the choice of the relative pronoun (kto 'who' and chto 'what' versus kotoryj 'which') is motivated by a tendency to reproduce maximally the essential grammatical and semantic properties of the antecedent. Chapter 4 gives a detailed description of the determiner-like words and wh-based heads used in the two types of RCs with pronominal heads. In addition, several issues related to the syntax and semantics of free relatives are discussed. The conclusion is that there is no need to establish a separate category of free relatives in Russian. Chapter 5 discusses the syntax and semantics of correlative and free concessive constructions. They share a number of properties with pronoun-headed RCs and the two are often confused in Russian linguistics. However, a detailed analysis shows that these constructions must be distinguished from RCs. The study combines the methods of functionally-oriented Russian structuralism with some insights from generative syntax.
Resumo:
Coordination and juxtaposed sentences The object of this study is the examination of the relations between juxtaposed clauses in contemporary French. The matter in question is sentences which are composed of several clauses adjoined without a conjunction or other connector, as in: Je détournai les yeux, mon c ur se mit à battre. The aim of the study is to determine, which quality is the relation in these sentences and, on the other hand, what is the part of the coordination there. Furthermore, what is this relation of coordination, which, according to some grammars, manifests through a conjunction of coordination, but which, according to some others is marked in juxtaposed sentences through different features. The study is based on a corpus of written French from literary and journalistic text sources. Syntactic, semantic and textual properties in the clauses are discussed. The analysis points to differences so, it has been noted, in each case, if one of the clauses is affirmative and the other negative and if in the second clause, the subject has not been repeated. Also, an analysis has been made on the ground of the tense, mode, phrase structure type, and thematic structure, taking into account, in each case, if the clauses are identical or different. Punctuation has been one of the properties considered. The final aim has been to eliminate gradually, based on the partition of properties, subordinate sentences, so that only the hard core of coordinate sentences remains. In this way, the coordination could be defined similarly as the phoneme is defined as a group of distinctive features. The quantitative analyses have led to the conclusion that the sentences which, from a semantic point of view, are interpreted as coordinating, contain the least of these differences, while the sentences which can be considered as subordinating present the most of these differences. The conditions of coordination are, in that sense, hierarchical, so that the syntactic constraints have to make room for semantic, textual and cognitive factors. It is interesting to notice that everyone has the ability to produce correct coordinating structures and recognize incorrect coordinating structures. This can be explained by the human ability to categorize which has been widely researched in the semantic of prototype. The study suggests that coordination and subordination could be considered as prototypical cognitive categories based on different linguistic and pragmatic features.
Resumo:
Information structure and Kabyle constructions Three sentence types in the Construction Grammar framework The study examines three Kabyle sentence types and their variants. These sentence types have been chosen because they code the same state of affairs but have different syntactic structures. The sentence types are Dislocated sentence, Cleft sentence, and Canonical sentence. I argue first that a proper description of these sentence types should include information structure and, second, that a description which takes into account information structure is possible in the Construction Grammar framework. The study thus constitutes a testing ground for Construction Grammar for its applicability to a less known language. It constitutes a testing ground notably because the differentiation between the three types of sentences cannot be done without information structure categories and, consequently, these categories must be integrated also in the grammatical description. The information structure analysis is based on the model outlined by Knud Lambrecht. In that model, information structure is considered as a component of sentence grammar that assures the pragmatically correct sentence forms. The work starts by an examination of the three sentence types and the analyses that have been done in André Martinet s functional grammar framework. This introduces the sentence types chosen as the object of study and discusses the difficulties related to their analysis. After a presentation of the state of the art, including earlier and more recent models, the principles and notions of Construction Grammar and of Lambrecht s model are introduced and explicated. The information structure analysis is presented in three chapters, each treating one of the three sentence types. The analyses are based on spoken language data and elicitation. Prosody is included in the study when a syntactic structure seems to code two different focus structures. In such cases, it is pertinent to investigate whether these are coded by prosody. The final chapter presents the constructions that have been established and the problems encountered in analysing them. It also discusses the impact of the study on the theories used and on the theory of syntax in general.
Resumo:
The book consists of an Introduction and four articles published both in Finland and abroad, written in English or Russian. They present the studies of eight Finnish and Russian idiomatic constructions that appear in the following examples: Ikkuna rikki — Окно сломано, lit.: ‘the window broken’, Äiti täällä — Мама здесь, lit.: ‘mother here’, Kaikki myymälöihin! — Все в магазин, lit.: ‘all to the shops’, Пить так пить! ≈ ‘When I drink, I drink (a lot)!’, etc. The aim of the studies is to reconstruct the origins and to trace the development of the above-mentioned constructions up to their modern usages. To this end, the constructions are investigated both from historical and from comparative perspectives. Finally, the case studies provide a possibility to develop more general bases of development of these 'ungrammatical' items. By attempting to answer the question why such constructions develop even though they destroy the harmonious structure of а language, some principles of idiomatization are postulated in the Conclusion.
Resumo:
Expressing generalized-personal meaning in Russian Based on data from Russian, this doctoral dissertation examines generalized-personal meaning that is, generic expressions referring to all human beings, people in general, each or any person (e.g. S vozrastom načinae cenit prostye ve či With age you start to appreciate simple things ). The study shares its basic theoretical orientation with functional approaches going from meaning to form . The objective of the thesis is to determine and describe the various linguistic means which can be used by the speaker to express generalized-personal meaning. The main material of the study consists of 2,000 examples collected from modern Russian literature, newspapers, and magazines. The linguistic means of expressing generalized-personal meaning are divided into three main classes. Morphological and lexico-grammatical means (22% of the material) include the use of personal pronouns and personal verbal endings. In Russian, all personal forms except the 3rd person singular can be used in a generalized-personal meaning. Lexical means (14% of the material) involve, above all, pronouns like vse all , ka dyj everyone , nikto no one , as well as the nouns čelovek man and ljudi people . In emotional speech, generalized-personal meaning can also be conveyed lexically by using utterances like da e idiot znaet even an idiot knows . In rhetorical questions the pronoun kto who can appear in this meaning (cf. Kto ne ljubit moro enoe?! Who doesn t like ice cream?! ). The third main class, syntactic means (64% of the material), consists of constructions in which the generic person is not expressed at the surface level. This class mainly includes two-component structures in which the infinitive relates to a modal predicative adverb (e.g. mo no can, be allowed to , nado must ), modal verb (e.g. stoit be worth(while) , sleduet must, be obliged to ), or predicative adverb ending in -о (e.g. trudno it is hard to , neprilično is not appropriate ). Other syntactic means are: one-component infinitive structures, so-called embedded structures, structures with a processual noun, passive constructions, and gerund constructions. The different forms of expression available in Russian are not interchangeable in all contexts. Even if a given context tolerates the substitution of one construction for another, the two expressions are never entirely synonymous. In addition to determining the range of forms which can express generalized-personal meaning, the study aims to compare these forms and to specify the conditions and possible restrictions (contextual, semantic, syntactic, stylistic, etc.) associated with the use of each construction. In Russian linguistics, the generalized-personal meaning has not been extensively studied from a functional perspective. The advantage of a meaning-based functional approach is that it gives a comprehensive picture of the diversity and distribution of the phenomenon.
Resumo:
We have carried out an analysis of crystal structure data on prolyl and hydroxyprolyl moieties in small molecules. The flexibility of the pyrrolidine ring due to the pyramidal character of nitrogen has been defined in terms of two projection angles δ1 and δ2. The distribution of these parameters in the crystal structures is found to be consistent with results of the energy calculations carried out on prolyl moieties in our laboratory.
Resumo:
In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.
Resumo:
The work is based on the assumption that words with similar syntactic usage have similar meaning, which was proposed by Zellig S. Harris (1954,1968). We study his assumption from two aspects: Firstly, different meanings (word senses) of a word should manifest themselves in different usages (contexts), and secondly, similar usages (contexts) should lead to similar meanings (word senses). If we start with the different meanings of a word, we should be able to find distinct contexts for the meanings in text corpora. We separate the meanings by grouping and labeling contexts in an unsupervised or weakly supervised manner (Publication 1, 2 and 3). We are confronted with the question of how best to represent contexts in order to induce effective classifiers of contexts, because differences in context are the only means we have to separate word senses. If we start with words in similar contexts, we should be able to discover similarities in meaning. We can do this monolingually or multilingually. In the monolingual material, we find synonyms and other related words in an unsupervised way (Publication 4). In the multilingual material, we ?nd translations by supervised learning of transliterations (Publication 5). In both the monolingual and multilingual case, we first discover words with similar contexts, i.e., synonym or translation lists. In the monolingual case we also aim at finding structure in the lists by discovering groups of similar words, e.g., synonym sets. In this introduction to the publications of the thesis, we consider the larger background issues of how meaning arises, how it is quantized into word senses, and how it is modeled. We also consider how to define, collect and represent contexts. We discuss how to evaluate the trained context classi?ers and discovered word sense classifications, and ?nally we present the word sense discovery and disambiguation methods of the publications. This work supports Harris' hypothesis by implementing three new methods modeled on his hypothesis. The methods have practical consequences for creating thesauruses and translation dictionaries, e.g., for information retrieval and machine translation purposes. Keywords: Word senses, Context, Evaluation, Word sense disambiguation, Word sense discovery.
Resumo:
This dissertation provides a synchronic grammatical description of Mauwake, a Papuan (Trans-New Guinea) language of about 2000 speakers on the North Coast of the Madang Province in Papua New Guinea. The theoretical background is that of Basic Linguistic Theory (BLT), used extensively in analysing and writing descriptive grammars. The chapters from morphology to clause level are described from form to function; in the later chapters the function is taken more often as the starting point. Any theory-specific terminology is kept to the minimum and formalisms have been avoided in accordance with BLT principles. Mauwake has a classic 5-vowel system and 14 consonant phonemes. With its simple phonology it is a typical representative of the Madang North Coast languages. For a Papuan language there are relatively few morphophonological alternations. Nouns are either alienably or inalienably possessed. There is no obligatory number marking in nouns or noun phrases. Pronouns have several different forms: five for case and three for other functions. The dative pronouns are treated as [+human] locatives, and they have also grammaticalised as possessives. The verbal morphology is agglutinative and mainly suffixal. Unusual features include two distributive suffixes, and the interaction of the derivational benefactive and the inflectional beneficiary suffixes. The applicative suffix has either transitivising or causative but not benefactive function. The switch-reference system distinguishes between simultaneous and sequential action, as well as same or different subject in relation to the following clause. There are several verbs denoting coming and going, and they may combine with one of three prefixes to indicate bringing and taking. Mauwake is a nominative-accusative type language, and the basic constituent order in a clause is SOV. Subject and object are the only syntactic arguments. There is no indirect object, but a clause can have two or even three objects. A nominalised clause with a finite verb functions as a relative clause or a complement clause; one with a nominalised verb has several different functions. Functional domains described include modality, negation, deixis, quantification, possession and comparison. As there are four negators, Mauwake has more variation in negative expressions than is usual in Papuan languages. Clause chaining is the preferred strategy for joining clauses into sentences, but coordination and subordination of finite clauses are also common. The form of a complement clause depends on whether it is of the fact, action or potential type. Tail-head linkage is used as a cohesive device between sentences. The discourse-level features described are topic and focus.
Resumo:
This dissertation is a synchronic description of the phonology and grammar of two dialects of the Rajbanshi language (Eastern Indo-Aryan) as spoken in Jhapa, Nepal. I have primarily confined the analysis to the oral expression, since the emerging literary form is still in its infancy. The grammatical analysis is therefore based, for the most part, on a corpus of oral narrative text which was recorded and transcribed from three informants from north-east Jhapa. An informant, speaking a dialect from south-west Jhapa cross checked this text corpus and provided additional elicited material. I have described the phonology, morphology and syntax of the language, and also one aspect of its discourse structure. For the most part the phonology follows the basic Indo-Aryan pattern. Derivational morphology, compounding, reduplication, echo formation and onomatopoeic constructions are considered, as well as number, noun classes (their assignment and grammatical function), pronouns, and case and postpositions. In verbal morphology I cover causative stems, the copula, primary and secondary agreement, tense, aspect, mood, auxiliary constructions and non-finite forms. The term secondary agreement here refers to genitive agreement, dative-subject agreement and patient (and sometimes patient-agent) agreement. The breaking of default agreement rules has a range of pragmatic inferences. I argue that a distinction, based on formal, semantic and statistical grounds, should be made between conjunct verbs, derivational compound verbs and quasi-aspectual compound verbs. Rajbanshi has an open set of adjectives, and it additionally makes use of a restricted set of nouns which can function as adjectives. Various particles, and the emphatic and conjunctive clitics are also considered. The syntactic structures studied include: non-declarative speech acts, phrase-internal and clause-internal constituent order, negation, subordination, coordination and valence adjustment. I explain how the future, present and past tenses in Rajbanshi oral narratives do not seem to maintain a time reference, but rather to indicate a distinction between background and foreground information. I call this tense neutralisation .
Resumo:
This study examined an aspect of adolescent writing development, specifically whether teaching secondary school students to use strategies to enhance succinctness in their essays changed the grammatical sophistication of their sentences. A quasi-experimental intervention was used to compare changes in syntactic complexity and lexical density between one-draft and polished essays. No link was demonstrated between the intervention and the changes. A thematic analysis of teacher interviews explored links between changes to student texts and teaching approaches. The study has implications for making syntactic complexity an explicit goal of student drafting.