746 resultados para Syntax
Resumo:
Relative Constructions with Pronominal Heads in Contemporary Russian Chapter 1 introduces the distinctive syntactic and semantic properties of Russian relative constructions (RCs), which are then divided into two main classes according to the type of the head phrase. The study concentrates on RCs with pronominal heads, which are systematically compared with noun-headed RCs. Chapter 2 clarifies the categorization of pronouns in Russian. The conclusion is that Russian pronouns include only personal, reflexive and wh-pronouns. The remaining words that are traditionally seen as pronouns are actually functional equivalents of determiners. This idea leads to the suggestion that RCs with these determiner-like words as the only constituent of the head phrase are actually headed by zero pronouns. In the other type of RCs with pronominal heads, the head position is occupied by wh-pronouns with clitics expressing different types of indefiniteness and quantification. Comparison of the two types of pronoun-headed RCs shows that the wh-heads and zero-heads share a number of common properties with respect to the grammatical gender, number and person as well as to the semantic distinction between animates and inanimates. The rest of Chapter 2 gives an overview of various uses of wh-pronouns in Russian and an experimental analysis of RCs headed by pronominal adverbs. Chapter 3 discusses fundamental differences between RCs with noun and pronominal heads. One of the main findings is that the choice of the relative pronoun (kto 'who' and chto 'what' versus kotoryj 'which') is motivated by a tendency to reproduce maximally the essential grammatical and semantic properties of the antecedent. Chapter 4 gives a detailed description of the determiner-like words and wh-based heads used in the two types of RCs with pronominal heads. In addition, several issues related to the syntax and semantics of free relatives are discussed. The conclusion is that there is no need to establish a separate category of free relatives in Russian. Chapter 5 discusses the syntax and semantics of correlative and free concessive constructions. They share a number of properties with pronoun-headed RCs and the two are often confused in Russian linguistics. However, a detailed analysis shows that these constructions must be distinguished from RCs. The study combines the methods of functionally-oriented Russian structuralism with some insights from generative syntax.
Resumo:
Information structure and Kabyle constructions Three sentence types in the Construction Grammar framework The study examines three Kabyle sentence types and their variants. These sentence types have been chosen because they code the same state of affairs but have different syntactic structures. The sentence types are Dislocated sentence, Cleft sentence, and Canonical sentence. I argue first that a proper description of these sentence types should include information structure and, second, that a description which takes into account information structure is possible in the Construction Grammar framework. The study thus constitutes a testing ground for Construction Grammar for its applicability to a less known language. It constitutes a testing ground notably because the differentiation between the three types of sentences cannot be done without information structure categories and, consequently, these categories must be integrated also in the grammatical description. The information structure analysis is based on the model outlined by Knud Lambrecht. In that model, information structure is considered as a component of sentence grammar that assures the pragmatically correct sentence forms. The work starts by an examination of the three sentence types and the analyses that have been done in André Martinet s functional grammar framework. This introduces the sentence types chosen as the object of study and discusses the difficulties related to their analysis. After a presentation of the state of the art, including earlier and more recent models, the principles and notions of Construction Grammar and of Lambrecht s model are introduced and explicated. The information structure analysis is presented in three chapters, each treating one of the three sentence types. The analyses are based on spoken language data and elicitation. Prosody is included in the study when a syntactic structure seems to code two different focus structures. In such cases, it is pertinent to investigate whether these are coded by prosody. The final chapter presents the constructions that have been established and the problems encountered in analysing them. It also discusses the impact of the study on the theories used and on the theory of syntax in general.
Resumo:
A new method of specifying the syntax of programming languages, known as hierarchical language specifications (HLS), is proposed. Efficient parallel algorithms for parsing languages generated by HLS are presented. These algorithms run on an exclusive-read exclusive-write parallel random-access machine. They require O(n) processors and O(log2n) time, where n is the length of the string to be parsed. The most important feature of these algorithms is that they do not use a stack.
Resumo:
In this study I look at what people want to express when they talk about time in Russian and Finnish, and why they use the means they use. The material consists of expressions of time: 1087 from Russian and 1141 from Finnish. They have been collected from dictionaries, usage guides, corpora, and the Internet. An expression means here an idiomatic set of words in a preset form, a collocation or construction. They are studied as lexical entities, without a context, and analysed and categorized according to various features. The theoretical background for the study includes two completely different approaches. Functional Syntax is used in order to find out what general meanings the speaker wishes to convey when talking about time and how these meanings are expressed in specific languages. Conceptual metaphor theory is used for explaining why the expressions are as they are, i.e. what kind of conceptual metaphors (transfers from one conceptual domain to another) they include. The study has resulted in a grammatically glossed list of time expressions in Russian and Finnish, a list of 56 general meanings involved in these time expressions and an account of the means (constructions) that these languages have for expressing the general meanings defined. It also includes an analysis of conceptual metaphors behind the expressions. The general meanings involved turned out to revolve around expressing duration, point in time, period of time, frequency, sequence, passing of time, suitable time and the right time, life as time, limitedness of time, and some other notions having less obvious semantic relations to the others. Conceptual metaphor analysis of the material has shown that time is conceptualized in Russian and Finnish according to the metaphors Time Is Space (Time Is Container, Time Has Direction, Time Is Cycle, and the Time Line Metaphor), Time Is Resource (and its submapping Time Is Substance), Time Is Actor; and some characteristics are added to these conceptualizations with the help of the secondary metaphors Time Is Nature and Time Is Life. The limits between different conceptual metaphors and the connections these metaphors have with one another are looked at with the help of the theory of conceptual integration (the blending theory) and its schemas. The results of the study show that although Russian and Finnish are typologically different, they are very similar both in the needs of expression their speakers have concerning time, and in the conceptualizations behind expressing time. This study introduces both theoretical and methodological novelties in the nature of material used, in developing empirical methodology for conceptual metaphor studies, in the exactness of defining the limits of different conceptual metaphors, and in seeking unity among the different facets of time. Keywords: time, metaphor, time expression, idiom, conceptual metaphor theory, functional syntax, blending theory
Resumo:
This dissertation is a synchronic description of the phonology and grammar of two dialects of the Rajbanshi language (Eastern Indo-Aryan) as spoken in Jhapa, Nepal. I have primarily confined the analysis to the oral expression, since the emerging literary form is still in its infancy. The grammatical analysis is therefore based, for the most part, on a corpus of oral narrative text which was recorded and transcribed from three informants from north-east Jhapa. An informant, speaking a dialect from south-west Jhapa cross checked this text corpus and provided additional elicited material. I have described the phonology, morphology and syntax of the language, and also one aspect of its discourse structure. For the most part the phonology follows the basic Indo-Aryan pattern. Derivational morphology, compounding, reduplication, echo formation and onomatopoeic constructions are considered, as well as number, noun classes (their assignment and grammatical function), pronouns, and case and postpositions. In verbal morphology I cover causative stems, the copula, primary and secondary agreement, tense, aspect, mood, auxiliary constructions and non-finite forms. The term secondary agreement here refers to genitive agreement, dative-subject agreement and patient (and sometimes patient-agent) agreement. The breaking of default agreement rules has a range of pragmatic inferences. I argue that a distinction, based on formal, semantic and statistical grounds, should be made between conjunct verbs, derivational compound verbs and quasi-aspectual compound verbs. Rajbanshi has an open set of adjectives, and it additionally makes use of a restricted set of nouns which can function as adjectives. Various particles, and the emphatic and conjunctive clitics are also considered. The syntactic structures studied include: non-declarative speech acts, phrase-internal and clause-internal constituent order, negation, subordination, coordination and valence adjustment. I explain how the future, present and past tenses in Rajbanshi oral narratives do not seem to maintain a time reference, but rather to indicate a distinction between background and foreground information. I call this tense neutralisation .
Resumo:
My dissertation is a corpus-based study of non-finite constructions in Old English (OE). It revisits the question of Latin influence on the OE syntax, offering a new evaluation of syntactic interference between Latin and OE, and, more generally, of the contact situation in the OE period, drawing on methods used in studying grammaticalization and language contact. I address three non-finite constructions: absolute participial construction, accusative-and-infinitive construction, and nominative-and-infinitive construction, exemplified respectively in present-day English as - She looked like a pixie sometimes, her eyes darting here and there, forever watchful (BNC CCM 98); - My first acquaintance with her was when I heard her sing (BNC CFY 2215); - Charles the Bald was said to resemble his grandfather physically (BNC HPT 175). This study compares data from translated texts against the background of original OE writings, establishing dependencies and differences between the two. Although the contrastive analysis of source and target texts is one of the major methods employed in the study, translation and translation strategies as such are only my secondary foci. The emphasis is rather on what source/target comparison can tell us about the OE non-finite syntax and the typological differences between Latin and OE in this domain, and on whether contact-induced change can originate in translation. In terms of theoretical framework, I have adopted functional-typological approach, which rests on the principles of iconicity and event integration, and to the best of my knowledge, has not been applied systematically to OE non-finite constructions. Therefore one more aim of the dissertation is to test this framework and to see how OE fits into the cross-linguistic picture of non-finites. My research corpus consists of two samples: 1) written OE closely dependent on the Latin originals, based on editions of two gloss texts, five translations, and Latin originals of these texts, representing four text types: hymns, religious regulations, homily/life narrative, and biblical narrative (180,622 words); and 2) written OE as far independent from Latin as possible, based on a selection from the York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE) and representing five text types: laws, charters, correspondence, chronicle narrative, and homily/life narrative (274,757 words).
Resumo:
Det har knappast undgått någon som är språkligt medveten att finlandssvenskan och sverigesvenskan skiljer sig åt till vissa delar. Olikheterna återfinns på olika språkliga nivåer. Mest kända och omskrivna är de lexikologiska skillnaderna, dvs. skillnaderna på ordplanet. Betydligt mindre uppmärksamhet har ägnats syntaktiska skillnader, dvs. skillnader i hur satser och meningar byggs upp. För att öka kunskapen om finlandssvensk syntax initierade Språkvetenskapliga nämnden vid Svenska litteratursällskapet i Finland projektet Svenskan i Finland – syntaktiska drag i ett jämförande perspektiv, som pågick åren 2004–2006. Min avhandling har kommit till inom ramen för det projektet. Prepositionerna (t.ex. av, i, på, för, till, åt osv.) är så kallade funktionsord som har till uppgift att binda samman de mer betydelsetunga orden till satser och meningar. Den finlandssvenska prepositionsanvändningen skiljer sig i viss mån från den sverigesvenska, och ”åt” är en av de prepositioner som ofta lyfts fram som exempel. Finlandssvenskarna säger t.ex. ”han gav en bok åt Lena” i stället för ”han gav en bok till Lena” eller ”han gav Lena en bok”. De säger ”berätta något åt någon” (i stället för ”för”) och de säger ”ringa åt någon” i stället för ”ringa någon”. Ett huvudsyfte med min undersökning är att ta reda på hur pass stora skillnaderna är om man ser till samtliga belägg på ”åt” i ett material och inte bara till sådana som man fäster sig vid för att man vet att de avviker i finlandssvenskan. Undersökningen är korpusbaserad. Det betyder att jag letat efter alla belägg på kombinationer av verb och prepositionen ”åt” i rätt stora textmassor som finns tillgängliga i elektronisk form. Materialet ligger i Språkbanken i Finland och omfattar huvudsakligen tidningstext och skönlitteratur. Jag har använt mig av en textmassa på sammanlagt ungefär 40 miljoner löpande ord, drygt 23 miljoner finlandssvenska och drygt 19 miljoner sverigesvenska. Det materialet gav ca 20 000 åt-belägg att studera, och det visade sig något oväntat att ”åt” inte alls är vanligare i finlandssvenskan än i sverigesvenskan när det gäller skriftspråk, åtminstone inte i professionella skribenters språk. Om man kompenserar för att den finlandssvenska och den sverigesvenska korpusen inte är helt lika i fråga om genrefördelning och ålder, kommer man fram till i stort sett samma frekvens för ”åt” i båda korpusarna. För den närmare analysen av vilka mönster åt-beläggen uppvisar har jag först och främst utnyttjat konstruktionsgrammatik men också ramsemantik och valensteori. Konstruktionsgrammatiken är ingen enhetlig teori, men tanken om grammatiska konstruktioner är gemensam. Konstruktioner representerar allt från generella syntaktiska mönster till specifika mönster för språkliga enskildheter. Uppfattningen om vad som ska inbegripas i begreppet varierar, men definitionen av ”konstruktion” som ”par (eller konstellationer) av form och betydelse” är gemensam. ”Konstruktion” avser aldrig konkreta belägg i texter eller yttranden utan alltid det abstrakta mönstret bakom dessa. Och varje yttrande är resultatet av att en stor mängd konstruktioner samverkar. I min analys har jag utgått ifrån att beläggen med ”åt” kan återföras på olika konstruktioner eller mönster utifrån vad som är gemensamt för grupper av belägg. Jag har sett på vad åt-frasen i samverkan med verbet har för funktion i beläggen. En åt-fras är syntaktiskt en prepositionsfras och består av en preposition och en rektion. Exempelvis utgör ordparet ”åt skogen” en prepositionsfras där ”skogen” är rektion. Ur mitt material har jag kunnat abstrahera fram fem övergripande mönster där referenten för rektionen har olika så kallade semantiska roller. Åt-frasen kan i kombination med verbet ange mål eller riktmärke, som i t.ex. svänga åt höger, dra åt helvete, ta sig åt hjärtat, luta åt en seger för IFK. Den kan för det andra ange mottagare (t.ex. ge varsin kaka åt hundarna, bygga en bastu åt sina svärföräldrar, skaffa biljetter åt en kompis). För det tredje kan åt-frasen avse en referent som har nytta (eller skada) av en aktion (t.ex. klippa häcken åt grannen, ställa in digitalboxen åt sin moster). Åt-frasen kan slutligen avse den eller det som är föremål antingen för en kommunikationsaktion (vinka åt sin son, skratta åt eländet) eller en attityd eller känsla (glädja sig åt framgången). Utöver dessa huvudmönster finns det ett antal smärre grupper av belägg som bildar egna mönster, men de utgör sammanlagt under 3 % i bägge korpusarna. Inom grupperna kan undermönster urskiljas. I t.ex. mottagargruppen representerar ”ge varsin kaka åt hundarna” överföringskonstruktion, ”bygga en bastu åt sina svärföräldrar” produktionskonstruktion och ”skaffa biljetter åt en kompis” ombesörjningskonstruktion. Alla typer är gemensamma för bägge materialen, men andelen belägg som representerar de olika typerna skiljer sig betydligt. I det sverigesvenska materialet står t.ex. det mönster där åt-frasen avser mål eller riktmärke för en mycket större andel av beläggen än i finlandssvenskan. Också andelen belägg där åt-frasen avser någon som har nytta (eller skada) av en aktion är mycket högre i det sverigesvenska materialet. I det finlandssvenska materialet står i gengäld mottagarbeläggen för över 50 % av beläggen medan andelen i det sverigesvenska materialet är bara 30 %. Inom gruppen utgör belägg av produktions- och ombesörjningstyp dessutom en mindre andel i det finlandssvenska materialet än i det sverigesvenska. Dessa står till sin funktion nära den typ som avser den som har nytta av aktionen. De konkreta beläggen på överföring (ge varsin kaka åt hundarna) utgör en större andel i det finlandssvenska materialet än i det sverigesvenska (ca 8 % mot 3 %), men typiskt för båda materialen är hög kollokationsgrad (”kollokation” avser par eller grupper av ord som uppträder oftare tillsammans än de statiskt sett skulle göra vid helt slumpmässig förekomst). Största delen av mottagarbeläggen utgörs av fraser av typen ”ge arbete åt någon, ge eftertryck åt något, ge liv åt något; ägna tid åt något, ägna sitt liv åt något, ägna uppmärksamhet åt något”. De här slutsatserna gäller alltså skriftspråk. I talspråk ser fördelningen annorlunda ut. Typiskt för prepositionen ”åt” är överhuvudtaget hög kollokationsgrad. Det förefaller som om språkanvändarna har tydliga, färdiga mallar för var ”åt” kan komma in. Det enda mönster som verkar helt produktivt, i den meningen att elementen är i stort sett fritt kombinerbara, är kombinationer av verb och åt-fras där åt-frasen avser den som har nytta av något. Att någon utför något för någons räkning verkar överlag kunna uttryckas med prepositionen ”åt”: t.ex. ”tvätta bilen åt pappa, ringa efter en taxi åt kunden”. Till och med belägg av typen ”hon drömde åt honom att bli ordinarie adjunkt” förekommer i någon mån. Konstruktionen är produktiv i båda språkvarieteterna men uppenbart är att konstruktion med mottagare har tolkningsföreträde i vissa fall i finlandssvenskan: ”Filip skrev ett brev åt sin syster” tolkas av sverigesvenskar som att Filip skrev brevet för systerns räkning, medan finlandssvenskar överlag uppenbarligen tolkar det som att Filip skrev till sin syster, att systern var mottagare av brevet. Ungefär 20 % av alla belägg i båda materialen representerar fall där ”åt” utgör partikel. Verb och ”åt” är närmare förbundna med varandra än när ”åt” utgör normal preposition. Exempel på partikelbelägg är ”han kom inte åt strömbrytaren, det gick åt mängder med saft, landet får dra åt svångremmen, de roffade åt sig de bästa platserna”. Också partikelmaterialet ser på ett generellt plan väldigt lika ut i båda språkvarieteterna. Den största skillnaden uppvisar den reflexiva typen ”roffa åt sig”. Medan typen är mycket homogen i det sverigesvenska materialet är variationen större i det finlandsvenska. Dels uppträder fler verb i kombinationen (han köpte åt sig ett par jeans), dels vacklar ordföljden (han nappade åt sig ett paraply ~ han nappade ett paraply åt sig). Att ”åt” används mer i vissa funktioner i finlandsvenskan brukar förklaras med påverkan från finskans allativ (ändelsen -lle: hän antoi kirjan Astalle > hon gav en bok åt Asta). Allt tyder dock på att den finlandssvenska åt-användningen delvis är en relikt. I äldre sverigesvenska källor träffar man på ”åt” i sådana kontexter som numera är typiska för finlandsvenskan. Det finlandssvenska språkområdet ligger ute i periferin i relation till det språkliga centrum som förändringar sprider sig från (för svenskans del främst Stockholmstrakten) och typiskt för perifera områden är att de uppvisar ålderdomliga drag också när inga kontaktfenomen spelar in. Allativen kan naturligtvis ha bidragit till att bevara användningen av ”åt” i finlandssvenskan. Att det är just ”åt” som används” beror antagligen på att prepositionen har flest funktioner gemensamt med allativen rent kognitivt om man jämför med de betydligt mer frekventa prepositionerna ”till” och ”för”. Uppenbart är också att åt-användningen därtill lever sitt eget liv i finlandssvenskan. I vissa varieteter av finlandssvenska kan man t.ex. höra yttranden av typ ”alla fiskarna dog åt dom”. Som språklig enskildhet har det ingen finsk förebild med allativ. Yttrandet är ett exempel på töjning av en svensk konstruktion. Modell finns dels i det mönster där åt avser den som har nytta eller skada av något, dels i relationell användning av ”åt”: han är hantlangare åt Eriksson ~ han är Erikssons hantlangare. Vid språkkontakt är det överlag konstruktioner som har förebild i det låntagande språket som lånas in från det långivande språket, medan konstruktioner som saknar förebild är betydligt mindre benägna att vinna insteg.
Resumo:
This study concerns the most common word pair in spoken Swedish, de e (it is, third person pronoun + copula-verb in present tense). The aim of the study is twofold, with an empirical aim and a theoretical aim. The empirical aim is to investigate if and how the string de e can be understood and described as a construction in its own right with characteristics that distinguishes it from other structures and resources in spoken Swedish. The theoretical aim is to test how two different linguistic theories and methods, interactional linguistics and construction grammar, can be combined and used to describe and explain patterns in languaging that traditional grammar does take into account. The empirical analysis is done within the interactional linguistic framework with sequence analyses of excerpts from authentic conversation data. The data consists of approximately ten hours of recorded conversation from Finland and Sweden. The sequence analysis suggests that the string de e really is used as a resource in its own right. In most cases, the string is also used in ways consistent with abstract grammatical patterns described by traditional grammar. Nevertheless, there are instances where de e is used in ways not described before: with numerals and infinitive phrases as complements, without any complements at all and together with certain complements (bra, de) in idiomatic ways. Furthermore, in the instances where de e is used according to known grammatical patterns the function of the particular string de e is clearly contextually specific and in various ways linked to the micro-context in which it is used. A new model is suggested for understanding and concluding the results from the sequence analyses. It consists of two different types of constructions grammatical and interactional. The grammatical constructions show how the string is used in eleven structurally different ways. The interactional constructions show seven different sequential positions and functions in which the string occurs. The two types of constructions are also linked to each other as potentials. This is a new way to describe how interactants use and responds to a concrete string like de e in conversation.
Resumo:
This study seeks to answer the question of what the language of administrative press releases is like, and how and why it has changed over the past few decades. The theoretical basis of the study is provided by critical text analysis, supplemented with, e.g., the metafunction theory of Systemic Functional Grammar, the theory of poetic function, and Finnish research into syntax. The data includes 83 press releases by the City of Helsinki Public Works Department, 14 of which were written between 1979 and 1980 (old press releases), and 69 of which were written between 1998 and 1999 (new press releases). The analysis focuses on the linguistic characteristics of the releases, their changes and variation, their relation to other texts and the extra linguistic context, as well as their genre. The core research method is linguistic text analysis. It is supplemented with an analysis of the communicative environment, based on the authors' interviews and written documents. The results can be applied to the improvement of texts produced by the authorities and even by other organizations. The linguistic analysis focuses on features that transform the texts in the data making them guiding, detailed, and poetic. The releases guide the residents of the city using modal verbal expressions and performative verbs that enable the mass media to publish the guiding expressions on their own behalf as such. The guiding is more persuasive in the new press releases than in the old ones, and the new ones also include imperative clauses and verbless directives that construct direct interaction. The language of the releases is made concrete and structurally detailed by, e.g., concrete vocabulary, proper nouns and terms, as well as definitions, adverbials and comparisons, which are used specifically to present places and administrative organizations in detail. The rhetorical features in the releases include alliteration and metaphors, which are found in the new releases especially in the titles. The emphasized features are used to draw the readers' attention and to highlight the core contents of the texts. The new releases also include words that are colloquial in style, making the communicative situations less official. Structurally, the releases have changed from being letter-like to a more newsflash-like format. The changes in the releases can be explained by the development towards more professional communications and the more market-oriented ideology adopted in the communicative environment. Key words: change in administrative language, press releases, critical text analysis, linguistic text analysis
Resumo:
A formal chemical nomenclature system WISENOM based on a context-free grammar and graph coding is described. The system is unique, unambiguous, easily pronounceable, encodable, and decodable for organic compounds. Being a formal system, every name is provable as a theorem or derivable as a terminal sentence by using the basic axioms and rewrite rules. The syntax in Backus-Naur form, examples of name derivations, and the corresponding derivation trees are provided. Encoding procedures to convert connectivity tables to WISENOM, parsing, and decoding are described.
Resumo:
This thesis is a study of a rather new logic called dependence logic and its closure under classical negation, team logic. In this thesis, dependence logic is investigated from several aspects. Some rules are presented for quantifier swapping in dependence logic and team logic. Such rules are among the basic tools one must be familiar with in order to gain the required intuition for using the logic for practical purposes. The thesis compares Ehrenfeucht-Fraïssé (EF) games of first order logic and dependence logic and defines a third EF game that characterises a mixed case where first order formulas are measured in the formula rank of dependence logic. The thesis contains detailed proofs of several translations between dependence logic, team logic, second order logic and its existential fragment. Translations are useful for showing relationships between the expressive powers of logics. Also, by inspecting the form of the translated formulas, one can see how an aspect of one logic can be expressed in the other logic. The thesis makes preliminary investigations into proof theory of dependence logic. Attempts focus on finding a complete proof system for a modest yet nontrivial fragment of dependence logic. A key problem is identified and addressed in adapting a known proof system of classical propositional logic to become a proof system for the fragment, namely that the rule of contraction is needed but is unsound in its unrestricted form. A proof system is suggested for the fragment and its completeness conjectured. Finally, the thesis investigates the very foundation of dependence logic. An alternative semantics called 1-semantics is suggested for the syntax of dependence logic. There are several key differences between 1-semantics and other semantics of dependence logic. 1-semantics is derived from first order semantics by a natural type shift. Therefore 1-semantics reflects an established semantics in a coherent manner. Negation in 1-semantics is a semantic operation and satisfies the law of excluded middle. A translation is provided from unrestricted formulas of existential second order logic into 1-semantics. Also game theoretic semantics are considerd in the light of 1-semantics.
Resumo:
XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.
Resumo:
Simple formalized rules are proposed for automatic phonetic transcription of Tamil words into Roman script. These rules are syntax-directed and require a one-symbol look-ahead facility and hence easily automated in a digital computer. Some suggestions are also put forth for the linearization of Tamil script for handling these by modern machinery.
Resumo:
The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing.
Resumo:
We have presented an overview of the FSIG approach and related FSIG gram- mars to issues of very low complexity and parsing strategy. We ended up with serious optimism according to which most FSIG grammars could be decom- posed in a reasonable way and then processed efficiently.