3 resultados para Slavic languages.

em Central European University - Research Support Scheme


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mr. Kubon's project was inspired by the growing need for an automatic, syntactic analyser (parser) of Czech, which could be used in the syntactic processing of large amounts of texts. Mr. Kubon notes that such a tool would be very useful, especially in the field of corpus linguistics, where creating a large-scale "tree bank" (a collection of syntactic representations of natural language sentences) is a very important step towards the investigation of the properties of a given language. The work involved in syntactically parsing a whole corpus in order to get a representative set of syntactic structures would be almost inconceivable without the help of some kind of robust (semi)automatic parser. The need for the automatic natural language parser to be robust increases with the size of the linguistic data in the corpus or in any other kind of text which is going to be parsed. Practical experience shows that apart from syntactically correct sentences, there are many sentences which contain a "real" grammatical error. These sentences may be corrected in small-scale texts, but not generally in the whole corpus. In order to be able to complete the overall project, it was necessary to address a number of smaller problems. These were; 1. the adaptation of a suitable formalism able to describe the formal grammar of the system; 2. the definition of the structure of the system's dictionary containing all relevant lexico-syntactic information, and the development of a formal grammar able to robustly parse Czech sentences from the test suite; 3. filling the syntactic dictionary with sample data allowing the system to be tested and debugged during its development (about 1000 words); 4. the development of a set of sample sentences containing a reasonable amount of grammatical and ungrammatical phenomena covering some of the most typical syntactic constructions being used in Czech. Number 3, building a formal grammar, was the main task of the project. The grammar is of course far from complete (Mr. Kubon notes that it is debatable whether any formal grammar describing a natural language may ever be complete), but it covers the most frequent syntactic phenomena, allowing for the representation of a syntactic structure of simple clauses and also the structure of certain types of complex sentences. The stress was not so much on building a wide coverage grammar, but on the description and demonstration of a method. This method uses a similar approach as that of grammar-based grammar checking. The problem of reconstructing the "correct" form of the syntactic representation of a sentence is closely related to the problem of localisation and identification of syntactic errors. Without a precise knowledge of the nature and location of syntactic errors it is not possible to build a reliable estimation of a "correct" syntactic tree. The incremental way of building the grammar used in this project is also an important methodological issue. Experience from previous projects showed that building a grammar by creating a huge block of metarules is more complicated than the incremental method, which begins with the metarules covering most common syntactic phenomena first, and adds less important ones later, especially from the point of view of testing and debugging the grammar. The sample of the syntactic dictionary containing lexico-syntactical information (task 4) now has slightly more than 1000 lexical items representing all classes of words. During the creation of the dictionary it turned out that the task of assigning complete and correct lexico-syntactic information to verbs is a very complicated and time-consuming process which would itself be worth a separate project. The final task undertaken in this project was the development of a method allowing effective testing and debugging of the grammar during the process of its development. The problem of the consistency of new and modified rules of the formal grammar with the rules already existing is one of the crucial problems of every project aiming at the development of a large-scale formal grammar of a natural language. This method allows for the detection of any discrepancy or inconsistency of the grammar with respect to a test-bed of sentences containing all syntactic phenomena covered by the grammar. This is not only the first robust parser of Czech, but also one of the first robust parsers of a Slavic language. Since Slavic languages display a wide range of common features, it is reasonable to claim that this system may serve as a pattern for similar systems in other languages. To transfer the system into any other language it is only necessary to revise the grammar and to change the data contained in the dictionary (but not necessarily the structure of primary lexico-syntactic information). The formalism and methods used in this project can be used in other Slavic languages without substantial changes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Through studying German, Polish and Czech publications on Silesia, Mr. Kamusella found that most of them, instead of trying to objectively analyse the past, are devoted to proving some essential "Germanness", "Polishness" or "Czechness" of this region. He believes that the terminology and thought-patterns of nationalist ideology are so deeply entrenched in the minds of researchers that they do not consider themselves nationalist. However, he notes that, due to the spread of the results of the latest studies on ethnicity/nationalism (by Gellner, Hobsbawm, Smith, Erikson Buillig, amongst others), German publications on Silesia have become quite objective since the 1980s, and the same process (impeded by under funding) has been taking place in Poland and the Czech Republic since 1989. His own research totals some 500 pages, in English, presented on disc. So what are the traps into which historians have been inclined to fall? There is a tendency for them to treat Silesia as an entity which has existed forever, though Mr. Kamusella points out that it emerged as a region only at the beginning of the 11th century. These same historians speak of Poles, Czechs and Germans in Silesia, though Mr. Kamusella found that before the mid-19th century, identification was with an inhabitant's local area, religion or dynasty. In fact, a German national identity started to be forged in Prussian Silesia only during the Liberation War against Napoleon (1813-1815). It was concretised in 1861 in the form of the first Prussian census, when the language a citizen spoke was equated with his/her nationality. A similar census was carried out in Austrian Silesia only in 1881. The censuses forced the Silesians to choose their nationality despite their multiethnic multicultural identities. It was the active promotion of a German identity in Prussian Silesia, and Vienna's uneasy acceptance of the national identities in Austrian Silesia which stimulated the development of Polish national, Moravian ethnic and Upper Silesian ethnic regional identities in Upper Silesia, and Polish national, Czech national, Moravian ethnic and Silesian ethnic identities in Austrian Silesia. While traditional historians speak of the "nationalist struggle" as though it were a permanent characteristic of Silesia, Mr. Kamusella points out that such a struggle only developed in earnest after 1918. What is more, he shows how it has been conveniently forgotten that, besides the national players, there were also significant ethnic movements of Moravians, Upper Silesians, Silesians and the tutejsi (i.e. those who still chose to identify with their locality). At this point Mr. Kamusella moves into the area of linguistics. While traditionally historians have spoken of the conflicts between the three national languages (German, Polish and Czech), Mr Kamusella reminds us that the standardised forms of these languages, which we choose to dub "national", were developed only in the mid-18th century, after 1869 (when Polish became the official language in Galicia), and after the 1870s (when Czech became the official language in Bohemia). As for standard German, it was only widely promoted in Silesia from the mid 19th century onwards. In fact, the majority of the population of Prussian Upper Silesia and Austrian Silesia were bi- or even multilingual. What is more, the "Polish" and "Czech" Silesians spoke were not the standard languages we know today, but a continuum of West-Slavic dialects in the countryside and a continuum of West-Slavic/German creoles in the urbanised areas. Such was the linguistic confusion that, from time to time, some ethnic/regional and Church activists strove to create a distinctive Upper Silesian/Silesian language on the basis of these dialects/creoles, but their efforts were thwarted by the staunch promotion of standard German, and after 1918, of standard Polish and Czech. Still on the subject of language, Mr. Kamusella draws attention to a problem around the issue of place names and personal names. Polish historians use current Polish versions of the Silesian place names, Czechs use current Polish/Czech versions of the place names, and Germans use the German versions which were in use in Silesia up to 1945. Mr. Kamusella attempted to avoid this, as he sees it, nationalist tendency, by using an appropriate version of a place name for a given period and providing its modern counterpart in parentheses. In the case of modern place names he gives the German version in parentheses. As for the name of historical figures, he strove to use the name entered on the birth certificate of the person involved, and by doing so avoid such confusion as, for instance, surrounds the Austrian Silesian pastor L.J. Sherschnik, who in German became Scherschnick, in Polish, Szersznik, and in Czech, Sersnik. Indeed, the prospective Silesian scholar should, Mr. Kamusella suggests, as well as the three languages directly involved in the area itself, know English and French, since many documents and books on the subject have been published in these languages, and even Latin, when dealing in depth with the period before the mid-19th century. Mr. Kamusella divides the policies of ethnic cleansing into two categories. The first he classifies as soft, meaning that policy is confined to the educational system, army, civil service and the church, and the aim is that everyone learn the language of the dominant group. The second is the group of hard policies, which amount to what is popularly labelled as ethnic cleansing. This category of policy aims at the total assimilation and/or physical liquidation of the non-dominant groups non-congruent with the ideal of homogeneity of a given nation-state. Mr. Kamusella found that soft policies were consciously and systematically employed by Prussia/Germany in Prussian Silesia from the 1860s to 1918, whereas in Austrian Silesia, Vienna quite inconsistently dabbled in them from the 1880s to 1917. In the inter-war period, the emergence of the nation-states of Poland and Czechoslovakia led to full employment of the soft policies and partial employment of the hard ones (curbed by the League of Nations minorities protection system) in Czechoslovakian Silesia, German Upper Silesia and the Polish parts of Upper and Austrian Silesia. In 1939-1945, Berlin started consistently using all the "hard" methods to homogenise Polish and Czechoslovakian Silesia which fell, in their entirety, within the Reich's borders. After World War II Czechoslovakia regained its prewar part of Silesia while Poland was given its prewar section plus almost the whole of the prewar German province. Subsequently, with the active involvement and support of the Soviet Union, Warsaw and Prague expelled the majority of Germans from Silesia in 1945-1948 (there were also instances of the Poles expelling Upper Silesian Czechs/Moravians, and of the Czechs expelling Czech Silesian Poles/pro-Polish Silesians). During the period of communist rule, the same two countries carried out a thorough Polonisation and Czechisation of Silesia, submerging this region into a new, non-historically based administrative division. Democratisation in the wake of the fall of communism, and a gradual retreat from the nationalist ideal of the homogeneous nation-state with a view to possible membership of the European Union, caused the abolition of the "hard" policies and phasing out of the "soft" ones. Consequently, limited revivals of various ethnic/national minorities have been observed in Czech and Polish Silesia, whereas Silesian regionalism has become popular in the westernmost part of Silesia which remained part of Germany. Mr. Kamusella believes it is possible that, with the overcoming of the nation-state discourse in European politics, when the expression of multiethnicity and multilingualism has become the cause of the day in Silesia, regionalism will hold sway in this region, uniting its ethnically/nationally variegated population in accordance with the principle of subsidiarity championed by the European Union.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study describes the sociolinguistic situation of the indigenous Hungarian national minorities in Slovakia (c. 600,000), Ukraine (c. 180,000), Romania (c. 2,000,000), Yugoslavia (c. 300,000), Slovenia (c. 8,000) and Austria (c. 6,000). Following the guidelines of Hans Goebl et al, the historical sociolinguistic portrait of each minority is presented from 1920 through to the mid-1990s. Each country's report includes sections on geography and demography, history, politics, economy, culture and religion, language policy and planning, and language use (domains of minority and/or majority language use, proficiency, attitudes, etc.). The team's findings were presented in the form of 374 pages of manuscripts, articles and tables, written in Hungarian and English. The core of the team's research results lies in the results of an empirical survey designed to study the social characteristics of Hungarian-minority bilingualism in the six project countries, and the linguistic similarities and differences between the six contact varieties of Hungarian and Hungarian in Hungary. The respondents were divided by age, education, and settlement group - city vs. village and local majority vs. local minority. The first thing to be observed is that Hungarian is tending to be spoken less to children than to parents and grandparents, a familiar pattern of language shift. In contact varieties of Hungarian, analytic constructions may be used where monolingual Hungarians would use a more synthetic form. Mr Kontra gives as an example the compound tagdij, which in Standard Hungarian means "membership fee" but which is replaced in contact Hungarian by the two-word phrase tagsagi dij. Another similar example concerns the synthetic verb hegedult "played the violin" and the analytic expression hegedun jatszott. The contrast is especially striking between the Hungarians in the northern Slavic countries, who use the synthetic form frequently, and those in the southern Slavic countries, who mainly use the analytic form. Mr. Kontra notes that from a structural point of view, there is no immediate explanation for this, since Slovak or Ukrainian are as likely to cause interference as is Serbian. He postulates instead that the difference may be attributable to some sociohistoric cause, and points out that the Turkish occupation of what is today Voivodina caused a discontinuity of the Hungarian presence in the region, with the result that Hungarians were resettled in the area only two and a half centuries ago. However, the Hungarians in today's Slovakia and Ukraine have lived together with Slavic peoples continuously for over a millennium. It may be, he suggests, that 250 years of interethnic coexistence is less than is needed for such a contact-induced change to run its course. Next Mr. Kontra moved on to what he terms "mental maps and morphology". In Hungarian, the names of cities and villages take the surface case (eg. Budapest-en "in Budapest") whereas some names denoting Hungarian settlements and all names of foreign cities take the interior case (eg. Tihany-ban "in Tihany" and Boston-ban "in Boston). The role of the semantic feature "foreign" in suffix-choice can be illustrated by such minimal pairs as Velence-n "in Velence, a village in Hungary" versus Velence-ben "in Velence [=Venice], a city in Italy", and Pecs-en "in Pecs, a city in Hungary" vs. Becs-ben "in Becs, ie. Vienna". This Hungarian vs. foreign distinction is often interpreted as "belonging to historical (pre-1920) Hungary" vs. "outside historical Hungary". The distinction is also expressed in the dichotomy "home" vs. "abroad'. The 1920 border changes have had an impact on both majority and minority Hungarians' mental maps, the maps which govern the choice of surface vs. interior cases with placenames. As there is a growing divergence between the mental maps of majority and minority Hungarians, so there will be a growing divergence in their use of the placename suffixes. Two placenames were chosen to scratch the surface of this complex problem: Craiova (a city in Oltenia, Romania) and Kosovo (Hungarian Koszovo) an autonomous region in southeast Yugoslavia. The assumption to be tested was that both placenames would be used with the inessive (interior) suffixes categorically by Hungarians in Hungary, but that the superessive suffix (showing "home") would be used near-categorically by Hungarians in Romania and Yugoslavia (Voivodina). Minority Hungarians in countries other than Romania and Yugoslavia would show no difference from majority Hungarians in Hungary. In fact, the data show that, contrary to expectation, there is considerable variation within Hungary. And although Koszovo is used, as expected, with the "home" suffix by 61% of the informants in Yugoslavia, the same suffix is used by an even higher percentage of the subjects in Slovenia. Mr. Kontra's team suggests that one factor playing a role in this might be the continuance of the former Yugoslav mentality in the Hungarians of Slovenia, at least from the geographical point of view. The contact varieties of Hungarian show important grammatical differences from Hungarian in Hungary. One of these concerns the variable use of Null subjects (the inclusion or exclusion of the subject of the verb). When informants were asked to insert either megkertem or megkertem ot - "I asked her" - into a test sentence, 54.9% of the respondents in the Ukraine inserted the second phrase as opposed to only 27.4% in Hungary. Although Mr. Kontra and his team concentrated more on the differences between Contact Hungarian and Standard Hungarian, they also discovered a number of similarities. One such similarity is demonstrable in the distribution of what Mr. Kontra calls an ongoing syntactic merger in Hungarian in Hungary. This change means effectively that two possibilities merge to form a third. For instance, the two sentences Valoszinuleg kulfoldre fognak koltozni and Valoszinu, hogy kulfoldre fognak koltozni merge to form the new construction Valszinuleg, hogy kulfoldre fognak koltozni ("Probably they will move abroad."). When asked to choose "the most natural" of the sentences, one in four chose the new construction, and a chi-square test shows homogeneity in the sample. In other words, this syntactic change is spreading across the entire Hungarian-speaking region in the Carpathian Basin Mr. Kontra believes that politicians, educators, and other interested parties now have reliable and up-to-date information about each Hungarian minority. An awareness of Hungarian as a pluricentric language is being developed which elevates the status of contact varieties of Hungarian used by the minorities, an essential process, he believes, if minority languages are to be maintained.