897 resultados para Ethno-linguistic
Resumo:
The city of Malaga underwent considerable growth in the 19th and 20th centuries. The territorial expansion paired with a massive influx of immigrants occurred in three waves and as a consequence the city of Malaga remains divided into three different parts up to today. The differences between these three neighbourhoods of the city lie in the type of houses, different cultural and industrial activities, socioeconomic level, and very interestingly, also in speech. Thus, the aim of this study is an examination of the interrelation between speech (phonetic features) and urban space in Malaga. A combination of quantitative and qualitative analysis was used, based on two types of data: 1) production data stemming from recordings of 120 speakers; 2) perception data (salience, estimated frequency of use, attitude, spatial and social perception, imitation) which was collected from several surveys with 120 participants each. Results show that the speech production data divides the city of Malaga clearly into three different parts. This tripartition is confirmed by the analysis of the perception data. Moreover, the habitants of these three areas are perceived as different social types, to whom a range of social features is attributed. That is, certain linguistic features, the different neighbourhoods of the city and the social characteristics associated with them are undergoing a process of indexicalization and iconization. As a result, the linguistic features in question function as identity markers on the intraurban level.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
In the information society large amounts of information are being generated and transmitted constantly, especially in the most natural way for humans, i.e., natural language. Social networks, blogs, forums, and Q&A sites are a dynamic Large Knowledge Repository. So, Web 2.0 contains structured data but still the largest amount of information is expressed in natural language. Linguistic structures for text recognition enable the extraction of structured information from texts. However, the expressiveness of the current structures is limited as they have been designed with a strict order in their phrases, limiting their applicability to other languages and making them more sensible to grammatical errors. To overcome these limitations, in this paper we present a linguistic structure named ?linguistic schema?, with a richer expressiveness that introduces less implicit constraints over annotations.
Resumo:
In this paper we describe the specification of amodel for the semantically interoperable representation of language resources for sentiment analysis. The model integrates "lemon", an RDF-based model for the specification of ontology-lexica (Buitelaar et al. 2009), which is used increasinglyfor the representation of language resources asLinked Data, with Marl, an RDF-based model for the representation of sentiment annotations (West-erski et al., 2011; Sánchez-Rada et al., 2013)
Resumo:
The application of Linked Data technology to the publication of linguistic data promises to facilitate interoperability of these data and has lead to the emergence of the so called Linguistic Linked Data Cloud (LLD) in which linguistic data is published following the Linked Data principles. Three essential issues need to be addressed for such data to be easily exploitable by language technologies: i) appropriate machine-readable licensing information is needed for each dataset, ii) minimum quality standards for Linguistic Linked Data need to be defined, and iii) appropriate vocabularies for publishing Linguistic Linked Data resources are needed. We propose the notion of Licensed Linguistic Linked Data (3LD) in which different licensing models might co-exist, from totally open to more restrictive licenses through to completely closed datasets.
Resumo:
The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized.
Resumo:
Linguistic systems are the human tools to understand reality. But is it possible to attain this reality? The reality that we perceive, is it just a fragmented reality of which we are part? In this paper the authors present is an attempt to address this question from an epistemological and philosophic linguistic point of view.
Resumo:
Reality contains information (significant) that becomes significances in the mind of the observer. Language is the human instrument to understand reality. But is it possible to attain this reality? Is there an absolute reality, as certain philosophical schools tell us? The reality that we perceive, is it just a fragmented reality of which we are part? The work that the authors present is an attempt to address this question from an epistemological, linguistic and logical-mathematical point of view.
Resumo:
The article analyzes the role of constitutional courts in Bosnia and Kosovo, both characterized by their partly internationalized membership, in the adjudication of cases that are highly controversial between the different ethno-political factions. The main focus is on the Constitutional Court of Bosnia, which presents one of the richest and most interesting examples of “lawfare” in divided societies. The concept of lawfare has been adapted to refer to the continuation of political battles by ethno-political actors through legal means, in this case, constitutional adjudication. In Kosovo, the Constitutional Court has been an important defender of diversity, albeit its primary focus and merit are to have contributed to the establishment of a concept of democracy close to the people of Kosovo. The article concludes that constitutional courts represent important institutions of internal conflict resolution in divided societies, which have been instrumental in shaping multiculturalism in these post-conflict societies divided by deep ethnic cleavages.
Resumo:
Basic grammatical categories may carry social meaning irrespective of their semantic content. In a set of four studies, we demonstrate that verbs – a basic linguistic category present and distinguishable in most languages – are related to the perception of agency, a fundamental dimension in social perception. In an archival analysis on actual language use in Polish and German, we found that targets stereotypically associated with high agency (men and young people) are presented in the immediate neighborhood of a verb more often than non-agentic social targets (women and old people). Moreover, in three experiments using a pseudo-word paradigm, verbs (but not adjectives and nouns) were consistently associated with agency (but not communion). These results provide consistent evidence that verbs, as grammatical vehicles of action, are linguistic markers of agency. In demonstrating meta-semantic effects of language, these studies corroborate the view of language as a social tool and of language as an integral part of social perception.