971 resultados para Language Model
Resumo:
This paper examines the adaptations of the writing system in Internet language in mainland China from a sociolinguistic perspective. A comparison is also made of the adaptations in mainland China with those that Su (2003) found in Taiwan. In Computer-Mediated Communication (CMC), writing systems are often adapted to compensate for their inherent inadequacies (such as difficulty in input). Su (2003) investigates the creative uses of the writing system on the electronic bulletin boards (BBS) of two college student organizations in Taipei, Taiwan, and identifies four popular and creative uses of the Chinese writing system: stylized English, stylized Taiwanese-accented Mandarin, stylized Taiwanese, and the recycling of a transliteration alphabet used in elementary education. According to Coupland (2001; cited in Su 2003), stylization is “the knowing deployment of culturally familiar styles and identities that are marked as deviating from those predictably associated with the current speaking context”. Within this framework and drawing on the data in previous publications on Internet language and online sources, this study identifies five types of adaptations in mainland China’s Internet language: stylized Mandarin (e.g., 漂漂 piāopiāo for 漂亮 ‘beautiful’), stylized dialect-accented Mandarin (e.g., 灰常 huīcháng for 非常 ‘very much’), stylized English (e.g., 伊妹儿 yīmèier for ‘email’), stylized initials (e.g., bt 变态 biàntài for ‘abnormal’; pk, short form for ‘player kill’), and stylized numbers (e.g., 9494 jiùshi jiùshi 就是就是 ‘that is it’). The Internet community is composed of highly mobile individuals and thus forms a weak-tie social network. According to Milroy and Milroy (1992), a social network with weak ties is often where language innovation takes place. Adaptations of the Chinese writing system in Internet language provide interesting evidence for the innovations within a weak-tie social network. Our comparison of adaptations in mainland China and Taiwan shows that, in maximizing the effectiveness and functionality of their communication, participants of Internet communication are confronted with different language resources and situations, including differences in Romanization systems, English proficiency level, and attitudes towards English usage. As argued by Milroy and Milroy (1992), a weak-tie social network model can bridge the social class and social network. In the Internet community, the degree of diversity of the stylized linguistic varieties indexes the virtual and/or social status of its participants: the more diversified one’s Internet language is, the higher is his/her virtual and/or social status.
Resumo:
Stemmatology, or the reconstruction of the transmission history of texts, is a field that stands particularly to gain from digital methods. Many scholars already take stemmatic approaches that rely heavily on computational analysis of the collated text (e.g. Robinson and O’Hara 1996; Salemans 2000; Heikkilä 2005; Windram et al. 2008 among many others). Although there is great value in computationally assisted stemmatology, providing as it does a reproducible result and allowing access to the relevant methodological process in related fields such as evolutionary biology, computational stemmatics is not without its critics. The current state-of-the-art effectively forces scholars to choose between a preconceived judgment of the significance of textual differences (the Lachmannian or neo-Lachmannian approach, and the weighted phylogenetic approach) or to make no judgment at all (the unweighted phylogenetic approach). Some basis for judgment of the significance of variation is sorely needed for medieval text criticism in particular. By this, we mean that there is a need for a statistical empirical profile of the text-genealogical significance of the different sorts of variation in different sorts of medieval texts. The rules that apply to copies of Greek and Latin classics may not apply to copies of medieval Dutch story collections; the practices of copying authoritative texts such as the Bible will most likely have been different from the practices of copying the Lives of local saints and other commonly adapted texts. It is nevertheless imperative that we have a consistent, flexible, and analytically tractable model for capturing these phenomena of transmission. In this article, we present a computational model that captures most of the phenomena of text variation, and a method for analysis of one or more stemma hypotheses against the variation model. We apply this method to three ‘artificial traditions’ (i.e. texts copied under laboratory conditions by scholars to study the properties of text variation) and four genuine medieval traditions whose transmission history is known or deduced in varying degrees. Although our findings are necessarily limited by the small number of texts at our disposal, we demonstrate here some of the wide variety of calculations that can be made using our model. Certain of our results call sharply into question the utility of excluding ‘trivial’ variation such as orthographic and spelling changes from stemmatic analysis.
Resumo:
When masculine forms are used to refer to men and women, this causes male-biased cognitive representations and behavioral consequences, as numerous studies have shown. This effect can be avoided or reduced with the help of gender-fair language. In this talk, we will present different approaches that aim at influencing people’s use of and attitudes towards gender-fair language. Firstly, we tested the influence of gender-fair input on people’s own use of gender-fair language. Based on Irmen and Linner’s (2005) adaptation of the scenario mapping and focus approach (Sanford & Garrod, 1998), we found that after reading a text with gender-fair forms women produced more gender-fair forms than women who read gender-neutral texts or texts containing masculine generics. Men were not affected. Secondly, we examined reactions to arguments which followed the Elaboration Likelihood Model (Petty &Cacioppo, 1986). We assumed that strong pros and cons would be more effective than weak arguments or control statements. The results indicated that strong pros could convince some, but not all participants, suggesting a complex interplay of diverse factors in reaction to attempts at persuasion. The influence of people’s initial characteristics will be discussed. Currently, we are investigating how self-generated refutations, in addition to arguments, may influence initial attitudes. Based on the resistance appraisal hypothesis (Tormala, 2008), we assume that individuals are encouraged in their initial attitude if they manage to refute strong counter-arguments. The results of our studies will be discussed regarding their practical implications.
Resumo:
Recent studies using diffusion tensor imaging (DTI) have advanced our knowledge of the organization of white matter subserving language function. It remains unclear, however, how DTI may be used to predict accurately a key feature of language organization: its asymmetric representation in one cerebral hemisphere. In this study of epilepsy patients with unambiguous lateralization on Wada testing (19 left and 4 right lateralized subjects; no bilateral subjects), the predictive value of DTI for classifying the dominant hemisphere for language was assessed relative to the existing standard-the intra-carotid Amytal (Wada) procedure. Our specific hypothesis is that language laterality in both unilateral left- and right-hemisphere language dominant subjects may be predicted by hemispheric asymmetry in the relative density of three white matter pathways terminating in the temporal lobe implicated in different aspects of language function: the arcuate (AF), uncinate (UF), and inferior longitudinal fasciculi (ILF). Laterality indices computed from asymmetry of high anisotropy AF pathways, but not the other pathways, classified the majority (19 of 23) of patients using the Wada results as the standard. A logistic regression model incorporating information from DTI of the AF, fMRI activity in Broca's area, and handedness was able to classify 22 of 23 (95.6%) patients correctly according to their Wada score. We conclude that evaluation of highly anisotropic components of the AF alone has significant predictive power for determining language laterality, and that this markedly asymmetric distribution in the dominant hemisphere may reflect enhanced connectivity between frontal and temporal sites to support fluent language processes. Given the small sample reported in this preliminary study, future research should assess this method on a larger group of patients, including subjects with bi-hemispheric dominance.
Resumo:
Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.
Resumo:
When people use generic masculine language instead of more gender-inclusive forms, they communicate gender stereotypes and sometimes exclusion of women from certain social roles. Past research related gender-inclusive language use to sexist beliefs and attitudes. Given that this aspect of language use may be transparent to users, it is unclear whether people explicitly act on these beliefs when using gender-exclusive language forms or whether these are more implicit, habitual patterns. In two studies with German-speaking participants, we showed that spontaneous use of gender-inclusive personal nouns is guided by explicitly favorable intentions as well as habitual processes involving past use of such language. Further indicating the joint influence of deliberate and habitual processes, Study 2 revealed that language-use intentions are embedded in explicit sexist ideologies. As anticipated in our decision-making model, the effects of sexist beliefs on language emerged through deliberate mechanisms involving attitudes and intentions.
Resumo:
The aim was to examine to what extent the dimensions of the BPS map the five factors derived from the PANSS in order to explore the level of agreement of these alternative dimensional approaches in patients with schizophrenia. 149 inpatients with schizophrenia spectrum disorders were recruited. Psychopathological symptoms were assessed with the Bern Psychopathology Scale (BPS) and the Positive and Negative Syndrome Scale (PANSS). Linear regression analyses were conducted to explore the association between the factors and the items of the BPS. The robustness of patterns was evaluated. An understandable overlap of both approaches was found for positive and negative symptoms and excitement. The PANSS positive factor was associated with symptoms of the affect domain in terms of both inhibition and disinhibition, the PANSS negative factor with symptoms of all three domains of the BPS as an inhibition and the PANSS excitement factor with an inhibition of the affect domain and a disinhibition of the language and motor domains. The results show that here is only a partial overlap between the system-specific approach of the BPS and the five-factor PANSS model. A longitudinal assessment of psychopathological symptoms would therefore be of interest.
Resumo:
The goal of the present thesis was to investigate the production of code-switched utterances in bilinguals’ speech production. This study investigates the availability of grammatical-category information during bilingual language processing. The specific aim is to examine the processes involved in the production of Persian-English bilingual compound verbs (BCVs). A bilingual compound verb is formed when the nominal constituent of a compound verb is replaced by an item from the other language. In the present cases of BCVs the nominal constituents are replaced by a verb from the other language. The main question addressed is how a lexical element corresponding to a verb node can be placed in a slot that corresponds to a noun lemma. This study also investigates how the production of BCVs might be captured within a model of BCVs and how such a model may be integrated within incremental network models of speech production. In the present study, both naturalistic and experimental data were used to investigate the processes involved in the production of BCVs. In the first part of the present study, I collected 2298 minutes of a popular Iranian TV program and found 962 code-switched utterances. In 83 (8%) of the switched cases, insertions occurred within the Persian compound verb structure, hence, resulting in BCVs. As to the second part of my work, a picture-word interference experiment was conducted. This study addressed whether in the case of the production of Persian-English BCVs, English verbs compete with the corresponding Persian compound verbs as a whole, or whether English verbs compete with the nominal constituents of Persian compound verbs only. Persian-English bilinguals named pictures depicting actions in 4 conditions in Persian (L1). In condition 1, participants named pictures of action using the whole Persian compound verb in the context of its English equivalent distractor verb. In condition 2, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically closely related English distractor verb. In condition 3, the whole Persian compound verb was produced in the context of a semantically unrelated English distractor verb. In condition 4, only the nominal constituent was produced in the presence of the light verb of the target Persian compound verb and in the context of a semantically unrelated English distractor verb. The main effect of linguistic unit was significant by participants and items. Naming latencies were longer in the nominal linguistic unit compared to the compound verb (CV) linguistic unit. That is, participants were slower to produce the nominal constituent of compound verbs in the context of a semantically closely related English distractor verb compared to producing the whole compound verbs in the context of a semantically closely related English distractor verb. The three-way interaction between version of the experiment (CV and nominal versions), linguistic unit (nominal and CV linguistic units), and relation (semantically related and unrelated distractor words) was significant by participants. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to the response latencies in the semantically related CV linguistic unit. In both versions, naming latencies were longer in the semantically related nominal linguistic unit compared to response latencies in the semantically unrelated nominal linguistic unit. Both the analysis of the naturalistic data and the results of the experiment revealed that in the case of the production of the nominal constituent of BCVs, a verb from the other language may compete with a noun from the base language, suggesting that grammatical category does not necessarily provide a constraint on lexical access during the production of the nominal constituent of BCVs. There was a minimal context in condition 2 (the nominal linguistic unit) in which the nominal constituent was produced in the presence of its corresponding light verb. The results suggest that generating words within a context may not guarantee that the effect of grammatical class becomes available. A model is proposed in order to characterize the processes involved in the production of BCVs. Implications for models of bilingual language production are discussed.
Resumo:
The European Union has been promoting linguistic diversity for many years as one of its main educational goals. This is an element that facilitates student mobility and student exchanges between different universities and countries and enriches the education of young undergraduates. In particular, a higher degree of competence in the English language is becoming essential for engineers, architects and researchers in general, as English has become the lingua franca that opens up horizons to internationalisation and the transfer of knowledge in today’s world. Many experts point to the Integrated Approach to Contents and Foreign Languages System as being an option that has certain benefits over the traditional method of teaching a second language that is exclusively based on specific subjects. This system advocates teaching the different subjects in the syllabus in a language other than one’s mother tongue, without prioritising knowledge of the language over the subject. This was the idea that in the 2009/10 academic year gave rise to the Second Language Integration Programme (SLI Programme) at the Escuela Arquitectura Técnica in the Universidad Politécnica Madrid (EUATM-UPM), just at the beginning of the tuition of the new Building Engineering Degree, which had been adapted to the European Higher Education Area (EHEA) model. This programme is an interdisciplinary initiative for the set of subjects taught during the semester and is coordinated through the Assistant Director Office for Educational Innovation. The SLI Programme has a dual goal; to familiarise students with the specific English terminology of the subject being taught, and at the same time improve their communication skills in English. A total of thirty lecturers are taking part in the teaching of eleven first year subjects and twelve in the second year, with around 120 students who have voluntarily enrolled in a special group in each semester. During the 2010/2011 academic year the degree of acceptance and the results of the SLI Programme have been monitored. Tools have been designed to aid interdisciplinary coordination and to analyse satisfaction, such as coordination records and surveys. The results currently available refer to the first and second year and are divided into specific aspects of the different subjects involved and into general aspects of the ongoing experience.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
This article describes a knowledge-based method for generating multimedia descriptions that summarize the behavior of dynamic systems. We designed this method for users who monitor the behavior of a dynamic system with the help of sensor networks and make decisions according to prefixed management goals. Our method generates presentations using different modes such as text in natural language, 2D graphics and 3D animations. The method uses a qualitative representation of the dynamic system based on hierarchies of components and causal influences. The method includes an abstraction generator that uses the system representation to find and aggregate relevant data at an appropriate level of abstraction. In addition, the method includes a hierarchical planner to generate a presentation using a model with dis- course patterns. Our method provides an efficient and flexible solution to generate concise and adapted multimedia presentations that summarize thousands of time series. It is general to be adapted to differ- ent dynamic systems with acceptable knowledge acquisition effort by reusing and adapting intuitive rep- resentations. We validated our method and evaluated its practical utility by developing several models for an application that worked in continuous real time operation for more than 1 year, summarizing sen- sor data of a national hydrologic information system in Spain.
Resumo:
A new formalism, called Hiord, for defining type-free higherorder logic programming languages with predicate abstraction is introduced. A model theory, based on partial combinatory algebras, is presented, with respect to which the formalism is shown sound. A programming language built on a subset of Hiord, and its implementation are discussed. A new proposal for defining modules in this framework is considered, along with several examples.
Resumo:
Andorra-I is the first implementation of a language based on the Andorra Principie, which states that determinate goals can (and shonld) be run before other goals, and even in a parallel fashion. This principie has materialized in a framework called the Basic Andorra model, which allows or-parallelism as well as (dependent) and-parallelism for determinate goals. In this report we show that it is possible to further extend this model in order to allow general independent and-parallelism for nondeterminate goals, withont greatly modifying the underlying implementation machinery. A simple an easy way to realize such an extensión is to make each (nondeterminate) independent goal determinate, by using a special "bagof" constract. We also show that this can be achieved antomatically by compile-time translation from original Prolog programs. A transformation that fulfüls this objective and which can be easily antomated is presented in this report.
Resumo:
There have been several previous proposals for the integration of Object Oriented Programming features into Logic Programming, resulting in much support theory and several language proposals. However, none of these proposals seem to have made it into the mainstream. Perhaps one of the reasons for these is that the resulting languages depart too much from the standard logic programming languages to entice the average Prolog programmer. Another reason may be that most of what can be done with object-oriented programming can already be done in Prolog through the meta- and higher-order programming facilities that the language includes, albeit sometimes in a more cumbersome way. In light of this, in this paper we propose an alternative solution which is driven by two main objectives. The first one is to include only those characteristics of object-oriented programming which are cumbersome to implement in standard Prolog systems. The second one is to do this in such a way that there is minimum impact on the syntax and complexity of the language, i.e., to introduce the minimum number of new constructs, declarations, and concepts to be learned. Finally, we would like the implementation to be as straightforward as possible, ideally based on simple source to source expansions.
Resumo:
The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.