962 resultados para Spanish as a foreign language (SFL)
Resumo:
This paper presents a dynamic LM adaptation based on the topic that has been identified on a speech segment. We use LSA and the given topic labels in the training dataset to obtain and use the topic models. We propose a dynamic language model adaptation to improve the recognition performance in "a two stages" AST system. The final stage makes use of the topic identification with two variants: the first on uses just the most probable topic and the other one depends on the relative distances of the topics that have been identified. We perform the adaptation of the LM as a linear interpolation between a background model and topic-based LM. The interpolation weight id dynamically adapted according to different parameters. The proposed method is evaluated on the Spanish partition of the EPPS speech database. We achieved a relative reduction in WER of 11.13% over the baseline system which uses a single blackground LM.
Resumo:
In recent years, coinciding with adjustments to the Bologna process, many European universities have attempted to improve their international profile by increasing course offerings in English. According to the Institute of International Education (IIE), Spain has notably increased its English-taught higher education programs, ranking fifth in the list of European countries by number of English-taught Master's programs in 2013. This article presents the goals and preliminary results of an on-going innovative education project (TechEnglish) that aims to promote course offerings in English at the Technical University of Madrid (Universidad Politécnica de Madrid, UPM). The UPM is the oldest and largest of all Technical Universities in Spain. It offers graduate and postgraduate programs that cover all the engineering disciplines as well as architecture. Currently, the UPM has no specific bilingual/multilingual program to promote teaching in English, although there is an Educational Model Whitepaper (with a focus on undergraduate degrees) that promotes the development of activities like an International Semester or a unique shared curriculum. The TechEnglish project is an attempt to foster courses taught in English at 7 UPM Technical Schools, including students and 80 faculty members. Four tasks were identified: (1) to design a university wide framework to increase course offerings, (2) to identify administrative difficulties, (3) to increase visibility of courses offered, and (4) to disseminate the results of the project. First, to design a program we analyzed existing programs at other Spanish universities, and other projects and efforts already under way at the UPM. A total of 13 plans were analyzed and classified according to their relation with students (learning), professors (teaching), administration, course offerings, other actors/institutions within the university (e.g., language departments), funds and projects, dissemination activities, mobility plans and quality control. Second, to begin to identify administrative and organizational difficulties in the implementation of teaching in English, we first estimated the current and potential course offerings at the undergraduate level at the UPM using a survey (student, teacher and administrative demand, level of English and willingness to work in English). Third, to make the course offerings more attractive for both Spanish and international students we examined the way the most prestigious universities in Spain and in Europe try to improve the visibility of their academic offerings in English. Finally, to disseminate the results of the project we created a web page and a workspace on the Moodle education platform and prepared conferences and workshops within the UPM. Preliminary results show that increasing course offerings in English is an important step to promote the internationalization of the University. The main difficulties identified at the UPM were related to how to acknowledge/certify the departments, teachers or students involved in English courses, how students should register for the courses, how departments should split and schedule the courses (Spanish and English), and the lack of qualified personnel. A concerted effort could be made to increase the visibility of English-taught programs offered on-line.
Resumo:
This thesis contains a translatological analysis of the Spanish proverbs collected by Charles Cahier in Quelque six mille proverbes et aphorismes usuels empruntés à notre âge et aux siècles derniers. Proverbs and other sententious sayings are part of our day-to-day life, and are more or less intensely used according to cultures and their types of speech. They have always existed in every single civilisation. There is, indeed, no denying that the purpose of proverbs is to convey an old experience. They are quoted by major philosophers and writers of all times. As a result of the interest they have raised, books of proverbs have been published for many centuries in a high number of countries. Proverbs can be found everywhere, both in professional and personal settings, or in a conversation between friends. In France, these sayings are more commonly used in literature than in spoken language, whereas in Spain proverbs can be found at all levels of communication. In this regard, it is interesting to compare the translations of international works to detect a number of misunderstandings regarding the interpretation of paremiological elements. This is why translating proverbs is a genuine, complex issue. This thesis, which is aimed at Spanish and French speakers (including native and foreign speakers), has a double application (translatological and linguistic) and falls within the context of translatological and comparative paremiology...
Resumo:
This paper reflects upon the increasing diversity of the United States and the subsequent necessity for mental health providers who can provide psychotherapy services in more than one language. Review of the current literature of clinicians who provide bilingual services highlight the challenges and rewards of working in a second language. The literature focuses on the experiences of those bilingual clinicians who are bilingual in English and Spanish. However, there is little to no research concerning clinicians who can provide psychotherapy in three languages. This writer speaks of her experience growing up in a bilingual Vietnamese-English household in Southern California and her journey of becoming fluent in Spanish. Lastly, she provides recommendations to training programs on how to support trainees who aim to provide psychotherapy services in multiple languages.
Resumo:
Abundant research has shown that poverty has negative influences on young child academic and psychosocial development, and unfortunately, disparities in school readiness between low and high income children can be seen as early the first year of life. The largest federal early care and education intervention for these vulnerable children is Early Head Start (EHS). To diminish these disparate child outcomes, EHS seeks to provide community based flexible programming for infants and toddlers and their families. Given how relatively recent these programs have been offered, little is known about the nuances of how EHS impacts infant and toddler language and psychosocial development. Using a framework of Community Based Participatory Research (CBPR) this paper had 5 goals: 1) to characterize the associations between domain specific and cumulative risk and child outcomes 2) to validate and explore these risk-outcome associations separately for Children of Hispanic immigrants (COHIs), 3) to explore relationships among family characteristics, multiple environmental factors, and dosage patterns in different EHS program types, 4) to examine the relationship between EHS dosage and child outcomes, and 5) to examine how EHS compliance impacts child internalizing and externalizing behaviors and emerging language abilities. Results of the current study showed that risks were differentially related to child outcomes. Poor maternal mental health was related to child internalizing and externalizing behaviors, but not related to emerging child language skills. Although child language skills were not related to maternal mental health, they were related to economic hardship. Additionally, parent level Spanish use and heritage orientation were associated with positive child outcomes. Results also showed that these relationships differed when COHIs and children with native-born parents were examined separately. Further, unique patterns emerged for EHS program use, for example families who participated in home-based care were less likely to comply with EHS attendance requirements. These findings provide tangible suggestions for EHS stakeholders: namely, the need to develop effective programming that targets engagement for diverse families enrolled in EHS programs.
Resumo:
This paper shows a system about the recognition of temporal expressions in Spanish and the resolution of their temporal reference. For the identification and recognition of temporal expressions we have based on a temporal expression grammar and for the resolution on an inference engine, where we have the information necessary to do the date operation based on the recognized expressions. For further information treatment, the output is proposed by means of XML tags in order to add standard information of the resolution obtained. Different kinds of annotation of temporal expressions are explained in another articles [WILSON2001][KATZ2001]. In the evaluation of our proposal we have obtained successful results.
Resumo:
In this paper we present a whole Natural Language Processing (NLP) system for Spanish. The core of this system is the parser, which uses the grammatical formalism Lexical-Functional Grammars (LFG). Another important component of this system is the anaphora resolution module. To solve the anaphora, this module contains a method based on linguistic information (lexical, morphological, syntactic and semantic), structural information (anaphoric accessibility space in which the anaphor obtains the antecedent) and statistical information. This method is based on constraints and preferences and solves pronouns and definite descriptions. Moreover, this system fits dialogue and non-dialogue discourse features. The anaphora resolution module uses several resources, such as a lexical database (Spanish WordNet) to provide semantic information and a POS tagger providing the part of speech for each word and its root to make this resolution process easier.
Resumo:
Comunicación presentada en Cross-Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, September 17-19, 2008.
Resumo:
There is no question nowadays as to the international and powerful status of English at a global scale and, consequently, as to its presence in non-English speaking countries at different levels. Linguistically speaking, English is one of the languages which have mostly influenced Spanish throughout its history and especially from the late 1960s. In this study, the impact of English on Spanish is considered in the language of sports; particularly, sports Anglicisms and false Anglicisms are analysed. Due attention is paid to the different forms that an Anglicism may adopt and to which of those forms are more widely accepted or rejected by prescriptivists and speakers at large, in the light of a contrastive analysis of their appearance in the Nuevo diccionario de anglicismos, the Diccionario de la Real Academia Española and the Corpus de Referencia del Español Actual.
Resumo:
In this paper we describe Fénix, a data model for exchanging information between Natural Language Processing applications. The format proposed is intended to be flexible enough to cover both current and future data structures employed in the field of Computational Linguistics. The Fénix architecture is divided into four separate layers: conceptual, logical, persistence and physical. This division provides a simple interface to abstract the users from low-level implementation details, such as programming languages and data storage employed, allowing them to focus in the concepts and processes to be modelled. The Fénix architecture is accompanied by a set of programming libraries to facilitate the access and manipulation of the structures created in this framework. We will also show how this architecture has been already successfully applied in different research projects.
Resumo:
The English language and the Internet, both separately and taken together, are nowadays well-acknowledged as powerful forces which influence and affect the lexico-grammatical characteristics of other languages world-wide. In fact, many authors like Crystal (2004) have pointed out the emergence of the so-called Netspeak, that is, the language used in the Net or World Wide Web; as Crystal himself (2004: 19) puts it, ‘a type of language displaying features that are unique to the Internet […] arising out of its character as a medium which is electronic, global and interactive’. This ‘language’, however, may be differently understood: either as an adaptation of the English language proper to internet requirements and purposes, or as a new and rapidly-changing and developing language as a result of a rapid evolution or adaptation to Internet requirements of almost all world languages, for whom English is a trendsetter. If the second and probably most plausible interpretation is adopted, there are three salient features of ‘Netspeak’: (a) the rapid expansion of all its new linguistic developments thanks to the Internet itself, which may lead to the generalization and widespread acceptance of new words, coinages, or meanings, hundreds of times faster than was the case with the printed media. As said above, (b) the visible influence of English, the most prevalent language on the Internet. Consequently, (c) this new language tends to reduce the ‘distance’ between English and other languages as well as the ignorance of the former by speakers of other languages, since the ‘Netspeak’ version of the latter adopts grammatical, syntactic and lexical features of English. Thus, linguistic differences may even disappear when code-switching and/or borrowing occurs, as whole fragments of English appear in other language contexts. As a consequence of the new situation, an ideal context appears for interlanguage or multilingual word formation to thrive: puns, blends, compounds and word creativity in general find in the web the ideal place to gain rapid acceptance world-wide, as a result of fashion, coincidence, or sheer merit of the new linguistic proposals.
Resumo:
Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Records are written in spontaneous language. We explore several machine learning techniques to deal with the classification problem. Random Forest resulted in the most competitive one, achieving an F-measure of 0.876.
Resumo:
This introduction provides an overview of the state-of-the-art technology in Applications of Natural Language to Information Systems. Specifically, we analyze the need for such technologies to successfully address the new challenges of modern information systems, in which the exploitation of the Web as a main data source on business systems becomes a key requirement. It will also discuss the reasons why Human Language Technologies themselves have shifted their focus onto new areas of interest very directly linked to the development of technology for the treatment and understanding of Web 2.0. These new technologies are expected to be future interfaces for the new information systems to come. Moreover, we will review current topics of interest to this research community, and will present the selection of manuscripts that have been chosen by the program committee of the NLDB 2011 conference as representative cornerstone research works, especially highlighting their contribution to the advancement of such technologies.
Resumo:
This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.
Resumo:
Natural Language Interfaces to Query Databases (NLIDBs) have been an active research field since the 1960s. However, they have not been widely adopted. This article explores some of the biggest challenges and approaches for building NLIDBs and proposes techniques to reduce implementation and adoption costs. The article describes {AskMe*}, a new system that leverages some of these approaches and adds an innovative feature: query-authoring services, which lower the entry barrier for end users. Advantages of these approaches are proven with experimentation. Results confirm that, even when {AskMe*} is automatically reconfigurable against multiple domains, its accuracy is comparable to domain-specific NLIDBs.