18 resultados para Portuguese as a non-native language
em Universidad Politécnica de Madrid
Resumo:
For most of us, speaking in a non-native language involves deviating to some extent from native pronunciation norms. However, the detailed basis for foreign accent (FA) remains elusive, in part due to methodological challenges in isolating segmental from suprasegmental factors. The current study examines the role of segmental features in conveying FA through the use of a generative approach in which accent is localised to single consonantal segments. Three techniques are evaluated: the first requires a highly-proficiency bilingual to produce words with isolated accented segments; the second uses cross-splicing of context-dependent consonants from the non-native language into native words; the third employs hidden Markov model synthesis to blend voice models for both languages. Using English and Spanish as the native/non-native languages respectively, listener cohorts from both languages identified words and rated their degree of FA. All techniques were capable of generating accented words, but to differing degrees. Naturally-produced speech led to the strongest FA ratings and synthetic speech the weakest, which we interpret as the outcome of over-smoothing. Nevertheless, the flexibility offered by synthesising localised accent encourages further development of the method.
Resumo:
Management of certain populations requires the preservation of its pure genetic background. When, for different reasons, undesired alleles are introduced, the original genetic conformation must be recovered. The present study tested, through computer simulations, the power of recovery (the ability for removing the foreign information) from genealogical data. Simulated scenarios comprised different numbers of exogenous individuals taking partofthe founder population anddifferent numbers of unmanaged generations before the removal program started. Strategies were based on variables arising from classical pedigree analyses such as founders? contribution and partial coancestry. The ef?ciency of the different strategies was measured as the proportion of native genetic information remaining in the population. Consequences on the inbreeding and coancestry levels of the population were also evaluated. Minimisation of the exogenous founders? contributions was the most powerful method, removing the largest amount of genetic information in just one generation.However, as a side effect, it led to the highest values of inbreeding. Scenarios with a large amount of initial exogenous alleles (i.e. high percentage of non native founders), or many generations of mixing became very dif?cult to recover, pointing out the importance of being careful about introgression events in population
Resumo:
This paper describes the participation of DAEDALUS at the LogCLEF lab in CLEF 2011. This year, the objectives of our participation are twofold. The first topic is to analyze if there is any measurable effect on the success of the search queries if the native language and the interface language chosen by the user are different. The idea is to determine if this difference may condition the way in which the user interacts with the search application. The second topic is to analyze the user context and his/her interaction with the system in the case of successful queries, to discover out any relation among the user native language, the language of the resource involved and the interaction strategy adopted by the user to find out such resource. Only 6.89% of queries are successful out of the 628,607 queries in the 320,001 sessions with at least one search query in the log. The main conclusion that can be drawn is that, in general for all languages, whether the native language matches the interface language or not does not seem to affect the success rate of the search queries. On the other hand, the analysis of the strategy adopted by users when looking for a particular resource shows that people tend to use the simple search tool, frequently first running short queries build up of just one specific term and then browsing through the results to locate the expected resource
Resumo:
El paradigma de procesamiento de eventos CEP plantea la solución al reto del análisis de grandes cantidades de datos en tiempo real, como por ejemplo, monitorización de los valores de bolsa o el estado del tráfico de carreteras. En este paradigma los eventos recibidos deben procesarse sin almacenarse debido a que el volumen de datos es demasiado elevado y a las necesidades de baja latencia. Para ello se utilizan sistemas distribuidos con una alta escalabilidad, elevado throughput y baja latencia. Este tipo de sistemas son usualmente complejos y el tiempo de aprendizaje requerido para su uso es elevado. Sin embargo, muchos de estos sistemas carecen de un lenguaje declarativo de consultas en el que expresar la computación que se desea realizar sobre los eventos recibidos. En este trabajo se ha desarrollado un lenguaje declarativo de consultas similar a SQL y un compilador que realiza la traducción de este lenguaje al lenguaje nativo del sistema de procesamiento masivo de eventos. El lenguaje desarrollado en este trabajo es similar a SQL, con el que se encuentran familiarizados un gran número de desarrolladores y por tanto aprender este lenguaje no supondría un gran esfuerzo. Así el uso de este lenguaje logra reducir los errores en ejecución de la consulta desplegada sobre el sistema distribuido al tiempo que se abstrae al programador de los detalles de este sistema.---ABSTRACT---The complex event processing paradigm CEP has become the solution for high volume data analytics which demand scalability, high throughput, and low latency. Examples of applications which use this paradigm are financial processing or traffic monitoring. A distributed system is used to achieve the performance requisites. These same requisites force the distributed system not to store the events but to process them on the fly as they are received. These distributed systems are complex systems which require a considerably long time to learn and use. The majority of such distributed systems lack a declarative language in which to express the computation to perform over incoming events. In this work, a new SQL-like declarative language and a compiler have been developed. This compiler translates this new language to the distributed system native language. Due to its similarity with SQL a vast amount of developers who are already familiar with SQL will need little time to learn this language. Thus, this language reduces the execution failures at the time the programmer no longer needs to know every single detail of the underlying distributed system to submit a query.
Resumo:
Esta tesis doctoral, que es la culminación de mis estudios de doctorado impartidos por el Departamento de Lingüística Aplicada a la Ciencia y a la Tecnología de la Universidad Politécnica de Madrid, aborda el análisis del uso de la matización (hedging) en el lenguaje legal inglés siguiendo los postulados y principios de la análisis crítica de género (Bhatia, 2004) y empleando las herramientas de análisis de córpora WordSmith Tools versión 6 (Scott, 2014). Como refleja el título, el estudio se centra en la descripción y en el análisis contrastivo de las variedades léxico-sintácticas de los matizadores del discurso (hedges) y las estrategias discursivas que con ellos se llevan a cabo, además de las funciones que éstas desempeñan en un corpus de sentencias del Tribunal Supremo de EE. UU., y de artículos jurídicos de investigación americanos, relacionando, en la medida posible, éstas con los rasgos determinantes de los dos géneros, desde una perspectiva socio-cognitiva. El elemento innovador que ofrece es que, a pesar de los numerosos estudios que se han podido realizar sobre los matizadores del discurso en el inglés general (Lakoff, 1973; Hübler, 1983; Clemen, 1997; Markkanen and Schröder, 1997; Mauranen, 1997; Fetzer 2010; y Finnegan, 2010 entre otros) académico (Crompton, 1997; Meyer, 1997; Skelton, 1997; Martín Butragueňo, 2003) científico (Hyland, 1996a, 1996c, 1998c, 2007; Grabe and Kaplan, 1997; Salager-Meyer, 1997 Varttala, 2001) médico (Prince, 1982; Salager-Meyer, 1994; Skelton, 1997), y, en menor medida el inglés legal (Toska, 2012), no existe ningún tipo de investigación que vincule los distintos usos de la matización a las características genéricas de las comunicaciones profesionales. Dentro del lenguaje legal, la matización confirma su dependencia tanto de las expectativas a macro-nivel de la comunidad de discurso, como de las intenciones a micro-nivel del escritor de la comunicación, variando en función de los propósitos comunicativos del género ya sean éstos educativos, pedagógicos, interpersonales u operativos. El estudio pone de relieve el uso predominante de los verbos modales epistémicos y de los verbos léxicos como matizadores del discurso, estos últimos divididos en cuatro tipos (Hyland 1998c; Palmer 1986, 1990, 2001) especulativos, citativos, deductivos y sensoriales. La realización léxico-sintáctica del matizador puede señalar una de cuatro estrategias discursivas particulares (Namsaraev, 1997; Salager-Meyer, 1994), la indeterminación, la despersonalización, la subjectivisación, o la matización camuflada (camouflage hedging), cuya incidencia y función varia según género. La identificación y cuantificación de los distintos matizadores y estrategias empleados en los diferentes géneros del discurso legal puede tener implicaciones pedagógicos para los estudiantes de derecho no nativos que tienen que demostrar una competencia adecuada en su uso y procesamiento. ABSTRACT This doctoral thesis, which represents the culmination of my doctoral studies undertaken in the Department of Linguistics Applied to Science and Technology of the Universidad Politécnica de Madrid, focusses on the analysis of hedging in legal English following the principles of Critical Genre Analysis (Bhatia, 2004), and using WordSmith Tools version 6 (Scott, 2014) corpus analysis tools. As the title suggests, this study centers on the description and contrastive analysis of lexico-grammatical realizations of hedges and the discourse strategies which they can indicate, as well as the functions they can carry out, in a corpus of U.S. Supreme Court opinions and American law review articles. The study relates realization, incidence and function of hedging to the predominant generic characteristics of the two genres from a socio-cognitive perspective. While there have been numerous studies on hedging in general English (Lakoff, 1973; Hübler, 1983; Clemen, 1997; Markkanen and Schröder, 1997; Mauranen, 1997; Fetzer 2010; and Finnegan, 2010 among others) academic English (Crompton, 1997; Meyer, 1997; Skelton, 1997; Martín Butragueňo, 2003) scientific English (Hyland, 1996a, 1996c, 1998c, 2007; Grabe and Kaplan, 1997; Salager-Meyer, 1997 Varttala, 2001) medical English (Prince, 1982; Salager-Meyer, 1994; Skelton, 1997), and, to a lesser degree, legal English (Toska, 2012), this study is innovative in that it links the different realizations and functions of hedging to the generic characteristics of a particular professional communication. Within legal English, hedging has been found to depend on not only the macro-level expectations of the discourse community for a specific genre, but also on the micro-level intentions of the author of a communication, varying according to the educational, pedagogical, interpersonal or operative purposes the genre may have. The study highlights the predominance of epistemic modal verbs and lexical verbs as hedges, dividing the latter into four types (Hyland, 1998c; Palmer, 1986, 1990, 2001): speculative, quotative, deductive and sensorial. Lexical-grammatical realizations of hedges can signal one of four discourse strategies (Namsaraev, 1997; Salager-Meyer, 1994), indetermination, depersonalization, subjectivization and camouflage hedging, as well as fulfill a variety of functions. The identification and quantification of the different hedges and hedging strategies and functions in the two genres may have pedagogical implications for non-native law students who must demonstrate adequate competence in the production and interpretation of hedged discourse.
Resumo:
El objetivo del presente trabajo es el estudio, diseño e implementación de una herramienta software, con interfaz gráfica de usuario, que permita aplicar diversas técnicas de análisis de textos de forma simple. Las técnicas de análisis, que serán implementadas en la herramienta, extraerán información de textos escritos en un lenguaje humano, es decir un lenguaje no artificial, y se le presentará al usuario. La herramienta permite la obtención de tres tipos de información: categorías a las que pertenece un texto, dentro de un conjunto de categorías predeterminadas; grupos de textos que son similares entre sí; y la polaridad de opinión expresada en un texto hacia el tema u objeto del que trata, que puede ser neutra, positiva o negativa.---ABSTRACT---The aim of this work is to study, design and implement a software tool, with graphical user interface, which will enable a user to easily apply various text analysis techniques. The techniques implemented in the tool will extract information from texts written in natural language, i.e. a non artificial language, and will present it to the user. The tool will extract three different types of information about a given set of texts: their categories (from a predefined set of categories), groups of similar texts, the polarity of the attitude expressed in the texts towards their topic.
Resumo:
Ozone (O3) phytototoxicity has been reported on a wide range of plantspecies, inducing the appearance of specific foliar injury or increasing leaf senescence. No information regarding the sensitivity of plantspecies from dehesa Mediterranean grasslands has been provided in spite of their great biological diversity. A screening study was carried out in open-top chambers (OTCs) to assess the O3-sensitivity of 22 representative therophytes of these ecosystems based on the appearance and extent of foliar injury. A distinction was made between specific O3injury and non-specific discolorations. Three O3 treatments (charcoal-filtered air, non-filtered air and non-filtered air supplemented with 40 nl l−1 O3 during 5 days per week) and three OTCs per treatment were used. The Papilionaceae species were more sensitive to O3 than the Poaceae species involved in the experiment since ambient levels induced foliar symptoms in 67% and 27%, respectively, of both plant families. An O3-sensitivity ranking of the species involved in the assessment is provided, which could be useful for bioindication programmes in Mediterranean areas. The assessed Trifoliumspecies were particularly sensitive since foliar symptoms were apparent in association with O3 accumulated exposures well below the current critical level for the prevention of this kind of effect. The exposure indices involving lower cut-off values (i.e. 30 nl l−1) were best related with the extent of O3-induced injury on these species.
Resumo:
This paper describes a preprocessing module for improving the performance of a Spanish into Spanish Sign Language (Lengua de Signos Espanola: LSE) translation system when dealing with sparse training data. This preprocessing module replaces Spanish words with associated tags. The list with Spanish words (vocabulary) and associated tags used by this module is computed automatically considering those signs that show the highest probability of being the translation of every Spanish word. This automatic tag extraction has been compared to a manual strategy achieving almost the same improvement. In this analysis, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not assigned to any sign. The preprocessing module has been incorporated into two well-known statistical translation architectures: a phrase-based system and a Statistical Finite State Transducer (SFST). This system has been developed for a specific application domain: the renewal of Identity Documents and Driver's License. In order to evaluate the system a parallel corpus made up of 4080 Spanish sentences and their LSE translation has been used. The evaluation results revealed a significant performance improvement when including this preprocessing module. In the phrase-based system, the proposed module has given rise to an increase in BLEU (Bilingual Evaluation Understudy) from 73.8% to 81.0% and an increase in the human evaluation score from 0.64 to 0.83. In the case of SFST, BLEU increased from 70.6% to 78.4% and the human evaluation score from 0.65 to 0.82.
Resumo:
This paper proposes the use of Factored Translation Models (FTMs) for improving a Speech into Sign Language Translation System. These FTMs allow incorporating syntactic-semantic information during the translation process. This new information permits to reduce significantly the translation error rate. This paper also analyses different alternatives for dealing with the non-relevant words. The speech into sign language translation system has been developed and evaluated in a specific application domain: the renewal of Identity Documents and Driver’s License. The translation system uses a phrase-based translation system (Moses). The evaluation results reveal that the BLEU (BiLingual Evaluation Understudy) has improved from 69.1% to 73.9% and the mSER (multiple references Sign Error Rate) has been reduced from 30.6% to 24.8%.
Resumo:
This paper describes a categorization module for improving the performance of a Spanish into Spanish Sign Language (LSE) translation system. This categorization module replaces Spanish words with associated tags. When implementing this module, several alternatives for dealing with non-relevant words have been studied. Non-relevant words are Spanish words not relevant in the translation process. The categorization module has been incorporated into a phrase-based system and a Statistical Finite State Transducer (SFST). The evaluation results reveal that the BLEU has increased from 69.11% to 78.79% for the phrase-based system and from 69.84% to 75.59% for the SFST.
Resumo:
Andorra-I is the first implementation of a language based on the Andorra Principie, which states that determinate goals can (and shonld) be run before other goals, and even in a parallel fashion. This principie has materialized in a framework called the Basic Andorra model, which allows or-parallelism as well as (dependent) and-parallelism for determinate goals. In this report we show that it is possible to further extend this model in order to allow general independent and-parallelism for nondeterminate goals, withont greatly modifying the underlying implementation machinery. A simple an easy way to realize such an extensión is to make each (nondeterminate) independent goal determinate, by using a special "bagof" constract. We also show that this can be achieved antomatically by compile-time translation from original Prolog programs. A transformation that fulfüls this objective and which can be easily antomated is presented in this report.
Resumo:
We discuss a framework for the application of abstract interpretation as an aid during program development, rather than in the more traditional application of program optimization. Program validation and detection of errors is first performed statically by comparing (partial) specifications written in terms of assertions against information obtained from (global) static analysis of the program. The results of this process are expressed in the user assertion language. Assertions (or parts of assertions) which cannot be checked statically are translated into run-time tests. The framework allows the use of assertions to be optional. It also allows using very general properties in assertions, beyond the predefined set understandable by the static analyzer and including properties defined by user programs. We also report briefly on an implementation of the framework. The resulting tool generates and checks assertions for Prolog, CLP(R), and CHIP/CLP(fd) programs, and integrates compile-time and run-time checking in a uniform way. The tool allows using properties such as types, modes, non-failure, determinacy, and computational cost, and can treat modules separately, performing incremental analysis.
Resumo:
We describe some of the novel aspects and motivations behind the design and implementation of the Ciao multiparadigm programming system. An important aspect of Ciao is that it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features can be turned on and off at will for each program module. Thus, a given module may be using e.g. higher order functions and constraints, while another module may be using objects, predicates, and concurrency. Furthermore, the language is designed to be extensible in a simple and modular way. Another important aspect of Ciao is its programming environment, which provides a powerful preprocessor (with an associated assertion language) capable of statically finding non-trivial bugs, verifying that programs comply with specifications, and performing many types of program optimizations. Such optimizations produce code that is highly competitive with other dynamic languages or, when the highest levéis of optimization are used, even that of static languages, all while retaining the interactive development environment of a dynamic language. The environment also includes a powerful auto-documenter. The paper provides an informal overview of the language and program development environment. It aims at illustrating the design philosophy rather than at being exhaustive, which would be impossible in the format of a paper, pointing instead to the existing literature on the system.
Resumo:
We present a framework for the application of abstract interpretation as an aid during program development, rather than in the more traditional application of program optimization. Program validation and detection of errors is first performed statically by comparing (partial) specifications written in terms of assertions against information obtained from static analysis of the program. The results of this process are expressed in the user assertion language. Assertions (or parts of assertions) which cannot be verified statically are translated into run-time tests. The framework allows the use of assertions to be optional. It also allows using very general properties in assertions, beyond the predefined set understandable by the static analyzer and including properties defined by means of user programs. We also report briefly on an implementation of the framework. The resulting tool generates and checks assertions for Prolog, CLP(R), and CHIP/CLP(fd) programs, and integrates compile-time and run-time checking in a uniform way. The tool allows using properties such as types, modes, non-failure, determinacy, and computational cost, and can treat modules separately, performing incremental analysis. In practice, this modularity allows detecting statically bugs in user programs even if they do not contain any assertions.
Resumo:
This paper introduces a semantic language developed with the objective to be used in a semantic analyzer based on linguistic and world knowledge. Linguistic knowledge is provided by a Combinatorial Dictionary and several sets of rules. Extra-linguistic information is stored in an Ontology. The meaning of the text is represented by means of a series of RDF-type triples of the form predicate (subject, object). Semantic analyzer is one of the options of the multifunctional ETAP-3 linguistic processor. The analyzer can be used for Information Extraction and Question Answering. We describe semantic representation of expressions that provide an assessment of the number of objects involved and/or give a quantitative evaluation of different types of attributes. We focus on the following aspects: 1) parametric and non-parametric attributes; 2) gradable and non-gradable attributes; 3) ontological representation of different classes of attributes; 4) absolute and relative quantitative assessment; 5) punctual and interval quantitative assessment; 6) intervals with precise and fuzzy boundaries