918 resultados para second language, spelling errors
Resumo:
This paper describes two methods to cancel the effect of two kinds of leakage signals which may be presented when an antenna is measured in a planar near-field range. One method tries to reduce leakage bias errors from the receiver¿s quadrature detector and it is based on estimating the bias constant added to every near-field data sample. Then, that constant is subtracted from the data, removing its undesired effect on the far-field pattern. The estimation is performed by back-propagating the field from the scan plane to the antenna under test plane (AUT) and averaging all the data located outside the AUT aperture. The second method is able to cancel the effect of the leakage from faulty transmission lines, connectors or rotary joints. The basis of this method is also a reconstruction process to determine the field distribution on the AUT plane. Once this distribution is known, a spatial filtering is applied to cancel the contribution due to those faulty elements. After that, a near-field-to-far-field transformation is applied, obtaining a new radiation pattern where the leakage effects have disappeared. To verify the effectiveness of both methods, several examples are presented.
Resumo:
OntoTag - A Linguistic and Ontological Annotation Model Suitable for the Semantic Web
1. INTRODUCTION. LINGUISTIC TOOLS AND ANNOTATIONS: THEIR LIGHTS AND SHADOWS
Computational Linguistics is already a consolidated research area. It builds upon the results of other two major ones, namely Linguistics and Computer Science and Engineering, and it aims at developing computational models of human language (or natural language, as it is termed in this area). Possibly, its most well-known applications are the different tools developed so far for processing human language, such as machine translation systems and speech recognizers or dictation programs.
These tools for processing human language are commonly referred to as linguistic tools. Apart from the examples mentioned above, there are also other types of linguistic tools that perhaps are not so well-known, but on which most of the other applications of Computational Linguistics are built. These other types of linguistic tools comprise POS taggers, natural language parsers and semantic taggers, amongst others. All of them can be termed linguistic annotation tools.
Linguistic annotation tools are important assets. In fact, POS and semantic taggers (and, to a lesser extent, also natural language parsers) have become critical resources for the computer applications that process natural language. Hence, any computer application that has to analyse a text automatically and ‘intelligently’ will include at least a module for POS tagging. The more an application needs to ‘understand’ the meaning of the text it processes, the more linguistic tools and/or modules it will incorporate and integrate.
However, linguistic annotation tools have still some limitations, which can be summarised as follows:
1. Normally, they perform annotations only at a certain linguistic level (that is, Morphology, Syntax, Semantics, etc.).
2. They usually introduce a certain rate of errors and ambiguities when tagging. This error rate ranges from 10 percent up to 50 percent of the units annotated for unrestricted, general texts.
3. Their annotations are most frequently formulated in terms of an annotation schema designed and implemented ad hoc.
A priori, it seems that the interoperation and the integration of several linguistic tools into an appropriate software architecture could most likely solve the limitations stated in (1). Besides, integrating several linguistic annotation tools and making them interoperate could also minimise the limitation stated in (2). Nevertheless, in the latter case, all these tools should produce annotations for a common level, which would have to be combined in order to correct their corresponding errors and inaccuracies. Yet, the limitation stated in (3) prevents both types of integration and interoperation from being easily achieved.
In addition, most high-level annotation tools rely on other lower-level annotation tools and their outputs to generate their own ones. For example, sense-tagging tools (operating at the semantic level) often use POS taggers (operating at a lower level, i.e., the morphosyntactic) to identify the grammatical category of the word or lexical unit they are annotating. Accordingly, if a faulty or inaccurate low-level annotation tool is to be used by other higher-level one in its process, the errors and inaccuracies of the former should be minimised in advance. Otherwise, these errors and inaccuracies would be transferred to (and even magnified in) the annotations of the high-level annotation tool.
Therefore, it would be quite useful to find a way to
(i) correct or, at least, reduce the errors and the inaccuracies of lower-level linguistic tools;
(ii) unify the annotation schemas of different linguistic annotation tools or, more generally speaking, make these tools (as well as their annotations) interoperate.
Clearly, solving (i) and (ii) should ease the automatic annotation of web pages by means of linguistic tools, and their transformation into Semantic Web pages (Berners-Lee, Hendler and Lassila, 2001). Yet, as stated above, (ii) is a type of interoperability problem. There again, ontologies (Gruber, 1993; Borst, 1997) have been successfully applied thus far to solve several interoperability problems. Hence, ontologies should help solve also the problems and limitations of linguistic annotation tools aforementioned.
Thus, to summarise, the main aim of the present work was to combine somehow these separated approaches, mechanisms and tools for annotation from Linguistics and Ontological Engineering (and the Semantic Web) in a sort of hybrid (linguistic and ontological) annotation model, suitable for both areas. This hybrid (semantic) annotation model should (a) benefit from the advances, models, techniques, mechanisms and tools of these two areas; (b) minimise (and even solve, when possible) some of the problems found in each of them; and (c) be suitable for the Semantic Web. The concrete goals that helped attain this aim are presented in the following section.
2. GOALS OF THE PRESENT WORK
As mentioned above, the main goal of this work was to specify a hybrid (that is, linguistically-motivated and ontology-based) model of annotation suitable for the Semantic Web (i.e. it had to produce a semantic annotation of web page contents). This entailed that the tags included in the annotations of the model had to (1) represent linguistic concepts (or linguistic categories, as they are termed in ISO/DCR (2008)), in order for this model to be linguistically-motivated; (2) be ontological terms (i.e., use an ontological vocabulary), in order for the model to be ontology-based; and (3) be structured (linked) as a collection of ontology-based
Resumo:
We discuss a framework for the application of abstract interpretation as an aid during program development, rather than in the more traditional application of program optimization. Program validation and detection of errors is first performed statically by comparing (partial) specifications written in terms of assertions against information obtained from (global) static analysis of the program. The results of this process are expressed in the user assertion language. Assertions (or parts of assertions) which cannot be checked statically are translated into run-time tests. The framework allows the use of assertions to be optional. It also allows using very general properties in assertions, beyond the predefined set understandable by the static analyzer and including properties defined by user programs. We also report briefly on an implementation of the framework. The resulting tool generates and checks assertions for Prolog, CLP(R), and CHIP/CLP(fd) programs, and integrates compile-time and run-time checking in a uniform way. The tool allows using properties such as types, modes, non-failure, determinacy, and computational cost, and can treat modules separately, performing incremental analysis.
Resumo:
We present a framework for the application of abstract interpretation as an aid during program development, rather than in the more traditional application of program optimization. Program validation and detection of errors is first performed statically by comparing (partial) specifications written in terms of assertions against information obtained from static analysis of the program. The results of this process are expressed in the user assertion language. Assertions (or parts of assertions) which cannot be verified statically are translated into run-time tests. The framework allows the use of assertions to be optional. It also allows using very general properties in assertions, beyond the predefined set understandable by the static analyzer and including properties defined by means of user programs. We also report briefly on an implementation of the framework. The resulting tool generates and checks assertions for Prolog, CLP(R), and CHIP/CLP(fd) programs, and integrates compile-time and run-time checking in a uniform way. The tool allows using properties such as types, modes, non-failure, determinacy, and computational cost, and can treat modules separately, performing incremental analysis. In practice, this modularity allows detecting statically bugs in user programs even if they do not contain any assertions.
Resumo:
We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.
Resumo:
We present two approaches to cluster dialogue-based information obtained by the speech understanding module and the dialogue manager of a spoken dialogue system. The purpose is to estimate a language model related to each cluster, and use them to dynamically modify the model of the speech recognizer at each dialogue turn. In the first approach we build the cluster tree using local decisions based on a Maximum Normalized Mutual Information criterion. In the second one we take global decisions, based on the optimization of the global perplexity of the combination of the cluster-related LMs. Our experiments show a relative reduction of the word error rate of 15.17%, which helps to improve the performance of the understanding and the dialogue manager modules.
Resumo:
Este trabajo aborda el problema de modelizar sistemas din´amicos reales a partir del estudio de sus series temporales, usando una formulaci´on est´andar que pretende ser una abstracci´on universal de los sistemas din´amicos, independientemente de su naturaleza determinista, estoc´astica o h´ıbrida. Se parte de modelizaciones separadas de sistemas deterministas por un lado y estoc´asticos por otro, para converger finalmente en un modelo h´ıbrido que permite estudiar sistemas gen´ericos mixtos, esto es, que presentan una combinaci´on de comportamiento determinista y aleatorio. Este modelo consta de dos componentes, uno determinista consistente en una ecuaci´on en diferencias, obtenida a partir de un estudio de autocorrelaci´on, y otro estoc´astico que modeliza el error cometido por el primero. El componente estoc´astico es un generador universal de distribuciones de probabilidad, basado en un proceso compuesto de variables aleatorias, uniformemente distribuidas en un intervalo variable en el tiempo. Este generador universal es deducido en la tesis a partir de una nueva teor´ıa sobre la oferta y la demanda de un recurso gen´erico. El modelo resultante puede formularse conceptualmente como una entidad con tres elementos fundamentales: un motor generador de din´amica determinista, una fuente interna de ruido generadora de incertidumbre y una exposici´on al entorno que representa las interacciones del sistema real con el mundo exterior. En las aplicaciones estos tres elementos se ajustan en base al hist´orico de las series temporales del sistema din´amico. Una vez ajustados sus componentes, el modelo se comporta de una forma adaptativa tomando como inputs los nuevos valores de las series temporales del sistema y calculando predicciones sobre su comportamiento futuro. Cada predicci´on se presenta como un intervalo dentro del cual cualquier valor es equipro- bable, teniendo probabilidad nula cualquier valor externo al intervalo. De esta forma el modelo computa el comportamiento futuro y su nivel de incertidumbre en base al estado actual del sistema. Se ha aplicado el modelo en esta tesis a sistemas muy diferentes mostrando ser muy flexible para afrontar el estudio de campos de naturaleza dispar. El intercambio de tr´afico telef´onico entre operadores de telefon´ıa, la evoluci´on de mercados financieros y el flujo de informaci´on entre servidores de Internet son estudiados en profundidad en la tesis. Todos estos sistemas son modelizados de forma exitosa con un mismo lenguaje, a pesar de tratarse de sistemas f´ısicos totalmente distintos. El estudio de las redes de telefon´ıa muestra que los patrones de tr´afico telef´onico presentan una fuerte pseudo-periodicidad semanal contaminada con una gran cantidad de ruido, sobre todo en el caso de llamadas internacionales. El estudio de los mercados financieros muestra por su parte que la naturaleza fundamental de ´estos es aleatoria con un rango de comportamiento relativamente acotado. Una parte de la tesis se dedica a explicar algunas de las manifestaciones emp´ıricas m´as importantes en los mercados financieros como son los “fat tails”, “power laws” y “volatility clustering”. Por ´ultimo se demuestra que la comunicaci´on entre servidores de Internet tiene, al igual que los mercados financieros, una componente subyacente totalmente estoc´astica pero de comportamiento bastante “d´ocil”, siendo esta docilidad m´as acusada a medida que aumenta la distancia entre servidores. Dos aspectos son destacables en el modelo, su adaptabilidad y su universalidad. El primero es debido a que, una vez ajustados los par´ametros generales, el modelo se “alimenta” de los valores observables del sistema y es capaz de calcular con ellos comportamientos futuros. A pesar de tener unos par´ametros fijos, la variabilidad en los observables que sirven de input al modelo llevan a una gran riqueza de ouputs posibles. El segundo aspecto se debe a la formulaci´on gen´erica del modelo h´ıbrido y a que sus par´ametros se ajustan en base a manifestaciones externas del sistema en estudio, y no en base a sus caracter´ısticas f´ısicas. Estos factores hacen que el modelo pueda utilizarse en gran variedad de campos. Por ´ultimo, la tesis propone en su parte final otros campos donde se han obtenido ´exitos preliminares muy prometedores como son la modelizaci´on del riesgo financiero, los algoritmos de routing en redes de telecomunicaci´on y el cambio clim´atico. Abstract This work faces the problem of modeling dynamical systems based on the study of its time series, by using a standard language that aims to be an universal abstraction of dynamical systems, irrespective of their deterministic, stochastic or hybrid nature. Deterministic and stochastic models are developed separately to be merged subsequently into a hybrid model, which allows the study of generic systems, that is to say, those having both deterministic and random behavior. This model is a combination of two different components. One of them is deterministic and consisting in an equation in differences derived from an auto-correlation study and the other is stochastic and models the errors made by the deterministic one. The stochastic component is an universal generator of probability distributions based on a process consisting in random variables distributed uniformly within an interval varying in time. This universal generator is derived in the thesis from a new theory of offer and demand for a generic resource. The resulting model can be visualized as an entity with three fundamental elements: an engine generating deterministic dynamics, an internal source of noise generating uncertainty and an exposure to the environment which depicts the interactions between the real system and the external world. In the applications these three elements are adjusted to the history of the time series from the dynamical system. Once its components have been adjusted, the model behaves in an adaptive way by using the new time series values from the system as inputs and calculating predictions about its future behavior. Every prediction is provided as an interval, where any inner value is equally probable while all outer ones have null probability. So, the model computes the future behavior and its level of uncertainty based on the current state of the system. The model is applied to quite different systems in this thesis, showing to be very flexible when facing the study of fields with diverse nature. The exchange of traffic between telephony operators, the evolution of financial markets and the flow of information between servers on the Internet are deeply studied in this thesis. All these systems are successfully modeled by using the same “language”, in spite the fact that they are systems physically radically different. The study of telephony networks shows that the traffic patterns are strongly weekly pseudo-periodic but mixed with a great amount of noise, specially in the case of international calls. It is proved that the underlying nature of financial markets is random with a moderate range of variability. A part of this thesis is devoted to explain some of the most important empirical observations in financial markets, such as “fat tails”, “power laws” and “volatility clustering”. Finally it is proved that the communication between two servers on the Internet has, as in the case of financial markets, an underlaying random dynamics but with a narrow range of variability, being this lack of variability more marked as the distance between servers is increased. Two aspects of the model stand out as being the most important: its adaptability and its universality. The first one is due to the fact that once the general parameters have been adjusted , the model is “fed” on the observable manifestations of the system in order to calculate its future behavior. Despite the fact that the model has fixed parameters the variability in the observable manifestations of the system, which are used as inputs of the model, lead to a great variability in the possible outputs. The second aspect is due to the general “language” used in the formulation of the hybrid model and to the fact that its parameters are adjusted based on external manifestations of the system under study instead of its physical characteristics. These factors made the model suitable to be used in great variety of fields. Lastly, this thesis proposes other fields in which preliminary and promising results have been obtained, such as the modeling of financial risk, the development of routing algorithms for telecommunication networks and the assessment of climate change.
Resumo:
We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.
Resumo:
The main focus of this paper is on hydrodynamic modelling of a semisubmersible platform (which can support a 1.5MW wind turbine and is composed by three buoyant columns connected by bracings) with especial emphasis on the estimation of the wave drift components and their effects on the design of the mooring system. Indeed, with natural periods of drift around 60 seconds, accurate computation of the low-frequency second-order components is not a straightforward task. As methods usually adopted when dealing with the slow-drifts of deep-water moored systems, such as Newman?s approximation, have their errors increased by the relatively low resonant periods, and as the effects of depth cannot be ignored, the wave diffraction analysis must be based on full Quadratic Transfer Functions (QTF) computations. A discussion on the numerical aspects of performing such computations is presented, making use of the second-order module available with the seakeeping software WAMIT®. Finally, the paper also provides a preliminary verification of the accuracy of the numerical predictions based on the results obtained in a series of model tests with the structure fixed in bichromatic waves.
Resumo:
Neuro-evolutive development from birth until the age of six years is a decisive factor in a child?s quality of life. Early detection of development disorders in early childhood can facilitate necessary diagnosis and/or treatment. Primary-care pediatricians play a key role in its detection as they can undertake the preventive and therapeutic actions requested to promote a child?s optimal development. However, the lack of time and little specific knowledge at primary-care avoid to applying continuous early-detection anomalies procedures. This research paper focuses on the deployment and evaluation of a smart system that enhances the screening of language disorders in primary care. Pediatricians get support to proceed with early referral of language disorders. The proposed model provides them with a decision-support tool for referral actions to trigger essential diagnostic and/or therapeutic actions for a comprehensive individual development. The research was conducted by starting from a sample of 60 cases of children with language disorders. Validation was carried out through two complementary steps: first, by including a team of seven experts from the fields of neonatology, pediatrics, neurology and language therapy, and, second, through the evaluation of 21 more previously diagnosed cases. The results obtained show that therapist positively accepted the system proposal in 18 cases (86%) and suggested system redesign for single referral to a speech therapist in three remaining cases.
Resumo:
Speech is the major function, emergence and which development radically changes all course of formation of the identity of the child already in the early childhood. If language and speech development in solitary born children is investigated today quite well, at twin children this process practically is not studied. Our research was carried out for the purpose of studying of an originality of mastering by speech by heterosexual children of pair of twins within communicative and pragmatist approach (T.N. Ushakov,G. V. Chirkina). Application of this approach to the analysis of process of communication at twin children allowed us to allocate those peculiar receptions and means of communication which they functionally develop in a situation of pair of twins, as allows them to show the phenomena of the speech which are not meeting at solitary born contemporaries. In this work results of supervision and research of pair of heterosexual twins of the second year of the life, carried out by a technique developed by us under the scientific guide of G. V. Chirkina
Resumo:
The new requirement placed on students in tertiary settings in Spain to demonstrate a B1 or a B2 proficiency level of English, in accordance with the Common European Framework of Reference for Languages (CEFRL), has led most Spanish universities to develop a program of certification or accreditation of the required level. The first part of this paper aims to provide a rationale for the type of test that has been developed at the Universidad Politécnica de Madrid for the accreditation of a B2 level, a multiple choice version, and to describe how it was constructed and validated. Then, in the second part of the paper, the results from its application to 924 students enrolled in different degree courses at a variety of schools and faculties at the university are analyzed based on a final test version item analysis. To conclude, some theoretical as well as practical conclusions about testing grammar that affect the teaching and learning process are drawn. RESUMEN. Las nuevas exigencias sobre niveles de competencia B1 y B2 en inglés según el Marco Común Europeo de Referencia para las Lenguas (MCERL) que se imponen sobre los estudiantes de grado y posgrado han llevado a la mayoría de las universidades españolas a desarrollar programas de acreditación o de certificación de estos niveles. La primera parte de este trabajo trata sobre las razones que fundamentan la elección de un tipo concreto de examen para la acreditación del nivel B2 de lengua inglesa en la Universidad Politécnica de Madrid. Se trata de un test de opción múltiple y en esta parte del trabajo se describe cómo fue diseñado y validado. En la segunda parte, se analizan los resultados de la aplicación del test a gran escala a un total de 924 estudiantes matriculados en varias escuelas y Facultades de la Universidad. Para terminar, se apuntan una serie de conclusiones teóricas y prácticas sobre la evaluación de la gramática y de qué modo influye en los procesos de enseñanza y aprendizaje.
Resumo:
Background: Early and effective identification of developmental disorders during childhood remains a critical task for the international community. The second highest prevalence of common developmental disorders in children are language delays, which are frequently the first symptoms of a possible disorder. Objective: This paper evaluates a Web-based Clinical Decision Support System (CDSS) whose aim is to enhance the screening of language disorders at a nursery school. The common lack of early diagnosis of language disorders led us to deploy an easy-to-use CDSS in order to evaluate its accuracy in early detection of language pathologies. This CDSS can be used by pediatricians to support the screening of language disorders in primary care. Methods: This paper details the evaluation results of the ?Gades? CDSS at a nursery school with 146 children, 12 educators, and 1 language therapist. The methodology embraces two consecutive phases. The first stage involves the observation of each child?s language abilities, carried out by the educators, to facilitate the evaluation of language acquisition level performed by a language therapist. Next, the same language therapist evaluates the reliability of the observed results. Results: The Gades CDSS was integrated to provide the language therapist with the required clinical information. The validation process showed a global 83.6% (122/146) success rate in language evaluation and a 7% (7/94) rate of non-accepted system decisions within the range of children from 0 to 3 years old. The system helped language therapists to identify new children with potential disorders who required further evaluation. This process will revalidate the CDSS output and allow the enhancement of early detection of language disorders in children. The system does need minor refinement, since the therapists disagreed with some questions from the CDSS knowledge base (KB) and suggested adding a few questions about speech production and pragmatic abilities. The refinement of the KB will address these issues and include the requested improvements, with the support of the experts who took part in the original KB development. Conclusions: This research demonstrated the benefit of a Web-based CDSS to monitor children?s neurodevelopment via the early detection of language delays at a nursery school. Current next steps focus on the design of a model that includes pseudo auto-learning capacity, supervised by experts.
Resumo:
This paper describes the application of language translation technologies for generating bus information in Spanish Sign Language (LSE: Lengua de Signos Española). In this work, two main systems have been developed: the first for translating text messages from information panels and the second for translating spoken Spanish into natural conversations at the information point of the bus company. Both systems are made up of a natural language translator (for converting a word sentence into a sequence of LSE signs), and a 3D avatar animation module (for playing back the signs). For the natural language translator, two technological approaches have been analyzed and integrated: an example-based strategy and a statistical translator. When translating spoken utterances, it is also necessary to incorporate a speech recognizer for decoding the spoken utterance into a word sequence, prior to the language translation module. This paper includes a detailed description of the field evaluation carried out in this domain. This evaluation has been carried out at the customer information office in Madrid involving both real bus company employees and deaf people. The evaluation includes objective measurements from the system and information from questionnaires. In the field evaluation, the whole translation presents an SER (Sign Error Rate) of less than 10% and a BLEU greater than 90%.
Resumo:
Traffic flow time series data are usually high dimensional and very complex. Also they are sometimes imprecise and distorted due to data collection sensor malfunction. Additionally, events like congestion caused by traffic accidents add more uncertainty to real-time traffic conditions, making traffic flow forecasting a complicated task. This article presents a new data preprocessing method targeting multidimensional time series with a very high number of dimensions and shows its application to real traffic flow time series from the California Department of Transportation (PEMS web site). The proposed method consists of three main steps. First, based on a language for defining events in multidimensional time series, mTESL, we identify a number of types of events in time series that corresponding to either incorrect data or data with interference. Second, each event type is restored utilizing an original method that combines real observations, local forecasted values and historical data. Third, an exponential smoothing procedure is applied globally to eliminate noise interference and other random errors so as to provide good quality source data for future work.