7 resultados para Statistical language models

em Universidad de Alicante


Relevância:

50.00% 50.00%

Publicador:

Resumo:

Statistical machine translation (SMT) is an approach to Machine Translation (MT) that uses statistical models whose parameter estimation is based on the analysis of existing human translations (contained in bilingual corpora). From a translation student’s standpoint, this dissertation aims to explain how a phrase-based SMT system works, to determine the role of the statistical models it uses in the translation process and to assess the quality of the translations provided that system is trained with in-domain goodquality corpora. To that end, a phrase-based SMT system based on Moses has been trained and subsequently used for the English to Spanish translation of two texts related in topic to the training data. Finally, the quality of this output texts produced by the system has been assessed through a quantitative evaluation carried out with three different automatic evaluation measures and a qualitative evaluation based on the Multidimensional Quality Metrics (MQM).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Ecological models written in a mathematical language L(M) or model language, with a given style or methodology can be considered as a text. It is possible to apply statistical linguistic laws and the experimental results demonstrate that the behaviour of a mathematical model is the same of any literary text of any natural language. A text has the following characteristics: (a) the variables, its transformed functions and parameters are the lexic units or LUN of ecological models; (b) the syllables are constituted by a LUN, or a chain of them, separated by operating or ordering LUNs; (c) the flow equations are words; and (d) the distribution of words (LUM and CLUN) according to their lengths is based on a Poisson distribution, the Chebanov's law. It is founded on Vakar's formula, that is calculated likewise the linguistic entropy for L(M). We will apply these ideas over practical examples using MARIOLA model. In this paper it will be studied the problem of the lengths of the simple lexic units composed lexic units and words of text models, expressing these lengths in number of the primitive symbols, and syllables. The use of these linguistic laws renders it possible to indicate the degree of information given by an ecological model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The exponential growth of the subjective information in the framework of the Web 2.0 has led to the need to create Natural Language Processing tools able to analyse and process such data for multiple practical applications. They require training on specifically annotated corpora, whose level of detail must be fine enough to capture the phenomena involved. This paper presents EmotiBlog – a fine-grained annotation scheme for subjectivity. We show the manner in which it is built and demonstrate the benefits it brings to the systems using it for training, through the experiments we carried out on opinion mining and emotion detection. We employ corpora of different textual genres –a set of annotated reported speech extracted from news articles, the set of news titles annotated with polarity and emotion from the SemEval 2007 (Task 14) and ISEAR, a corpus of real-life self-expressed emotion. We also show how the model built from the EmotiBlog annotations can be enhanced with external resources. The results demonstrate that EmotiBlog, through its structure and annotation paradigm, offers high quality training data for systems dealing both with opinion mining, as well as emotion detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, several explanatory models have been developed which attempt to analyse the predictive worth of various factors in relation to academic achievement, as well as the direct and indirect effects that they produce. The aim of this study was to examine a structural model incorporating various cognitive and motivational variables which influence student achievement in the two basic core skills in the Spanish curriculum: Spanish Language and Mathematics. These variables included differential aptitudes, specific self-concept, goal orientations, effort and learning strategies. The sample comprised 341 Spanish students in their first year of Compulsory Secondary Education. Various tests and questionnaires were used to assess each student, and Structural Equation Modelling (SEM) was employed to study the relationships in the initial model. The proposed model obtained a satisfactory fit for the two subjects studied, and all the relationships hypothesised were significant. The variable with the most explanatory power regarding academic achievement was mathematical and verbal aptitude. Also notable was the direct influence of specific self-concept on achievement, goal-orientation and effort, as was the mediatory effect that effort and learning strategies had between academic goals and final achievement.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper addresses the problem of the automatic recognition and classification of temporal expressions and events in human language. Efficacy in these tasks is crucial if the broader task of temporal information processing is to be successfully performed. We analyze whether the application of semantic knowledge to these tasks improves the performance of current approaches. We therefore present and evaluate a data-driven approach as part of a system: TIPSem. Our approach uses lexical semantics and semantic roles as additional information to extend classical approaches which are principally based on morphosyntax. The results obtained for English show that semantic knowledge aids in temporal expression and event recognition, achieving an error reduction of 59% and 21%, while in classification the contribution is limited. From the analysis of the results it may be concluded that the application of semantic knowledge leads to more general models and aids in the recognition of temporal entities that are ambiguous at shallower language analysis levels. We also discovered that lexical semantics and semantic roles have complementary advantages, and that it is useful to combine them. Finally, we carried out the same analysis for Spanish. The results obtained show comparable advantages. This supports the hypothesis that applying the proposed semantic knowledge may be useful for different languages.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The mathematical models of the complex reality are texts belonging to a certain literature that is written in a semi-formal language, denominated L(MT) by the authors whose laws linguistic mathematics have been previously defined. This text possesses linguistic entropy that is the reflection of the physical entropy of the processes of real world that said text describes. Through the temperature of information defined by Mandelbrot, the authors begin a text-reality thermodynamic theory that drives to the existence of information attractors, or highly structured point, settling down a heterogeneity of the space text, the same one that of ontologic space, completing the well-known law of Saint Mathew, of the General Theory of Systems and formulated by Margalef saying: “To the one that has more he will be given, and to the one that doesn't have he will even be removed it little that it possesses.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One saw previously that indications of diversity IT and the one of Shannon permits to characterize globally by only one number one fundamental aspects of the text structure. However a more precise knowledge of this structure requires specific abundance distributions and the use, to represent this one, of a suitable mathematical model. Among the numerous models that would be either susceptible to be proposed, the only one that present a real convenient interest are simplest. One will limit itself to study applied three of it to the language L(MT): the log-linear, the log-normal and Mac Arthur's models very used for the calculation of the diversity of the species of ecosystems, and used, we believe that for the first time, in the calculation of the diversity of a text written in a certain language, in our case L(MT). One will show advantages and inconveniences of each of these model types, methods permitting to adjust them to text data and in short tests that permit to decide if this adjustment is acceptable.