986 resultados para Machine Translation
Resumo:
We present an approach to parsing rehive clauses in Arabic in the tradition of the Paninian Grammar Frumework/2] which leads to deriving U common logicul form for equivalent sentences. Particular attention is paid to the analysis of resumptive pronouns in the retrieval of syntuctico-semantic relationships. The analysis arises from the development of a lexicalised dependency grammar for Arabic that has application for machine translation.
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence extends to many areas and includes contributions to Machines Translation, word sense disambiguation, dialogue modeling and Information Extraction. This book celebrates the work of Yorick Wilks in the form of a selection of his papers which are intended to reflect the range and depth of his work. The volume accompanies a Festschrift which celebrates his contribution to the fields of Computational Linguistics and Artificial Intelligence. The papers include early work carried out at Cambridge University, descriptions of groundbreaking work on Machine Translation and Preference Semantics as well as more recent works on belief modeling and computational semantics. The selected papers reflect Yorick’s contribution to both practical and theoretical aspects of automatic language processing.
Resumo:
Yorick Wilks is a central figure in the fields of Natural Language Processing and Artificial Intelligence. His influence has extends to many areas of these fields and includes contributions to Machine Translation, word sense disambiguation, dialogue modeling and Information Extraction.This book celebrates the work of Yorick Wilks from the perspective of his peers. It consists of original chapters each of which analyses an aspect of his work and links it to current thinking in that area. His work has spanned over four decades but is shown to be pertinent to recent developments in language processing such as the Semantic Web.This volume forms a two-part set together with Words and Intelligence I, Selected Works by Yorick Wilks, by the same editors.
Resumo:
The best results in the application of computer science systems to automatic translation are obtained in word processing when texts pertain to specific thematic areas, with structures well defined and a concise and limited lexicon. In this article we present a plan of systematic work for the analysis and generation of language applied to the field of pharmaceutical leaflet, a type of document characterized by format rigidity and precision in the use of lexicon. We propose a solution based in the use of one interlingua as language pivot between source and target languages; we are considering Spanish and Arab languages in this case of application.
Resumo:
The article briefly reviews bilingual Slovak-Bulgarian/Bulgarian-Slovak parallel and aligned corpus. The corpus is collected and developed as results of the collaboration in the frameworks of the joint research project between Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, and Ľ. Štúr Institute of Linguistics, Slovak Academy of Sciences. The multilingual corpora are large repositories of language data with an important role in preserving and supporting the world's cultural heritage, because the natural language is an outstanding part of the human cultural values and collective memory, and a bridge between cultures. This bilingual corpus will be widely applicable to the contrastive studies of the both Slavic languages, will also be useful resource for language engineering research and development, especially in machine translation.
Resumo:
Users seeking information may not find relevant information pertaining to their information need in a specific language. But information may be available in a language different from their own, but users may not know that language. Thus users may experience difficulty in accessing the information present in different languages. Since the retrieval process depends on the translation of the user query, there are many issues in getting the right translation of the user query. For a pair of languages chosen by a user, resources, like incomplete dictionary, inaccurate machine translation system may exist. These resources may be insufficient to map the query terms in one language to its equivalent terms in another language. Also for a given query, there might exist multiple correct translations. The underlying corpus evidence may suggest a clue to select a probable set of translations that could eventually perform a better information retrieval. In this paper, we present a cross language information retrieval approach to effectively retrieve information present in a language other than the language of the user query using the corpus driven query suggestion approach. The idea is to utilize the corpus based evidence of one language to improve the retrieval and re-ranking of news documents in the other language. We use FIRE corpora - Tamil and English news collections in our experiments and illustrate the effectiveness of the proposed cross language information retrieval approach.
Resumo:
With the development of information technology, the theory and methodology of complex network has been introduced to the language research, which transforms the system of language in a complex networks composed of nodes and edges for the quantitative analysis about the language structure. The development of dependency grammar provides theoretical support for the construction of a treebank corpus, making possible a statistic analysis of complex networks. This paper introduces the theory and methodology of the complex network and builds dependency syntactic networks based on the treebank of speeches from the EEE-4 oral test. According to the analysis of the overall characteristics of the networks, including the number of edges, the number of the nodes, the average degree, the average path length, the network centrality and the degree distribution, it aims to find in the networks potential difference and similarity between various grades of speaking performance. Through clustering analysis, this research intends to prove the network parameters’ discriminating feature and provide potential reference for scoring speaking performance.
Resumo:
Is phraseology the third articulation of language? Fresh insights into a theoretical conundrum Jean-Pierre Colson University of Louvain (Louvain-la-Neuve, Belgium) Although the notion of phraseology is now used across a wide range of linguistic disciplines, its definition and the classification of phraseological units remain a subject of intense debate. It is generally agreed that phraseology implies polylexicality, but this term is problematic as well, because it brings us back to one of the most controversial topics in modern linguistics: the definition of a word. On the other hand, another widely accepted principle of language is the double articulation or duality of patterning (Martinet 1960): the first articulation consists of morphemes and the second of phonemes. The very definition of morphemes, however, also poses several problems, and the situation becomes even more confused if we wish to take phraseology into account. In this contribution, I will take the view that a corpus-based and computational approach to phraseology may shed some new light on this theoretical conundrum. A better understanding of the basic units of meaning is necessary for more efficient language learning and translation, especially in the case of machine translation. Previous research (Colson 2011, 2012, 2013, 2014), Corpas Pastor (2000, 2007, 2008, 2013, 2015), Corpas Pastor & Leiva Rojo (2011), Leiva Rojo (2013), has shown the paramount importance of phraseology for translation. A tentative step towards a coherent explanation of the role of phraseology in language has been proposed by Mejri (2006): it is postulated that a third articulation of language intervenes at the level of words, including simple morphemes, sequences of free and bound morphemes, but also phraseological units. I will present results from experiments with statistical associations of morphemes across several languages, and point out that (mainly) isolating languages such as Chinese are interesting for a better understanding of the interplay between morphemes and phraseological units. Named entities, in particular, are an extreme example of intertwining cultural, statistical and linguistic elements. Other examples show that the many borrowings and influences that characterize European languages tend to give a somewhat blurred vision of the interplay between morphology and phraseology. From a statistical point of view, the cpr-score (Colson 2016) provides a methodology for adapting the automatic extraction of phraseological units to the morphological structure of each language. The results obtained can therefore be used for testing hypotheses about the interaction between morphology, phraseology and culture. Experiments with the cpr-score on the extraction of Chinese phraseological units show that results depend on how the basic units of meaning are defined: a morpheme-based approach yields good results, which corroborates the claim by Beck and Mel'čuk (2011) that the association of morphemes into words may be similar to the association of words into phraseological units. A cross-linguistic experiment carried out for English, French, Spanish and Chinese also reveals that the results are quite compatible with Mejri’s hypothesis (2006) of a third articulation of language. Such findings, if confirmed, also corroborate the notion of statistical semantics in language. To illustrate this point, I will present the PhraseoRobot (Colson 2016), a computational tool for extracting phraseological associations around key words from the media, such as Brexit. The results confirm a previous study on the term globalization (Colson 2016): a significant part of sociolinguistic associations prevailing in the media is related to phraseology in the broad sense, and can therefore be partly extracted by means of statistical scores. References Beck, D. & I. Mel'čuk (2011). Morphological phrasemes and Totonacan verbal morphology. Linguistics 49/1: 175-228. Colson, J.-P. (2011). La traduction spécialisée basée sur les corpus : une expérience dans le domaine informatique. In : Sfar, I. & S. Mejri, La traduction de textes spécialisés : retour sur des lieux communs. Synergies Tunisie n° 2. Gerflint, Agence universitaire de la Francophonie, p. 115-123. Colson, J.-P. (2012). Traduire le figement en langue de spécialité : une expérience de phraséologie informatique. In : Mogorrón Huerta, P. & S. Mejri (dirs.), Lenguas de especialidad, traducción, fijación / Langues spécialisées, figement et traduction. Encuentros Mediterráneos / Rencontres Méditerranéennes, N°4. Universidad de Alicante, p. 159-171. Colson, J.-P. (2013). Pratique traduisante et idiomaticité : l’importance des structures semi-figées. In : Mogorrón Huerta, P., Gallego Hernández, D., Masseau, P. & Tolosa Igualada, M. (eds.), Fraseología, Opacidad y Traduccíon. Studien zur romanischen Sprachwissenschaft und interkulturellen Kommunikation (Herausgegeben von Gerd Wotjak). Frankfurt am Main, Peter Lang, p. 207-218. Colson, J.-P. (2014). La phraséologie et les corpus dans les recherches traductologiques. Communication lors du colloque international Europhras 2014, Association Européenne de Phraséologie. Université de Paris Sorbonne, 10-12 septembre 2014. Colson, J-P. (2016). Set phrases around globalization : an experiment in corpus-based computational phraseology. In: F. Alonso Almeida, I. Ortega Barrera, E. Quintana Toledo and M. Sánchez Cuervo (eds.), Input a Word, Analyse the World: Selected Approaches to Corpus Linguistics. Newcastle upon Tyne: Cambridge Scholars Publishing, p. 141-152. Corpas Pastor, G. (2000). Acerca de la (in)traducibilidad de la fraseología. In: G. Corpas Pastor (ed.), Las lenguas de Europa: Estudios de fraseología, fraseografía y traducción. Granada: Comares, p. 483-522. Corpas Pastor, G. (2007). Europäismen - von Natur aus phraseologische Äquivalente? Von blauem Blut und sangre azul. In: M. Emsel y J. Cuartero Otal (eds.), Brücken: Übersetzen und interkulturelle Kommunikationen. Festschrift für Gerd Wotjak zum 65. Geburtstag, Fráncfort: Peter Lang, p. 65-77. Corpas Pastor, G. (2008). Investigar con corpus en traducción: los retos de un nuevo paradigma [Studien zur romanische Sprachwissenschaft und interkulturellen Kommunikation, 49], Fráncfort: Peter Lang. Corpas Pastor, G. (2013). Detección, descripción y contraste de las unidades fraseológicas mediante tecnologías lingüísticas. In Olza, I. & R. Elvira Manero (eds.) Fraseopragmática. Berlin: Frank & Timme, p. 335-373. Leiva Rojo, J. (2013). La traducción de unidades fraseológicas (alemán-español/español-alemán) como parámetro para la evaluación y revisión de traducciones. In: Mellado Blanco, C., Buján, P, Iglesias N.M., Losada M.C. & A. Mansilla (eds), La fraseología del alemán y el español: lexicografía y traducción. ELS, Etudes Linguistiques / Linguistische Studien, Band 11. München: Peniope, p. 31-42. Leiva Rojo, J. & G. Corpas Pastor (2011). Placing Italian idioms in a foreign milieu: a case study. In: Pamies Bertrán, A., Luque Nadal, L., Bretana, J. &; M. Pazos (eds), (2011). Multilingual phraseography. Second Language Learning and Translation Applications. Baltmannsweiler: Schneider Verlag (Colección: Phraseologie und Parömiologie, 28), p. 289-298. Martinet, A. (1966). Eléments de linguistique générale. Paris: Colin. Mejri, S. (2006). Polylexicalité, monolexicalité et double articulation. Cahiers de Lexicologie 2: 209-221.
Resumo:
Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.
Resumo:
Este Trabalho de Projeto tem como objetivo primordial analisar a tradução, de português para inglês, de textos económico-financeiros, utilizando a plataforma de Tradução Automática (TA) ISTRION. A tradução de conteúdos selecionados da Newsletter Económico-Financeira Maximus Report é efetuada com base na referida plataforma, complementada com outras ferramentas de apoio ao processamento linguístico que sejam consideradas relevantes. Visa-se igualmente com este Trabalho de Projeto analisar as potencialidades desta plataforma, bem como medir os resultados da tradução. Por último pretende-se enquadrar, testar, estudar e medir quais os critérios em que se poderá tornar mais eficiente a tradução destes textos.
Resumo:
Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014
Resumo:
Sequence problems belong to the most challenging interdisciplinary topics of the actuality. They are ubiquitous in science and daily life and occur, for example, in form of DNA sequences encoding all information of an organism, as a text (natural or formal) or in form of a computer program. Therefore, sequence problems occur in many variations in computational biology (drug development), coding theory, data compression, quantitative and computational linguistics (e.g. machine translation). In recent years appeared some proposals to formulate sequence problems like the closest string problem (CSP) and the farthest string problem (FSP) as an Integer Linear Programming Problem (ILPP). In the present talk we present a general novel approach to reduce the size of the ILPP by grouping isomorphous columns of the string matrix together. The approach is of practical use, since the solution of sequence problems is very time consuming, in particular when the sequences are long.
Resumo:
Neural interface devices and the melding of mind and machine, challenge the law in determining where civil liability for injury, damage or loss should lie. The ability of the human mind to instruct and control these devices means that in a negligence action against a person with a neural interface device, determining the standard of care owed by him or her will be of paramount importance. This article considers some of the factors that may influence the court’s determination of the appropriate standard of care to be applied in this situation, leading to the conclusion that a new standard of care might evolve.
Resumo:
Climate change impact assessment studies involve downscaling large-scale atmospheric predictor variables (LSAPVs) simulated by general circulation models (GCMs) to site-scale meteorological variables. This article presents a least-square support vector machine (LS-SVM)-based methodology for multi-site downscaling of maximum and minimum daily temperature series. The methodology involves (1) delineation of sites in the study area into clusters based on correlation structure of predictands, (2) downscaling LSAPVs to monthly time series of predictands at a representative site identified in each of the clusters, (3) translation of the downscaled information in each cluster from the representative site to that at other sites using LS-SVM inter-site regression relationships, and (4) disaggregation of the information at each site from monthly to daily time scale using k-nearest neighbour disaggregation methodology. Effectiveness of the methodology is demonstrated by application to data pertaining to four sites in the catchment of Beas river basin, India. Simulations of Canadian coupled global climate model (CGCM3.1/T63) for four IPCC SRES scenarios namely A1B, A2, B1 and COMMIT were downscaled to future projections of the predictands in the study area. Comparison of results with those based on recently proposed multivariate multiple linear regression (MMLR) based downscaling method and multi-site multivariate statistical downscaling (MMSD) method indicate that the proposed method is promising and it can be considered as a feasible choice in statistical downscaling studies. The performance of the method in downscaling daily minimum temperature was found to be better when compared with that in downscaling daily maximum temperature. Results indicate an increase in annual average maximum and minimum temperatures at all the sites for A1B, A2 and B1 scenarios. The projected increment is high for A2 scenario, and it is followed by that for A1B, B1 and COMMIT scenarios. Projections, in general, indicated an increase in mean monthly maximum and minimum temperatures during January to February and October to December.