991 resultados para machine translation programs


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Is phraseology the third articulation of language? Fresh insights into a theoretical conundrum Jean-Pierre Colson University of Louvain (Louvain-la-Neuve, Belgium) Although the notion of phraseology is now used across a wide range of linguistic disciplines, its definition and the classification of phraseological units remain a subject of intense debate. It is generally agreed that phraseology implies polylexicality, but this term is problematic as well, because it brings us back to one of the most controversial topics in modern linguistics: the definition of a word. On the other hand, another widely accepted principle of language is the double articulation or duality of patterning (Martinet 1960): the first articulation consists of morphemes and the second of phonemes. The very definition of morphemes, however, also poses several problems, and the situation becomes even more confused if we wish to take phraseology into account. In this contribution, I will take the view that a corpus-based and computational approach to phraseology may shed some new light on this theoretical conundrum. A better understanding of the basic units of meaning is necessary for more efficient language learning and translation, especially in the case of machine translation. Previous research (Colson 2011, 2012, 2013, 2014), Corpas Pastor (2000, 2007, 2008, 2013, 2015), Corpas Pastor & Leiva Rojo (2011), Leiva Rojo (2013), has shown the paramount importance of phraseology for translation. A tentative step towards a coherent explanation of the role of phraseology in language has been proposed by Mejri (2006): it is postulated that a third articulation of language intervenes at the level of words, including simple morphemes, sequences of free and bound morphemes, but also phraseological units. I will present results from experiments with statistical associations of morphemes across several languages, and point out that (mainly) isolating languages such as Chinese are interesting for a better understanding of the interplay between morphemes and phraseological units. Named entities, in particular, are an extreme example of intertwining cultural, statistical and linguistic elements. Other examples show that the many borrowings and influences that characterize European languages tend to give a somewhat blurred vision of the interplay between morphology and phraseology. From a statistical point of view, the cpr-score (Colson 2016) provides a methodology for adapting the automatic extraction of phraseological units to the morphological structure of each language. The results obtained can therefore be used for testing hypotheses about the interaction between morphology, phraseology and culture. Experiments with the cpr-score on the extraction of Chinese phraseological units show that results depend on how the basic units of meaning are defined: a morpheme-based approach yields good results, which corroborates the claim by Beck and Mel'čuk (2011) that the association of morphemes into words may be similar to the association of words into phraseological units. A cross-linguistic experiment carried out for English, French, Spanish and Chinese also reveals that the results are quite compatible with Mejri’s hypothesis (2006) of a third articulation of language. Such findings, if confirmed, also corroborate the notion of statistical semantics in language. To illustrate this point, I will present the PhraseoRobot (Colson 2016), a computational tool for extracting phraseological associations around key words from the media, such as Brexit. The results confirm a previous study on the term globalization (Colson 2016): a significant part of sociolinguistic associations prevailing in the media is related to phraseology in the broad sense, and can therefore be partly extracted by means of statistical scores. References Beck, D. & I. Mel'čuk (2011). Morphological phrasemes and Totonacan verbal morphology. Linguistics 49/1: 175-228. Colson, J.-P. (2011). La traduction spécialisée basée sur les corpus : une expérience dans le domaine informatique. In : Sfar, I. & S. Mejri, La traduction de textes spécialisés : retour sur des lieux communs. Synergies Tunisie n° 2. Gerflint, Agence universitaire de la Francophonie, p. 115-123. Colson, J.-P. (2012). Traduire le figement en langue de spécialité : une expérience de phraséologie informatique. In : Mogorrón Huerta, P. & S. Mejri (dirs.), Lenguas de especialidad, traducción, fijación / Langues spécialisées, figement et traduction. Encuentros Mediterráneos / Rencontres Méditerranéennes, N°4. Universidad de Alicante, p. 159-171. Colson, J.-P. (2013). Pratique traduisante et idiomaticité : l’importance des structures semi-figées. In : Mogorrón Huerta, P., Gallego Hernández, D., Masseau, P. & Tolosa Igualada, M. (eds.), Fraseología, Opacidad y Traduccíon. Studien zur romanischen Sprachwissenschaft und interkulturellen Kommunikation (Herausgegeben von Gerd Wotjak). Frankfurt am Main, Peter Lang, p. 207-218. Colson, J.-P. (2014). La phraséologie et les corpus dans les recherches traductologiques. Communication lors du colloque international Europhras 2014, Association Européenne de Phraséologie. Université de Paris Sorbonne, 10-12 septembre 2014. Colson, J-P. (2016). Set phrases around globalization : an experiment in corpus-based computational phraseology. In: F. Alonso Almeida, I. Ortega Barrera, E. Quintana Toledo and M. Sánchez Cuervo (eds.), Input a Word, Analyse the World: Selected Approaches to Corpus Linguistics. Newcastle upon Tyne: Cambridge Scholars Publishing, p. 141-152. Corpas Pastor, G. (2000). Acerca de la (in)traducibilidad de la fraseología. In: G. Corpas Pastor (ed.), Las lenguas de Europa: Estudios de fraseología, fraseografía y traducción. Granada: Comares, p. 483-522. Corpas Pastor, G. (2007). Europäismen - von Natur aus phraseologische Äquivalente? Von blauem Blut und sangre azul. In: M. Emsel y J. Cuartero Otal (eds.), Brücken: Übersetzen und interkulturelle Kommunikationen. Festschrift für Gerd Wotjak zum 65. Geburtstag, Fráncfort: Peter Lang, p. 65-77. Corpas Pastor, G. (2008). Investigar con corpus en traducción: los retos de un nuevo paradigma [Studien zur romanische Sprachwissenschaft und interkulturellen Kommunikation, 49], Fráncfort: Peter Lang. Corpas Pastor, G. (2013). Detección, descripción y contraste de las unidades fraseológicas mediante tecnologías lingüísticas. In Olza, I. & R. Elvira Manero (eds.) Fraseopragmática. Berlin: Frank & Timme, p. 335-373. Leiva Rojo, J. (2013). La traducción de unidades fraseológicas (alemán-español/español-alemán) como parámetro para la evaluación y revisión de traducciones. In: Mellado Blanco, C., Buján, P, Iglesias N.M., Losada M.C. & A. Mansilla (eds), La fraseología del alemán y el español: lexicografía y traducción. ELS, Etudes Linguistiques / Linguistische Studien, Band 11. München: Peniope, p. 31-42. Leiva Rojo, J. & G. Corpas Pastor (2011). Placing Italian idioms in a foreign milieu: a case study. In: Pamies Bertrán, A., Luque Nadal, L., Bretana, J. &; M. Pazos (eds), (2011). Multilingual phraseography. Second Language Learning and Translation Applications. Baltmannsweiler: Schneider Verlag (Colección: Phraseologie und Parömiologie, 28), p. 289-298. Martinet, A. (1966). Eléments de linguistique générale. Paris: Colin. Mejri, S. (2006). Polylexicalité, monolexicalité et double articulation. Cahiers de Lexicologie 2: 209-221.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Natural language processing has achieved great success in a wide range of ap- plications, producing both commercial language services and open-source language tools. However, most methods take a static or batch approach, assuming that the model has all information it needs and makes a one-time prediction. In this disser- tation, we study dynamic problems where the input comes in a sequence instead of all at once, and the output must be produced while the input is arriving. In these problems, predictions are often made based only on partial information. We see this dynamic setting in many real-time, interactive applications. These problems usually involve a trade-off between the amount of input received (cost) and the quality of the output prediction (accuracy). Therefore, the evaluation considers both objectives (e.g., plotting a Pareto curve). Our goal is to develop a formal understanding of sequential prediction and decision-making problems in natural language processing and to propose efficient solutions. Toward this end, we present meta-algorithms that take an existent batch model and produce a dynamic model to handle sequential inputs and outputs. Webuild our framework upon theories of Markov Decision Process (MDP), which allows learning to trade off competing objectives in a principled way. The main machine learning techniques we use are from imitation learning and reinforcement learning, and we advance current techniques to tackle problems arising in our settings. We evaluate our algorithm on a variety of applications, including dependency parsing, machine translation, and question answering. We show that our approach achieves a better cost-accuracy trade-off than the batch approach and heuristic-based decision- making approaches. We first propose a general framework for cost-sensitive prediction, where dif- ferent parts of the input come at different costs. We formulate a decision-making process that selects pieces of the input sequentially, and the selection is adaptive to each instance. Our approach is evaluated on both standard classification tasks and a structured prediction task (dependency parsing). We show that it achieves similar prediction quality to methods that use all input, while inducing a much smaller cost. Next, we extend the framework to problems where the input is revealed incremen- tally in a fixed order. We study two applications: simultaneous machine translation and quiz bowl (incremental text classification). We discuss challenges in this set- ting and show that adding domain knowledge eases the decision-making problem. A central theme throughout the chapters is an MDP formulation of a challenging problem with sequential input/output and trade-off decisions, accompanied by a learning algorithm that solves the MDP.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Este Trabalho de Projeto tem como objetivo primordial analisar a tradução, de português para inglês, de textos económico-financeiros, utilizando a plataforma de Tradução Automática (TA) ISTRION. A tradução de conteúdos selecionados da Newsletter Económico-Financeira Maximus Report é efetuada com base na referida plataforma, complementada com outras ferramentas de apoio ao processamento linguístico que sejam consideradas relevantes. Visa-se igualmente com este Trabalho de Projeto analisar as potencialidades desta plataforma, bem como medir os resultados da tradução. Por último pretende-se enquadrar, testar, estudar e medir quais os critérios em que se poderá tornar mais eficiente a tradução destes textos.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Dissertação de Mestrado, Processamento de Linguagem Natural e Indústrias da Língua, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sequence problems belong to the most challenging interdisciplinary topics of the actuality. They are ubiquitous in science and daily life and occur, for example, in form of DNA sequences encoding all information of an organism, as a text (natural or formal) or in form of a computer program. Therefore, sequence problems occur in many variations in computational biology (drug development), coding theory, data compression, quantitative and computational linguistics (e.g. machine translation). In recent years appeared some proposals to formulate sequence problems like the closest string problem (CSP) and the farthest string problem (FSP) as an Integer Linear Programming Problem (ILPP). In the present talk we present a general novel approach to reduce the size of the ILPP by grouping isomorphous columns of the string matrix together. The approach is of practical use, since the solution of sequence problems is very time consuming, in particular when the sequences are long.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Sin dal suo ingresso nel mercato della traduzione, la machine translation (MT) è stata impiegata in numerosi settori professionali e ha suscitato l’interesse del grande pubblico. Con l’avvento della traduzione automatica neurale, la MT ha avuto un ulteriore impulso e ci si è iniziati a interrogare sulla possibilità di estenderne il raggio d’azione all’ambito letterario, che sembra tuttora non toccato dagli automatismi delle macchine in virtù della funzione espressiva e del contenuto creativo dei testi. Il presente elaborato nasce quindi con l’obiettivo di valutare l’effettiva applicabilità della traduzione automatica in ambito letterario attraverso un’attenta analisi dell’output prodotto dalla MT supportata da un confronto con la traduzione svolta da un professionista. Questa analisi si baserà sulla traduzione di un estratto del romanzo Hunger Games di Suzanne Collins e le traduzioni verranno effettuate da due dei migliori software di traduzione automatica neurale attualmente sul mercato: ModernMT e DeepL. Una volta terminata la valutazione dei singoli output verranno mostrati i risultati e determinati i limiti e i vantaggi della machine translation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transformer architectures achieved impressive results in almost any NLP task, such as Text Classification, Machine Translation, and Language Generation. As time went by, transformers continued to improve thanks to larger corpora and bigger networks, reaching hundreds of billions of parameters. Training and deploying such large models has become prohibitively expensive, such that only big high tech companies can afford to train those models. Therefore, a lot of research has been dedicated to reducing a model’s size. In this thesis, we investigate the effects of Vocabulary Transfer and Knowledge Distillation for compressing large Language Models. The goal is to combine these two methodologies to further compress models without significant loss of performance. In particular, we designed different combination strategies and conducted a series of experiments on different vertical domains (medical, legal, news) and downstream tasks (Text Classification and Named Entity Recognition). Four different methods involving Vocabulary Transfer (VIPI) with and without a Masked Language Modelling (MLM) step and with and without Knowledge Distillation are compared against a baseline that assigns random vectors to new elements of the vocabulary. Results indicate that VIPI effectively transfers information of the original vocabulary and that MLM is beneficial. It is also noted that both vocabulary transfer and knowledge distillation are orthogonal to one another and may be applied jointly. The application of knowledge distillation first before subsequently applying vocabulary transfer is recommended. Finally, model performance due to vocabulary transfer does not always show a consistent trend as the vocabulary size is reduced. Hence, the choice of vocabulary size should be empirically selected by evaluation on the downstream task similar to hyperparameter tuning.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Throughout the years, technology has had an undeniable impact on the AVT field. It has revolutionized the way audiovisual content is consumed by allowing audiences to easily access it at any time and on any device. Especially after the introduction of OTT streaming platforms such as Netflix, Amazon Prime Video, Disney+, Apple TV+, and HBO Max, which offer a vast catalog of national and international products, the consumption of audiovisual products has been on a constant rise and, consequently, the demand for localized content too. In turn, the AVT industry resorts to new technologies and practices to handle the ever-growing workload and the faster turnaround times. Due to the numerous implications that it has on the industry, technological advancement can be considered an area of research of particular interest for the AVT studies. However, in the case of dubbing, research and discussion regarding the topic is lagging behind because of the more limited impact that technology has had on the very conservative dubbing industry. Therefore, the aim of the dissertation is to offer an overview of some of the latest technological innovations and practices that have already been implemented (i.e. cloud dubbing and DeepDub technology) or that are still under development and research (i.e. automatic speech recognition and respeaking, machine translation and post-editing, audio-based and visual-based dubbing techniques, text-based editing of talking-head videos, and automatic dubbing), and respectively discuss their reception by the industry professionals, and make assumptions about their future implementation in the dubbing field.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Although the sequential execution speed of logic programs has been greatly improved by the concepts introduced in the Warren Abstract Machine (WAM), parallel execution represents the only way to increase this speed beyond the natural limits of sequential systems. However, most proposed parallel logic programming execution models lack the performance optimizations and storage efficiency of sequential systems. This paper presents a parallel abstract machine which is an extension of the WAM and is thus capable of supporting ANDParallelism without giving up the optimizations present in sequential implementations. A suitable instruction set, which can be used as a target by a variety of logic programming languages, is also included. Special instructions are provided to support a generalized version of "Restricted AND-Parallelism" (RAP), a technique which reduces the overhead traditionally associated with the run-time management of variable binding conflicts to a series of simple run-time checks, which select one out of a series of compiled execution graphs.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved since the advent of the first interpreters. However, higher inference speeds are still required in order to meet the demands of applications such as those contemplated for next generation computer systems. The execution of logic programs in parallel is currently considered a promising strategy for attaining such inference speeds. Logic Programming in turn appears as a suitable programming paradigm for parallel architectures because of the many opportunities for parallel execution present in the implementation of logic programs. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an "Abstract Machine" level suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and therefore the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-Parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set. This design is based on extending to a parallel environment the techniques introduced by the Warren Abstract Machine, which have already made very fast and space efficient sequential systems a reality. Therefore, the model herein presented is capable of retaining sequential execution speed similar to that of high performance sequential systems, while extracting additional gains in speed by efficiently implementing parallel execution. These claims are supported by simulations of the Abstract Machine on sample programs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Demand response can play a very relevant role in the context of power systems with an intensive use of distributed energy resources, from which renewable intermittent sources are a significant part. More active consumers participation can help improving the system reliability and decrease or defer the required investments. Demand response adequate use and management is even more important in competitive electricity markets. However, experience shows difficulties to make demand response be adequately used in this context, showing the need of research work in this area. The most important difficulties seem to be caused by inadequate business models and by inadequate demand response programs management. This paper contributes to developing methodologies and a computational infrastructure able to provide the involved players with adequate decision support on demand response programs and contracts design and use. The presented work uses DemSi, a demand response simulator that has been developed by the authors to simulate demand response actions and programs, which includes realistic power system simulation. It includes an optimization module for the application of demand response programs and contracts using deterministic and metaheuristic approaches. The proposed methodology is an important improvement in the simulator while providing adequate tools for demand response programs adoption by the involved players. A machine learning method based on clustering and classification techniques, resulting in a rule base concerning DR programs and contracts use, is also used. A case study concerning the use of demand response in an incident situation is presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work deals with the cooling of high-speed electric machines, such as motors and generators, through an air gap. It consists of numerical and experimental modelling of gas flow and heat transfer in an annular channel. Velocity and temperature profiles are modelled in the air gap of a high-speed testmachine. Local and mean heat transfer coefficients and total friction coefficients are attained for a smooth rotor-stator combination at a large velocity range. The aim is to solve the heat transfer numerically and experimentally. The FINFLO software, developed at Helsinki University of Technology, has been used in the flow solution, and the commercial IGG and Field view programs for the grid generation and post processing. The annular channel is discretized as a sector mesh. Calculation is performed with constant mass flow rate on six rotational speeds. The effect of turbulence is calculated using three turbulence models. The friction coefficient and velocity factor are attained via total friction power. The first part of experimental section consists of finding the proper sensors and calibrating them in a straight pipe. After preliminary tests, a RdF-sensor is glued on the walls of stator and rotor surfaces. Telemetry is needed to be able to measure the heat transfer coefficients at the rotor. The mean heat transfer coefficients are measured in a test machine on four cooling air mass flow rates at a wide Couette Reynolds number range. The calculated values concerning the friction and heat transfer coefficients are compared with measured and semi-empirical data. Heat is transferred from the hotter stator and rotor surfaces to the coolerair flow in the air gap, not from the rotor to the stator via the air gap, althought the stator temperature is lower than the rotor temperature. The calculatedfriction coefficients fits well with the semi-empirical equations and precedingmeasurements. On constant mass flow rate the rotor heat transfer coefficient attains a saturation point at a higher rotational speed, while the heat transfer coefficient of the stator grows uniformly. The magnitudes of the heat transfer coefficients are almost constant with different turbulence models. The calibrationof sensors in a straight pipe is only an advisory step in the selection process. Telemetry is tested in the pipe conditions and compared to the same measurements with a plain sensor. The magnitudes of the measured data and the data from the semi-empirical equation are higher for the heat transfer coefficients than thenumerical data considered on the velocity range. Friction and heat transfer coefficients are presented in a large velocity range in the report. The goals are reached acceptably using numerical and experimental research. The next challenge is to achieve results for grooved stator-rotor combinations. The work contains also results for an air gap with a grooved stator with 36 slots. The velocity field by the numerical method does not match in every respect the estimated flow mode. The absence of secondary Taylor vortices is evident when using time averagednumerical simulation.