30 resultados para Machine Translation (MT)
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English-Portuguese and Spanish-Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.
Resumo:
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
Resumo:
This study evaluated the microbiological quality of hamburgers and the microbe community on the hands of vendors in Cuiabá, Mato Grosso, Brazil, in relation to vendors´ awareness as to what constitute acceptable food-handling practices as part of a broad-spectrum research programme on street foods in Brazil . Sale of the hamburger known as the 'baguncinha' is common and widespread in urban Cuiabá, Mato Grosso, Brazil. Food inspectors encounter various difficulties in carrying out inspections. One hundred and five hamburgers samples were evaluated using conventional methods including tests for facultative aerobic and/or anaerobic mesophytic bacteria, coliform counts at 45 °C, the coagulase test for Staphylococcus, Gram-staining for the presence of Bacillus cereus, Clostridium sulphite reductase and Salmonella spp. The hamburgers were categorized as unsuitable for human consumption in 31.4% of samples, with those testing positive for coliforms and Staphylococcus at unacceptably high levels by Brazilian standards. High levels of microbiological contamination were detected on the hands of the food handlers and mesophytic bacterial counts reached 1.8 × 10(4) CFU/hand. Interviews were carried out by means of questionnaires to evaluate levels of awareness as to acceptable food handling practices and it was found that 80,1% of vendors had never participated in any kind of training.
Resumo:
This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.
Resumo:
With the relentless quest for improved performance driving ever tighter tolerances for manufacturing, machine tools are sometimes unable to meet the desired requirements. One option to improve the tolerances of machine tools is to compensate for their errors. Among all possible sources of machine tool error, thermally induced errors are, in general for newer machines, the most important. The present work demonstrates the evaluation and modelling of the behaviour of the thermal errors of a CNC cylindrical grinding machine during its warm-up period.
Resumo:
Paper products show dimensional changes when subjected to moisture content modification. Hygroexpansivity was investigated in a commercial paper machine operating at 1256 m/min by a set of measurements on 75 g/m(2) reprographic bleached eucalyptus pulp paper samples. The present work shows hygroexpansivity development in different sections of the paper machine along the manufacturing direction. The measurement results demonstrate the effects of papermaking process operations on paper hygroexpansivity and lead to the confirmation of fiber orientation degree, drying restraint and shrinkage and paper tension as significant influencing factors. Structural, strength and elastic properties of paper were also measured as a function of machine direction position and presented for discussion purposes.
Resumo:
This paper addresses the minimization of the mean absolute deviation from a common due date in a two-machine flowshop scheduling problem. We present heuristics that use an algorithm, based on proposed properties, which obtains an optimal schedule fora given job sequence. A new set of benchmark problems is presented with the purpose of evaluating the heuristics. Computational experiments show that the developed heuristics outperform results found in the literature for problems up to 500 jobs. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
This paper addresses the non-preemptive single machine scheduling problem to minimize total tardiness. We are interested in the online version of this problem, where orders arrive at the system at random times. Jobs have to be scheduled without knowledge of what jobs will come afterwards. The processing times and the due dates become known when the order is placed. The order release date occurs only at the beginning of periodic intervals. A customized approximate dynamic programming method is introduced for this problem. The authors also present numerical experiments that assess the reliability of the new approach and show that it performs better than a myopic policy.
Resumo:
This paper addresses the single machine scheduling problem with a common due date aiming to minimize earliness and tardiness penalties. Due to its complexity, most of the previous studies in the literature deal with this problem using heuristics and metaheuristics approaches. With the intention of contributing to the study of this problem, a branch-and-bound algorithm is proposed. Lower bounds and pruning rules that exploit properties of the problem are introduced. The proposed approach is examined through a computational comparative study with 280 problems involving different due date scenarios. In addition, the values of optimal solutions for small problems from a known benchmark are provided.
Resumo:
This article examines book illustrations through the prism of Translation Studies. It mainly suggests that the pictures in illustrated books are (intersemiotic) translations of the text and that, as such, they can be analyzed making use of the same tools applied to verbal interlingual translation. The first section deals with the theoretical bases upon which illustrations can be regarded as translations, concentrating on theories of re-creation, as illustration is viewed essentially as the re-creation of the text in visual form. One of the claims in this section is that, because illustration is carried out in very similar ways as interlingual translation itself, the term ""intersemiotic"" relates more to the (obvious) difference of medium. For this reason the word is most often referred to in parentheses. The second section discusses three particular ways through which illustrations can translate the text, namely, by reproducing the textual elements literally in the picture, by emphasizing a specific narrative element, and by adapting the pictures to a certain ideology or artistic trend. The example illustrations are extracted from different. kinds of publication and media, ranging from Virgil`s Aeneid, Lewis Carroll`s Alice in Wonderland and Mark Twain`s Adventures of Huckleberry Finn to an online comic version of Shakespeare`s Hamlet.
Resumo:
There is not a specific test to diagnose Alzheimer`s disease (AD). Its diagnosis should be based upon clinical history, neuropsychological and laboratory tests, neuroimaging and electroencephalography (EEG). Therefore, new approaches are necessary to enable earlier and more accurate diagnosis and to follow treatment results. In this study we used a Machine Learning (ML) technique, named Support Vector Machine (SVM), to search patterns in EEG epochs to differentiate AD patients from controls. As a result, we developed a quantitative EEG (qEEG) processing method for automatic differentiation of patients with AD from normal individuals, as a complement to the diagnosis of probable dementia. We studied EEGs from 19 normal subjects (14 females/5 males, mean age 71.6 years) and 16 probable mild to moderate symptoms AD patients (14 females/2 males, mean age 73.4 years. The results obtained from analysis of EEG epochs were accuracy 79.9% and sensitivity 83.2%. The analysis considering the diagnosis of each individual patient reached 87.0% accuracy and 91.7% sensitivity.