37 resultados para sistemi integrati, CAT tools, machine translation
em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo (BDPI/USP)
Resumo:
Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English-Portuguese and Spanish-Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Complex networks have been increasingly used in text analysis, including in connection with natural language processing tools, as important text features appear to be captured by the topology and dynamics of the networks. Following previous works that apply complex networks concepts to text quality measurement, summary evaluation, and author characterization, we now focus on machine translation (MT). In this paper we assess the possible representation of texts as complex networks to evaluate cross-linguistic issues inherent in manual and machine translation. We show that different quality translations generated by NIT tools can be distinguished from their manual counterparts by means of metrics such as in-(ID) and out-degrees (OD), clustering coefficient (CC), and shortest paths (SP). For instance, we demonstrate that the average OD in networks of automatic translations consistently exceeds the values obtained for manual ones, and that the CC values of source texts are not preserved for manual translations, but are for good automatic translations. This probably reflects the text rearrangements humans perform during manual translation. We envisage that such findings could lead to better NIT tools and automatic evaluation metrics.
Resumo:
Due to idiosyncrasies in their syntax, semantics or frequency, Multiword Expressions (MWEs) have received special attention from the NLP community, as the methods and techniques developed for the treatment of simplex words are not necessarily suitable for them. This is certainly the case for the automatic acquisition of MWEs from corpora. A lot of effort has been directed to the task of automatically identifying them, with considerable success. In this paper, we propose an approach for the identification of MWEs in a multilingual context, as a by-product of a word alignment process, that not only deals with the identification of possible MWE candidates, but also associates some multiword expressions with semantics. The results obtained indicate the feasibility and low costs in terms of tools and resources demanded by this approach, which could, for example, facilitate and speed up lexicographic work.
Resumo:
Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.
Resumo:
With the relentless quest for improved performance driving ever tighter tolerances for manufacturing, machine tools are sometimes unable to meet the desired requirements. One option to improve the tolerances of machine tools is to compensate for their errors. Among all possible sources of machine tool error, thermally induced errors are, in general for newer machines, the most important. The present work demonstrates the evaluation and modelling of the behaviour of the thermal errors of a CNC cylindrical grinding machine during its warm-up period.
Resumo:
This article examines book illustrations through the prism of Translation Studies. It mainly suggests that the pictures in illustrated books are (intersemiotic) translations of the text and that, as such, they can be analyzed making use of the same tools applied to verbal interlingual translation. The first section deals with the theoretical bases upon which illustrations can be regarded as translations, concentrating on theories of re-creation, as illustration is viewed essentially as the re-creation of the text in visual form. One of the claims in this section is that, because illustration is carried out in very similar ways as interlingual translation itself, the term ""intersemiotic"" relates more to the (obvious) difference of medium. For this reason the word is most often referred to in parentheses. The second section discusses three particular ways through which illustrations can translate the text, namely, by reproducing the textual elements literally in the picture, by emphasizing a specific narrative element, and by adapting the pictures to a certain ideology or artistic trend. The example illustrations are extracted from different. kinds of publication and media, ranging from Virgil`s Aeneid, Lewis Carroll`s Alice in Wonderland and Mark Twain`s Adventures of Huckleberry Finn to an online comic version of Shakespeare`s Hamlet.
Resumo:
Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Visceral leishmaniasis (VL) is a widely spread zoonotic disease. In Brazil the disease is caused by Leishmania (Leishmania) infantum chagasi. Peridomestic sandflies acquire the etiological agent by feeding on blood of infected reservoir animals, such as dogs or wildlife. The disease is endemic in Brazil and epidemic foci have been reported in densely populated cities all over the country. Many clinical features of Leishmania infection are related to the host-parasite relationship, and many candidate virulence factors in parasites that cause VL have been studied such as A2 genes. The A2 gene was first isolated in 1994 and then in 2005 three new alleles were described in Leishmania (Leishmania) infantum. In the present study we amplified by polymerase chain reaction (PCR) and sequenced the A2 gene from the genome of a clonal population of L. (L.) infantum chagasi VL parasites. The L. (L.) infantum chagasi A2 gene was amplified, cloned, and sequenced in. The amplified fragment showed approximately 90% similarity with another A2 allele amplified in Leishmania (Leishmania) donovani and in L.(L.) infantum described in literature. However, nucleotide translation shows differences in protein amino acid sequence, which may be essential to determine the variability of A2 genes in the species of the L. (L.) donovani complex and represents an additional tool to help understanding the role this gene family may have in establishing virulence and immunity in visceral leishmaniasis. This knowledge is important for the development of more accurate diagnostic tests and effective tools for disease control.
Resumo:
Due to the imprecise nature of biological experiments, biological data is often characterized by the presence of redundant and noisy data. This may be due to errors that occurred during data collection, such as contaminations in laboratorial samples. It is the case of gene expression data, where the equipments and tools currently used frequently produce noisy biological data. Machine Learning algorithms have been successfully used in gene expression data analysis. Although many Machine Learning algorithms can deal with noise, detecting and removing noisy instances from the training data set can help the induction of the target hypothesis. This paper evaluates the use of distance-based pre-processing techniques for noise detection in gene expression data classification problems. This evaluation analyzes the effectiveness of the techniques investigated in removing noisy data, measured by the accuracy obtained by different Machine Learning classifiers over the pre-processed data.
Resumo:
This paper proposes an architecture for machining process and production monitoring to be applied in machine tools with open Computer numerical control (CNC). A brief description of the advantages of using open CNC for machining process and production monitoring is presented with an emphasis on the CNC architecture using a personal computer (PC)-based human-machine interface. The proposed architecture uses the CNC data and sensors to gather information about the machining process and production. It allows the development of different levels of monitoring systems with mininium investment, minimum need for sensor installation, and low intrusiveness to the process. Successful examples of the utilization of this architecture in a laboratory environment are briefly described. As a Conclusion, it is shown that a wide range of monitoring solutions can be implemented in production processes using the proposed architecture.
Resumo:
This work proposes a new approach using a committee machine of artificial neural networks to classify masses found in mammograms as benign or malignant. Three shape factors, three edge-sharpness measures, and 14 texture measures are used for the classification of 20 regions of interest (ROIs) related to malignant tumors and 37 ROIs related to benign masses. A group of multilayer perceptrons (MLPs) is employed as a committee machine of neural network classifiers. The classification results are reached by combining the responses of the individual classifiers. Experiments involving changes in the learning algorithm of the committee machine are conducted. The classification accuracy is evaluated using the area A. under the receiver operating characteristics (ROC) curve. The A, result for the committee machine is compared with the A, results obtained using MLPs and single-layer perceptrons (SLPs), as well as a linear discriminant analysis (LDA) classifier Tests are carried out using the student's t-distribution. The committee machine classifier outperforms the MLP SLP, and LDA classifiers in the following cases: with the shape measure of spiculation index, the A, values of the four methods are, in order 0.93, 0.84, 0.75, and 0.76; and with the edge-sharpness measure of acutance, the values are 0.79, 0.70, 0.69, and 0.74. Although the features with which improvement is obtained with the committee machines are not the same as those that provided the maximal value of A(z) (A(z) = 0.99 with some shape features, with or without the committee machine), they correspond to features that are not critically dependent on the accuracy of the boundaries of the masses, which is an important result. (c) 2008 SPIE and IS&T.
Resumo:
Balance problems in hemiparetic patients after stroke can be caused by different impairments in the physiological systems involved in Postural control, including sensory afferents, movement strategies, biomechanical constraints, cognitive processing, and perception of verticality. Balance impairments and disabilities must be appropriately addressed. This article reviews the most common balance abnormalities in hemiparetic patients with stroke and the main tools used to diagnose them.
Resumo:
We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 <= r <= 21 (85.2%) and r >= 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 <= r <= 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (> 80%) while simultaneously achieving low contamination (similar to 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21.
Resumo:
The aim of the study was to evaluate the possible relationships between stress tolerance, training load, banal infections and salivary parameters during 4 weeks of regular training in fifteen basketball players. The Daily Analysis of Life Demands for Athletes` questionnaire (sources and symptoms of stress) and the Wisconsin Upper Respiratory Symptom Survey were used on a weekly basis. Salivary cortisol and salivary immunoglobulin A (SIgA) were collected at the beginning (before) and after the study, and measured by enzyme-linked immunosorbent assay (ELISA). Ratings of perceived exertion (training load) were also obtained. The results from ANOVA with repeated measures showed greater training loads, number of upper respiratory tract infection episodes and negative sensation to both symptoms and sources of stress, at week 2 (p < 0.05). Significant increases in cortisol levels and decreases in SIgA secretion rate were noted (before to after). Negative sensations to symptoms of stress at week 4 were inversely and significantly correlated with SIgA secretion rate. A positive and significant relationship between sources and symptoms of stress at week 4 and cortisol levels were verified. In summary, an approach incorporating in conjunction psychometric tools and salivary biomarkers could be an efficient means of monitoring reaction to stress in sport. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
The main purpose of this paper is to present architecture of automated system that allows monitoring and tracking in real time (online) the possible occurrence of faults and electromagnetic transients observed in primary power distribution networks. Through the interconnection of this automated system to the utility operation center, it will be possible to provide an efficient tool that will assist in decisionmaking by the Operation Center. In short, the desired purpose aims to have all tools necessary to identify, almost instantaneously, the occurrence of faults and transient disturbances in the primary power distribution system, as well as to determine its respective origin and probable location. The compilations of results from the application of this automated system show that the developed techniques provide accurate results, identifying and locating several occurrences of faults observed in the distribution system.