965 resultados para Sentence alignment
Resumo:
Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.
Resumo:
The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
Resumo:
This paper demonstrates a novel distributed architecture to facilitate the acquisition of Language Resources. We build a factory that automates the stages involved in the acquisition, production, updating and maintenance of these resources. The factory is designed as a platform where functionalities are deployed as web services, which can be combined in complex acquisition chains using workflows. We show a case study, which acquires a Translation Memory for a given pair of languages and a domain using web services for crawling, sentence alignment and conversion to TMX.
Resumo:
Afin d'enrichir les données de corpus bilingues parallèles, il peut être judicieux de travailler avec des corpus dits comparables. En effet dans ce type de corpus, même si les documents dans la langue cible ne sont pas l'exacte traduction de ceux dans la langue source, on peut y retrouver des mots ou des phrases en relation de traduction. L'encyclopédie libre Wikipédia constitue un corpus comparable multilingue de plusieurs millions de documents. Notre travail consiste à trouver une méthode générale et endogène permettant d'extraire un maximum de phrases parallèles. Nous travaillons avec le couple de langues français-anglais mais notre méthode, qui n'utilise aucune ressource bilingue extérieure, peut s'appliquer à tout autre couple de langues. Elle se décompose en deux étapes. La première consiste à détecter les paires d’articles qui ont le plus de chance de contenir des traductions. Nous utilisons pour cela un réseau de neurones entraîné sur un petit ensemble de données constitué d'articles alignés au niveau des phrases. La deuxième étape effectue la sélection des paires de phrases grâce à un autre réseau de neurones dont les sorties sont alors réinterprétées par un algorithme d'optimisation combinatoire et une heuristique d'extension. L'ajout des quelques 560~000 paires de phrases extraites de Wikipédia au corpus d'entraînement d'un système de traduction automatique statistique de référence permet d'améliorer la qualité des traductions produites. Nous mettons les données alignées et le corpus extrait à la disposition de la communauté scientifique.
Resumo:
La traduction statistique requiert des corpus parallèles en grande quantité. L’obtention de tels corpus passe par l’alignement automatique au niveau des phrases. L’alignement des corpus parallèles a reçu beaucoup d’attention dans les années quatre vingt et cette étape est considérée comme résolue par la communauté. Nous montrons dans notre mémoire que ce n’est pas le cas et proposons un nouvel aligneur que nous comparons à des algorithmes à l’état de l’art. Notre aligneur est simple, rapide et permet d’aligner une très grande quantité de données. Il produit des résultats souvent meilleurs que ceux produits par les aligneurs les plus élaborés. Nous analysons la robustesse de notre aligneur en fonction du genre des textes à aligner et du bruit qu’ils contiennent. Pour cela, nos expériences se décomposent en deux grandes parties. Dans la première partie, nous travaillons sur le corpus BAF où nous mesurons la qualité d’alignement produit en fonction du bruit qui atteint les 60%. Dans la deuxième partie, nous travaillons sur le corpus EuroParl où nous revisitons la procédure d’alignement avec laquelle le corpus Europarl a été préparé et montrons que de meilleures performances au niveau des systèmes de traduction statistique peuvent être obtenues en utilisant notre aligneur.
Resumo:
This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy
Resumo:
In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.
Resumo:
We present the first spin alignment measurements for the K*(0)(892) and phi(1020) vector mesons produced at midrapidity with transverse momenta up to 5 GeV/c at root s(NN) = 200 GeV at RHIC. The diagonal spin-density matrix elements with respect to the reaction plane in Au+Au collisions are rho(00) = 0.32 +/- 0.04 (stat) +/- 0.09 (syst) for the K*(0) (0.8 < p(T) < 5.0 GeV/c) and rho(00) = 0.34 +/- 0.02 (stat) +/- 0.03 (syst) for the phi (0.4 < p(T) < 5.0 GeV/c) and are constant with transverse momentum and collision centrality. The data are consistent with the unpolarized expectation of 1/3 and thus no evidence is found for the transfer of the orbital angular momentum of the colliding system to the vector-meson spins. Spin alignments for K(*0) and phi in Au+Au collisions were also measured with respect to the particle's production plane. The phi result, rho(00) = 0.41 +/- 0.02 (stat) +/- 0.04 (syst), is consistent with that in p+p collisions, rho(00) = 0.39 +/- 0.03 (stat) +/- 0.06 (syst), also measured in this work. The measurements thus constrain the possible size of polarization phenomena in the production dynamics of vector mesons.
Resumo:
Objective: Postural assessment through photography is a simple method that allows the acquisition of quantitative values to define the alignment of body segments. The purpose of this study was to quantitatively assess the postural alignment of several body segments in standing through anterior, posterior, and lateral views. Methods: In this cross-sectional study, 122 subjects were initially evaluated. Seven subjects were excluded from the study after cluster analysis. The final sample had 115 subjects, 75% women with a mean age of 26 + 7 years. Photographs were taken from anterior, posterior, and lateral views after placement of markers on specific anatomical points. Photographs were analyzed using free Postural Analysis Software/Software of Postural Analysis (PAS/SAPO). Quantitative values for postural analysis variables were ascertained for head, upper and lower limbs, and trunk, along with the frequency of inclinations to the left and to the right. Results: Regarding the head, 88% of the sample presented some inclination, 67% of which was to the right. There was a predominance of right inclination of the shoulder and pelvis in 68% and 43% of study subjects, respectively. Lower limbs presented mean alignment of 178 in the anterior view, and the trunk showed predominant right inclination in 66% of participants. Conclusion: Small asymmetries were observed in anterior and posterior views. This study suggests that there is no symmetry in postural alignment and that small asymmetries represent the normative standard for posture in standing. (J Manipulative Physiol Ther 2011;34:371-380)
Resumo:
Study design: Radiographic analysis of sagittal spinal alignment of paraplegics in a standing position under surface neuromuscular electrical stimulation (NMES). Objectives: Describing the radiographic parameters of the sagittal spinal alignment of paraplegics going through a rehabilitation program with NMES. Setting: The University Hospital`s Ambulatory (UNICAMP), Campinas, Sao Paulo, Brazil. Methods: Panoramic X-ray images in profile were taken for 10 paraplegics. All patients participated in the rehabilitation program and were able to perform gait through NMES of the femoral quadriceps muscles. The radiographic parameters used for the analysis were the same as those described in the literature for healthy people. The results were didactically organized into three groups: anatomical shape of the spine, morphology and kinetics of the pelvis and spinopelvic alignment. Results: The physiological curvature of the spine in paraplegics showed average values similar to those described in the literature for healthy patients. The inversion of the pelvic tilt and the increase in the sacral slope were defined by the anterior backward rotation of the pelvis. The existing theoretical mathematical formulas that define lumbar lordosis, pelvic incidence and pelvic tilt showed normal values, despite the anterior intense sagittal imbalance. Conclusions: The adaptive posture of the spine in paraplegics standing through the stimulation of the femoral quadriceps does not allow for a neutral sagittal alignment. This novel radiographic detailed description of the various segments of the spine can be of assistance toward the understanding of the global postural control for such subjects. Spinal Cord (2010) 48, 251-256; doi: 10.1038/sc.2009.123; published online 29 September 2009
Resumo:
A gap has been identified in the literature on the diagnosis and monitoring of the degree of strategic alignment. The main objective of this article is to diagnose and analyze the strategic alignment profile using the alignment diagnostic profile (ADP) tool, which enables organizations to show visually their degree of strategic alignment. The methodological approach adopted is multiple-case studies, which were conducted at five organizations in the medical diagnostics sector. The results indicate that the ADP enables organizations to understand the steps required to improve their level of alignment and to identify and locate gaps and conflicts.
Resumo:
Conventionally, protein structure prediction via threading relies on some nonoptimal method to align a protein sequence to each member of a library of known structures. We show how a score function (force field) can be modified so as to allow the direct application of a dynamic programming algorithm to the problem. This involves an approximation whose damage can be minimized by an optimization process during score function parameter determination. The method is compared to sequence to structure alignments using a more conventional pair-wise score function and the frozen approximation. The new method produces results comparable to the frozen approximation, but is faster and has fewer adjustable parameters. It is also free of memory of the template's original amino acid sequence, and does not suffer from a problem of nonconvergence, which can be shown to occur with the frozen approximation. Alignments generated by the simplified score function can then be ranked using a second score function with the approximations removed. (C) 1999 John Wiley & Sons, Inc.
Resumo:
We illustrate the flow behaviour of fluids with isotropic and anisotropic microstructure (internal length, layering with bending stiffness) by means of numerical simulations of silo discharge and flow alignment in simple shear. The Cosserat theory is used to provide an internal length in the constitutive model through bending stiffness to describe isotropic microstructure and this theory is coupled to a director theory to add specific orientation of grains to describe anisotropic microstructure. The numerical solution is based on an implicit form of the Material Point Method developed by Moresi et al. [1].
Resumo:
Objective: To evaluate influences of vitrification and warming of metaphase II (MII) mouse oocytes on survival, spindle dynamics. spindle morphology, and chromatin alignment on metaphase plates. Design: Experimental animal Study. Setting: University animal laboratory. Animal(s): Eight-week-old B6D2F1 mice. Intervention(s): Denuded MII oocytes were used fresh (control), exposed to vitrification/warming solutions (Sol Expos), or vitrified and warmed (Vitr). Main Outcome Measure(s): Oocyte recovery and survival after warming and the influence of solution exposure and cryopreservation on spindle dynamics and chromatin alignment. Result(s): Cryopreservation of two or 10 oocytes per straw resulted in recovery (100% +/- 0% and 95% +/- 4%, respectively; mean SE) and survival (95% 2% and 98% 2%, respectively). Immediately after warming (Vitr), significantly fewer oocytes assessed with immunocytochemistry contained spindles, compared with control and Sol Expos. When oocytes were placed into a 3 degrees 7C environment for 2 hours after exposure or warming, the ability to recognize spindles by immunocytochemistry was not significantly different between groups. Using live-cell time-lapse imaging with LC-Polscope, similar time-dependent spindle formation dynamics were observed. At 2 hours after collection or treatment, spindle morphology and length were not significantly different between the groups, nor was the incidence of aberrant alignment of chromatin on metaphase plates. Conclusion(s): Immediately after warming of vitrified MII oocytes, beta-tubulin is depolymerized and chromatin remains condensed on the metaphase plate. Within a 2-hour period, beta-tubulin repolymerizes, forming morphologically normal metaphase spindles with properly aligned chromatin.