782 resultados para Parallel texts alignment


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Following the internationalization of contemporary higher education, academic institutions based in non-English speaking countries are increasingly urged to produce contents in English to address international prospective students and personnel, as well as to increase their attractiveness. The demand for English translations in the institutional academic domain is consequently increasing at a rate exceeding the capacity of the translation profession. Resources for assisting non-native authors and translators in the production of appropriate texts in L2 are therefore required in order to help academic institutions and professionals streamline their translation workload. Some of these resources include: (i) parallel corpora to train machine translation systems and multilingual authoring tools; and (ii) translation memories for computer-aided tools. The purpose of this study is to create and evaluate reference resources like the ones mentioned in (i) and (ii) through the automatic sentence alignment of a large set of Italian and English as a Lingua Franca (ELF) institutional academic texts given as equivalent but not necessarily parallel (i.e. translated). In this framework, a set of aligning algorithms and alignment tools is examined in order to identify the most profitable one(s) in terms of accuracy and time- and cost-effectiveness. In order to determine the text pairs to align, a sample is selected according to document length similarity (characters) and subsequently evaluated in terms of extent of noisiness/parallelism, alignment accuracy and content leverageability. The results of these analyses serve as the basis for the creation of an aligned bilingual corpus of academic course descriptions, which is eventually used to create a translation memory in TMX format.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Memoir of the editor (with portrait): vol. I, p. [xiii]-xvi.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the advent of High performance computing, it is now possible to achieve orders of magnitude performance and computation e ciency gains over conventional computer architectures. This thesis explores the potential of using high performance computing to accelerate whole genome alignment. A parallel technique is applied to an algorithm for whole genome alignment, this technique is explained and some experiments were carried out to test it. This technique is based in a fair usage of the available resource to execute genome alignment and how this can be used in HPC clusters. This work is a rst approximation to whole genome alignment and it shows the advantages of parallelism and some of the drawbacks that our technique has. This work describes the resource limitations of current WGA applications when dealing with large quantities of sequences. It proposes a parallel heuristic to distribute the load and to assure that alignment quality is mantained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The word tradition has a very specific meaning in linguistics: the passing down of a text, which may have been completed or corrected by different copyists at different times, when the concept of authorship was not the same as it is today. When reading an ancient text the word tradition must be in the reader's mind. To discuss one of the problems an ancient text poses to its modern readers, this work deals with one of the first printed medical texts in Portuguese, the Regimento proueytoso contra ha pestenença, and draws a parallel between it and two related texts, A moche profitable treatise against the pestilence, and the Recopilaçam das cousas que conuem guardar se no modo de preseruar à Cidade de Lixboa E os sãos, & curar os que esteuerem enfermos de Peste. The problems which arise out of the textual structure of those books show how difficult is to establish a tradition of another type, the medical tradition. The linguistic study of the innumerable medieval plague treatises may throw light on the continuities and on the disruptions of the so-called hippocratic-galenical medical tradition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of this study was to increase understanding of the link between the identification of required HR competences and competence management alignment with business strategy in a Finnish, global company employing over 8,000 people and about 100 HR professionals. This aim was approached by analyzing the data collected in focus group interviews using a grounded theory method and in parallel reviewing the literature of strategic human resource management, competence-based strategic management, strategy and foresight. The literature on competence management in different contexts dismisses in-depth discussions on the foresight process and individuals are often forgotten in strategic frameworks. However, corporate foresight helps in the detection of emerging opportunities for innovations and in the implementation of strategy. The empirical findings indicate a lack of strategic leadership and an alignment with HR and business. Accordingly, the most important HR competence areas identified were the need for increasing business understanding and enabling change. As a result, the study provided a holistic model for competence foresight, which introduces HR professionals as strategic change agents in the role of organizational futurists at the heart of the company: facilitating competence foresight and competence development on individual as well as organizational levels, resulting in an agile organization with increased business understanding, sensitive sensors and adaptive actions to enable change.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates certain methods of training adopted in the Statistical Machine Translator (SMT) from English to Malayalam. In English Malayalam SMT, the word to word translation is determined by training the parallel corpus. Our primary goal is to improve the alignment model by reducing the number of possible alignments of all sentence pairs present in the bilingual corpus. Incorporating morphological information into the parallel corpus with the help of the parts of speech tagger has brought around better training results with improved accuracy

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In Statistical Machine Translation from English to Malayalam, an unseen English sentence is translated into its equivalent Malayalam sentence using statistical models. A parallel corpus of English-Malayalam is used in the training phase. Word to word alignments has to be set among the sentence pairs of the source and target language before subjecting them for training. This paper deals with certain techniques which can be adopted for improving the alignment model of SMT. Methods to incorporate the parts of speech information into the bilingual corpus has resulted in eliminating many of the insignificant alignments. Also identifying the name entities and cognates present in the sentence pairs has proved to be advantageous while setting up the alignments. Presence of Malayalam words with predictable translations has also contributed in reducing the insignificant alignments. Moreover, reduction of the unwanted alignments has brought in better training results. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate thin films of cylinder-forming diblock copolymer confined between electrically charged parallel plates, using self-consistent-field theory ( SCFT) combined with an exact treatment for linear dielectric materials. Our study focuses on the competition between the surface interactions, which tend to orient cylinder domains parallel to the plates, and the electric field, which favors a perpendicular orientation. The effect of the electric field on the relative stability of the competing morphologies is demonstrated with equilibrium phase diagrams, calculated with the aid of a weak-field approximation. As hoped, modest electric fields are shown to have a significant stabilizing effect on perpendicular cylinders, particularly for thicker films. Our improved SCFT-based treatment removes most of the approximations implemented by previous approaches, thereby managing to resolve outstanding qualitative inconsistencies among different approximation schemes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The alignment of model amyloid peptide YYKLVFFC is investigated in bulk and at a solid surface using a range of spectroscopic methods employing polarized radiation. The peptide is based on a core sequence of the amyloid beta (A beta) peptide, KLVFF. The attached tyrosine and cysteine units are exploited to yield information on alignment and possible formation of disulfide or dityrosine links. Polarized Raman spectroscopy on aligned stalks provides information on tyrosine orientation, which complements data from linear dichroism (LD) on aqueous solutions subjected to shear in a Couette cell. LD provides a detailed picture of alignment of peptide strands and aromatic residues and was also used to probe the kinetics of self-assembly. This suggests initial association of phenylalanine residues, followed by subsequent registry of strands and orientation of tyrosine residues. X-ray diffraction (XRD) data from aligned stalks is used to extract orientational order parameters from the 0.48 nm reflection in the cross-beta pattern, from which an orientational distribution function is obtained. X-ray diffraction on solutions subject to capillary flow confirmed orientation in situ at the level of the cross-beta pattern. The information on fibril and tyrosine orientation from polarized Raman spectroscopy is compared with results from NEXAFS experiments on samples prepared as films on silicon. This indicates fibrils are aligned parallel to the surface, with phenyl ring normals perpendicular to the surface. Possible disulfide bridging leading to peptide dimer formation was excluded by Raman spectroscopy, whereas dityrosine formation was probed by fluorescence experiments and was found not to occur except under alkaline conditions. Congo red binding was found not to influence the cross-beta XRD pattern.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

WThe capillary flow alignment of the thermotropic liquid crystal 4-n-octyl-4′-cyanobiphenyl in the nematic and smectic phases is investigated using time-resolved synchrotron small-angle x-ray scattering. Samples were cooled from the isotropic phase to erase prior orientation. Upon cooling through the nematic phase under Poiseuille flow in a circular capillary, a transition from the alignment of mesogens along the flow direction to the alignment of layers along the flow direction (mesogens perpendicular to flow) appears to occur continuously at the cooling rate applied. The transition is centered on a temperature at which the Leslie viscosity coefficient α3 changes sign. The configuration with layers aligned along the flow direction is also observed in the smectic phase. The transition in the nematic phase on cooling has previously been ascribed to an aligning-nonaligning or tumbling transition. At high flow rates there is evidence for tumbling around an average alignment of layers along the flow direction. At lower flow rates this orientation is more clearly defined. The layer alignment is ascribed to surface-induced ordering propagating into the bulk of the capillary, an observation supported by the parallel alignment of layers observed for a static sample at low temperatures in the nematic phase.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The InteGrade middleware intends to exploit the idle time of computing resources in computer laboratories. In this work we investigate the performance of running parallel applications with communication among processors on the InteGrade grid. As costly communication on a grid can be prohibitive, we explore the so-called systolic or wavefront paradigm to design the parallel algorithms in which no global communication is used. To evaluate the InteGrade middleware we considered three parallel algorithms that solve the matrix chain product problem, the 0-1 Knapsack Problem, and the local sequence alignment problem, respectively. We show that these three applications running under the InteGrade middleware and MPI take slightly more time than the same applications running on a cluster with only LAM-MPI support. The results can be considered promising and the time difference between the two is not substantial. The overhead of the InteGrade middleware is acceptable, in view of the benefits obtained to facilitate the use of grid computing by the user. These benefits include job submission, checkpointing, security, job migration, etc. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Zeitreihen sind allgegenwärtig. Die Erfassung und Verarbeitung kontinuierlich gemessener Daten ist in allen Bereichen der Naturwissenschaften, Medizin und Finanzwelt vertreten. Das enorme Anwachsen aufgezeichneter Datenmengen, sei es durch automatisierte Monitoring-Systeme oder integrierte Sensoren, bedarf außerordentlich schneller Algorithmen in Theorie und Praxis. Infolgedessen beschäftigt sich diese Arbeit mit der effizienten Berechnung von Teilsequenzalignments. Komplexe Algorithmen wie z.B. Anomaliedetektion, Motivfabfrage oder die unüberwachte Extraktion von prototypischen Bausteinen in Zeitreihen machen exzessiven Gebrauch von diesen Alignments. Darin begründet sich der Bedarf nach schnellen Implementierungen. Diese Arbeit untergliedert sich in drei Ansätze, die sich dieser Herausforderung widmen. Das umfasst vier Alignierungsalgorithmen und ihre Parallelisierung auf CUDA-fähiger Hardware, einen Algorithmus zur Segmentierung von Datenströmen und eine einheitliche Behandlung von Liegruppen-wertigen Zeitreihen.rnrnDer erste Beitrag ist eine vollständige CUDA-Portierung der UCR-Suite, die weltführende Implementierung von Teilsequenzalignierung. Das umfasst ein neues Berechnungsschema zur Ermittlung lokaler Alignierungsgüten unter Verwendung z-normierten euklidischen Abstands, welches auf jeder parallelen Hardware mit Unterstützung für schnelle Fouriertransformation einsetzbar ist. Des Weiteren geben wir eine SIMT-verträgliche Umsetzung der Lower-Bound-Kaskade der UCR-Suite zur effizienten Berechnung lokaler Alignierungsgüten unter Dynamic Time Warping an. Beide CUDA-Implementierungen ermöglichen eine um ein bis zwei Größenordnungen schnellere Berechnung als etablierte Methoden.rnrnAls zweites untersuchen wir zwei Linearzeit-Approximierungen für das elastische Alignment von Teilsequenzen. Auf der einen Seite behandeln wir ein SIMT-verträgliches Relaxierungschema für Greedy DTW und seine effiziente CUDA-Parallelisierung. Auf der anderen Seite führen wir ein neues lokales Abstandsmaß ein, den Gliding Elastic Match (GEM), welches mit der gleichen asymptotischen Zeitkomplexität wie Greedy DTW berechnet werden kann, jedoch eine vollständige Relaxierung der Penalty-Matrix bietet. Weitere Verbesserungen umfassen Invarianz gegen Trends auf der Messachse und uniforme Skalierung auf der Zeitachse. Des Weiteren wird eine Erweiterung von GEM zur Multi-Shape-Segmentierung diskutiert und auf Bewegungsdaten evaluiert. Beide CUDA-Parallelisierung verzeichnen Laufzeitverbesserungen um bis zu zwei Größenordnungen.rnrnDie Behandlung von Zeitreihen beschränkt sich in der Literatur in der Regel auf reellwertige Messdaten. Der dritte Beitrag umfasst eine einheitliche Methode zur Behandlung von Liegruppen-wertigen Zeitreihen. Darauf aufbauend werden Distanzmaße auf der Rotationsgruppe SO(3) und auf der euklidischen Gruppe SE(3) behandelt. Des Weiteren werden speichereffiziente Darstellungen und gruppenkompatible Erweiterungen elastischer Maße diskutiert.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem.