889 resultados para Corpus parallèles
Resumo:
La traduction statistique requiert des corpus parallèles en grande quantité. L’obtention de tels corpus passe par l’alignement automatique au niveau des phrases. L’alignement des corpus parallèles a reçu beaucoup d’attention dans les années quatre vingt et cette étape est considérée comme résolue par la communauté. Nous montrons dans notre mémoire que ce n’est pas le cas et proposons un nouvel aligneur que nous comparons à des algorithmes à l’état de l’art. Notre aligneur est simple, rapide et permet d’aligner une très grande quantité de données. Il produit des résultats souvent meilleurs que ceux produits par les aligneurs les plus élaborés. Nous analysons la robustesse de notre aligneur en fonction du genre des textes à aligner et du bruit qu’ils contiennent. Pour cela, nos expériences se décomposent en deux grandes parties. Dans la première partie, nous travaillons sur le corpus BAF où nous mesurons la qualité d’alignement produit en fonction du bruit qui atteint les 60%. Dans la deuxième partie, nous travaillons sur le corpus EuroParl où nous revisitons la procédure d’alignement avec laquelle le corpus Europarl a été préparé et montrons que de meilleures performances au niveau des systèmes de traduction statistique peuvent être obtenues en utilisant notre aligneur.
Resumo:
Les travaux entrepris dans le cadre de la présente thèse portent sur l’analyse de l’équivalence terminologique en corpus parallèle et en corpus comparable. Plus spécifiquement, nous nous intéressons aux corpus de textes spécialisés appartenant au domaine du changement climatique. Une des originalités de cette étude réside dans l’analyse des équivalents de termes simples. Les bases théoriques sur lesquelles nous nous appuyons sont la terminologie textuelle (Bourigault et Slodzian 1999) et l’approche lexico-sémantique (L’Homme 2005). Cette étude poursuit deux objectifs. Le premier est d’effectuer une analyse comparative de l’équivalence dans les deux types de corpus afin de vérifier si l’équivalence terminologique observable dans les corpus parallèles se distingue de celle que l’on trouve dans les corpus comparables. Le deuxième consiste à comparer dans le détail les équivalents associés à un même terme anglais, afin de les décrire et de les répertorier pour en dégager une typologie. L’analyse détaillée des équivalents français de 343 termes anglais est menée à bien grâce à l’exploitation d’outils informatiques (extracteur de termes, aligneur de textes, etc.) et à la mise en place d’une méthodologie rigoureuse divisée en trois parties. La première partie qui est commune aux deux objectifs de la recherche concerne l’élaboration des corpus, la validation des termes anglais et le repérage des équivalents français dans les deux corpus. La deuxième partie décrit les critères sur lesquels nous nous appuyons pour comparer les équivalents des deux types de corpus. La troisième partie met en place la typologie des équivalents associés à un même terme anglais. Les résultats pour le premier objectif montrent que sur les 343 termes anglais analysés, les termes présentant des équivalents critiquables dans les deux corpus sont relativement peu élevés (12), tandis que le nombre de termes présentant des similitudes d’équivalence entre les corpus est très élevé (272 équivalents identiques et 55 équivalents non critiquables). L’analyse comparative décrite dans ce chapitre confirme notre hypothèse selon laquelle la terminologie employée dans les corpus parallèles ne se démarque pas de celle des corpus comparables. Les résultats pour le deuxième objectif montrent que de nombreux termes anglais sont rendus par plusieurs équivalents (70 % des termes analysés). Il est aussi constaté que ce ne sont pas les synonymes qui forment le groupe le plus important des équivalents, mais les quasi-synonymes. En outre, les équivalents appartenant à une autre partie du discours constituent une part importante des équivalents. Ainsi, la typologie élaborée dans cette thèse présente des mécanismes de l’équivalence terminologique peu décrits aussi systématiquement dans les travaux antérieurs.
Resumo:
The QUT-NOISE-TIMIT corpus consists of 600 hours of noisy speech sequences designed to enable a thorough evaluation of voice activity detection (VAD) algorithms across a wide variety of common background noise scenarios. In order to construct the final mixed-speech database, a collection of over 10 hours of background noise was conducted across 10 unique locations covering 5 common noise scenarios, to create the QUT-NOISE corpus. This background noise corpus was then mixed with speech events chosen from the TIMIT clean speech corpus over a wide variety of noise lengths, signal-to-noise ratios (SNRs) and active speech proportions to form the mixed-speech QUT-NOISE-TIMIT corpus. The evaluation of five baseline VAD systems on the QUT-NOISE-TIMIT corpus is conducted to validate the data and show that the variety of noise available will allow for better evaluation of VAD systems than existing approaches in the literature.
Resumo:
Extracellular matrix regulates many cellular processes likely to be important for development and regression of corpora lutea. Therefore, we identified the types and components of the extracellular matrix of the human corpus luteum at different stages of the menstrual cycle. Two different types of extracellular matrix were identified by electron microscopy; subendothelial basal laminas and an interstitial matrix located as aggregates at irregular intervals between the non-vascular cells. No basal laminas were associated with luteal cells. At all stages, collagen type IV α1 and laminins α5, β2 and γ1 were localized by immunohistochemistry to subendothelial basal laminas, and collagen type IV α1 and laminins α2, α5, β1 and β2 localized in the interstitial matrix. Laminin α4 and β1 chains occurred in the subendothelial basal lamina from mid-luteal stage to regression; at earlier stages, a punctate pattern of staining was observed. Therefore, human luteal subendothelial basal laminas potentially contain laminin 11 during early luteal development and, additionally, laminins 8, 9 and 10 at the mid-luteal phase. Laminin α1 and α3 chains were not detected in corpora lutea. Versican localized to the connective tissue extremities of the corpus luteum. Thus, during the formation of the human corpus luteum, remodelling of extracellular matrix does not result in basal laminas as present in the adrenal cortex or ovarian follicle. Instead, novel aggregates of interstitial matrix of collagen and laminin are deposited within the luteal parenchyma, and it remains to be seen whether this matrix is important for maintaining the luteal cell phenotype.
Resumo:
In this paper, we describe a machine-translated parallel English corpus for the NTCIR Chinese, Japanese and Korean (CJK) Wikipedia collections. This document collection is named CJK2E Wikipedia XML corpus. The corpus could be used by the information retrieval research community and knowledge sharing in Wikipedia in many ways; for example, this corpus could be used for experimentations in cross-lingual information retrieval, cross-lingual link discovery, or omni-lingual information retrieval research. Furthermore, the translated CJK articles could be used to further expand the current coverage of the English Wikipedia.
Resumo:
Measures of semantic similarity between medical concepts are central to a number of techniques in medical informatics, including query expansion in medical information retrieval. Previous work has mainly considered thesaurus-based path measures of semantic similarity and has not compared different corpus-driven approaches in depth. We evaluate the effectiveness of eight common corpus-driven measures in capturing semantic relatedness and compare these against human judged concept pairs assessed by medical professionals. Our results show that certain corpus-driven measures correlate strongly (approx 0.8) with human judgements. An important finding is that performance was significantly affected by the choice of corpus used in priming the measure, i.e., used as evidence from which corpus-driven similarities are drawn. This paper provides guidelines for the implementation of semantic similarity measures for medical informatics and concludes with implications for medical information retrieval.
Resumo:
This paper evaluates the efficiency of a number of popular corpus-based distributional models in performing discovery on very large document sets, including online collections. Literature-based discovery is the process of identifying previously unknown connections from text, often published literature, that could lead to the development of new techniques or technologies. Literature-based discovery has attracted growing research interest ever since Swanson's serendipitous discovery of the therapeutic effects of fish oil on Raynaud's disease in 1986. The successful application of distributional models in automating the identification of indirect associations underpinning literature-based discovery has been heavily demonstrated in the medical domain. However, we wish to investigate the computational complexity of distributional models for literature-based discovery on much larger document collections, as they may provide computationally tractable solutions to tasks including, predicting future disruptive innovations. In this paper we perform a computational complexity analysis on four successful corpus-based distributional models to evaluate their fit for such tasks. Our results indicate that corpus-based distributional models that store their representations in fixed dimensions provide superior efficiency on literature-based discovery tasks.
Resumo:
Although LH is essential for survival and function of the corpus luteum (CL) in higher primates, luteolysis occurs during nonfertile cycles without a discernible decrease in circulating LH levels. Using genome-wide expression analysis, several experiments were performed to examine the processes of luteolysis and rescue of luteal function in monkeys. Induced luteolysis with GnRH receptor antagonist (Cetrorelix) resulted in differential regulation of 3949 genes, whereas replacement with exogenous LH (Cetrorelix plus LH) led to regulation of 4434 genes (1563 down-regulation and 2871 up-regulation). A model system for prostaglandin (PG) F-2 alpha-induced luteolysis in the monkey was standardized and demonstrated that PGF(2 alpha) regulated expression of 2290 genes in the CL. Analysis of the LH-regulated luteal transcriptome revealed that 120 genes were regulated in an antagonistic fashion by PGF(2 alpha). Based on the microarray data, 25 genes were selected for validation by real-time RT-PCR analysis, and expression of these genes was also examined in the CL throughout the luteal phase and from monkeys treated with human chorionic gonadotropin (hCG) to mimic early pregnancy. The results indicated changes in expression of genes favorable to PGF(2 alpha) action during the late to very late luteal phase, and expressions of many of these genes were regulated in an opposite manner by exogenous hCG treatment. Collectively, the findings suggest that curtailment of expression of downstream LH-target genes possibly through PGF(2 alpha) action on the CL is among the mechanisms underlying cross talk between the luteotropic and luteolytic signaling pathways that result in the cessation of luteal function, but hCG is likely to abrogate the PGF(2 alpha)-responsive gene expression changes resulting in luteal rescue crucial for the maintenance of early pregnancy. (Endocrinology 150: 1473-1484, 2009)
Resumo:
The objective of the current study was to investigate the mechanism by which the corpus luteum (CL) of the monkey undergoes desensitization to luteinizing hormone following exposure to increasing concentration of human chorionic gonadotrophin (hCG) as it occurs in pregnancy. Female bonnet monkeys were injected (im) increasing doses of hCG or dghCG beginning from day 6 or 12 of the luteal phase for either 10 or 4 or 2 days. The day of oestrogen surge was considered as day '0' of luteal phase. Luteal cells obtained from CL of these animals were incubated with hCG (2 and 200 pg/ml) or dbcAMP (2.5, 25 and 100 mu M) for 3 h at 37 degrees C and progesterone secreted was estimated. Corpora lutea of normal cycling monkeys on day 10/16/22 of the luteal phase were used as controls, In addition the in vivo response to CG and deglycosylated hCG (dghCG) was assessed by determining serum steroid profiles following their administration. hCG (from 15-90 IU) but not dghCG (15-90 IU) treatment in vivo significantly (P < 0.05) elevated serum progesterone and oestradiol levels. Serum progesterone, however, could not be maintained at a elevated level by continuous treatment with hCG (from day 6-15), the progesterone level declining beyond day 13 of luteal phase. Administering low doses of hCG (15-90 IU/day) from day 6-9 or high doses (600 IU/day) on days 8 and 9 of the luteal phase resulted in significant increase (about 10-fold over corresponding control P < 0.005) in the ability of luteal cells to synthesize progesterone (incubated controls) in vitro. The luteal cells of the treated animals responded to dbcAMP (P < 0.05) but not to hCG added in vitro, The in vitro response of luteal cells to added hCG was inhibited by 0, 50 and 100% if the animals were injected with low (15-90 IU) or medium (100 IU) between day 6-9 of luteal phase and high (600 IU on day 8 and 9 of luteal phase) doses of dghCG respectively; such treatment had no effect on responsivity of the cells to dbcAMP, The luteal cell responsiveness to dbcAMP in vitro was also blocked if hCG was administered for 10 days beginning day 6 of the luteal phase. Though short term hCG treatment during late luteal phase (from days 12-15) had no effect on luteal function, 10 day treatment beginning day 12 of luteal phase resulted in regain of in vitro responsiveness to both hCG (P < 0.05) and dbcAMP (P < 0.05) suggesting that luteal rescue can occur even at this late stage. In conclusion, desensitization of the CL to hCG appears to be governed by the dose/period for which it is exposed to hCG/dghCG. That desensitization is due to receptor occupancy is brought out by the fact that (i) this can be achieved by giving a larger dose of hCG over a 2 day period instead of a lower dose of the hormone for a longer (4 to 10 days) period and (ii) the effect can largely be reproduced by using dghCG instead of hCG to block the receptor sites. It appears that to achieve desensitization to dbcAMP also it is necessary to expose the luteal cell to relatively high dose of hCG for more than 4 days.
Resumo:
The ontogeny of muscarinic receptors was studied in human fetal striatum, brainstem, and cerebellum to investigate general principles of synaptogenesis as well as the physiological balance between various chemical synapses during development in a given region of the brain. [3H]Quinuclidinyl benzilate ([-'H]QNB) binding was assayed in total particulate fraction (TPF) from various parts of brain. In the corpus striatum, QNB binding sites are present at 16 weeks of gestation (average concentration 180 fmol/mg protein of TPF), slowly increase up to 24 weeks (average concentration 217 fmol/mg protein), and rapidly increase during the third trimester to 480 fmol/mg protein of TPF. In contrast, dopaminergic receptors exist as two subpopulations. one with low affinity and the other with high affinity up to the 24th week of gestation; all of them acquire the highaffinity characteristic during the third trimester. In brainstem, the muscarinic receptors show maximum concentration by 16 weeks of age (360 fmolimg protein of TPF). Subsequently the muscarinic receptor concentration shows a gradual decline in the brainstem. In cerebellum, except for a slight increase at 24 weeks (average concentration 90 fmol/mg protein of TPF), the receptor concentration remained nearly constant at about 60-70 fmolimg protein of TPF throughout fetal life. This study demonstrates that the ontogeny of muscarinic receptors varies among the different regions, and the patterns observed suggest that receptor formation occurs principally in the third trimester. Also noteworthy is the finding that the QNB binding sites decreased in all regions of the human brain during adult life. Key Words: Cholinergic muscarinic receptors-Quinuclidinyl benzilate-Corpus striaturn-Brainstem-Cerebellum. Ravikumar B. V. and Sastry P. S. Cholinergic muscarinic receptors in human fetal brain: Ontogeny of [3H]quinuclidinyl benzilate binding sites in corpus striatum, brainstem, and cerebellum. J. Neurochem. 45, 1948- 1950 (1985).