985 resultados para Cross-Lingual, CallHome, Indonesian, Recognition, ASR
Resumo:
This paper presents the overall methodology that has been used to encode both the Brazilian Portuguese WordNet (WordNet.Br) standard language-independent conceptual-semantic relations (hyponymy, co-hyponymy, meronymy, cause, and entailment) and the so-called cross-lingual conceptual-semantic relations between different wordnets. Accordingly, after contextualizing the project and outlining the current lexical database structure and statistics, it describes the WordNet.Br editing GUI that was designed to aid the linguist in carrying out the tasks of building synsets, selecting sample sentences from corpora, writing synset concept glosses, and encoding both language-independent conceptual-semantic relations and cross-lingual conceptual-semantic relations between WordNet.Br and Princeton WordNet © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
O processamento de voz tornou-se uma tecnologia cada vez mais baseada na modelagem automática de vasta quantidade de dados. Desta forma, o sucesso das pesquisas nesta área está diretamente ligado a existência de corpora de domínio público e outros recursos específicos, tal como um dicionário fonético. No Brasil, ao contrário do que acontece para a língua inglesa, por exemplo, não existe atualmente em domínio público um sistema de Reconhecimento Automático de Voz (RAV) para o Português Brasileiro com suporte a grandes vocabulários. Frente a este cenário, o trabalho tem como principal objetivo discutir esforços dentro da iniciativa FalaBrasil [1], criada pelo Laboratório de Processamento de Sinais (LaPS) da UFPA, apresentando pesquisas e softwares na área de RAV para o Português do Brasil. Mais especificamente, o presente trabalho discute a implementação de um sistema de reconhecimento de voz com suporte a grandes vocabulários para o Português do Brasil, utilizando a ferramenta HTK baseada em modelo oculto de Markov (HMM) e a criação de um módulo de conversão grafema-fone, utilizando técnicas de aprendizado de máquina.
Resumo:
Para compor um sistema de Reconhecimento Automático de Voz, pode ser utilizada uma tarefa chamada Classificação Fonética, onde a partir de uma amostra de voz decide-se qual fonema foi emitido por um interlocutor. Para facilitar a classificação e realçar as características mais marcantes dos fonemas, normalmente, as amostras de voz são pré- processadas através de um fronl-en'L Um fron:-end, geralmente, extrai um conjunto de parâmetros para cada amostra de voz. Após este processamento, estes parâmetros são insendos em um algoritmo classificador que (já devidamente treinado) procurará decidir qual o fonema emitido. Existe uma tendência de que quanto maior a quantidade de parâmetros utilizados no sistema, melhor será a taxa de acertos na classificação. A contrapartida para esta tendência é o maior custo computacional envolvido. A técnica de Seleção de Parâmetros tem como função mostrar quais os parâmetros mais relevantes (ou mais utilizados) em uma tarefa de classificação, possibilitando, assim, descobrir quais os parâmetros redundantes, que trazem pouca (ou nenhuma) contribuição à tarefa de classificação. A proposta deste trabalho é aplicar o classificador SVM à classificação fonética, utilizando a base de dados TIMIT, e descobrir os parâmetros mais relevantes na classificação, aplicando a técnica Boosting de Seleção de Parâmetros.
Resumo:
PURPOSE: Currently, in forensic medicine cross-sectional imaging gains recognition and a wide use as a non-invasive examination approach. Today, computed tomography (CT) or magnetic resonance imaging that are available for patients are unable to provide tissue information on the cellular level in a non-invasive manner and also diatom detection, DNA, bacteriological, chemical toxicological and other specific tissue analyses are impossible using radiology. We hypothesised that post-mortem minimally invasive tissue sampling using needle biopsies under CT guidance might significantly enhance the potential of virtual autopsy. The purpose of this study was to test the use of a clinically approved biopsy needle for minimally invasive post-mortem sampling of tissue specimens under CT guidance. MATERIAL AND METHODS: ACN III biopsy core needles 14 gauge x 160 mm with automatic pistol device were used on three bodies dedicated to research from the local anatomical institute. Tissue probes from the brain, heart, lung, liver, spleen, kidney and muscle tissue were obtained under CT fluoroscopy. RESULTS: CT fluoroscopy enabled accurate placement of the needle within the organs and tissues. The needles allowed for sampling of tissue probes with a mean width of 1.7 mm (range 1.2-2 mm) and the maximal length of 20 mm at all locations. The obtained tissue specimens were of sufficient size and adequate quality for histological analysis. CONCLUSION: Our results indicate that, similar to the clinical experience but in many more organs, the tissue specimens obtained using the clinically approved biopsy needle are of a sufficient size and adequate quality for a histological examination. We suggest that post-mortem biopsy using the ACN III needle under CT guidance may become a reliable method for targeted sampling of tissue probes of the body.
Resumo:
This study addresses the questions of whether the frequency of generation and in vivo cross-reactivity of highly immunogenic tumor clones induced in a single parental murine fibrosarcoma cell line MCA-F is more closely related to the agent used to induce the Imm$\sp{+}$ clone or whether these characteristics are independent of the agents used. These questions were addressed by treating the parental tumor cell line MCA-F with UV-B radiation (UV-B), 1-methyl-3-nitro-1-nitrosoguanidine (MNNG), or 5-aza-2$\sp\prime$-deoxycytidine (5-azaCdR). The frequency of Imm$\sp{+}$ variant generation was similarly high for the three different agents, suggesting that the frequency of Imm$\sp{+}$ generation was related more closely to the cell line than to the inducing agent used. Cross-reactivity was tested with two Imm$\sp{+}$ clones from each treatment group in a modified immunoprotection assay that selectively engendered antivariant, but not antiparental immunity. Under these conditions each clone, except one, immunized against itself. The MNNG-induced clones engendered stronger antivariant immunity but a weaker variant cross-reactive immunity could also be detected.^ This study also characterized the lymphocyte populations responsible for antivariant and antiparental immunity in vivo. Using the local adoptive transfer assay (LATA) and antibody plus complement depletion of T-cell subsets, we showed that immunity induced by the Imm$\sp{+}$ variants against the parent MCA-F was transferred by the Thy1.2$\sp{+}$, L3T4a$\sp{+}$, Lyt2.1$\sp{-}$ (CD4$\sp{+}$) population, without an apparent contribution by Thy1.2$\sp{+}$, L3T4a$\sp{-}$, Lyt2.1$\sp{+}$ (CD8$\sp{+}$) cells. A role for Lyt2.1$\sp{+}$T lymphocytes in antivariant, but not antiparent immunity was supported by the results of LATA and CTL assays. Immunization with low numbers of viable Imm$\sp{+}$ cells, or with high numbers of non viable Imm$\sp{+}$ cells engendered only antivariant immunity without parental cross-protection. The associative recognition of parental antigens and variant neoantigens resulting in strong antiparent immunity was investigated using somatic cells hybrids of Imm$\sp{+}$ variants of MCA-F and an antigenically distinct tumor MCA-D. An unexpected result of these latter experiments was the expression of a unique tumor-specific antigen by the hybrid cells. These studies demonstrate that the parental tumor-specific antigen and the variant neoantigen must be coexpressed on the cell surface to engender parental cross-protective immunity. (Abstract shortened with permission of author.) ^
Resumo:
This paper presents the 2005 Miracle’s team approach to the Ad-Hoc Information Retrieval tasks. The goal for the experiments this year was twofold: to continue testing the effect of combination approaches on information retrieval tasks, and improving our basic processing and indexing tools, adapting them to new languages with strange encoding schemes. The starting point was a set of basic components: stemming, transforming, filtering, proper nouns extraction, paragraph extraction, and pseudo-relevance feedback. Some of these basic components were used in different combinations and order of application for document indexing and for query processing. Second-order combinations were also tested, by averaging or selective combination of the documents retrieved by different approaches for a particular query. In the multilingual track, we concentrated our work on the merging process of the results of monolingual runs to get the overall multilingual result, relying on available translations. In both cross-lingual tracks, we have used available translation resources, and in some cases we have used a combination approach.
Resumo:
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in different natural languages. Indeed, although the Web of Data can contain any kind of information in any language, it still lacks explicit mechanisms to automatically reconcile such information when it is expressed in different languages. This leads to situations in which data expressed in a certain language is not easily accessible to speakers of other languages. The Web of Data shows the potential for being extended to a truly multilingual web as vocabularies and data can be published in a language-independent fashion, while associated language-dependent (linguistic) information supporting the access across languages can be stored separately. In this sense, the multilingual Web of Data can be realized in our view as a layer of services and resources on top of the existing Linked Data infrastructure adding i) linguistic information for data and vocabularies in different languages, ii) mappings between data with labels in different languages, and iii) services to dynamically access and traverse Linked Data across different languages. In this article we present this vision of a multilingual Web of Data. We discuss challenges that need to be addressed to make this vision come true and discuss the role that techniques such as ontology localization, ontology mapping, and cross-lingual ontology-based information access and presentation will play in achieving this. Further, we propose an initial architecture and describe a roadmap that can provide a basis for the implementation of this vision.
Resumo:
The present is marked by the availability of large volumes of heterogeneous data, whose management is extremely complex. While the treatment of factual data has been widely studied, the processing of subjective information still poses important challenges. This is especially true in tasks that combine Opinion Analysis with other challenges, such as the ones related to Question Answering. In this paper, we describe the different approaches we employed in the NTCIR 8 MOAT monolingual English (opinionatedness, relevance, answerness and polarity) and cross-lingual English-Chinese tasks, implemented in our OpAL system. The results obtained when using different settings of the system, as well as the error analysis performed after the competition, offered us some clear insights on the best combination of techniques, that balance between precision and recall. Contrary to our initial intuitions, we have also seen that the inclusion of specialized Natural Language Processing tools dealing with Temporality or Anaphora Resolution lowers the system performance, while the use of topic detection techniques using faceted search with Wikipedia and Latent Semantic Analysis leads to satisfactory system performance, both for the monolingual setting, as well as in a multilingual one.
Resumo:
En este trabajo presentamos unos resultados preliminares obtenidos mediante la aplicación de una nueva técnica de construcción de grafos semánticos a la tarea de desambiguación del sentido de las palabras en un entorno multilingüe. Gracias al uso de esta técnica no supervisada, inducimos los sentidos asociados a las traducciones de la palabra ambigua considerada en la lengua destino. Utilizamos las traducciones de las palabras del contexto de la palabra ambigua en la lengua origen para seleccionar el sentido más probable de la traducción. El sistema ha sido evaluado sobre la colección de datos de una tarea de desambiguación multilingüe que se propuso en la competición SemEval-2010, consiguiendo superar los resultados de todos los sistemas no supervisados que participaron en aquella tarea.
Resumo:
Document classification is a supervised machine learning process, where predefined category labels are assigned to documents based on the hypothesis derived from training set of labelled documents. Documents cannot be directly interpreted by a computer system unless they have been modelled as a collection of computable features. Rogati and Yang [M. Rogati and Y. Yang, Resource selection for domain-specific cross-lingual IR, in SIGIR 2004: Proceedings of the 27th annual international conference on Research and Development in Information Retrieval, ACM Press, Sheffied: United Kingdom, pp. 154-161.] pointed out that the effectiveness of document classification system may vary in different domains. This implies that the quality of document model contributes to the effectiveness of document classification. Conventionally, model evaluation is accomplished by comparing the effectiveness scores of classifiers on model candidates. However, this kind of evaluation methods may encounter either under-fitting or over-fitting problems, because the effectiveness scores are restricted by the learning capacities of classifiers. We propose a model fitness evaluation method to determine whether a model is sufficient to distinguish positive and negative instances while still competent to provide satisfactory effectiveness with a small feature subset. Our experiments demonstrated how the fitness of models are assessed. The results of our work contribute to the researches of feature selection, dimensionality reduction and document classification.
Resumo:
False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available.
Resumo:
Recent empirical studies about the neurological executive nature of reading in bilinguals differ in their evaluations of the degree of selective manifestation in lexical access as implicated by data from early and late reading measures in the eye-tracking paradigm. Currently two scenarios are plausible: (1) Lexical access in reading is fundamentally language non-selective and top-down effects from semantic context can influence the degree of selectivity in lexical access; (2) Cross-lingual lexical activation is actuated via bottom-up processes without being affected by top-down effects from sentence context. In an attempt to test these hypotheses empirically, this study analyzed reader-text events arising when cognate facilitation and semantic constraint interact in a 22 factorially designed experiment tracking the eye movements of 26 Swedish-English bilinguals reading in their L2. Stimulus conditions consisted of high- and low-constraint sentences embedded with either a cognate or a non-cognate control word. The results showed clear signs of cognate facilitation in both early and late reading measures and in either sentence conditions. This evidence in favour of the non-selective hypothesis indicates that the manifestation of non-selective lexical access in reading is not constrained by top-down effects from semantic context.
Resumo:
A regional cross-calibration between the first Delay Doppler altimetry dataset from Cryosat-2 and a retracked Envisat dataset is here presented, in order to test the benefits of the Delay-Doppler processing and to expand the Envisat time series in the coastal ocean. The Indonesian Seas are chosen for the calibration, since the availability of altimetry data in this region is particularly beneficial due to the lack of in-situ measurements and its importance for global ocean circulation. The Envisat data in the region are retracked with the Adaptive Leading Edge Subwaveform (ALES) Retracker, which has been previously validated and applied successfully to coastal sea level research. The study demonstrates that CryoSat-2 is able to decrease the 1-Hz noise of sea level estimations by 0.3 cm within 50 km of the coast, when compared to the ALES-reprocessed Envisat dataset. It also shows that Envisat can be confidently used for detailed oceanographic research after the orbit change of October 2010. Cross-calibration at the crossover points indicates that in the region of study a sea state bias correction equal to 5% of the significant wave height is an acceptable approximation for Delay-Doppler altimetry. The analysis of the joint sea level time series reveals the geographic extent of the semiannual signal caused by Kelvin waves during the monsoon transitions, the larger amplitudes of the annual signal due to the Java Coastal Current and the impact of the strong La Nina event of 2010 on rising sea level trends.
Resumo:
A regional cross-calibration between the first Delay Doppler altimetry dataset from Cryosat-2 and a retracked Envisat dataset is here presented, in order to test the benefits of the Delay-Doppler processing and to expand the Envisat time series in the coastal ocean. The Indonesian Seas are chosen for the calibration, since the availability of altimetry data in this region is particularly beneficial due to the lack of in-situ measurements and its importance for global ocean circulation. The Envisat data in the region are retracked with the Adaptive Leading Edge Subwaveform (ALES) Retracker, which has been previously validated and applied successfully to coastal sea level research. The study demonstrates that CryoSat-2 is able to decrease the 1-Hz noise of sea level estimations by 0.3 cm within 50 km of the coast, when compared to the ALES-reprocessed Envisat dataset. It also shows that Envisat can be confidently used for detailed oceanographic research after the orbit change of October 2010. Cross-calibration at the crossover points indicates that in the region of study a sea state bias correction equal to 5% of the significant wave height is an acceptable approximation for Delay-Doppler altimetry. The analysis of the joint sea level time series reveals the geographic extent of the semiannual signal caused by Kelvin waves during the monsoon transitions, the larger amplitudes of the annual signal due to the Java Coastal Current and the impact of the strong La Nina event of 2010 on rising sea level trends.
Resumo:
The cross-recognition of peptides by cytotoxic T lymphocytes is a key element in immunology and in particular in peptide based immunotherapy. Here we develop three-dimensional (3D) quantitative structure-activity relationships (QSARs) to predict cross-recognition by Melan-A-specific cytotoxic T lymphocytes of peptides bound to HLA A*0201 (hereafter referred to as HLA A2). First, we predict the structure of a set of self- and pathogen-derived peptides bound to HLA A2 using a previously developed ab initio structure prediction approach [Fagerberg et al., J. Mol. Biol., 521-46 (2006)]. Second, shape and electrostatic energy calculations are performed on a 3D grid to produce similarity matrices which are combined with a genetic neural network method [So et al., J. Med. Chem., 4347-59 (1997)] to generate 3D-QSAR models. The models are extensively validated using several different approaches. During the model generation, the leave-one-out cross-validated correlation coefficient (q (2)) is used as the fitness criterion and all obtained models are evaluated based on their q (2) values. Moreover, the best model obtained for a partitioned data set is evaluated by its correlation coefficient (r = 0.92 for the external test set). The physical relevance of all models is tested using a functional dependence analysis and the robustness of the models obtained for the entire data set is confirmed using y-randomization. Finally, the validated models are tested for their utility in the setting of rational peptide design: their ability to discriminate between peptides that only contain side chain substitutions in a single secondary anchor position is evaluated. In addition, the predicted cross-recognition of the mono-substituted peptides is confirmed experimentally in chromium-release assays. These results underline the utility of 3D-QSARs in peptide mimetic design and suggest that the properties of the unbound epitope are sufficient to capture most of the information to determine the cross-recognition.