921 resultados para cross-language information retrieval


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Informação - FFC

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Informação - FFC

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The classification of texts has become a major endeavor with so much electronic material available, for it is an essential task in several applications, including search engines and information retrieval. There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic field, whereas in author recognition stylistic features should be considered. In this study, we introduce ways to classify texts employing concepts of complex networks, which may be able to capture syntactic, semantic and even pragmatic features. The interplay between various metrics of the complex networks is analyzed with three applications, namely identification of machine translation (MT) systems, evaluation of quality of machine translated texts and authorship recognition. We shall show that topological features of the networks representing texts can enhance the ability to identify MT systems in particular cases. For evaluating the quality of MT texts, on the other hand, high correlation was obtained with methods capable of capturing the semantics. This was expected because the golden standards used are themselves based on word co-occurrence. Notwithstanding, the Katz similarity, which involves semantic and structure in the comparison of texts, achieved the highest correlation with the NIST measurement, indicating that in some cases the combination of both approaches can improve the ability to quantify quality in MT. In authorship recognition, again the topological features were relevant in some contexts, though for the books and authors analyzed good results were obtained with semantic features as well. Because hybrid approaches encompassing semantic and topological features have not been extensively used, we believe that the methodology proposed here may be useful to enhance text classification considerably, as it combines well-established strategies. (c) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In questo lavoro si introducono i concetti di base di Natural Language Processing, soffermandosi su Information Extraction e analizzandone gli ambiti applicativi, le attività principali e la differenza rispetto a Information Retrieval. Successivamente si analizza il processo di Named Entity Recognition, focalizzando l’attenzione sulle principali problematiche di annotazione di testi e sui metodi per la valutazione della qualità dell’estrazione di entità. Infine si fornisce una panoramica della piattaforma software open-source di language processing GATE/ANNIE, descrivendone l’architettura e i suoi componenti principali, con approfondimenti sugli strumenti che GATE offre per l'approccio rule-based a Named Entity Recognition.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is a report about the FuXML project carried out at the FernUniversität Hagen. FuXML is a Learning Content Management System (LCMS) aimed at providing a practical and efficient solution for the issues attributed to authoring, maintenance, production and distribution of online and offline distance learning material. The paper presents the environment for which the system was conceived and describes the technical realisation. We discuss the reasons for specific implementation decisions and also address the integration of the system within the organisational and technical infrastructure of the university.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Web-scale knowledge retrieval can be enabled by distributed information retrieval, clustering Web clients to a large-scale computing infrastructure for knowledge discovery from Web documents. Based on this infrastructure, we propose to apply semiotic (i.e., sub-syntactical) and inductive (i.e., probabilistic) methods for inferring concept associations in human knowledge. These associations can be combined to form a fuzzy (i.e.,gradual) semantic net representing a map of the knowledge in the Web. Thus, we propose to provide interactive visualizations of these cognitive concept maps to end users, who can browse and search the Web in a human-oriented, visual, and associative interface.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To determine whether algorithms developed for the World Wide Web can be applied to the biomedical literature in order to identify articles that are important as well as relevant. DESIGN AND MEASUREMENTS A direct comparison of eight algorithms: simple PubMed queries, clinical queries (sensitive and specific versions), vector cosine comparison, citation count, journal impact factor, PageRank, and machine learning based on polynomial support vector machines. The objective was to prioritize important articles, defined as being included in a pre-existing bibliography of important literature in surgical oncology. RESULTS Citation-based algorithms were more effective than noncitation-based algorithms at identifying important articles. The most effective strategies were simple citation count and PageRank, which on average identified over six important articles in the first 100 results compared to 0.85 for the best noncitation-based algorithm (p < 0.001). The authors saw similar differences between citation-based and noncitation-based algorithms at 10, 20, 50, 200, 500, and 1,000 results (p < 0.001). Citation lag affects performance of PageRank more than simple citation count. However, in spite of citation lag, citation-based algorithms remain more effective than noncitation-based algorithms. CONCLUSION Algorithms that have proved successful on the World Wide Web can be applied to biomedical information retrieval. Citation-based algorithms can help identify important articles within large sets of relevant results. Further studies are needed to determine whether citation-based algorithms can effectively meet actual user information needs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Molecular beacons (MBs) are stem-loop DNA probes used for identifying and reporting the presence and localization of nucleic acid targets in vitro and in vivo via target-dependent dequenching of fluorescence. A drawback of conventional MB design is present in the stem sequence that is necessary to keep the MBs in a closed conformation in the absence of a target, but that can participate in target binding in the open (target-on) conformation, giving rise to the possibility of false-positive results. In order to circumvent these problems, we designed MBs in which the stem was replaced by an orthogonal DNA analog that does not cross-pair with natural nucleic acids. Homo-DNA seemed to be specially suited, as it forms stable adenine-adenine base pairs of the reversed Hoogsteen type, potentially reducing the number of necessary building blocks for stem design to one. We found that MBs in which the stem part was replaced by homo-adenylate residues can easily be synthesized using conventional automated DNA synthesis. As conventional MBs, such hybrid MBs show cooperative hairpin to coil transitions in the absence of a DNA target, indicating stable homo-DNA base pair formation in the closed conformation. Furthermore, our results show that the homo-adenylate stem is excluded from DNA target binding, which leads to a significant increase in target binding selectivity

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Software corpora facilitate reproducibility of analyses, however, static analysis for an entire corpus still requires considerable effort, often duplicated unnecessarily by multiple users. Moreover, most corpora are designed for single languages increasing the effort for cross-language analysis. To address these aspects we propose Pangea, an infrastructure allowing fast development of static analyses on multi-language corpora. Pangea uses language-independent meta-models stored as object model snapshots that can be directly loaded into memory and queried without any parsing overhead. To reduce the effort of performing static analyses, Pangea provides out-of-the box support for: creating and refining analyses in a dedicated environment, deploying an analysis on an entire corpus, using a runner that supports parallel execution, and exporting results in various formats. In this tool demonstration we introduce Pangea and provide several usage scenarios that illustrate how it reduces the cost of analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

As recent research documents, there has been an impressive effort of studying the entrepreneurial orientation construct and its nomological role. However, most research has been very context specific and based on the analyses of cross-sectional information. We study causal performance effects from entrepreneurial orientation and its key dimensions in two economic contexts–developed and emerging markets. Gathering data on a sample of 94 firms in developed market context and 108 in emerging market context at two time-points we explore our hypotheses. The results suggest that in a developed economy entrepreneurial orientation has a positive impact on firm performance, whereas in the emerging market context this effect is negative. Furthermore, we assess the contribution of each dimension to the aggregate construct and reveal the importance of risk-taking in both contexts. Finally, we highlight the role of environmental dynamism and explain its varying effect

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genome-wide DNA remodelling in the ciliate Paramecium is ensured by RNA-mediated trans-nuclear crosstalk between the germline and the somatic genomes during sexual development. The rearrangements include elimination of transposable elements, minisatellites and tens of thousands non-coding elements called internally eliminated sequences (IESs). The trans-nuclear genome comparison process employs a distinct class of germline small RNAs (scnRNAs) that are compared against the parental somatic genome to select the germline-specific subset of scnRNAs that subsequently target DNA elimination in the progeny genome. Only a handful of proteins involved in this process have been identified so far and the mechanism of DNA targeting is unknown. Here we describe chromatin assembly factor-1-like protein (PtCAF-1), which we show is required for the survival of sexual progeny and localizes first in the parental and later in the newly developing macronucleus. Gene silencing shows that PtCAF-1 is required for the elimination of transposable elements and a subset of IESs. PTCAF-1 depletion also impairs the selection of germline-specific scnRNAs during development. We identify specific histone modifications appearing during Paramecium development which are strongly reduced in PTCAF-1 depleted cells. Our results demonstrate the importance of PtCAF-1 for the epigenetic trans-nuclear cross-talk mechanism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In Xenopus oocytes in vitro transcribed mouse U7 RNA is assembled into small nuclear ribonucleoproteins (snRNPs) that are functional in histone RNA 3' processing. If the special Sm binding site of U7 (AAUUUGUCUAG, U7 Sm WT) is converted into the canonical Sm sequence derived from the major snRNAs (AAUUUUUGGAG, U7 Sm OPT) the RNA assembles into a particle which accumulates more efficiently in the nucleus, but which is non-functional. U7 RNA with a heavily mutated Sm binding site (AACGCGUCAUG, U7 Sm MUT) is deficient in nuclear accumulation and function. By UV cross-linking U7 Sm WT RNA can be linked to three proteins, i.e. the common snRNP proteins G and B/B' and an apparently U7-specific protein of 40 kDa. As a result of altering the Sm binding site, U7 Sm OPT RNA cannot be cross-linked to the 40 kDa protein and no cross-links are obtained with U7 Sm MUT RNA. The fact that the Sm site also interacts with at least one U7-specific protein is so far unique to U7 RNA and may provide an explanation for the atypical sequence of this site. All described RNA-protein interactions, including that with the 40 kDa protein, already occur in the cytoplasm. An additional cytoplasmic photoadduct obtained with U7 Sm WT and U7 Sm OPT, but not U7 Sm MUT, RNAs is indicative of a protein of 60-80 kDa. The m7G cap structure of U7 Sm WT and U7 Sm OPT RNA becomes hypermethylated. However, the 3mG cap enhances, but is not required for, nuclear accumulation. Finally, U7 Sm WT RNA is functional in histone RNA processing even when bearing an ApppG cap.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Early Employee Assistance Programs (EAPs) had their origin in humanitarian motives, and there was little concern for their cost/benefit ratios; however, as some programs began accumulating data and analyzing it over time, even with single variables such as absenteeism, it became apparent that the humanitarian reasons for a program could be reinforced by cost savings particularly when the existence of the program was subject to justification.^ Today there is general agreement that cost/benefit analyses of EAPs are desirable, but the specific models for such analyses, particularly those making use of sophisticated but simple computer based data management systems, are few.^ The purpose of this research and development project was to develop a method, a design, and a prototype for gathering managing and presenting information about EAPS. This scheme provides information retrieval and analyses relevant to such aspects of EAP operations as: (1) EAP personnel activities, (2) Supervisory training effectiveness, (3) Client population demographics, (4) Assessment and Referral Effectiveness, (5) Treatment network efficacy, (6) Economic worth of the EAP.^ This scheme has been implemented and made operational at The University of Texas Employee Assistance Programs for more than three years.^ Application of the scheme in the various programs has defined certain variables which remained necessary in all programs. Depending on the degree of aggressiveness for data acquisition maintained by program personnel, other program specific variables are also defined. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta tesina indaga en el ámbito de las Tecnologías de la Información sobre los diferentes desarrollos realizados en la interpretación automática de la semántica de textos y su relación con los Sistemas de Recuperación de Información. Partiendo de una revisión bibliográfica selectiva se busca sistematizar la documentación estableciendo de manera evolutiva los principales antecedentes y técnicas, sintetizando los conceptos fundamentales y resaltando los aspectos que justifican la elección de unos u otros procedimientos en la resolución de los problemas.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta tesina indaga en el ámbito de las Tecnologías de la Información sobre los diferentes desarrollos realizados en la interpretación automática de la semántica de textos y su relación con los Sistemas de Recuperación de Información. Partiendo de una revisión bibliográfica selectiva se busca sistematizar la documentación estableciendo de manera evolutiva los principales antecedentes y técnicas, sintetizando los conceptos fundamentales y resaltando los aspectos que justifican la elección de unos u otros procedimientos en la resolución de los problemas.