891 resultados para 080704 Information Retrieval and Web Search
Resumo:
This paper discusses an document discovery tool based on formal concept analysis. The program allows users to navigate email using a visual lattice metaphor rather than a tree. It implements a virtual file structure over email where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in email discovery. The system described provides more flexibility in retrieving stored emails than what is normally available in email clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems.
Resumo:
"August 1997."
Resumo:
Mode of access: Internet.
Resumo:
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org).
Resumo:
Le présent mémoire cherche à comprendre et à cerner le lien entre la stratégie de recherche d’information par le journaliste sur le web et les exigences de sa profession. Il vise à appréhender les précautions que prend le journaliste lors de sa recherche d’information sur le web en rapport avec les contraintes que lui imposent les règles de sa profession pour assurer la qualité des sources d’informations qu’il exploite. Nous avons examiné cette problématique en choisissant comme cadre d’étude Radio-Canada où nous avons rencontré quelques journalistes. Ceux-ci ont été suivis en situation de recherche d’information puis questionnés sur leurs expériences de recherche. L’arrivée d’internet et la révolution technologique qui en a découlé ont profondément bouleversé les pratiques journalistiques. La recherche d’information représente ainsi une zone importante de cette mutation des pratiques. Cette transformation amène surtout à s’interroger sur la façon dont la nouvelle façon de rechercher les sources d’information influence le travail du journaliste, et surtout les balises que se donne celui-ci pour résister aux pièges découlant de sa nouvelle méthode de travail.
Resumo:
Aiming to address requirements concerning integration of services in the context of ?big data?, this paper presents an innovative approach that (i) ensures a flexible, adaptable and scalable information and computation infrastructure, and (ii) exploits the competences of stakeholders and information workers to meaningfully confront information management issues such as information characterization, classification and interpretation, thus incorporating the underlying collective intelligence. Our approach pays much attention to the issues of usability and ease-of-use, not requiring any particular programming expertise from the end users. We report on a series of technical issues concerning the desired flexibility of the proposed integration framework and we provide related recommendations to developers of such solutions. Evaluation results are also discussed.
Resumo:
Early Employee Assistance Programs (EAPs) had their origin in humanitarian motives, and there was little concern for their cost/benefit ratios; however, as some programs began accumulating data and analyzing it over time, even with single variables such as absenteeism, it became apparent that the humanitarian reasons for a program could be reinforced by cost savings particularly when the existence of the program was subject to justification.^ Today there is general agreement that cost/benefit analyses of EAPs are desirable, but the specific models for such analyses, particularly those making use of sophisticated but simple computer based data management systems, are few.^ The purpose of this research and development project was to develop a method, a design, and a prototype for gathering managing and presenting information about EAPS. This scheme provides information retrieval and analyses relevant to such aspects of EAP operations as: (1) EAP personnel activities, (2) Supervisory training effectiveness, (3) Client population demographics, (4) Assessment and Referral Effectiveness, (5) Treatment network efficacy, (6) Economic worth of the EAP.^ This scheme has been implemented and made operational at The University of Texas Employee Assistance Programs for more than three years.^ Application of the scheme in the various programs has defined certain variables which remained necessary in all programs. Depending on the degree of aggressiveness for data acquisition maintained by program personnel, other program specific variables are also defined. ^
Resumo:
With the dramatic growth of text information, there is an increasing need for powerful text mining systems that can automatically discover useful knowledge from text. Text is generally associated with all kinds of contextual information. Those contexts can be explicit, such as the time and the location where a blog article is written, and the author(s) of a biomedical publication, or implicit, such as the positive or negative sentiment that an author had when she wrote a product review; there may also be complex context such as the social network of the authors. Many applications require analysis of topic patterns over different contexts. For instance, analysis of search logs in the context of the user can reveal how we can improve the quality of a search engine by optimizing the search results according to particular users; analysis of customer reviews in the context of positive and negative sentiments can help the user summarize public opinions about a product; analysis of blogs or scientific publications in the context of a social network can facilitate discovery of more meaningful topical communities. Since context information significantly affects the choices of topics and language made by authors, in general, it is very important to incorporate it into analyzing and mining text data. In general, modeling the context in text, discovering contextual patterns of language units and topics from text, a general task which we refer to as Contextual Text Mining, has widespread applications in text mining. In this thesis, we provide a novel and systematic study of contextual text mining, which is a new paradigm of text mining treating context information as the ``first-class citizen.'' We formally define the problem of contextual text mining and its basic tasks, and propose a general framework for contextual text mining based on generative modeling of text. This conceptual framework provides general guidance on text mining problems with context information and can be instantiated into many real tasks, including the general problem of contextual topic analysis. We formally present a functional framework for contextual topic analysis, with a general contextual topic model and its various versions, which can effectively solve the text mining problems in a lot of real world applications. We further introduce general components of contextual topic analysis, by adding priors to contextual topic models to incorporate prior knowledge, regularizing contextual topic models with dependency structure of context, and postprocessing contextual patterns to extract refined patterns. The refinements on the general contextual topic model naturally lead to a variety of probabilistic models which incorporate different types of context and various assumptions and constraints. These special versions of the contextual topic model are proved effective in a variety of real applications involving topics and explicit contexts, implicit contexts, and complex contexts. We then introduce a postprocessing procedure for contextual patterns, by generating meaningful labels for multinomial context models. This method provides a general way to interpret text mining results for real users. By applying contextual text mining in the ``context'' of other text information management tasks, including ad hoc text retrieval and web search, we further prove the effectiveness of contextual text mining techniques in a quantitative way with large scale datasets. The framework of contextual text mining not only unifies many explorations of text analysis with context information, but also opens up many new possibilities for future research directions in text mining.
Resumo:
Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terms using the knowledge base of DBpedia — a community effort to extract structured information from Wikipedia. Several approaches to extract semantic relatedness from Wikipedia using bag-of-words vector models are already available in the literature. The research presented in this paper explores a novel approach using paths on an ontological graph extracted from DBpedia. It is based on an algorithm for finding and weighting a collection of paths connecting concept nodes. This algorithm was implemented on a tool called Shakti that extract relevant ontological data for a given domain from DBpedia using its SPARQL endpoint. To validate the proposed approach Shakti was used to recommend web pages on a Portuguese social site related to alternative music and the results of that experiment are reported in this paper.
Resumo:
Web-scale knowledge retrieval can be enabled by distributed information retrieval, clustering Web clients to a large-scale computing infrastructure for knowledge discovery from Web documents. Based on this infrastructure, we propose to apply semiotic (i.e., sub-syntactical) and inductive (i.e., probabilistic) methods for inferring concept associations in human knowledge. These associations can be combined to form a fuzzy (i.e.,gradual) semantic net representing a map of the knowledge in the Web. Thus, we propose to provide interactive visualizations of these cognitive concept maps to end users, who can browse and search the Web in a human-oriented, visual, and associative interface.
Resumo:
In this paper we present a complete system for the treatment of both geographical and temporal dimensions in text and its application to information retrieval. This system has been evaluated in both the GeoTime task of the 8th and 9th NTCIR workshop in the years 2010 and 2011 respectively, making it possible to compare the system to contemporary approaches to the topic. In order to participate in this task we have added the temporal dimension to our GIR system. The system proposed here has a modular architecture in order to add or modify features. In the development of this system, we have followed a QA-based approach as well as multi-search engines to improve the system performance.
Resumo:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies