924 resultados para XML, Information, Retrieval, Query, Language
USO DE TEORIAS NO CAMPO DE SISTEMAS DE INFORMAÇÃO: MAPEAMENTO USANDO TÉCNICAS DE MINERAÇÃO DE TEXTOS
Resumo:
Esta dissertação visa apresentar o mapeamento do uso das teorias de sistemas de informações, usando técnicas de recuperação de informação e metodologias de mineração de dados e textos. As teorias abordadas foram Economia de Custos de Transações (Transactions Costs Economics TCE), Visão Baseada em Recursos da Firma (Resource-Based View-RBV) e Teoria Institucional (Institutional Theory-IT), sendo escolhidas por serem teorias de grande relevância para estudos de alocação de investimentos e implementação em sistemas de informação, tendo como base de dados o conteúdo textual (em inglês) do resumo e da revisão teórica dos artigos dos periódicos Information System Research (ISR), Management Information Systems Quarterly (MISQ) e Journal of Management Information Systems (JMIS) no período de 2000 a 2008. Os resultados advindos da técnica de mineração textual aliada à mineração de dados foram comparadas com a ferramenta de busca avançada EBSCO e demonstraram uma eficiência maior na identificação de conteúdo. Os artigos fundamentados nas três teorias representaram 10% do total de artigos dos três períodicos e o período mais profícuo de publicação foi o de 2001 e 2007.(AU)
Resumo:
This thesis initially presents an 'assay' of the literature pertaining to individual differences in human-computer interaction. A series of experiments is then reported, designed to investigate the association between a variety of individual characteristics and various computer task and interface factors. Predictor variables included age, computer expertise, and psychometric tests of spatial visualisation, spatial memory, logical reasoning, associative memory, and verbal ability. These were studied in relation to a variety of computer-based tacks, including: (1) word processing and its component elements; (ii) the location of target words within passages of text; (iii) the navigation of networks and menus; (iv) command generation using menus and command line interfaces; (v) the search and selection of icons and text labels; (vi) information retrieval. A measure of self-report workload was also included in several of these experiments. The main experimental findings included: (i) an interaction between spatial ability and the manipulation of semantic but not spatial interface content; (ii) verbal ability being only predictive of certain task components of word processing; (iii) age differences in word processing and information retrieval speed but not accuracy; (iv) evidence of compensatory strategies being employed by older subjects; (v) evidence of performance strategy differences which disadvantaged high spatial subjects in conditions of low spatial information content; (vi) interactive effects of associative memory, expertise and command strategy; (vii) an association between logical reasoning and word processing but not information retrieval; (viii) an interaction between expertise and cognitive demand; and (ix) a stronger association between cognitive ability and novice performance than expert performance.
Resumo:
Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.
Resumo:
The practice of evidence-based medicine involves consulting documents from repositories such as Scopus, PubMed, or the Cochrane Library. The most common approach for presenting retrieved documents is in the form of a list, with the assumption that the higher a document is on a list, the more relevant it is. Despite this list-based presentation, it is seldom studied how physicians perceive the importance of the order of documents presented in a list. This paper describes an empirical study that elicited and modeled physicians' preferences with regard to list-based results. Preferences were analyzed using a GRIP method that relies on pairwise comparisons of selected subsets of possible rank-ordered lists composed of 3 documents. The results allow us to draw conclusions regarding physicians' attitudes towards the importance of having documents ranked correctly on a result list, versus the importance of retrieving relevant but misplaced documents. Our findings should help developers of clinical information retrieval applications when deciding how retrieved documents should be presented and how performance of the application should be assessed. © 2012 Springer-Verlag Berlin Heidelberg.
Resumo:
Timeline generation is an important research task which can help users to have a quick understanding of the overall evolution of any given topic. It thus attracts much attention from research communities in recent years. Nevertheless, existing work on timeline generation often ignores an important factor, the attention attracted to topics of interest (hereafter termed "social attention"). Without taking into consideration social attention, the generated timelines may not reflect users' collective interests. In this paper, we study how to incorporate social attention in the generation of timeline summaries. In particular, for a given topic, we capture social attention by learning users' collective interests in the form of word distributions from Twitter, which are subsequently incorporated into a unified framework for timeline summary generation. We construct four evaluation sets over six diverse topics. We demonstrate that our proposed approach is able to generate both informative and interesting timelines. Our work sheds light on the feasibility of incorporating social attention into traditional text mining tasks. Copyright © 2013 ACM.
Resumo:
Two studies aiming to identify the nature and extent of problems that people have when completing theory of planned behaviour (TPB) questionnaires, using a cognitive interviewing approach are reported. Both studies required participants to 'think aloud' as they completed TPB questionnaires about: (a) increasing physical activity (six general public participants); and (b) binge drinking (13 students). Most people had no identifiable problems with the majority of questions. However, there were problems common to both studies, relating to information retrieval and to participants answering different questions from those intended by researchers. Questions about normative influence were particularly problematic. The standard procedure for developing TPB questionnaires may systematically produce problematic questions. Suggestions are made for improving this procedure. Copyright © 2007 SAGE Publications.
Resumo:
In this paper we study some of the characteristics of the art painting image color semantics. We analyze the color features of differ- ent artists and art movements. The analysis includes exploration of hue, saturation and luminance. We also use quartile’s analysis to obtain the dis- tribution of the dispersion of defined groups of paintings and measure the degree of purity for these groups. A special software system “Art Paint- ing Image Color Semantics” (APICSS) for image analysis and retrieval was created. The obtained result can be used for automatic classification of art paintings in image retrieval systems, where the indexing is based on color characteristics.
Resumo:
In this paper we present algorithms which work on pairs of 0,1- matrices which multiply again a matrix of zero and one entries. When applied over a pair, the algorithms change the number of non-zero entries present in the matrices, meanwhile their product remains unchanged. We establish the conditions under which the number of 1s decreases. We recursively define as well pairs of matrices which product is a specific matrix and such that by applying on them these algorithms, we minimize the total number of non-zero entries present in both matrices. These matrices may be interpreted as solutions for a well known information retrieval problem, and in this case the number of 1 entries represent the complexity of the retrieve and information update operations.
Resumo:
Search engines sometimes apply the search on the full text of documents or web-pages; but sometimes they can apply the search on selected parts of the documents only, e.g. their titles. Full-text search may consume a lot of computing resources and time. It may be possible to save resources by applying the search on the titles of documents only, assuming that a title of a document provides a concise representation of its content. We tested this assumption using Google search engine. We ran search queries that have been defined by users, distinguishing between two types of queries/users: queries of users who are familiar with the area of the search, and queries of users who are not familiar with the area of the search. We found that searches which use titles provide similar and sometimes even (slightly) better results compared to searches which use the full-text. These results hold for both types of queries/users. Moreover, we found an advantage in title-search when searching in unfamiliar areas because the general terms used in queries in unfamiliar areas match better with general terms which tend to be used in document titles.
Resumo:
In the context of Software Reuse providing techniques to support source code retrieval has been widely experimented. However, much effort is required in order to find how to match classical Information Retrieval and source code characteristics and implicit information. Introducing linguistic theories in the software development process, in terms of documentation standardization may produce significant benefits when applying Information Retrieval techniques. The goal of our research is to provide a tool to improve source code search and retrieval In order to achieve this goal we apply some linguistic rules to the development process.
Resumo:
The paper discusses the Europeana Creative project which aims to facilitate re-use of cultural heritage metadata and content by the creative industries. The paper focuses on the contribution of Ontotext to the project activities. The Europeana Data Model (EDM) is further discussed as a new proposal for structuring the data that Europeana will ingest, manage and publish. The advantages of using EDM instead of the current ESE metadata set are highlighted. Finally, Ontotext’s EDM Endpoint is presented, based on OWLIM semantic repository and SPARQL query language. A user-friendly RDF view is presented in order to illustrate the possibilities of Forest - an extensible modular user interface framework for creating linked data and semantic web applications.
Resumo:
Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.