871 resultados para Interactive Information Retrieval
Resumo:
In the context of Software Reuse providing techniques to support source code retrieval has been widely experimented. However, much effort is required in order to find how to match classical Information Retrieval and source code characteristics and implicit information. Introducing linguistic theories in the software development process, in terms of documentation standardization may produce significant benefits when applying Information Retrieval techniques. The goal of our research is to provide a tool to improve source code search and retrieval In order to achieve this goal we apply some linguistic rules to the development process.
Resumo:
In this paper, we present an innovative topic segmentation system based on a new informative similarity measure that takes into account word co-occurrence in order to avoid the accessibility to existing linguistic resources such as electronic dictionaries or lexico-semantic databases such as thesauri or ontology. Topic segmentation is the task of breaking documents into topically coherent multi-paragraph subparts. Topic segmentation has extensively been used in information retrieval and text summarization. In particular, our architecture proposes a language-independent topic segmentation system that solves three main problems evidenced by previous research: systems based uniquely on lexical repetition that show reliability problems, systems based on lexical cohesion using existing linguistic resources that are usually available only for dominating languages and as a consequence do not apply to less favored languages and finally systems that need previously existing harvesting training data. For that purpose, we only use statistics on words and sequences of words based on a set of texts. This solution provides a flexible solution that may narrow the gap between dominating languages and less favored languages thus allowing equivalent access to information.
Resumo:
Since multimedia data, such as images and videos, are way more expressive and informative than ordinary text-based data, people find it more attractive to communicate and express with them. Additionally, with the rising popularity of social networking tools such as Facebook and Twitter, multimedia information retrieval can no longer be considered a solitary task. Rather, people constantly collaborate with one another while searching and retrieving information. But the very cause of the popularity of multimedia data, the huge and different types of information a single data object can carry, makes their management a challenging task. Multimedia data is commonly represented as multidimensional feature vectors and carry high-level semantic information. These two characteristics make them very different from traditional alpha-numeric data. Thus, to try to manage them with frameworks and rationales designed for primitive alpha-numeric data, will be inefficient. An index structure is the backbone of any database management system. It has been seen that index structures present in existing relational database management frameworks cannot handle multimedia data effectively. Thus, in this dissertation, a generalized multidimensional index structure is proposed which accommodates the atypical multidimensional representation and the semantic information carried by different multimedia data seamlessly from within one single framework. Additionally, the dissertation investigates the evolving relationships among multimedia data in a collaborative environment and how such information can help to customize the design of the proposed index structure, when it is used to manage multimedia data in a shared environment. Extensive experiments were conducted to present the usability and better performance of the proposed framework over current state-of-art approaches.
Resumo:
With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.
Resumo:
The Everglades Online Thesaurus is a structured vocabulary of concepts and terms relating to the south Florida environment. Designed as an information management tool for both researchers and metadata creators, the Thesaurus is intended to improve information retrieval across the many disparate information systems, databases, and web sites that provide Everglades-related information. The vocabulary provided by the Everglades Online Thesaurus expresses each relevant concept using a single ‘preferred term’, whereas in natural language many terms may exist to express that same concept. In this way, the Thesaurus offers the possibility of standardizing the terminology used to describe Everglades-related information — an important factor in predictable and successful resource discovery.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
In the last decade, large numbers of social media services have emerged and been widely used in people's daily life as important information sharing and acquisition tools. With a substantial amount of user-contributed text data on social media, it becomes a necessity to develop methods and tools for text analysis for this emerging data, in order to better utilize it to deliver meaningful information to users. Previous work on text analytics in last several decades is mainly focused on traditional types of text like emails, news and academic literatures, and several critical issues to text data on social media have not been well explored: 1) how to detect sentiment from text on social media; 2) how to make use of social media's real-time nature; 3) how to address information overload for flexible information needs. In this dissertation, we focus on these three problems. First, to detect sentiment of text on social media, we propose a non-negative matrix tri-factorization (tri-NMF) based dual active supervision method to minimize human labeling efforts for the new type of data. Second, to make use of social media's real-time nature, we propose approaches to detect events from text streams on social media. Third, to address information overload for flexible information needs, we propose two summarization framework, dominating set based summarization framework and learning-to-rank based summarization framework. The dominating set based summarization framework can be applied for different types of summarization problems, while the learning-to-rank based summarization framework helps utilize the existing training data to guild the new summarization tasks. In addition, we integrate these techneques in an application study of event summarization for sports games as an example of how to better utilize social media data.
Resumo:
In line with the process of financialization and globalization of capital, which has intensified in all latitudes of the globe, the world of work is permeated by his determinations arising and also has been (re) setting from numerous changes expressed by example, in the unbridled expansion of temporary forms of work activities, and flexible outsourced by the growth of informality, forming a new morphology of work. However, regardless of how these forms are expressed in concrete materiality, there is something that unifies: all of them are marked by exponentiation of insecurity and hence the numerous negative effects on the lives of individuals who need to sell their labor power to survive. Given this premise, the present work is devoted to study, within the framework of the Brazilian particularities of transition between Fordism and Toyotism, what we call composite settings of the conditions and labor relations processed within the North river- textile industry Grande. To this end, guided by historical and dialectical materialism, we made use of social research in its qualitative aspect, using semi-structured interviews, in addition to literature review, information retrieval and use of field notes. From our raids, we note that between the time span stretching from the 1990s to the current year, the Natal textile industry has been undergoing a process of successive and intense changes in their modus operandi, geared specifically to the organization and labor management causing, concomitantly, several repercussions for the entire working class.
Resumo:
In line with the process of financialization and globalization of capital, which has intensified in all latitudes of the globe, the world of work is permeated by his determinations arising and also has been (re) setting from numerous changes expressed by example, in the unbridled expansion of temporary forms of work activities, and flexible outsourced by the growth of informality, forming a new morphology of work. However, regardless of how these forms are expressed in concrete materiality, there is something that unifies: all of them are marked by exponentiation of insecurity and hence the numerous negative effects on the lives of individuals who need to sell their labor power to survive. Given this premise, the present work is devoted to study, within the framework of the Brazilian particularities of transition between Fordism and Toyotism, what we call composite settings of the conditions and labor relations processed within the North river- textile industry Grande. To this end, guided by historical and dialectical materialism, we made use of social research in its qualitative aspect, using semi-structured interviews, in addition to literature review, information retrieval and use of field notes. From our raids, we note that between the time span stretching from the 1990s to the current year, the Natal textile industry has been undergoing a process of successive and intense changes in their modus operandi, geared specifically to the organization and labor management causing, concomitantly, several repercussions for the entire working class.
Resumo:
Over the years there has been a broader definition of the term health. At the same time it was found also an evolution of the concept of health care which in turn has led to changes in the approach to delivery of health services and hence in its management. In this regard, currently the nephrology services have been searching for quality technical and social need. In view of these innovations and the quest for quality, it elaborated the general objective: to develop a quality assessment protocol for dialysis service Onofre Lopes University Hospital. It is an intervention project effected through an action research, which consisted of 4 steps. Initially was identified through a literature search in scientific literature, which quality indicators would apply to a dialysis unit being selected as follows: infection rate in hemodialysis access site, microbiological control of water used for hemodialysis and Index User satisfaction. Through critical reflection on the theme researched in the previous step, it was drawn up three data collection instruments, interview form type, applied between the months of October and November 2015. In addition to the information obtained, also made up of the use of information retrieval technique. The results were organized in graphs and tables and analyzed using qualitative and exploratory technical approach. Then a reflective analysis of the data obtained and the diagnosis of reality studied was traced and confronted with the literature was performed. The data produced in this study revealed that the Dialysis Unit of HUOL is much to be desired, considering that some weaknesses have been identified in its structure. Faced with this finding have been proposed, as a contribution and aiming to guide the development of future actions, suggestions for improvement that should be implemented and monitored to be assured overcoming these difficulties, allowing an appropriate organizational restructuring, and resulting in improved service public offered. It was concluded that for hemodialysis treatment results are achieved and positive, it is necessary to have physical structure and adequate infrastructure, multidisciplinary team specialized, trained and in sufficient quantity, well designed processes for professionals to have standards to be followed decreasing the chance to err, and a risk management system to detect and control situations that endanger patient safety.
Resumo:
Over the years there has been a broader definition of the term health. At the same time it was found also an evolution of the concept of health care which in turn has led to changes in the approach to delivery of health services and hence in its management. In this regard, currently the nephrology services have been searching for quality technical and social need. In view of these innovations and the quest for quality, it elaborated the general objective: to develop a quality assessment protocol for dialysis service Onofre Lopes University Hospital. It is an intervention project effected through an action research, which consisted of 4 steps. Initially was identified through a literature search in scientific literature, which quality indicators would apply to a dialysis unit being selected as follows: infection rate in hemodialysis access site, microbiological control of water used for hemodialysis and Index User satisfaction. Through critical reflection on the theme researched in the previous step, it was drawn up three data collection instruments, interview form type, applied between the months of October and November 2015. In addition to the information obtained, also made up of the use of information retrieval technique. The results were organized in graphs and tables and analyzed using qualitative and exploratory technical approach. Then a reflective analysis of the data obtained and the diagnosis of reality studied was traced and confronted with the literature was performed. The data produced in this study revealed that the Dialysis Unit of HUOL is much to be desired, considering that some weaknesses have been identified in its structure. Faced with this finding have been proposed, as a contribution and aiming to guide the development of future actions, suggestions for improvement that should be implemented and monitored to be assured overcoming these difficulties, allowing an appropriate organizational restructuring, and resulting in improved service public offered. It was concluded that for hemodialysis treatment results are achieved and positive, it is necessary to have physical structure and adequate infrastructure, multidisciplinary team specialized, trained and in sufficient quantity, well designed processes for professionals to have standards to be followed decreasing the chance to err, and a risk management system to detect and control situations that endanger patient safety.
Resumo:
Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.
Resumo:
The Semantic Annotation component is a software application that provides support for automated text classification, a process grounded in a cohesion-centered representation of discourse that facilitates topic extraction. The component enables the semantic meta-annotation of text resources, including automated classification, thus facilitating information retrieval within the RAGE ecosystem. It is available in the ReaderBench framework (http://readerbench.com/) which integrates advanced Natural Language Processing (NLP) techniques. The component makes use of Cohesion Network Analysis (CNA) in order to ensure an in-depth representation of discourse, useful for mining keywords and performing automated text categorization. Our component automatically classifies documents into the categories provided by the ACM Computing Classification System (http://dl.acm.org/ccs_flat.cfm), but also into the categories from a high level serious games categorization provisionally developed by RAGE. English and French languages are already covered by the provided web service, whereas the entire framework can be extended in order to support additional languages.
Resumo:
We consider the problem of resource selection in clustered Peer-to-Peer Information Retrieval (P2P IR) networks with cooperative peers. The clustered P2P IR framework presents a significant departure from general P2P IR architectures by employing clustering to ensure content coherence between resources at the resource selection layer, without disturbing document allocation. We propose that such a property could be leveraged in resource selection by adapting well-studied and popular inverted lists for centralized document retrieval. Accordingly, we propose the Inverted PeerCluster Index (IPI), an approach that adapts the inverted lists, in a straightforward manner, for resource selection in clustered P2P IR. IPI also encompasses a strikingly simple peer-specific scoring mechanism that exploits the said index for resource selection. Through an extensive empirical analysis on P2P IR testbeds, we establish that IPI competes well with the sophisticated state-of-the-art methods in virtually every parameter of interest for the resource selection task, in the context of clustered P2P IR.
Resumo:
MEDEIROS, Rildeci; MELO, Erica S. F.; NASCIMENTO, M. S. Hemeroteca digital temática: socialização da informação em cinema.In:SEMINÁRIO NACIONAL DE BIBLIOTECAS UNIVERSITÁRIAS,15.,2008,São Paulo. Anais eletrônicos... São Paulo:CRUESP,2008. Disponível em: http://www.sbu.unicamp.br/snbu2008/anais/site/pdfs/3018.pdf