29 resultados para Information retrieval, Web search behavior, Cognitive style
em CentAUR: Central Archive University of Reading - UK
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
Search has become a hot topic in Internet computing, with rival search engines battling to become the de facto Web portal, harnessing search algorithms to wade through information on a scale undreamed of by early information retrieval (IR) pioneers. This article examines how search has matured from its roots in specialized IR systems to become a key foundation of the Web. The authors describe new challenges posed by the Web's scale, and show how search is changing the nature of the Web as much as the Web has changed the nature of search
Resumo:
In general, ranking entities (resources) on the Semantic Web (SW) is subject to importance, relevance, and query length. Few existing SW search systems cover all of these aspects. Moreover, many existing efforts simply reuse the technologies from conventional Information Retrieval (IR), which are not designed for SW data. This paper proposes a ranking mechanism, which includes all three categories of rankings and are tailored to SW data.
Resumo:
A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.
Resumo:
The Web's link structure (termed the Web Graph) is a richly connected set of Web pages. Current applications use this graph for indexing and information retrieval purposes. In contrast the relationship between Web Graph and application is reversed by letting the structure of the Web Graph influence the behaviour of an application. Presents a novel Web crawling agent, AlienBot, the output of which is orthogonally coupled to the enemy generation strategy of a computer game. The Web Graph guides AlienBot, causing it to generate a stochastic process. Shows the effectiveness of such unorthodox coupling to both the playability of the game and the heuristics of the Web crawler. In addition, presents the results of the sample of Web pages collected by the crawling process. In particular, shows: how AlienBot was able to identify the power law inherent in the link structure of the Web; that 61.74 per cent of Web pages use some form of scripting technology; that the size of the Web can be estimated at just over 5.2 billion pages; and that less than 7 per cent of Web pages fully comply with some variant of (X)HTML.
Resumo:
Material encoded with reference to the self is better remembered. One interpretation of this effect is that the self operates to organise retrieval of memories. We were motivated to find out whether this organisational principle extended to everyday information and for material not explicitly related to the self. Participants generated friends' birthdays from memory and then gave their own birthday. We found that participants were particularly likely to recall birthdays from on or around the date of their own birthday. Thus, memory for birthdays clusters around self-relevant information, even when there is no specific attempt to recall self-related material. Birthdays clustered somewhat around the time of testing, important dates in the calendar, and for a close other, but not to the extent of the participants' birthdays. We suggest this is a demonstration of the organisational structure of the self in memory. Copyright (C) 2010 John Wiley & Sons, Ltd.
Resumo:
This paper describes the implementation of a semantic web search engine on conversation styled transcripts. Our choice of data is Hansard, a publicly available conversation style transcript of parliamentary debates. The current search engine implementation on Hansard is limited to running search queries based on keywords or phrases hence lacks the ability to make semantic inferences from user queries. By making use of knowledge such as the relationship between members of parliament, constituencies, terms of office, as well as topics of debates the search results can be improved in terms of both relevance and coverage. Our contribution is not algorithmic instead we describe how we exploit a collection of external data sources, ontologies, semantic web vocabularies and named entity extraction in the analysis of underlying semantics of user queries as well as the semantic enrichment of the search index thereby improving the quality of results.
Resumo:
Accessing information, which is spread across multiple sources, in a structured and connected way, is a general problem for enterprises. A unified structure for knowledge representation is urgently needed to enable integration of heterogeneous information resources. Topic Maps seem to be a solution for this problem. The Topic Map technology enables connecting information, through concepts and relationships, and their occurrences across multiple systems. In this paper, we address this problem by describing a framework built on topic maps, to support the current need of knowledge management. New approaches for information integration, intelligent search and topic map exploration are introduced within this framework.
Resumo:
Individual differences in cognitive style can be characterized along two dimensions: ‘systemizing’ (S, the drive to analyze or build ‘rule-based’ systems) and ‘empathizing’ (E, the drive to identify another's mental state and respond to this with an appropriate emotion). Discrepancies between these two dimensions in one direction (S > E) or the other (E > S) are associated with sex differences in cognition: on average more males show an S > E cognitive style, while on average more females show an E > S profile. The neurobiological basis of these different profiles remains unknown. Since individuals may be typical or atypical for their sex, it is important to move away from the study of sex differences and towards the study of differences in cognitive style. Using structural magnetic resonance imaging we examined how neuroanatomy varies as a function of the discrepancy between E and S in 88 adult males from the general population. Selecting just males allows us to study discrepant E-S profiles in a pure way, unconfounded by other factors related to sex and gender. An increasing S > E profile was associated with increased gray matter volume in cingulate and dorsal medial prefrontal areas which have been implicated in processes related to cognitive control, monitoring, error detection, and probabilistic inference. An increasing E > S profile was associated with larger hypothalamic and ventral basal ganglia regions which have been implicated in neuroendocrine control, motivation and reward. These results suggest an underlying neuroanatomical basis linked to the discrepancy between these two important dimensions of individual differences in cognitive style.
Resumo:
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.