16 resultados para Information Retrieval, Document Databases, Digital Libraries

em CentAUR: Central Archive University of Reading - UK


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the main tasks of the mathematical knowledge management community must surely be to enhance access to mathematics on digital systems. In this paper we present a spectrum of approaches to solving the various problems inherent in this task, arguing that a variety of approaches is both necessary and useful. The main ideas presented are about the differences between digitised mathematics, digitally represented mathematics and formalised mathematics. Each has its part to play in managing mathematical information in a connected world. Digitised material is that which is embodied in a computer file, accessible and displayable locally or globally. Represented material is digital material in which there is some structure (usually syntactic in nature) which maps to the mathematics contained in the digitised information. Formalised material is that in which both the syntax and semantics of the represented material, is automatically accessible. Given the range of mathematical information to which access is desired, and the limited resources available for managing that information, we must ensure that these resources are applied to digitise, form representations of or formalise, existing and new mathematical information in such a way as to extract the most benefit from the least expenditure of resources. We also analyse some of the various social and legal issues which surround the practical tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This article is concerned with the risks associated with the monopolisation of information that is available from a single source only. Although there is a longstanding consensus that sole-source databases should not receive protection under the EU Database Directive, and there are legislative provisions to ensure that lawful users have access to a database’s contents, Ryanair v PR Aviation challenges this assumption by affirming that the use of non-protected databases can be restricted by contract. Owners of non-protected databases can contractually exclude lawful users from taking the benefit of statutorily permitted uses, because such databases are not covered from the legislation that declares this kind of contract null and void. We argue that this judgment is not consistent with the legislative history and can have a profound impact on the functioning of the digital single market, where new information services, such as meta-search engines or price-comparison websites, base their operation on the systematic extraction and re-utilisation of materials available from online sources. This is an issue that the Commission should address in a forthcoming evaluation of the Database Directive.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A large volume of visual content is inaccessible until effective and efficient indexing and retrieval of such data is achieved. In this paper, we introduce the DREAM system, which is a knowledge-assisted semantic-driven context-aware visual information retrieval system applied in the film post production domain. We mainly focus on the automatic labelling and topic map related aspects of the framework. The use of the context- related collateral knowledge, represented by a novel probabilistic based visual keyword co-occurrence matrix, had been proven effective via the experiments conducted during system evaluation. The automatically generated semantic labels were fed into the Topic Map Engine which can automatically construct ontological networks using Topic Maps technology, which dramatically enhances the indexing and retrieval performance of the system towards an even higher semantic level.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We describe the CHARMe project, which aims to link climate datasets with publications, user feedback and other items of "commentary metadata". The system will help users learn from previous community experience and select datasets that best suit their needs, as well as providing direct traceability between conclusions and the data that supported them. The project applies the principles of Linked Data and adopts the Open Annotation standard to record and publish commentary information. CHARMe contributes to the emerging landscape of "climate services", which will provide climate data and information to influence policy and decision-making. Although the project focuses on climate science, the technologies and concepts are very general and could be applied to other fields.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Goal modelling is a well known rigorous method for analysing problem rationale and developing requirements. Under the pressures typical of time-constrained projects its benefits are not accessible. This is because of the effort and time needed to create the graph and because reading the results can be difficult owing to the effects of crosscutting concerns. Here we introduce an adaptation of KAOS to meet the needs of rapid turn around and clarity. The main aim is to help the stakeholders gain an insight into the larger issues that might be overlooked if they make a premature start into implementation. The method emphasises the use of obstacles, accepts under-refined goals and has new methods for managing crosscutting concerns and strategic decision making. It is expected to be of value to agile as well as traditional processes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In general, ranking entities (resources) on the Semantic Web (SW) is subject to importance, relevance, and query length. Few existing SW search systems cover all of these aspects. Moreover, many existing efforts simply reuse the technologies from conventional Information Retrieval (IR), which are not designed for SW data. This paper proposes a ranking mechanism, which includes all three categories of rankings and are tailored to SW data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Search has become a hot topic in Internet computing, with rival search engines battling to become the de facto Web portal, harnessing search algorithms to wade through information on a scale undreamed of by early information retrieval (IR) pioneers. This article examines how search has matured from its roots in specialized IR systems to become a key foundation of the Web. The authors describe new challenges posed by the Web's scale, and show how search is changing the nature of the Web as much as the Web has changed the nature of search

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Web's link structure (termed the Web Graph) is a richly connected set of Web pages. Current applications use this graph for indexing and information retrieval purposes. In contrast the relationship between Web Graph and application is reversed by letting the structure of the Web Graph influence the behaviour of an application. Presents a novel Web crawling agent, AlienBot, the output of which is orthogonally coupled to the enemy generation strategy of a computer game. The Web Graph guides AlienBot, causing it to generate a stochastic process. Shows the effectiveness of such unorthodox coupling to both the playability of the game and the heuristics of the Web crawler. In addition, presents the results of the sample of Web pages collected by the crawling process. In particular, shows: how AlienBot was able to identify the power law inherent in the link structure of the Web; that 61.74 per cent of Web pages use some form of scripting technology; that the size of the Web can be estimated at just over 5.2 billion pages; and that less than 7 per cent of Web pages fully comply with some variant of (X)HTML.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A quasi-optical interferometric technique capable of measuring antenna phase patterns without the need for a heterodyne receiver is presented. It is particularly suited to the characterization of terahertz antennas feeding power detectors or mixers employing quasi-optical local oscillator injection. Examples of recorded antenna phase patterns at frequencies of 1.4 and 2.5 THz using homodyne detectors are presented. To our knowledge, these are the highest frequency antenna phase patterns ever recovered. Knowledge of both the amplitude and phase patterns in the far field enable a Gauss-Hermite or Gauss-Laguerre beam-mode analysis to be carried out for the antenna, of importance in performance optimization calculations, such as antenna gain and beam efficiency parameters at the design and prototype stage of antenna development. A full description of the beam would also be required if the antenna is to be used to feed a quasi-optical system in the near-field to far-field transition region. This situation could often arise when the device is fitted directly at the back of telescopes in flying observatories. A further benefit of the proposed technique is simplicity for characterizing systems in situ, an advantage of considerable importance as in many situations, the components may not be removable for further characterization once assembled. The proposed methodology is generic and should be useful across the wider sensing community, e.g., in single detector acoustic imaging or in adaptive imaging array applications. Furthermore, it is applicable across other frequencies of the EM spectrum, provided adequate spatial and temporal phase stability of the source can be maintained throughout the measurement process. Phase information retrieval is also of importance to emergent research areas, such as band-gap structure characterization, meta-materials research, electromagnetic cloaking, slow light, super-lens design as well as near-field and virtual imaging applications.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information systems integration becomes critical in enhancing organisational competitiveness through effective use of information resource provided by the whole host of information systems. Information systems integration in its nature is a process of bringing about the capability of communication and information exchange between systems; while interoperability, often as the result of systems integration, is such a capability. However currently there is a lack of theoretical foundation for representation and measure of the interoperability in organisations. Organisational semiotics provides a theoretical foundation for systems interoperability. A notion of ‘semiotic interoperability’ is proposed in this paper as a paradigm, guiding systems integration and measuring degree of interoperability, covering aspects from physical properties, transmission structure of signs, placing emphasis on communicating meaning, intention to social consequence of information.