998 resultados para Entity-oriented Retrieval


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The iRODS system, created by the San Diego Supercomputing Centre, is a rule oriented data management system that allows the user to create sets of rules to define how the data is to be managed. Each rule corresponds to a particular action or operation (such as checksumming a file) and the system is flexible enough to allow the user to create new rules for new types of operations. The iRODS system can interface to any storage system (provided an iRODS driver is built for that system) and relies on its’ metadata catalogue to provide a virtual file-system that can handle files of any size and type. However, some storage systems (such as tape systems) do not handle small files efficiently and prefer small files to be packaged up (or “bundled”) into larger units. We have developed a system that can bundle small data files of any type into larger units - mounted collections. The system can create collection families and contains its’ own extensible metadata, including metadata on which family the collection belongs to. The mounted collection system can work standalone and is being incorporated into the iRODS system to enhance the systems flexibility to handle small files. In this paper we describe the motivation for creating a mounted collection system, its’ architecture and how it has been incorporated into the iRODS system. We describe different technologies used to create the mounted collection system and provide some performance numbers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic indexing and retrieval of digital data poses major challenges. The main problem arises from the ever increasing mass of digital media and the lack of efficient methods for indexing and retrieval of such data based on the semantic content rather than keywords. To enable intelligent web interactions, or even web filtering, we need to be capable of interpreting the information base in an intelligent manner. For a number of years research has been ongoing in the field of ontological engineering with the aim of using ontologies to add such (meta) knowledge to information. In this paper, we describe the architecture of a system (Dynamic REtrieval Analysis and semantic metadata Management (DREAM)) designed to automatically and intelligently index huge repositories of special effects video clips, based on their semantic content, using a network of scalable ontologies to enable intelligent retrieval. The DREAM Demonstrator has been evaluated as deployed in the film post-production phase to support the process of storage, indexing and retrieval of large data sets of special effects video clips as an exemplar application domain. This paper provides its performance and usability results and highlights the scope for future enhancements of the DREAM architecture which has proven successful in its first and possibly most challenging proving ground, namely film production, where it is already in routine use within our test bed Partners' creative processes. (C) 2009 Published by Elsevier B.V.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we describe an exploratory assessment of the effect of aspect-oriented programming on software maintainability. An experiment was conducted in which 11 software professionals were asked to carry out maintenance tasks on one of two programs. The first program was written in Java and the second in AspectJ. Both programs implement a shopping system according to the same set of requirements. A number of statistical hypotheses were tested. The results did seem to suggest a slight advantage for the subjects using the object-oriented system since in general it took the subjects less time to answer the questions on this system. Also, both systems appeared to be equally difficult to modify. However, the results did not show a statistically significant influence of aspect-oriented programming at the 5% level. We are aware that the results of this single small study cannot be generalized. We conclude that more empirical research is necessary in this area to identify the benefits of aspect-oriented programming and we hope that this paper will encourage such research.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel framework referred to as collaterally confirmed labelling (CCL) is proposed, aiming at localising the visual semantics to regions of interest in images with textual keywords. Both the primary image and collateral textual modalities are exploited in a mutually co-referencing and complementary fashion. The collateral content and context-based knowledge is used to bias the mapping from the low-level region-based visual primitives to the high-level visual concepts defined in a visual vocabulary. We introduce the notion of collateral context, which is represented as a co-occurrence matrix of the visual keywords. A collaborative mapping scheme is devised using statistical methods like Gaussian distribution or Euclidean distance together with collateral content and context-driven inference mechanism. We introduce a novel high-level visual content descriptor that is devised for performing semantic-based image classification and retrieval. The proposed image feature vector model is fundamentally underpinned by the CCL framework. Two different high-level image feature vector models are developed based on the CCL labelling of results for the purposes of image data clustering and retrieval, respectively. A subset of the Corel image collection has been used for evaluating our proposed method. The experimental results to-date already indicate that the proposed semantic-based visual content descriptors outperform both traditional visual and textual image feature models. (C) 2007 Elsevier B.V. All rights reserved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The atmospheric electrical Potential Gradient (PG) arises from global thunderstorm activity, but surface measurements of the atmospheric Potential Gradient (PG) are influenced by global thunderstorms and local aerosol concentration changes. The local aerosol change can be monitored independently, and in some cases the concentration changes are closely related to PG changes. For these circumstances, a general theory to remove the local aerosol influence on PG measurements has been developed. Continuous measurements of PG and aerosol mass concentration were made during 24–31 Dec, 2005 within an urban environment at Reading, UK. The average diurnal variation of PG showed a double diurnal cycle, with maxima in the early morning and evening hours. The aerosol concentration has similar double maxima. Removing the aerosol using from the PG and aerosol correlation returns a single diurnal cycle, suggestive of the more global PG diurnal cycle.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new Bayesian algorithm for retrieving surface rain rate from Tropical Rainfall Measuring Mission (TRMM) Microwave Imager (TMI) over the ocean is presented, along with validations against estimates from the TRMM Precipitation Radar (PR). The Bayesian approach offers a rigorous basis for optimally combining multichannel observations with prior knowledge. While other rain-rate algorithms have been published that are based at least partly on Bayesian reasoning, this is believed to be the first self-contained algorithm that fully exploits Bayes’s theorem to yield not just a single rain rate, but rather a continuous posterior probability distribution of rain rate. To advance the understanding of theoretical benefits of the Bayesian approach, sensitivity analyses have been conducted based on two synthetic datasets for which the “true” conditional and prior distribution are known. Results demonstrate that even when the prior and conditional likelihoods are specified perfectly, biased retrievals may occur at high rain rates. This bias is not the result of a defect of the Bayesian formalism, but rather represents the expected outcome when the physical constraint imposed by the radiometric observations is weak owing to saturation effects. It is also suggested that both the choice of the estimators and the prior information are crucial to the retrieval. In addition, the performance of the Bayesian algorithm herein is found to be comparable to that of other benchmark algorithms in real-world applications, while having the additional advantage of providing a complete continuous posterior probability distribution of surface rain rate.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We test the response of the Oxford-RAL Aerosol and Cloud (ORAC) retrieval algorithm for MSG SEVIRI to changes in the aerosol properties used in the dust aerosol model, using data from the Dust Outflow and Deposition to the Ocean (DODO) flight campaign in August 2006. We find that using the observed DODO free tropospheric aerosol size distribution and refractive index increases simulated top of the atmosphere radiance at 0.55 µm assuming a fixed erosol optical depth of 0.5 by 10–15 %, reaching a maximum difference at low solar zenith angles. We test the sensitivity of the retrieval to the vertical distribution f the aerosol and find that this is unimportant in determining simulated radiance at 0.55 µm. We also test the ability of the ORAC retrieval when used to produce the GlobAerosol dataset to correctly identify continental aerosol outflow from the African continent and we find that it poorly constrains aerosol speciation. We develop spatially and temporally resolved prior distributions of aerosols to inform the retrieval which incorporates five aerosol models: desert dust, maritime, biomass burning, urban and continental. We use a Saharan Dust Index and the GEOS-Chem chemistry transport model to describe dust and biomass burning aerosol outflow, and compare AOD using our speciation against the GlobAerosol retrieval during January and July 2006. We find AOD discrepancies of 0.2–1 over regions of intense biomass burning outflow, where AOD from our aerosol speciation and GlobAerosol speciation can differ by as much as 50 - 70 %.