18 resultados para indexing
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
As a result of resource limitations, state in branch predictors is frequently shared between uncorrelated branches. This interference can significantly limit prediction accuracy. In current predictor designs, the branches sharing prediction information are determined by their branch addresses and thus branch groups are arbitrarily chosen during compilation. This feasibility study explores a more analytic and systematic approach to classify branches into clusters with similar behavioral characteristics. We present several ways to incorporate this cluster information as an additional information source in branch predictors.
Resumo:
Latent semantic indexing (LSI) is a technique used for intelligent information retrieval (IR). It can be used as an alternative to traditional keyword matching IR and is attractive in this respect because of its ability to overcome problems with synonymy and polysemy. This study investigates various aspects of LSI: the effect of the Haar wavelet transform (HWT) as a preprocessing step for the singular value decomposition (SVD) in the key stage of the LSI process; and the effect of different threshold types in the HWT on the search results. The developed method allows the visualisation and processing of the term document matrix, generated in the LSI process, using HWT. The results have shown that precision can be increased by applying the HWT as a preprocessing step, with better results for hard thresholding than soft thresholding, whereas standard SVD-based LSI remains the most effective way of searching in terms of recall value.
Resumo:
PURPOSE: To assess the Medical Subject Headings (MeSH) indexing of articles that employed time-to-event analyses to report outcomes of dental treatment in patients.
MATERIALS AND METHODS: Articles published in 2008 in 50 dental journals with the highest impact factors were hand searched to identify articles reporting dental treatment outcomes over time in human subjects with time-to-event statistics (included, n = 95), without time-to-event statistics (active controls, n = 91), and all other articles (passive controls, n = 6,769). The search was systematic (kappa 0.92 for screening, 0.86 for eligibility). Outcome-, statistic- and time-related MeSH were identified, and differences in allocation between groups were analyzed with chi-square and Fischer exact statistics.
RESULTS: The most frequently allocated MeSH for included and active control articles were "dental restoration failure" (77% and 52%, respectively) and "treatment outcome" (54% and 48%, respectively). Outcome MeSH was similar between these groups (86% and 77%, respectively) and significantly greater than passive controls (10%, P < .001). Significantly more statistical MeSH were allocated to the included articles than to the active or passive controls (67%, 15%, and 1%, respectively, P < .001). Sixty-nine included articles specifically used Kaplan-Meier or life table analyses, but only 42% (n = 29) were indexed as such. Significantly more time-related MeSH were allocated to the included than the active controls (92% and 79%, respectively, P = .02), or to the passive controls (22%, P < .001).
CONCLUSIONS: MeSH allocation within MEDLINE to time-to-event dental articles was inaccurate and inconsistent. Statistical MeSH were omitted from 30% of the included articles and incorrectly allocated to 15% of active controls. Such errors adversely impact search accuracy.
Resumo:
Quantifying the similarity between two trajectories is a fundamental operation in analysis of spatio-temporal databases. While a number of distance functions exist, the recent shift in the dynamics of the trajectory generation procedure violates one of their core assumptions; a consistent and uniform sampling rate. In this paper, we formulate a robust distance function called Edit Distance with Projections (EDwP) to match trajectories under inconsistent and variable sampling rates through dynamic interpolation. This is achieved by deploying the idea of projections that goes beyond matching only the sampled points while aligning trajectories. To enable efficient trajectory retrievals using EDwP, we design an index structure called TrajTree. TrajTree derives its pruning power by employing the unique combination of bounding boxes with Lipschitz embedding. Extensive experiments on real trajectory databases demonstrate EDwP to be up to 5 times more accurate than the state-of-the-art distance functions. Additionally, TrajTree increases the efficiency of trajectory retrievals by up to an order of magnitude over existing techniques.
Resumo:
The analytic advantages of central concepts from linguistics and information theory, and the analogies demonstrated between them, for understanding patterns of retrieval from full-text indexes to documents are developed. The interaction between the syntagm and the paradigm in computational operations on written language in indexing, searching, and retrieval is used to account for transformations of the signified or meaning between documents and their representation and between queries and documents retrieved. Characteristics of the message, and messages for selection for written language, are brought to explain the relative frequency of occurrence of words and multiple word sequences in documents. The examples given in the companion article are revisited and a fuller example introduced. The signified of the sequence stood for, the term classically used in the definitions of the sign, as something standing for something else, can itself change rapidly according to its syntagm. A greater than ordinary discourse understanding of patterns in retrieval is obtained.
Resumo:
In a typical shoeprint classification and retrieval system, the first step is to segment meaningful basic shapes and patterns in a noisy shoeprint image. This step has significant influence on shape descriptors and shoeprint indexing in the later stages. In this paper, we extend a recently developed denoising technique proposed by Buades, called non-local mean filtering, to give a more general model. In this model, the expected result of an operation on a pixel can be estimated by performing the same operation on all of its reference pixels in the same image. A working pixel’s reference pixels are those pixels whose neighbourhoods are similar to the working pixel’s neighbourhood. Similarity is based on the correlation between the local neighbourhoods of the working pixel and the reference pixel. We incorporate a special instance of this general case into thresholding a very noisy shoeprint image. Visual and quantitative comparisons with two benchmarking techniques, by Otsu and Kittler, are conducted in the last section, giving evidence of the effectiveness of our method for thresholding noisy shoeprint images.
Resumo:
Information retrieval in the age of Internet search engines has become part of ordinary discourse and everyday practice: "Google" is a verb in common usage. Thus far, more attention has been given to practical understanding of information retrieval than to a full theoretical account. In Human Information Retrieval, Julian Warner offers a comprehensive overview of information retrieval, synthesizing theories from different disciplines (information and computer science, librarianship and indexing, and information society discourse) and incorporating such disparate systems as WorldCat and Google into a single, robust theoretical framework. There is a need for such a theoretical treatment, he argues, one that reveals the structure and underlying patterns of this complex field while remaining congruent with everyday practice. Warner presents a labor theoretic approach to information retrieval, building on his previously formulated distinction between semantic and syntactic mental labor, arguing that the description and search labor of information retrieval can be understood as both semantic and syntactic in character. Warner's information science approach is rooted in the humanities and the social sciences but informed by an understanding of information technology and information theory. The chapters offer a progressive exposition of the topic, with illustrative examples to explain the concepts presented. Neither narrowly practical nor largely speculative, Human Information Retrieval meets the contemporary need for a broader treatment of information and information systems.
Resumo:
Latent semantic indexing (LSI) is a popular technique used in information retrieval (IR) applications. This paper presents a novel evaluation strategy based on the use of image processing tools. The authors evaluate the use of the discrete cosine transform (DCT) and Cohen Daubechies Feauveau 9/7 (CDF 9/7) wavelet transform as a pre-processing step for the singular value decomposition (SVD) step of the LSI system. In addition, the effect of different threshold types on the search results is examined. The results show that accuracy can be increased by applying both transforms as a pre-processing step, with better performance for the hard-threshold function. The choice of the best threshold value is a key factor in the transform process. This paper also describes the most effective structure for the database to facilitate efficient searching in the LSI system.
Resumo:
Objectives: Acute lung injury and the acute respiratory distress syndrome are characterized by noncardiogenic pulmonary edema, which can be assessed by measurement of extravascular lung water. Traditionally, extravascular lung water has been indexed to actual body weight (mL/kg). Because lung size is dependent on height rather than weight, we hypothesized indexing to predicted body weight may be a better predictor of mortality in acute lung injury/acute respiratory distress syndrome.
Resumo:
The importance and use of text extraction from camera based coloured scene images is rapidly increasing with time. Text within a camera grabbed image can contain a huge amount of meta data about that scene. Such meta data can be useful for identification, indexing and retrieval purposes. While the segmentation and recognition of text from document images is quite successful, detection of coloured scene text is a new challenge for all camera based images. Common problems for text extraction from camera based images are the lack of prior knowledge of any kind of text features such as colour, font, size and orientation as well as the location of the probable text regions. In this paper, we document the development of a fully automatic and extremely robust text segmentation technique that can be used for any type of camera grabbed frame be it single image or video. A new algorithm is proposed which can overcome the current problems of text segmentation. The algorithm exploits text appearance in terms of colour and spatial distribution. When the new text extraction technique was tested on a variety of camera based images it was found to out perform existing techniques (or something similar). The proposed technique also overcomes any problems that can arise due to an unconstraint complex background. The novelty in the works arises from the fact that this is the first time that colour and spatial information are used simultaneously for the purpose of text extraction.