812 resultados para Indexing (of information)


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper. we propose the Onion-tree, a new and robust dynamic memory-based MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper reports a research to evaluate the potential and the effects of use of annotated Paraconsistent logic in automatic indexing. This logic attempts to deal with contradictions, concerned with studying and developing inconsistency-tolerant systems of logic. This logic, being flexible and containing logical states that go beyond the dichotomies yes and no, permits to advance the hypothesis that the results of indexing could be better than those obtained by traditional methods. Interactions between different disciplines, as information retrieval, automatic indexing, information visualization, and nonclassical logics were considered in this research. From the methodological point of view, an algorithm for treatment of uncertainty and imprecision, developed under the Paraconsistent logic, was used to modify the values of the weights assigned to indexing terms of the text collections. The tests were performed on an information visualization system named Projection Explorer (PEx), created at Institute of Mathematics and Computer Science (ICMC - USP Sao Carlos), with available source code. PEx uses traditional vector space model to represent documents of a collection. The results were evaluated by criteria built in the information visualization system itself, and demonstrated measurable gains in the quality of the displays, confirming the hypothesis that the use of the para-analyser under the conditions of the experiment has the ability to generate more effective clusters of similar documents. This is a point that draws attention, since the constitution of more significant clusters can be used to enhance information indexing and retrieval. It can be argued that the adoption of non-dichotomous (non-exclusive) parameters provides new possibilities to relate similar information.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The biomedical literature is extensively catalogued and indexed in MEDLINE. MEDLINE indexing is done by trained human indexers, who identify the most important concepts in each article, and is expensive and inconsistent. Automating the indexing task is difficult: the National Library of Medicine produces the Medical Text Indexer (MTI), which suggests potential indexing terms to the indexers. MTI’s output is not good enough to work unattended. In my thesis, I propose a different way to approach the indexing task called MEDRank. MEDRank creates graphs representing the concepts in biomedical articles and their relationships within the text, and applies graph-based ranking algorithms to identify the most important concepts in each article. I evaluate the performance of several automated indexing solutions, including my own, by comparing their output to the indexing terms selected by the human indexers. MEDRank outperformed all other evaluated indexing solutions, including MTI, in general indexing performance and precision. MEDRank can be used to cluster documents, index any kind of biomedical text with standard vocabularies, or could become part of MTI itself.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper argues that in order to systematically comprehend the diversity of information organization frameworks, we must look at how aesthetic concerns and economic concerns manifest in decisions made about designing and deploying work practices, structures, and discourse. In order to do this I construct an analytical rubric borne by a definition of aesthetic and economics pertinent to indexing regimes. I take the position that we need to move into a more descriptive stance on practices of knowledge organization, not only in documentary heritage institutions (libraries, archives, and museums), but also into the cultural and artistic realms. By expanding the scope of inquiry we can interrogate the integrity of my assertion above, namely, that a chief concern in systematically understanding information organization frameworks, lies in understanding how such frameworks wrestle with, and manifest along a spectrum drawn from economic to aesthetic decision-making.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper outlines the purposes, predications, functions, and contexts of information organization frameworks; including: bibliographic control, information retrieval, resource discovery, resource description, open access scholarly indexing, personal information management protocols, and social tagging in order to compare and contrast those purposes, predications, functions, and contexts. Information organization frameworks, for the purpose of this paper, consist of information organization systems (classification schemes, taxonomies, ontologies, bibliographic descriptions, etc.), methods of conceiving of and creating the systems, and the work processes involved in maintaining these systems. The paper first outlines the theoretical literature of these information organization frameworks. In conclusion, this paper establishes the first part of an evaluation rubric for a function, predication, purpose, and context analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 2003, the “ICT Curriculum Integration Performance Measurement Instrument” was developed froman extensive review ofthe contemporary international and Australian research pertaining to the definition and measurement of ICT curriculum integration in classrooms (Proctor, Watson, & Finger, 2003). The 45-item instrument that resulted was based on theories and methodologies identified by the literature review. This paper describes psychometric results from a large-scale evaluation of the instrument subsequently conducted, as recommended by Proctor, Watson, and Finger (2003). The resultant 20-item, two-factor instrument, now called “Learning with ICTs: Measuring ICT Use in the Curriculum,” is both statistically and theoretically robust. This paper should be read in association with the original paper published in Computers in the Schools(Proctor, Watson, & Finger, 2003) that described in detail the theoretical framework underpinning the development of the instrument.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Information graphics have become increasingly important in representing, organising and analysing information in a technological age. In classroom contexts, information graphics are typically associated with graphs, maps and number lines. However, all students need to become competent with the broad range of graphics that they will encounter in mathematical situations. This paper provides a rationale for creating a test to measure students’ knowledge of graphics. This instrument can be used in mass testing and individual (in-depth) situations. Our analysis of the utility of this instrument informs policy and practice. The results provide an appreciation of the relative difficulty of different information graphics; and provide the capacity to benchmark information about students’ knowledge of graphics. The implications for practice include the need to support the development of students’ knowledge of graphics, the existence of gender differences, the role of cross-curriculum applications in learning about graphics, and the need to explicate the links among graphics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: To examine the reliability of work-related activity coding for injury-related hospitalisations in Australia. Method: A random sample of 4373 injury-related hospital separations from 1 July 2002 to 30 June 2004 were obtained from a stratified random sample of 50 hospitals across 4 states in Australia. From this sample, cases were identified as work-related if they contained an ICD-10-AM work-related activity code (U73) allocated by either: (i) the original coder; (ii) an independent auditor, blinded to the original code; or (iii) a research assistant, blinded to both the original and auditor codes, who reviewed narrative text extracted from the medical record. The concordance of activity coding and number of cases identified as work-related using each method were compared. Results: Of the 4373 cases sampled, 318 cases were identified as being work-related using any of the three methods for identification. The original coder identified 217 and the auditor identified 266 work-related cases (68.2% and 83.6% of the total cases identified, respectively). Around 10% of cases were only identified through the text description review. The original coder and auditor agreed on the assignment of work-relatedness for 68.9% of cases. Conclusions and Implications: The current best estimates of the frequency of hospital admissions for occupational injury underestimate the burden by around 32%. This is a substantial underestimate that has major implications for public policy, and highlights the need for further work on improving the quality and completeness of routine, administrative data sources for a more complete identification of work-related injuries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The generic IS-success constructs first identified by DeLone and McLean (1992) continue to be widely employed in research. Yet, recent work by Petter et al (2007) has cast doubt on the validity of many mainstream constructs employed in IS research over the past 3 decades; critiquing the almost universal conceptualization and validation of these constructs as reflective when in many studies the measures appear to have been implicitly operationalized as formative. Cited examples of proper specification of the Delone and McLean constructs are few, particularly in light of their extensive employment in IS research. This paper introduces a four-stage formative construct development framework: Conceive > Operationalize > Respond > Validate (CORV). Employing the CORV framework in an archival analysis of research published in top outlets 1985-2007, the paper explores the extent of possible problems with past IS research due to potential misspecification of the four application-related success dimensions: Individual-Impact, Organizational-Impact, System-Quality and Information-Quality. Results suggest major concerns where there is a mismatch of the Respond and Validate stages. A general dearth of attention to the Operationalize and Respond stages in methodological writings is also observed.