2 resultados para INFORMATION DISCOVERY
em Illinois Digital Environment for Access to Learning and Scholarship Repository
Collection-Level Subject Access in Aggregations of Digital Collections: Metadata Application and Use
Resumo:
Problems in subject access to information organization systems have been under investigation for a long time. Focusing on item-level information discovery and access, researchers have identified a range of subject access problems, including quality and application of metadata, as well as the complexity of user knowledge required for successful subject exploration. While aggregations of digital collections built in the United States and abroad generate collection-level metadata of various levels of granularity and richness, no research has yet focused on the role of collection-level metadata in user interaction with these aggregations. This dissertation research sought to bridge this gap by answering the question “How does collection-level metadata mediate scholarly subject access to aggregated digital collections?” This goal was achieved using three research methods: • in-depth comparative content analysis of collection-level metadata in three large-scale aggregations of cultural heritage digital collections: Opening History, American Memory, and The European Library • transaction log analysis of user interactions, with Opening History, and • interview and observation data on academic historians interacting with two aggregations: Opening History and American Memory. It was found that subject-based resource discovery is significantly influenced by collection-level metadata richness. The richness includes such components as: 1) describing collection’s subject matter with mutually-complementary values in different metadata fields, and 2) a variety of collection properties/characteristics encoded in the free-text Description field, including types and genres of objects in a digital collection, as well as topical, geographic and temporal coverage are the most consistently represented collection characteristics in free-text Description fields. Analysis of user interactions with aggregations of digital collections yields a number of interesting findings. Item-level user interactions were found to occur more often than collection-level interactions. Collection browse is initiated more often than search, while subject browse (topical and geographic) is used most often. Majority of collection search queries fall within FRBR Group 3 categories: object, concept, and place. Significantly more object, concept, and corporate body searches and less individual person, event and class of persons searches were observed in collection searches than in item searches. While collection search is most often satisfied by Description and/or Subjects collection metadata fields, it would not retrieve a significant proportion of collection records without controlled-vocabulary subject metadata (Temporal Coverage, Geographic Coverage, Subjects, and Objects), and free-text metadata (the Description field). Observation data shows that collection metadata records in Opening History and American Memory aggregations are often viewed. Transaction log data show a high level of engagement with collection metadata records in Opening History, with the total page views for collections more than 4 times greater than item page views. Scholars observed viewing collection records valued descriptive information on provenance, collection size, types of objects, subjects, geographic coverage, and temporal coverage information. They also considered the structured display of collection metadata in Opening History more useful than the alternative approach taken by other aggregations, such as American Memory, which displays only the free-text Description field to the end-user. The results extend the understanding of the value of collection-level subject metadata, particularly free-text metadata, for the scholarly users of aggregations of digital collections. The analysis of the collection metadata created by three large-scale aggregations provides a better understanding of collection-level metadata application patterns and suggests best practices. This dissertation is also the first empirical research contribution to test the FRBR model as a conceptual and analytic framework for studying collection-level subject access.
Resumo:
Discovery Driven Analysis (DDA) is a common feature of OLAP technology to analyze structured data. In essence, DDA helps analysts to discover anomalous data by highlighting 'unexpected' values in the OLAP cube. By giving indications to the analyst on what dimensions to explore, DDA speeds up the process of discovering anomalies and their causes. However, Discovery Driven Analysis (and OLAP in general) is only applicable on structured data, such as records in databases. We propose a system to extend DDA technology to semi-structured text documents, that is, text documents with a few structured data. Our system pipeline consists of two stages: first, the text part of each document is structured around user specified dimensions, using semi-PLSA algorithm; then, we adapt DDA to these fully structured documents, thus enabling DDA on text documents. We present some applications of this system in OLAP analysis and show how scalability issues are solved. Results show that our system can handle reasonable datasets of documents, in real time, without any need for pre-computation.