88 resultados para DOCUMENT COLLECTIONS
Resumo:
Many organizations realize that increasing amounts of data (“Big Data”) need to be dealt with intelligently in order to compete with other organizations in terms of efficiency, speed and services. The goal is not to collect as much data as possible, but to turn event data into valuable insights that can be used to improve business processes. However, data-oriented analysis approaches fail to relate event data to process models. At the same time, large organizations are generating piles of process models that are disconnected from the real processes and information systems. In this chapter we propose to manage large collections of process models and event data in an integrated manner. Observed and modeled behavior need to be continuously compared and aligned. This results in a “liquid” business process model collection, i.e. a collection of process models that is in sync with the actual organizational behavior. The collection should self-adapt to evolving organizational behavior and incorporate relevant execution data (e.g. process performance and resource utilization) extracted from the logs, thereby allowing insightful reports to be produced from factual organizational data.
Resumo:
Collecting has become a popular hobby within Western society, with collectables including anything from ‘bottle tops’ to ‘skyscrapers’. As the nature and size of these collections can impact upon the use of space in the home, the purpose of this study is to explore the relationship between the collections, space in the home and the impacts on others. This qualitative study explores the experiences of 11 Australian collectors, investigating the motivations, practices and adaption techniques used within their urban home environment. The themes of sentimentality, sociability and spatial tensions, including physical, personal and use of space are discussed within the context of their home and family environments. Overall the practice of collecting objects is a complex, varied, sentimental and sociable activity, providing enjoyment, knowledge and friendships. Space can be a central consideration to the practice of collecting as collections shape and are shaped by the available space in a household.
Resumo:
QUT Library Research Support has simplified and streamlined the process of research data management planning, storage, discovery and reuse through collaboration and the use of integrated and tailored online tools, and a simplification of the metadata schema. This poster presents the integrated data management services a QUT, including QUT’s Data Management Planning Tool, Research Data Finder, Spatial Data Finder and Software Finder, and information on the simplified Registry Interchange Format – Collections and Services (RIF-CS) Schema. The QUT Data Management Planning (DMP) Tool was built using the Digital Curation Centre’s DMP Online Tool and modified to QUT’s needs and policies. The tool allows researchers and Higher Degree Research students to plan how to handle research data throughout the active phase of their research. The plan is promoted as a ‘live’ document’ and researchers are encouraged to update it as required. The information entered into the plan can be made private or shared with supervisors, project members and external examiners. A plan is mandatory when requesting storage space on the QUT Research Data Storage Service. QUT’s Research Data Finder is integrated with QUT’s Academic Profiles and the Data Management Planning Tool to create a seamless data management process. This process aims to encourage the creation of high quality rich records which facilitate discovery and reuse of quality data. The Registry Interchange Format – Collections and Services (RIF-CS) Schema that is used in the QUT Research Data Finder was simplified to “RIF-CS lite” to reflect mandatory and optional metadata requirements. RIF-CS lite removed schema fields that were underused or extra to the needs of the users and system. This has reduced the amount of metadata fields required from users and made integration of systems a far more simple process where field content is easily shared across services making the process of collecting metadata as transparent as possible.
Resumo:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
Resumo:
Developing and maintaining a successful institutional repository for research publications requires a considerable investment by the institution. Most of the money is spent on developing the skill-sets of existing staff or hiring new staff with the necessary skills. The return on this investment can be magnified by using this valuable infrastructure to curate collections of other materials such as learning objects, student work, conference proceedings and institutional or local community heritage materials. When Queensland University of Technology (QUT) implemented its repository for research publications (QUT ePrints) over 11 years ago, it was one of the first institutional repositories to be established in Australia. Currently, the repository holds over 29,000 open access research publications and the cumulative total number of full-text downloads for these document now exceeds 16 million. The full-text deposit rate for recently-published peer reviewed papers (currently over 74%) shows how well the repository has been embraced by QUT researchers. The success of QUT ePrints has resulted in requests to accommodate a plethora of materials which are ‘out of scope’ for this repository. QUT Library saw this as an opportunity to use its repository infrastructure (software, technical know-how and policies) to develop and implement a metadata repository for its research datasets (QUT Research Data Finder), a repository for research-related software (QUT Software Finder) and to curate a number of digital collections of institutional and local community heritage materials (QUT Digital Collections). This poster describes the repositories and digital collections curated by QUT Library and outlines the value delivered to the institution, and the wider community, by these initiatives.
Resumo:
This cross disciplinary study was conducted as two research and development projects. The outcome is a multimodal and dynamic chronicle, which incorporates the tracking of spatial, temporal and visual elements of performative practice-led and design-led research journeys. The distilled model provides a strong new approach to demonstrate rigour in non-traditional research outputs including provenance and an 'augmented web of facticity'.
Resumo:
The reliance on police data for the counting of road crash injuries can be problematic, as it is well known that not all road crash injuries are reported to police which under-estimates the overall burden of road crash injuries. The aim of this study was to use multiple linked data sources to estimate the extent of under-reporting of road crash injuries to police in the Australian state of Queensland. Data from the Queensland Road Crash Database (QRCD), the Queensland Hospital Admitted Patients Data Collection (QHAPDC), Emergency Department Information System (EDIS), and the Queensland Injury Surveillance Unit (QISU) for the year 2009 were linked. The completeness of road crash cases reported to police was examined via discordance rates between the police data (QRCD) and the hospital data collections. In addition, the potential bias of this discordance (under-reporting) was assessed based on gender, age, road user group, and regional location. Results showed that the level of under-reporting varied depending on the data set with which the police data was compared. When all hospital data collections are examined together the estimated population of road crash injuries was approximately 28,000, with around two-thirds not linking to any record in the police data. The results also showed that the under-reporting was more likely for motorcyclists, cyclists, males, young people, and injuries occurring in Remote and Inner Regional areas. These results have important implications for road safety research and policy in terms of: prioritising funding and resources; targeting road safety interventions into areas of higher risk; and estimating the burden of road crash injuries.
Resumo:
This research examined the function of Queensland Health's Root Cause Analysis (RCA) to improve patient safety through an investigation of patient harm events where permanent harm and preventable death, Severity Assessment Code 1, were the outcome of healthcare. Unedited and highly legislated RCAs from across Queensland Health public hospitals from 2009, 2010 and 2011 comprised the data. A document analysis revealed the RCAs opposed organisational policy and dominant theoretical directives. If we accept the prevailing assumption that patient harm is a systemic issue, then the RCA is failing to address harm events in healthcare.
Resumo:
Previous qualitative research has highlighted that temporality plays an important role in relevance for clinical records search. In this study, an investigation is undertaken to determine the effect that the timespan of events within a patient record has on relevance in a retrieval scenario. In addition, based on the standard practise of document length normalisation, a document timespan normalisation model that specifically accounts for timespans is proposed. Initial analysis revealed that in general relevant patient records tended to cover a longer timespan of events than non-relevant patient records. However, an empirical evaluation using the TREC Medical Records track supports the opposite view that shorter documents (in terms of timespan) are better for retrieval. These findings highlight that the role of temporality in relevance is complex and how to effectively deal with temporality within a retrieval scenario remains an open question.
Resumo:
Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.
Resumo:
This paper discusses the use of observational video recordings to document young children’s use of technology in their homes. Although observational research practices have been used for decades, often with video-based techniques, the participant group in this study (i.e., very young children) and the setting (i.e., private homes), provide a rich space for exploring the benefits and limitations of qualitative observation. The data gathered in this study point to a number of key decisions and issues that researchers must face in designing observational research, particularly where non-researchers (in this case, parents) act as surrogates for the researcher at the data collection stage. The involvement of parents and children as research videographers in the home resulted in very rich and detailed data about children’s use of technology in their daily lives. However, limitations noted in the dataset (e.g., image quality) provide important guidance for researchers developing projects using similar methods in future. The paper provides recommendations for future observational designs in similar settings and/or with similar participant groups.
Resumo:
The hobby of collecting represents the passionate acquisition and possession of a specific type(s) of object; creating a ‘spectacle’, to be shared with others. The fundamentality of the physical objects in collections against the backdrop of the growing ubiquity of computing provides a unique and compelling avenue for design. Based on interviews with 11 self-identified collectors, this paper discusses the role collectors have in informing HCI design and in turn, the potential HCI has in designing technology to assist collectors in sharing what we term the ‘spectacle’ of collecting. Toward this, we suggest two ideas for future designs, including building personal histories of individual collectable items and developing a simple digital means of connecting proximate collectors with those who stand to benefit from collectors’ unique and item-specific knowledge.
Resumo:
Multi-document summarization addressing the problem of information overload has been widely utilized in the various real-world applications. Most of existing approaches adopt term-based representation for documents which limit the performance of multi-document summarization systems. In this paper, we proposed a novel pattern-based topic model (PBTMSum) for the task of the multi-document summarization. PBTMSum combining pattern mining techniques with LDA topic modelling could generate discriminative and semantic rich representations for topics and documents so that the most representative and non-redundant sentences can be selected to form a succinct and informative summary. Extensive experiments are conducted on the data of document understanding conference (DUC) 2007. The results prove the effectiveness and efficiency of our proposed approach.