15 resultados para Data compression (Computer science)
em Aston University Research Archive
Resumo:
Digital image processing is exploited in many diverse applications but the size of digital images places excessive demands on current storage and transmission technology. Image data compression is required to permit further use of digital image processing. Conventional image compression techniques based on statistical analysis have reached a saturation level so it is necessary to explore more radical methods. This thesis is concerned with novel methods, based on the use of fractals, for achieving significant compression of image data within reasonable processing time without introducing excessive distortion. Images are modelled as fractal data and this model is exploited directly by compression schemes. The validity of this is demonstrated by showing that the fractal complexity measure of fractal dimension is an excellent predictor of image compressibility. A method of fractal waveform coding is developed which has low computational demands and performs better than conventional waveform coding methods such as PCM and DPCM. Fractal techniques based on the use of space-filling curves are developed as a mechanism for hierarchical application of conventional techniques. Two particular applications are highlighted: the re-ordering of data during image scanning and the mapping of multi-dimensional data to one dimension. It is shown that there are many possible space-filling curves which may be used to scan images and that selection of an optimum curve leads to significantly improved data compression. The multi-dimensional mapping property of space-filling curves is used to speed up substantially the lookup process in vector quantisation. Iterated function systems are compared with vector quantisers and the computational complexity or iterated function system encoding is also reduced by using the efficient matching algcnithms identified for vector quantisers.
Resumo:
The growth and advances made in computer technology have led to the present interest in picture processing techniques. When considering image data compression the tendency is towards trans-form source coding of the image data. This method of source coding has reached a stage where very high reductions in the number of bits representing the data can be made while still preserving image fidelity. The point has thus been reached where channel errors need to be considered, as these will be inherent in any image comnunication system. The thesis first describes general source coding of images with the emphasis almost totally on transform coding. The transform technique adopted is the Discrete Cosine Transform (DCT) which becomes common to both transform coders. Hereafter the techniques of source coding differ substantially i.e. one technique involves zonal coding, the other involves threshold coding. Having outlined the theory and methods of implementation of the two source coders, their performances are then assessed first in the absence, and then in the presence, of channel errors. These tests provide a foundation on which to base methods of protection against channel errors. Six different protection schemes are then proposed. Results obtained, from each particular, combined, source and channel error protection scheme, which are described in full are then presented. Comparisons are made between each scheme and indicate the best one to use given a particular channel error rate.
Resumo:
Retrospective clinical data presents many challenges for data mining and machine learning. The transcription of patient records from paper charts and subsequent manipulation of data often results in high volumes of noise as well as a loss of other important information. In addition, such datasets often fail to represent expert medical knowledge and reasoning in any explicit manner. In this research we describe applying data mining methods to retrospective clinical data to build a prediction model for asthma exacerbation severity for pediatric patients in the emergency department. Difficulties in building such a model forced us to investigate alternative strategies for analyzing and processing retrospective data. This paper describes this process together with an approach to mining retrospective clinical data by incorporating formalized external expert knowledge (secondary knowledge sources) into the classification task. This knowledge is used to partition the data into a number of coherent sets, where each set is explicitly described in terms of the secondary knowledge source. Instances from each set are then classified in a manner appropriate for the characteristics of the particular set. We present our methodology and outline a set of experiential results that demonstrate some advantages and some limitations of our approach. © 2008 Springer-Verlag Berlin Heidelberg.
Resumo:
Optical data communication systems are prone to a variety of processes that modify the transmitted signal, and contribute errors in the determination of 1s from 0s. This is a difficult, and commercially important, problem to solve. Errors must be detected and corrected at high speed, and the classifier must be very accurate; ideally it should also be tunable to the characteristics of individual communication links. We show that simple single layer neural networks may be used to address these problems, and examine how different input representations affect the accuracy of bit error correction. Our results lead us to conclude that a system based on these principles can perform at least as well as an existing non-trainable error correction system, whilst being tunable to suit the individual characteristics of different communication links.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.
Resumo:
Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations.
Resumo:
The research presented in this paper is part of an ongoing investigation into how best to incorporate speech-based input within mobile data collection applications. In our previous work [1], we evaluated the ability of a single speech recognition engine to support accurate, mobile, speech-based data input. Here, we build on our previous research to compare the achievable speaker-independent accuracy rates of a variety of speech recognition engines; we also consider the relative effectiveness of different speech recognition engine and microphone pairings in terms of their ability to support accurate text entry under realistic mobile conditions of use. Our intent is to provide some initial empirical data derived from mobile, user-based evaluations to support technological decisions faced by developers of mobile applications that would benefit from, or require, speech-based data entry facilities.
Resumo:
Optical data communication systems are prone to a variety of processes that modify the transmitted signal, and contribute errors in the determination of 1s from 0s. This is a difficult, and commercially important, problem to solve. Errors must be detected and corrected at high speed, and the classifier must be very accurate; ideally it should also be tunable to the characteristics of individual communication links. We show that simple single layer neural networks may be used to address these problems, and examine how different input representations affect the accuracy of bit error correction. Our results lead us to conclude that a system based on these principles can perform at least as well as an existing non-trainable error correction system, whilst being tunable to suit the individual characteristics of different communication links.
Resumo:
Population measures for genetic programs are defined and analysed in an attempt to better understand the behaviour of genetic programming. Some measures are simple, but do not provide sufficient insight. The more meaningful ones are complex and take extra computation time. Here we present a unified view on the computation of population measures through an information hypertree (iTree). The iTree allows for a unified and efficient calculation of population measures via a basic tree traversal. © Springer-Verlag 2004.
Resumo:
In this paper, we address the problem of robust information embedding in digital data. Such a process is carried out by introducing modifications to the original data that one would like to keep minimal. It assumes that the data, which includes the embedded information, is corrupted before the extraction is carried out. We propose a principled way to tailor an efficient embedding process for given data and noise statistics. © Springer-Verlag Berlin Heidelberg 2005.
Resumo:
Most current 3D landscape visualisation systems either use bespoke hardware solutions, or offer a limited amount of interaction and detail when used in realtime mode. We are developing a modular, data driven 3D visualisation system that can be readily customised to specific requirements. By utilising the latest software engineering methods and bringing a dynamic data driven approach to geo-spatial data visualisation we will deliver an unparalleled level of customisation in near-photo realistic, realtime 3D landscape visualisation. In this paper we show the system framework and describe how this employs data driven techniques. In particular we discuss how data driven approaches are applied to the spatiotemporal management aspect of the application framework, and describe the advantages these convey. © Springer-Verlag Berlin Heidelberg 2006.
Resumo:
At the moment, the phrases “big data” and “analytics” are often being used as if they were magic incantations that will solve all an organization’s problems at a stroke. The reality is that data on its own, even with the application of analytics, will not solve any problems. The resources that analytics and big data can consume represent a significant strategic risk if applied ineffectively. Any analysis of data needs to be guided, and to lead to action. So while analytics may lead to knowledge and intelligence (in the military sense of that term), it also needs the input of knowledge and intelligence (in the human sense of that term). And somebody then has to do something new or different as a result of the new insights, or it won’t have been done to any purpose. Using an analytics example concerning accounts payable in the public sector in Canada, this paper reviews thinking from the domains of analytics, risk management and knowledge management, to show some of the pitfalls, and to present a holistic picture of how knowledge management might help tackle the challenges of big data and analytics.
Resumo:
In order to reduce serious health incidents, individuals with high risks need to be identified as early as possible so that effective intervention and preventive care can be provided. This requires regular and efficient assessments of risk within communities that are the first point of contacts for individuals. Clinical Decision Support Systems CDSSs have been developed to help with the task of risk assessment, however such systems and their underpinning classification models are tailored towards those with clinical expertise. Communities where regular risk assessments are required lack such expertise. This paper presents the continuation of GRiST research team efforts to disseminate clinical expertise to communities. Based on our earlier published findings, this paper introduces the framework and skeleton for a data collection and risk classification model that evaluates data redundancy in real-time, detects the risk-informative data and guides the risk assessors towards collecting those data. By doing so, it enables non-experts within the communities to conduct reliable Mental Health risk triage.