989 resultados para data matching
Resumo:
We present the results of applying automated machine learning techniques to the problem of matching different object catalogues in astrophysics. In this study, we take two partially matched catalogues where one of the two catalogues has a large positional uncertainty. The two catalogues we used here were taken from the H I Parkes All Sky Survey (HIPASS) and SuperCOSMOS optical survey. Previous work had matched 44 per cent (1887 objects) of HIPASS to the SuperCOSMOS catalogue. A supervised learning algorithm was then applied to construct a model of the matched portion of our catalogue. Validation of the model shows that we achieved a good classification performance (99.12 per cent correct). Applying this model to the unmatched portion of the catalogue found 1209 new matches. This increases the catalogue size from 1887 matched objects to 3096. The combination of these procedures yields a catalogue that is 72 per cent matched.
Resumo:
Beyond the inherent technical challenges, current research into the three dimensional surface correspondence problem is hampered by a lack of uniform terminology, an abundance of application specific algorithms, and the absence of a consistent model for comparing existing approaches and developing new ones. This paper addresses these challenges by presenting a framework for analysing, comparing, developing, and implementing surface correspondence algorithms. The framework uses five distinct stages to establish correspondence between surfaces. It is general, encompassing a wide variety of existing techniques, and flexible, facilitating the synthesis of new correspondence algorithms. This paper presents a review of existing surface correspondence algorithms, and shows how they fit into the correspondence framework. It also shows how the framework can be used to analyse and compare existing algorithms and develop new algorithms using the framework's modular structure. Six algorithms, four existing and two new, are implemented using the framework. Each implemented algorithm is used to match a number of surface pairs. Results demonstrate that the correspondence framework implementations are faithful implementations of existing algorithms, and that powerful new surface correspondence algorithms can be created. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
The edge-to-edge matching model, which was originally developed for predicting crystallographic features in diffusional phase transformations in solids, has been used to understand the formation of in-plane textures in TiSi2 (C49) thin films on Si single crystal (001)si surface. The model predicts all the four previously reported orientation relationships between C49 and Si substrate based on the actual atom matching across the interface and the basic crystallographic data only. The model has strong potential to be used to develop new thin film materials. (c) 2006 Acta Materialia Inc. Published by Elsevier Ltd. All rights reserved.
Resumo:
We consider the statistical problem of catalogue matching from a machine learning perspective with the goal of producing probabilistic outputs, and using all available information. A framework is provided that unifies two existing approaches to producing probabilistic outputs in the literature, one based on combining distribution estimates and the other based on combining probabilistic classifiers. We apply both of these to the problem of matching the HI Parkes All Sky Survey radio catalogue with large positional uncertainties to the much denser SuperCOSMOS catalogue with much smaller positional uncertainties. We demonstrate the utility of probabilistic outputs by a controllable completeness and efficiency trade-off and by identifying objects that have high probability of being rare. Finally, possible biasing effects in the output of these classifiers are also highlighted and discussed.
Resumo:
The basis of the present authors' edge-to-edge matching model for understanding the crystallography of partially coherent precipitates is the minimization of the energy of the interface between the two phases. For relatively simple crystal structures, this energy minimization occurs when close-packed, or relatively close-packed, rows of atoms match across the interface. Hence, the fundamental principle behind edge-to-edge matching is that the directions in each phase that correspond to the edges of the planes that meet in the interface should be close-packed, or relatively close-packed, rows of atoms. A few of the recently reported examples of what is termed edge-to-edge matching appear to ignore this fundamental principle. By comparing theoretical predictions with available experimental data, this article will explore the validity of this critical atom-row coincidence condition, in situations where the two phases have simple crystal Structures and in those where the precipitate has a more complex structure.
Resumo:
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
Resumo:
One of critical challenges in automatic recognition of TV commercials is to generate a unique, robust and compact signature. Uniqueness indicates the ability to identify the similarity among the commercial video clips which may have slight content variation. Robustness means the ability to match commercial video clips containing the same content but probably with different digitalization/encoding, some noise data, and/or transmission and recording distortion. Efficiency is about the capability of effectively matching commercial video sequences with a low computation cost and storage overhead. In this paper, we present a binary signature based method, which meets all the three criteria above, by combining the techniques of ordinal and color measurements. Experimental results on a real large commercial video database show that our novel approach delivers a significantly better performance comparing to the existing methods.
Resumo:
Mirror neurons in the tree of life rappresenta lo sviluppo e l' evoluzione del sistema dei neuroni specchio nei primati umani, non - umani e di alcune specie di uccelli, utilizzando metodi cooptati dalla filosofia della biologia e la biologia teorica, per integrare dati relativi al sistema nervoso e al comportamento delle specie in esame.
Resumo:
How do signals from the 2 eyes combine and interact? Our recent work has challenged earlier schemes in which monocular contrast signals are subject to square-law transduction followed by summation across eyes and binocular gain control. Much more successful was a new 'two-stage' model in which the initial transducer was almost linear and contrast gain control occurred both pre- and post-binocular summation. Here we extend that work by: (i) exploring the two-dimensional stimulus space (defined by left- and right-eye contrasts) more thoroughly, and (ii) performing contrast discrimination and contrast matching tasks for the same stimuli. Twenty-five base-stimuli made from 1 c/deg patches of horizontal grating, were defined by the factorial combination of 5 contrasts for the left eye (0.3-32%) with five contrasts for the right eye (0.3-32%). Other than in contrast, the gratings in the two eyes were identical. In a 2IFC discrimination task, the base-stimuli were masks (pedestals), where the contrast increment was presented to one eye only. In a matching task, the base-stimuli were standards to which observers matched the contrast of either a monocular or binocular test grating. In the model, discrimination depends on the local gradient of the observer's internal contrast-response function, while matching equates the magnitude (rather than gradient) of response to the test and standard. With all model parameters fixed by previous work, the two-stage model successfully predicted both the discrimination and the matching data and was much more successful than linear or quadratic binocular summation models. These results show that performance measures and perception (contrast discrimination and contrast matching) can be understood in the same theoretical framework for binocular contrast vision. © 2007 VSP.
Resumo:
This report presents and evaluates a novel idea for scalable lossy colour image coding with Matching Pursuit (MP) performed in a transform domain. The benefits of the idea of MP performed in the transform domain are analysed in detail. The main contribution of this work is extending MP with wavelets to colour coding and proposing a coding method. We exploit correlations between image subbands after wavelet transformation in RGB colour space. Then, a new and simple quantisation and coding scheme of colour MP decomposition based on Run Length Encoding (RLE), inspired by the idea of coding indexes in relational databases, is applied. As a final coding step arithmetic coding is used assuming uniform distributions of MP atom parameters. The target application is compression at low and medium bit-rates. Coding performance is compared to JPEG 2000 showing the potential to outperform the latter with more sophisticated than uniform data models for arithmetic coder. The results are presented for grayscale and colour coding of 12 standard test images.
Resumo:
Digital image processing is exploited in many diverse applications but the size of digital images places excessive demands on current storage and transmission technology. Image data compression is required to permit further use of digital image processing. Conventional image compression techniques based on statistical analysis have reached a saturation level so it is necessary to explore more radical methods. This thesis is concerned with novel methods, based on the use of fractals, for achieving significant compression of image data within reasonable processing time without introducing excessive distortion. Images are modelled as fractal data and this model is exploited directly by compression schemes. The validity of this is demonstrated by showing that the fractal complexity measure of fractal dimension is an excellent predictor of image compressibility. A method of fractal waveform coding is developed which has low computational demands and performs better than conventional waveform coding methods such as PCM and DPCM. Fractal techniques based on the use of space-filling curves are developed as a mechanism for hierarchical application of conventional techniques. Two particular applications are highlighted: the re-ordering of data during image scanning and the mapping of multi-dimensional data to one dimension. It is shown that there are many possible space-filling curves which may be used to scan images and that selection of an optimum curve leads to significantly improved data compression. The multi-dimensional mapping property of space-filling curves is used to speed up substantially the lookup process in vector quantisation. Iterated function systems are compared with vector quantisers and the computational complexity or iterated function system encoding is also reduced by using the efficient matching algcnithms identified for vector quantisers.
Resumo:
The key to the correct application of ANOVA is careful experimental design and matching the correct analysis to that design. The following points should therefore, be considered before designing any experiment: 1. In a single factor design, ensure that the factor is identified as a 'fixed' or 'random effect' factor. 2. In more complex designs, with more than one factor, there may be a mixture of fixed and random effect factors present, so ensure that each factor is clearly identified. 3. Where replicates can be grouped or blocked, the advantages of a randomised blocks design should be considered. There should be evidence, however, that blocking can sufficiently reduce the error variation to counter the loss of DF compared with a randomised design. 4. Where different treatments are applied sequentially to a patient, the advantages of a three-way design in which the different orders of the treatments are included as an 'effect' should be considered. 5. Combining different factors to make a more efficient experiment and to measure possible factor interactions should always be considered. 6. The effect of 'internal replication' should be taken into account in a factorial design in deciding the number of replications to be used. Where possible, each error term of the ANOVA should have at least 15 DF. 7. Consider carefully whether a particular factorial design can be considered to be a split-plot or a repeated measures design. If such a design is appropriate, consider how to continue the analysis bearing in mind the problem of using post hoc tests in this situation.
Resumo:
We present and evaluate a novel idea for scalable lossy colour image coding with Matching Pursuit (MP) performed in a transform domain. The idea is to exploit correlations in RGB colour space between image subbands after wavelet transformation rather than in the spatial domain. We propose a simple quantisation and coding scheme of colour MP decomposition based on Run Length Encoding (RLE) which can achieve comparable performance to JPEG 2000 even though the latter utilises careful data modelling at the coding stage. Thus, the obtained image representation has the potential to outperform JPEG 2000 with a more sophisticated coding algorithm.
Resumo:
This thesis considers sparse approximation of still images as the basis of a lossy compression system. The Matching Pursuit (MP) algorithm is presented as a method particularly suited for application in lossy scalable image coding. Its multichannel extension, capable of exploiting inter-channel correlations, is found to be an efficient way to represent colour data in RGB colour space. Known problems with MP, high computational complexity of encoding and dictionary design, are tackled by finding an appropriate partitioning of an image. The idea of performing MP in the spatio-frequency domain after transform such as Discrete Wavelet Transform (DWT) is explored. The main challenge, though, is to encode the image representation obtained after MP into a bit-stream. Novel approaches for encoding the atomic decomposition of a signal and colour amplitudes quantisation are proposed and evaluated. The image codec that has been built is capable of competing with scalable coders such as JPEG 2000 and SPIHT in terms of compression ratio.
Resumo:
Most of the existing work on information integration in the Semantic Web concentrates on resolving schema-level problems. Specific issues of data-level integration (instance coreferencing, conflict resolution, handling uncertainty) are usually tackled by applying the same techniques as for ontology schema matching or by reusing the solutions produced in the database domain. However, data structured according to OWL ontologies has its specific features: e.g., the classes are organized into a hierarchy, the properties are inherited, data constraints differ from those defined by database schema. This paper describes how these features are exploited in our architecture KnoFuss, designed to support data-level integration of semantic annotations.