973 resultados para Digital image classification
Resumo:
Efficient and effective feature detection and representation is an important consideration when processing videos, and a large number of applications such as motion analysis, 3D scene understanding, tracking etc. depend on this. Amongst several feature description methods, local features are becoming increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational complexity, their performance is still too limited for real world applications. Furthermore, rapid increases in the uptake of mobile devices has increased the demand for algorithms that can run with reduced memory and computational requirements. In this paper we propose a semi binary based feature detectordescriptor based on the BRISK detector, which can detect and represent videos with significantly reduced computational requirements, while achieving comparable performance to the state of the art spatio-temporal feature descriptors. First, the BRISK feature detector is applied on a frame by frame basis to detect interest points, then the detected key points are compared against consecutive frames for significant motion. Key points with significant motion are encoded with the BRISK descriptor in the spatial domain and Motion Boundary Histogram in the temporal domain. This descriptor is not only lightweight but also has lower memory requirements because of the binary nature of the BRISK descriptor, allowing the possibility of applications using hand held devices.We evaluate the combination of detectordescriptor performance in the context of action classification with a standard, popular bag-of-features with SVM framework. Experiments are carried out on two popular datasets with varying complexity and we demonstrate comparable performance with other descriptors with reduced computational complexity.
Resumo:
This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset.
Resumo:
In this paper we propose the hybrid use of illuminant invariant and RGB images to perform image classification of urban scenes despite challenging variation in lighting conditions. Coping with lighting change (and the shadows thereby invoked) is a non-negotiable requirement for long term autonomy using vision. One aspect of this is the ability to reliably classify scene components in the presence of marked and often sudden changes in lighting. This is the focus of this paper. Posed with the task of classifying all parts in a scene from a full colour image, we propose that lighting invariant transforms can reduce the variability of the scene, resulting in a more reliable classification. We leverage the ideas of “data transfer” for classification, beginning with full colour images for obtaining candidate scene-level matches using global image descriptors. This is commonly followed by superpixellevel matching with local features. However, we show that if the RGB images are subjected to an illuminant invariant transform before computing the superpixel-level features, classification is significantly more robust to scene illumination effects. The approach is evaluated using three datasets. The first being our own dataset and the second being the KITTI dataset using manually generated ground truth for quantitative analysis. We qualitatively evaluate the method on a third custom dataset over a 750m trajectory.
Resumo:
With the increasing availability of high quality digital cameras that are easily operated by the non-professional photographer, the utility of using digital images to assess endpoints in clinical research of skin lesions has growing acceptance. However, rigorous protocols and description of experiences for digital image collection and assessment are not readily available, particularly for research conducted in remote settings. We describe the development and evaluation of a protocol for digital image collection by the non-professional photographer in a remote setting research trial, together with a novel methodology for assessment of clinical outcomes by an expert panel blinded to treatment allocation.
Resumo:
A novel shape recognition algorithm was developed to autonomously classify the Northern Pacific Sea Star (Asterias amurenis) from benthic images that were collected by the Starbug AUV during 6km of transects in the Derwent estuary. Despite the effects of scattering, attenuation, soft focus and motion blur within the underwater images, an optimal joint classification rate of 77.5% and misclassification rate of 13.5% was achieved. The performance of algorithm was largely attributed to its ability to recognise locally deformed sea star shapes that were created during the segmentation of the distorted images.
Resumo:
Local spatio-temporal features with a Bag-of-visual words model is a popular approach used in human action recognition. Bag-of-features methods suffer from several challenges such as extracting appropriate appearance and motion features from videos, converting extracted features appropriate for classification and designing a suitable classification framework. In this paper we address the problem of efficiently representing the extracted features for classification to improve the overall performance. We introduce two generative supervised topic models, maximum entropy discrimination LDA (MedLDA) and class- specific simplex LDA (css-LDA), to encode the raw features suitable for discriminative SVM based classification. Unsupervised LDA models disconnect topic discovery from the classification task, hence yield poor results compared to the baseline Bag-of-words framework. On the other hand supervised LDA techniques learn the topic structure by considering the class labels and improve the recognition accuracy significantly. MedLDA maximizes likelihood and within class margins using max-margin techniques and yields a sparse highly discriminative topic structure; while in css-LDA separate class specific topics are learned instead of common set of topics across the entire dataset. In our representation first topics are learned and then each video is represented as a topic proportion vector, i.e. it can be comparable to a histogram of topics. Finally SVM classification is done on the learned topic proportion vector. We demonstrate the efficiency of the above two representation techniques through the experiments carried out in two popular datasets. Experimental results demonstrate significantly improved performance compared to the baseline Bag-of-features framework which uses kmeans to construct histogram of words from the feature vectors.
Resumo:
Acoustic recordings of the environment provide an effective means to monitor bird species diversity. To facilitate exploration of acoustic recordings, we describe a content-based birdcall retrieval algorithm. A query birdcall is a region of spectrogram bounded by frequency and time. Retrieval depends on a similarity measure derived from the orientation and distribution of spectral ridges. The spectral ridge detection method caters for a broad range of birdcall structures. In this paper, we extend previous work by incorporating a spectrogram scaling step in order to improve the detection of spectral ridges. Compared to an existing approach based on MFCC features, our feature representation achieves better retrieval performance for multiple bird species in noisy recordings.
Resumo:
Color displays used in image processing systems consist of a refresh memory buffer storing digital image data which are converted into analog signals to display an image by driving the primary color channels (red, green, and blue) of a color television monitor. The color cathode ray tube (CRT) of the monitor is unable to reproduce colors exactly due to phosphor limitations, exponential luminance response of the tube to the applied signal, and limitations imposed by the digital-to-analog conversion. In this paper we describe some computer simulation studies (using the U*V*W* color space) carried out to measure these reproduction errors. Further, a procedure to correct for color reproduction error due to the exponential luminance response (gamma) of the picture tube is proposed, using a video-lookup-table and a higher resolution digital-to-analog converter. It is found, on the basis of computer simulation studies, that the proposed gamma correction scheme is effective and robust with respect to variations in the assumed value of the gamma.
Resumo:
This paper presents a validation study on the application of a novel interslice interpolation technique for musculoskeletal structure segmentation of articulated joints and muscles on human magnetic resonance imaging data. The interpolation technique is based on morphological shape-based interpolation combined with intensity based voxel classification. Shape-based interpolation in the absence of the original intensity image has been investigated intensively. However, in some applications of medical image analysis, the intensity image of the slice to be interpolated is available. For example, when manual segmentation is conducted on selected slices, the segmentation on those unselected slices can be obtained by interpolation. We proposed a two- step interpolation method to utilize both the shape information in the manual segmentation and local intensity information in the image. The method was tested on segmentations of knee, hip and shoulder joint bones and hamstring muscles. The results were compared with two existing interpolation methods. Based on the calculated Dice similarity coefficient and normalized error rate, the proposed method outperformed the other two methods.
Resumo:
In competitive combat sporting environments like boxing, the statistics on a boxer's performance, including the amount and type of punches thrown, provide a valuable source of data and feedback which is routinely used for coaching and performance improvement purposes. This paper presents a robust framework for the automatic classification of a boxer's punches. Overhead depth imagery is employed to alleviate challenges associated with occlusions, and robust body-part tracking is developed for the noisy time-of-flight sensors. Punch recognition is addressed through both a multi-class SVM and Random Forest classifiers. A coarse-to-fine hierarchical SVM classifier is presented based on prior knowledge of boxing punches. This framework has been applied to shadow boxing image sequences taken at the Australian Institute of Sport with 8 elite boxers. Results demonstrate the effectiveness of the proposed approach, with the hierarchical SVM classifier yielding a 96% accuracy, signifying its suitability for analysing athletes punches in boxing bouts.
Resumo:
Flood extent mapping is a basic tool for flood damage assessment, which can be done by digital classification techniques using satellite imageries, including the data recorded by radar and optical sensors. However, converting the data into the information we need is not a straightforward task. One of the great challenges involved in the data interpretation is to separate the permanent water bodies and flooding regions, including both the fully inundated areas and the wet areas where trees and houses are partly covered with water. This paper adopts the decision fusion technique to combine the mapping results from radar data and the NDVI data derived from optical data. An improved capacity in terms of identifying the permanent or semi-permanent water bodies from flood inundated areas has been achieved. Computer software tools Multispec and Matlab were used.
Resumo:
Usually digital image forgeries are created by copy-pasting a portion of an image onto some other image. While doing so, it is often necessary to resize the pasted portion of the image to suit the sampling grid of the host image. The resampling operation changes certain characteristics of the pasted portion, which when detected serves as a clue of tampering. In this paper, we present deterministic techniques to detect resampling, and localize the portion of the image that has been tampered with. Two of the techniques are in pixel domain and two others in frequency domain. We study the efficacy of our techniques against JPEG compression and subsequent resampling of the entire tampered image.
Resumo:
The mode I fracture toughness of concrete can be experimentally determined using three point bend beam in conjunction with digital image correlation (DIC). Three different geometrically similar sizes of beams are cast for this study. To study the influence of fly ash and silica fume on fracture toughness of SCC, three SCC mixes are prepared with and without mineral additions. The scanning electron microscope (SEM) images are taken on the fractured surface to add information on fracture process in SCC. From this study, it is concluded that the fracture toughness of SCC with mineral addition is higher when compared to those without mineral addition.
Resumo:
Text segmentation and localization algorithms are proposed for the born-digital image dataset. Binarization and edge detection are separately carried out on the three colour planes of the image. Connected components (CC's) obtained from the binarized image are thresholded based on their area and aspect ratio. CC's which contain sufficient edge pixels are retained. A novel approach is presented, where the text components are represented as nodes of a graph. Nodes correspond to the centroids of the individual CC's. Long edges are broken from the minimum spanning tree of the graph. Pair wise height ratio is also used to remove likely non-text components. A new minimum spanning tree is created from the remaining nodes. Horizontal grouping is performed on the CC's to generate bounding boxes of text strings. Overlapping bounding boxes are removed using an overlap area threshold. Non-overlapping and minimally overlapping bounding boxes are used for text segmentation. Vertical splitting is applied to generate bounding boxes at the word level. The proposed method is applied on all the images of the test dataset and values of precision, recall and H-mean are obtained using different approaches.
Resumo:
This paper discusses an approach for river mapping and flood evaluation to aid multi-temporal time series analysis of satellite images utilizing pixel spectral information for image classification and region-based segmentation to extract water covered region. Analysis of Moderate Resolution Imaging Spectroradiometer (MODIS) satellite images is applied in two stages: before flood and during flood. For these images the extraction of water region utilizes spectral information for image classification and spatial information for image segmentation. Multi-temporal MODIS images from ``normal'' (non-flood) and flood time-periods are processed in two steps. In the first step, image classifiers such as artificial neural networks and gene expression programming to separate the image pixels into water and non-water groups based on their spectral features. The classified image is then segmented using spatial features of the water pixels to remove the misclassified water region. From the results obtained, we evaluate the performance of the method and conclude that the use of image classification and region-based segmentation is an accurate and reliable for the extraction of water-covered region.