336 resultados para 280200 Artificial Intelligence and Signal and Image Processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Person re-identification involves recognising individuals in different locations across a network of cameras and is a challenging task due to a large number of varying factors such as pose (both subject and camera) and ambient lighting conditions. Existing databases do not adequately capture these variations, making evaluations of proposed techniques difficult. In this paper, we present a new challenging multi-camera surveillance database designed for the task of person re-identification. This database consists of 150 unscripted sequences of subjects travelling in a building environment though up to eight camera views, appearing from various angles and in varying illumination conditions. A flexible XML-based evaluation protocol is provided to allow a highly configurable evaluation setup, enabling a variety of scenarios relating to pose and lighting conditions to be evaluated. A baseline person re-identification system consisting of colour, height and texture models is demonstrated on this database.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Quality based frame selection is a crucial task in video face recognition, to both improve the recognition rate and to reduce the computational cost. In this paper we present a framework that uses a variety of cues (face symmetry, sharpness, contrast, closeness of mouth, brightness and openness of the eye) to select the highest quality facial images available in a video sequence for recognition. Normalized feature scores are fused using a neural network and frames with high quality scores are used in a Local Gabor Binary Pattern Histogram Sequence based face recognition system. Experiments on the Honda/UCSD database shows that the proposed method selects the best quality face images in the video sequence, resulting in improved recognition performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many methods exist at the moment for deformable face fitting. A drawback to nearly all these approaches is that they are (i) noisy in terms of landmark positions, and (ii) the noise is biased across frames (i.e. the misalignment is toward common directions across all frames). In this paper we propose a grouped $\mathcal{L}1$-norm anchored method for simultaneously aligning an ensemble of deformable face images stemming from the same subject, given noisy heterogeneous landmark estimates. Impressive alignment performance improvement and refinement is obtained using very weak initialization as "anchors".

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents an efficient face detection method suitable for real-time surveillance applications. Improved efficiency is achieved by constraining the search window of an AdaBoost face detector to pre-selected regions. Firstly, the proposed method takes a sparse grid of sample pixels from the image to reduce whole image scan time. A fusion of foreground segmentation and skin colour segmentation is then used to select candidate face regions. Finally, a classifier-based face detector is applied only to selected regions to verify the presence of a face (the Viola-Jones detector is used in this paper). The proposed system is evaluated using 640 x 480 pixels test images and compared with other relevant methods. Experimental results show that the proposed method reduces the detection time to 42 ms, where the Viola-Jones detector alone requires 565 ms (on a desktop processor). This improvement makes the face detector suitable for real-time applications. Furthermore, the proposed method requires 50% of the computation time of the best competing method, while reducing the false positive rate by 3.2% and maintaining the same hit rate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The work described in this technical report is part of an ongoing project at QUT to build practical tools for the manipulation, analysis and visualisation of recordings of the natural environment. This report describes the algorithm we use to cluster the spectra in a spectrogram. The report begins with a brief description of the signal processing that prepares the spectrograms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a computationally efficient image border pixel based watermark embedding scheme for medical images. We considered the border pixels of a medical image as RONI (region of non-interest), since those pixels have no or little interest to doctors and medical professionals irrespective of the image modalities. Although RONI is used for embedding, our proposed scheme still keeps distortion at a minimum level in the embedding region using the optimum number of least significant bit-planes for the border pixels. All these not only ensure that a watermarked image is safe for diagnosis, but also help minimize the legal and ethical concerns of altering all pixels of medical images in any manner (e.g, reversible or irreversible). The proposed scheme avoids the need for RONI segmentation, which incurs capacity and computational overheads. The performance of the proposed scheme has been compared with a relevant scheme in terms of embedding capacity, image perceptual quality (measured by SSIM and PSNR), and computational efficiency. Our experimental results show that the proposed scheme is computationally efficient, offers an image-content-independent embedding capacity, and maintains a good image quality

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Field robots often rely on laser range finders (LRFs) to detect obstacles and navigate autonomously. Despite recent progress in sensing technology and perception algorithms, adverse environmental conditions, such as the presence of smoke, remain a challenging issue for these robots. In this paper, we investigate the possibility to improve laser-based perception applications by anticipating situations when laser data are affected by smoke, using supervised learning and state-of-the-art visual image quality analysis. We propose to train a k-nearest-neighbour (kNN) classifier to recognise situations where a laser scan is likely to be affected by smoke, based on visual data quality features. This method is evaluated experimentally using a mobile robot equipped with LRFs and a visual camera. The strengths and limitations of the technique are identified and discussed, and we show that the method is beneficial if conservative decisions are the most appropriate.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Object classification is plagued by the issue of session variation. Session variation describes any variation that makes one instance of an object look different to another, for instance due to pose or illumination variation. Recent work in the challenging task of face verification has shown that session variability modelling provides a mechanism to overcome some of these limitations. However, for computer vision purposes, it has only been applied in the limited setting of face verification. In this paper we propose a local region based intersession variability (ISV) modelling approach, and apply it to challenging real-world data. We propose a region based session variability modelling approach so that local session variations can be modelled, termed Local ISV. We then demonstrate the efficacy of this technique on a challenging real-world fish image database which includes images taken underwater, providing significant real-world session variations. This Local ISV approach provides a relative performance improvement of, on average, 23% on the challenging MOBIO, Multi-PIE and SCface face databases. It also provides a relative performance improvement of 35% on our challenging fish image dataset.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this research, we introduce a new blind steganalysis in detecting grayscale JPEG images. Features-pooling method is employed to extract the steganalytic features and the classification is done by using neural network. Three different steganographic models are tested and classification results are compared to the five state-of-the-art blind steganalysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The proliferation of news reports published in online websites and news information sharing among social media users necessitates effective techniques for analysing the image, text and video data related to news topics. This paper presents the first study to classify affective facial images on emerging news topics. The proposed system dynamically monitors and selects the current hot (of great interest) news topics with strong affective interestingness using textual keywords in news articles and social media discussions. Images from the selected hot topics are extracted and classified into three categorized emotions, positive, neutral and negative, based on facial expressions of subjects in the images. Performance evaluations on two facial image datasets collected from real-world resources demonstrate the applicability and effectiveness of the proposed system in affective classification of facial images in news reports. Facial expression shows high consistency with the affective textual content in news reports for positive emotion, while only low correlation has been observed for neutral and negative. The system can be directly used for applications, such as assisting editors in choosing photos with a proper affective semantic for a certain topic during news report preparation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the effect of topic dependent language models (TDLM) on phonetic spoken term detection (STD) using dynamic match lattice spotting (DMLS). Phonetic STD consists of two steps: indexing and search. The accuracy of indexing audio segments into phone sequences using phone recognition methods directly affects the accuracy of the final STD system. If the topic of a document in known, recognizing the spoken words and indexing them to an intermediate representation is an easier task and consequently, detecting a search word in it will be more accurate and robust. In this paper, we propose the use of TDLMs in the indexing stage to improve the accuracy of STD in situations where the topic of the audio document is known in advance. It is shown that using TDLMs instead of the traditional general language model (GLM) improves STD performance according to figure of merit (FOM) criteria.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Speech recognition in car environments has been identified as a valuable means for reducing driver distraction when operating noncritical in-car systems. Under such conditions, however, speech recognition accuracy degrades significantly, and techniques such as speech enhancement are required to improve these accuracies. Likelihood-maximizing (LIMA) frameworks optimize speech enhancement algorithms based on recognized state sequences rather than traditional signal-level criteria such as maximizing signal-to-noise ratio. LIMA frameworks typically require calibration utterances to generate optimized enhancement parameters that are used for all subsequent utterances. Under such a scheme, suboptimal recognition performance occurs in noise conditions that are significantly different from that present during the calibration session – a serious problem in rapidly changing noise environments out on the open road. In this chapter, we propose a dialog-based design that allows regular optimization iterations in order to track the ever-changing noise conditions. Experiments using Mel-filterbank noise subtraction (MFNS) are performed to determine the optimization requirements for vehicular environments and show that minimal optimization is required to improve speech recognition, avoid over-optimization, and ultimately assist with semireal-time operation. It is also shown that the proposed design is able to provide improved recognition performance over frameworks incorporating a calibration session only.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One main challenge in developing a system for visual surveillance event detection is the annotation of target events in the training data. By making use of the assumption that events with security interest are often rare compared to regular behaviours, this paper presents a novel approach by using Kullback-Leibler (KL) divergence for rare event detection in a weakly supervised learning setting, where only clip-level annotation is available. It will be shown that this approach outperforms state-of-the-art methods on a popular real-world dataset, while preserving real time performance.