936 resultados para Image recognition
Resumo:
Typical flow fields in a stormwater gross pollutant trap (GPT) with blocked retaining screens were experimentally captured and visualised. Particle image velocimetry (PIV) software was used to capture the flow field data by tracking neutrally buoyant particles with a high speed camera. A technique was developed to apply the Image Based Flow Visualization (IBFV) algorithm to the experimental raw dataset generated by the PIV software. The dataset consisted of scattered 2D point velocity vectors and the IBFV visualisation facilitates flow feature characterisation within the GPT. The flow features played a pivotal role in understanding gross pollutant capture and retention within the GPT. It was found that the IBFV animations revealed otherwise unnoticed flow features and experimental artefacts. For example, a circular tracer marker in the IBFV program visually highlighted streamlines to investigate specific areas and identify the flow features within the GPT.
Resumo:
This paper investigates the use of mel-frequency deltaphase (MFDP) features in comparison to, and in fusion with, traditional mel-frequency cepstral coefficient (MFCC) features within joint factor analysis (JFA) speaker verification. MFCC features, commonly used in speaker recognition systems, are derived purely from the magnitude spectrum, with the phase spectrum completely discarded. In this paper, we investigate if features derived from the phase spectrum can provide additional speaker discriminant information to the traditional MFCC approach in a JFA based speaker verification system. Results are presented which provide a comparison of MFCC-only, MFDPonly and score fusion of the two approaches within a JFA speaker verification approach. Based upon the results presented using the NIST 2008 Speaker Recognition Evaluation (SRE) dataset, we believe that, while MFDP features alone cannot compete with MFCC features, MFDP can provide complementary information that result in improved speaker verification performance when both approaches are combined in score fusion, particularly in the case of shorter utterances.
Resumo:
The application of epoxy embedding and microtomy to individual chondritic interplanetary dust particles (lOP's)(Bradley and Brownlee, 1986a) provides not only higher precision in thin-film elemental analyses (Bradley and Brownlee, 19861:1), but also allows a wealth of other important techniques for the micro-characterization of these primitive extraterrestrial materials. For example, individual sections (e.g. 100 nm thick) or a series of sections, can be examined using image analysis techniques which utilize either transmitted or scanned secondary electron images, or alternatively, secondary X-ray spectra collected concurrently from a given region of sample. Individual particles, or groups of particles with similar image characteristics can then be rapidly identified using conventional grey-scale/particle recognition techniques for each microtomed section of lOP. This type of image analysis provides a suitable method for determination of particle size and shape distribution as well as porosity throughout the aggregate.
Resumo:
Sound tagging has been studied for years. Among all sound types, music, speech, and environmental sound are three hottest research areas. This survey aims to provide an overview about the state-of-the-art development in these areas.We discuss about the meaning of tagging in different sound areas at the beginning of the journey. Some examples of sound tagging applications are introduced in order to illustrate the significance of this research. Typical tagging techniques include manual, automatic, and semi-automatic approaches.After reviewing work in music, speech and environmental sound tagging, we compare them and state the research progress to date. Research gaps are identified for each research area and the common features and discriminations between three areas are discovered as well. Published datasets, tools used by researchers, and evaluation measures frequently applied in the analysis are listed. In the end, we summarise the worldwide distribution of countries dedicated to sound tagging research for years.
Resumo:
Our contemporary public sphere has seen the 'emergence of new political rituals, which are concerned with the stains of the past, with self disclosure, and with ways of remembering once taboo and traumatic events' (Misztal, 2005). A recent case of this phenomenon occurred in Australia in 2009 with the apology to the 'Forgotten Australians': a group who suffered abuse and neglect after being removed from their parents – either in Australia or in the UK - and placed in Church and State run institutions in Australia between 1930 and 1970. This campaign for recognition by a profoundly marginalized group coincides with the decade in which the opportunities of Web 2.0 were seen to be diffusing throughout different social groups, and were considered a tool for social inclusion. This paper examines the case of the Forgotten Australians as an opportunity to investigate the role of the internet in cultural trauma and public apology. As such, it adds to recent scholarship on the role of digital web based technologies in commemoration and memorials (Arthur, 2009; Haskins, 2007; Cohen and Willis, 2004), and on digital storytelling in the context of trauma (Klaebe, 2011) by locating their role in a broader and emerging domain of social responsibility and political action (Alexander, 2004).
Resumo:
Many methods exist at the moment for deformable face fitting. A drawback to nearly all these approaches is that they are (i) noisy in terms of landmark positions, and (ii) the noise is biased across frames (i.e. the misalignment is toward common directions across all frames). In this paper we propose a grouped $\mathcal{L}1$-norm anchored method for simultaneously aligning an ensemble of deformable face images stemming from the same subject, given noisy heterogeneous landmark estimates. Impressive alignment performance improvement and refinement is obtained using very weak initialization as "anchors".
Resumo:
Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.
Resumo:
Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.
Resumo:
This paper is concerned with the unsupervised learning of object representations by fusing visual and motor information. The problem is posed for a mobile robot that develops its representations as it incrementally gathers data. The scenario is problematic as the robot only has limited information at each time step with which it must generate and update its representations. Object representations are refined as multiple instances of sensory data are presented; however, it is uncertain whether two data instances are synonymous with the same object. This process can easily diverge from stability. The premise of the presented work is that a robot's motor information instigates successful generation of visual representations. An understanding of self-motion enables a prediction to be made before performing an action, resulting in a stronger belief of data association. The system is implemented as a data-driven partially observable semi-Markov decision process. Object representations are formed as the process's hidden states and are coordinated with motor commands through state transitions. Experiments show the prediction process is essential in enabling the unsupervised learning method to converge to a solution - improving precision and recall over using sensory data alone.
Resumo:
Abstraction in its resistance to evident meaning has the capacity to interrupt or at least provide tools with which to question an overly compliant reception of the information to which we are subject. It does so by highlighting a latency or potentiality inherent in materiality that points to the possibility of a critical resistance to this ceaseless flow of sound/image/data. This resistance has been remarked on in differing ways by a number of commentators such as Lyotard, in his exploration of the avant-garde and the sublime for example. This joint paper will initially map the collaborative project by Daniel Mafe and Andrew Brown, Affecting Interference which conjoins painting with digital sound and animations into a single, large scale, immersive exhibition/installation. The work acts as an interstitial point between contrasting approaches to abstraction: the visual and aural, the digital and analogue. The paper will then explore the ramifications of this through the examination of abstraction as ‘noise’, that is as that raw inassimilable materiality, within which lays the creative possibility to forge and embrace the as-yet-unthought and almost-forgotten. It does so by establishing a space for a more poetic and slower paced critical engagement for the viewing and receiving information or data. This slowing of perception through the suspension of easy recognition runs counter to our current ‘high performance’ culture, and it’s requisite demand for speedy assimilation of content, representing instead the poetic encounter with a potentiality or latency inherent in the nameless particularity of that which is.
Resumo:
Our everyday environment is full of text but this rich source of information remains largely inaccessible to mobile robots. In this paper we describe an active text spotting system that uses a small number of wide angle views to locate putative text in the environment and then foveates and zooms onto that text in order to improve the reliability of text recognition. We present extensive experimental results obtained with a pan/tilt/zoom camera and a ROS-based mobile robot operating in an indoor environment.
Resumo:
Image representations derived from simplified models of the primary visual cortex (V1), such as HOG and SIFT, elicit good performance in a myriad of visual classification tasks including object recognition/detection, pedestrian detection and facial expression classification. A central question in the vision, learning and neuroscience communities regards why these architectures perform so well. In this paper, we offer a unique perspective to this question by subsuming the role of V1-inspired features directly within a linear support vector machine (SVM). We demonstrate that a specific class of such features in conjunction with a linear SVM can be reinterpreted as inducing a weighted margin on the Kronecker basis expansion of an image. This new viewpoint on the role of V1-inspired features allows us to answer fundamental questions on the uniqueness and redundancies of these features, and offer substantial improvements in terms of computational and storage efficiency.
Resumo:
Since the first destination image studies were published in the early 1970s, the field has become one of the most popular in the tourism literature. While reviews of the destination image literature show no commonly agreed conceptualisation of the construct, researchers have predominantly used structured questionnaires for measurement. There has been criticism that the way some of these scales have been selected means a greater likelihood of attributes being irrelevant to participants. This opens up the risk of stimulating uninformed responses. The issue of uninformed response was first raised as a source of error 60 years ago. However, there has been little, if any, discussion in relation to destination image measurement, studies of which often require participants to provide opinion-driven rather than fact-based responses. This paper reports the trial of a ‘don’t know’ (DK) non-response option for participants in two destination image questionnaires. It is suggested the use of a DK option provides participants with an alternative to i) skipping the question, ii) using the scale midpoint to denote neutrality, or iii) providing an uninformed response. High levels of DK usage by participants can then alert the marketer of the need to improve awareness of destination performance for potential salient attributes.
Resumo:
The rank and census are two filters based on order statistics which have been applied to the image matching problem for stereo pairs. Advantages of these filters include their robustness to radiometric distortion and small amounts of random noise, and their amenability to hardware implementation. In this paper, a new matching algorithm is presented, which provides an overall framework for matching, and is used to compare the rank and census techniques with standard matching metrics. The algorithm was tested using both real stereo pairs and a synthetic pair with ground truth. The rank and census filters were shown to significantly improve performance in the case of radiometric distortion. In all cases, the results obtained were comparable to, if not better than, those obtained using standard matching metrics. Furthermore, the rank and census have the additional advantage that their computational overhead is less than these metrics. For all techniques tested, the difference between the results obtained for the synthetic stereo pair, and the ground truth results was small.