874 resultados para speech processing, automatic speech recognition, robust, adverse environments, speech enhancement, phase spectrum, phase estimation, optimisation, likelihood maximisation, automotive
Resumo:
In this work an adaptive modeling and spectral estimation scheme based on a dual Discrete Kalman Filtering (DKF) is proposed for speech enhancement. Both speech and noise signals are modeled by an autoregressive structure which provides an underlying time frame dependency and improves time-frequency resolution. The model parameters are arranged to obtain a combined state-space model and are also used to calculate instantaneous power spectral density estimates. The speech enhancement is performed by a dual discrete Kalman filter that simultaneously gives estimates for the models and the signals. This approach is particularly useful as a pre-processing module for parametric based speech recognition systems that rely on spectral time dependent models. The system performance has been evaluated by a set of human listeners and by spectral distances. In both cases the use of this pre-processing module has led to improved results.
Resumo:
We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010. Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals. The goal of the procedure is to perceptually improve the sound of speech signals in background noise. The proposed new method modifies Xiao's method in four significant ways. Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end. Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique. With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training. Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation. And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz. The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios. The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures. Results of subjective CMOS tests over a smaller set of test samples support our claims.
Resumo:
Most face recognition systems only work well under quite constrained environments. In particular, the illumination conditions, facial expressions and head pose must be tightly controlled for good recognition performance. In 2004, we proposed a new face recognition algorithm, Adaptive Principal Component Analysis (APCA) [4], which performs well against both lighting variation and expression change. But like other eigenface-derived face recognition algorithms, APCA only performs well with frontal face images. The work presented in this paper is an extension of our previous work to also accommodate variations in head pose. Following the approach of Cootes et al, we develop a face model and a rotation model which can be used to interpret facial features and synthesize realistic frontal face images when given a single novel face image. We use a Viola-Jones based face detector to detect the face in real-time and thus solve the initialization problem for our Active Appearance Model search. Experiments show that our approach can achieve good recognition rates on face images across a wide range of head poses. Indeed recognition rates are improved by up to a factor of 5 compared to standard PCA.
Resumo:
We present a new method for the enhancement of speech. The method is designed for scenarios in which targeted speaker enrollment as well as system training within the typical noise environment are feasible. The proposed procedure is fundamentally different from most conventional and state-of-the-art denoising approaches. Instead of filtering a distorted signal we are resynthesizing a new “clean” signal based on its likely characteristics. These characteristics are estimated from the distorted signal. A successful implementation of the proposed method is presented. Experiments were performed in a scenario with roughly one hour of clean speech training data. Our results show that the proposed method compares very favorably to other state-of-the-art systems in both objective and subjective speech quality assessments. Potential applications for the proposed method include jet cockpit communication systems and offline methods for the restoration of audio recordings.
Resumo:
The purpose of this paper is to survey and assess the state-of-the-art in automatic target recognition for synthetic aperture radar imagery (SAR-ATR). The aim is not to develop an exhaustive survey of the voluminous literature, but rather to capture in one place the various approaches for implementing the SAR-ATR system. This paper is meant to be as self-contained as possible, and it approaches the SAR-ATR problem from a holistic end-to-end perspective. A brief overview for the breadth of the SAR-ATR challenges is conducted. This is couched in terms of a single-channel SAR, and it is extendable to multi-channel SAR systems. Stages pertinent to the basic SAR-ATR system structure are defined, and the motivations of the requirements and constraints on the system constituents are addressed. For each stage in the SAR-ATR processing chain, a taxonomization methodology for surveying the numerous methods published in the open literature is proposed. Carefully selected works from the literature are presented under the taxa proposed. Novel comparisons, discussions, and comments are pinpointed throughout this paper. A two-fold benchmarking scheme for evaluating existing SAR-ATR systems and motivating new system designs is proposed. The scheme is applied to the works surveyed in this paper. Finally, a discussion is presented in which various interrelated issues, such as standard operating conditions, extended operating conditions, and target-model design, are addressed. This paper is a contribution toward fulfilling an objective of end-to-end SAR-ATR system design.
Resumo:
Behavior-based navigation of autonomous vehicles requires the recognition of the navigable areas and the potential obstacles. In this paper we describe a model-based objects recognition system which is part of an image interpretation system intended to assist the navigation of autonomous vehicles that operate in industrial environments. The recognition system integrates color, shape and texture information together with the location of the vanishing point. The recognition process starts from some prior scene knowledge, that is, a generic model of the expected scene and the potential objects. The recognition system constitutes an approach where different low-level vision techniques extract a multitude of image descriptors which are then analyzed using a rule-based reasoning system to interpret the image content. This system has been implemented using a rule-based cooperative expert system
Resumo:
In the modern warfare there is an active development of a new trend connected with a robotic warfare. One of the critical elements of robotics warfare systems is an automatic target recognition system, allowing to recognize objects, based on the data received from sensors. This work considers aspects of optical realization of such a system by means of NIR target scanning at fixed wavelengths. An algorithm was designed, an experimental setup was built and samples of various modern gear and apparel materials were tested. For pattern testing the samples of actively arm engaged armies camouflages were chosen. Tests were performed both in clear atmosphere and in the artificial extremely humid and hot atmosphere to simulate field conditions.
Resumo:
Behavior-based navigation of autonomous vehicles requires the recognition of the navigable areas and the potential obstacles. In this paper we describe a model-based objects recognition system which is part of an image interpretation system intended to assist the navigation of autonomous vehicles that operate in industrial environments. The recognition system integrates color, shape and texture information together with the location of the vanishing point. The recognition process starts from some prior scene knowledge, that is, a generic model of the expected scene and the potential objects. The recognition system constitutes an approach where different low-level vision techniques extract a multitude of image descriptors which are then analyzed using a rule-based reasoning system to interpret the image content. This system has been implemented using a rule-based cooperative expert system
Resumo:
resumen tomado de la revista
Resumo:
Automated virtual camera control has been widely used in animation and interactive virtual environments. We have developed a multiple sparse camera based free view video system prototype that allows users to control the position and orientation of a virtual camera, enabling the observation of a real scene in three dimensions (3D) from any desired viewpoint. Automatic camera control can be activated to follow selected objects by the user. Our method combines a simple geometric model of the scene composed of planes (virtual environment), augmented with visual information from the cameras and pre-computed tracking information of moving targets to generate novel perspective corrected 3D views of the virtual camera and moving objects. To achieve real-time rendering performance, view-dependent textured mapped billboards are used to render the moving objects at their correct locations and foreground masks are used to remove the moving objects from the projected video streams. The current prototype runs on a PC with a common graphics card and can generate virtual 2D views from three cameras of resolution 768 x 576 with several moving objects at about 11 fps. (C)2011 Elsevier Ltd. All rights reserved.
Resumo:
In this paper we shed light over the problem of landslide automatic recognition using supervised classification, and we also introduced the OPF classifier in this context. We employed two images acquired from Geoeye-MS satellite at March-2010 in the northwest (high steep areas) and north sides (pipeline area) covering the area of Duque de Caxias city, Rio de Janeiro State, Brazil. The landslide recognition rate has been assessed through a cross-validation with 10 runnings. In regard to the classifiers, we have used OPF against SVM with Radial Basis Function for kernel mapping and a Bayesian classifier. We can conclude that OPF, Bayes and SVM achieved high recognition rates, being OPF the fastest approach. © 2012 IEEE.
Resumo:
Automatically recognizing faces captured under uncontrolled environments has always been a challenging topic in the past decades. In this work, we investigate cohort score normalization that has been widely used in biometric verification as means to improve the robustness of face recognition under challenging environments. In particular, we introduce cohort score normalization into undersampled face recognition problem. Further, we develop an effective cohort normalization method specifically for the unconstrained face pair matching problem. Extensive experiments conducted on several well known face databases demonstrate the effectiveness of cohort normalization on these challenging scenarios. In addition, to give a proper understanding of cohort behavior, we study the impact of the number and quality of cohort samples on the normalization performance. The experimental results show that bigger cohort set size gives more stable and often better results to a point before the performance saturates. And cohort samples with different quality indeed produce different cohort normalization performance. Recognizing faces gone after alterations is another challenging problem for current face recognition algorithms. Face image alterations can be roughly classified into two categories: unintentional (e.g., geometrics transformations introduced by the acquisition devide) and intentional alterations (e.g., plastic surgery). We study the impact of these alterations on face recognition accuracy. Our results show that state-of-the-art algorithms are able to overcome limited digital alterations but are sensitive to more relevant modifications. Further, we develop two useful descriptors for detecting those alterations which can significantly affect the recognition performance. In the end, we propose to use the Structural Similarity (SSIM) quality map to detect and model variations due to plastic surgeries. Extensive experiments conducted on a plastic surgery face database demonstrate the potential of SSIM map for matching face images after surgeries.
Resumo:
The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.
Resumo:
The role of polarisation in late time complex resonance based target identification is investigated numerically for the case of an L-shaped wire. While repeated extraction of the resonances for varying polarisation allows for better signal-to-noise immunity, it is also found that there are preferred polarisations for each complex resonance. The first few of these polarisations are extracted for the sample target.