97 resultados para Noisy corpora.
Resumo:
In this paper we propose a novel scheme for carrying out speaker diarization in an iterative manner. We aim to show that the information obtained through the first pass of speaker diarization can be reused to refine and improve the original diarization results. We call this technique speaker rediarization and demonstrate the practical application of our rediarization algorithm using a large archive of two-speaker telephone conversation recordings. We use the NIST 2008 SRE summed telephone corpora for evaluating our speaker rediarization system. This corpus contains recurring speaker identities across independent recording sessions that need to be linked across the entire corpus. We show that our speaker rediarization scheme can take advantage of inter-session speaker information, linked in the initial diarization pass, to achieve a 30% relative improvement over the original diarization error rate (DER) after only two iterations of rediarization.
Resumo:
In this paper conditional hidden Markov model (HMM) filters and conditional Kalman filters (KF) are coupled together to improve demodulation of differential encoded signals in noisy fading channels. We present an indicator matrix representation for differential encoded signals and the optimal HMM filter for demodulation. The filter requires O(N3) calculations per time iteration, where N is the number of message symbols. Decision feedback equalisation is investigated via coupling the optimal HMM filter for estimating the message, conditioned on estimates of the channel parameters, and a KF for estimating the channel states, conditioned on soft information message estimates. The particular differential encoding scheme examined in this paper is differential phase shift keying. However, the techniques developed can be extended to other forms of differential modulation. The channel model we use allows for multiplicative channel distortions and additive white Gaussian noise. Simulation studies are also presented.
Resumo:
The aim of this ethnographic study was to understand welding practices in shipyard environments with the purpose of designing an interactive welding robot that can help workers with their daily job. The robot is meant to be deployed for automatic welding on jack-up rig structures. The design of the robot turns out to be a challenging task due to several problematic working conditions on the shipyard, such as dust, irregular floor, high temperature, wind variations, elevated working platforms, narrow spaces, and circular welding paths requiring a robotic arm with more than 6 degrees of freedom. Additionally, the environment is very noisy and the workers – mostly foreigners – have a very basic level of English. These two issues need to be taken into account when designing the interactive user interface for the robot. Ideally, the communication flow between the two parties involved should be as frictionless as possible. The paper presents the results of our field observations and welders’ interviews, as well as our robot design recommendation for the next project stage.
Resumo:
With the overwhelming increase in the amount of data on the web and data bases, many text mining techniques have been proposed for mining useful patterns in text documents. Extracting closed sequential patterns using the Pattern Taxonomy Model (PTM) is one of the pruning methods to remove noisy, inconsistent, and redundant patterns. However, PTM model treats each extracted pattern as whole without considering included terms, which could affect the quality of extracted patterns. This paper propose an innovative and effective method that extends the random set to accurately weigh patterns based on their distribution in the documents and their terms distribution in patterns. Then, the proposed approach will find the specific closed sequential patterns (SCSP) based on the new calculated weight. The experimental results on Reuters Corpus Volume 1 (RCV1) data collection and TREC topics show that the proposed method significantly outperforms other state-of-the-art methods in different popular measures.
Resumo:
Corner detection has shown its great importance in many computer vision tasks. However, in real-world applications, noise in the image strongly affects the performance of corner detectors. Few corner detectors have been designed to be robust to heavy noise by now, partly because the noise could be reduced by a denoising procedure. In this paper, we present a corner detector that could find discriminative corners in images contaminated by noise of different levels, without any denoising procedure. Candidate corners (i.e., features) are firstly detected by a modified SUSAN approach, and then false corners in noise are rejected based on their local characteristics. Features in flat regions are removed based on their intensity centroid, and features on edge structures are removed using the Harris response. The detector is self-adaptive to noise since the image signal-to-noise ratio (SNR) is automatically estimated to choose an appropriate threshold for refining features. Experimental results show that our detector has better performance at locating discriminative corners in images with strong noise than other widely used corner or keypoint detectors.
Resumo:
We present our work on tele-operating a complex humanoid robot with the help of bio-signals collected from the operator. The frameworks (for robot vision, collision avoidance and machine learning), developed in our lab, allow for a safe interaction with the environment, when combined. This even works with noisy control signals, such as, the operator’s hand acceleration and their electromyography (EMG) signals. These bio-signals are used to execute equivalent actions (such as, reaching and grasping of objects) on the 7 DOF arm.
Resumo:
It is traditional to initialise Kalman filters and extended Kalman filters with estimates of the states calculated directly from the observed (raw) noisy inputs, but unfortunately their performance is extremely sensitive to state initialisation accuracy: good initial state estimates ensure fast convergence whereas poor estimates may give rise to slow convergence or even filter divergence. Divergence is generally due to excessive observation noise and leads to error magnitudes that quickly become unbounded (R.J. Fitzgerald, 1971). When a filter diverges, it must be re initialised but because the observations are extremely poor, re initialised states will have poor estimates. The paper proposes that if neurofuzzy estimators produce more accurate state estimates than those calculated from the observed noisy inputs (using the known state model), then neurofuzzy estimates can be used to initialise the states of Kalman and extended Kalman filters. Filters whose states have been initialised with neurofuzzy estimates should give improved performance by way of faster convergence when the filter is initialised, and when a filter is re started after divergence
Resumo:
Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).
Resumo:
Dwellings in multi-storey apartment buildings (MSAB) are predicted to increase dramatically as a proportion of housing stock in subtropical cities over coming decades. The problem of designing comfortable and healthy high-density residential environments and minimising energy consumption must be addressed urgently in subtropical cities globally. This paper explores private residents’ experiences of privacy and comfort and their perceptions of how well their apartment dwelling modulated the external environment in subtropical conditions through analysis of 636 survey responses and 24 interviews with residents of MSAB in inner urban neighbourhoods of Brisbane, Australia. The findings show that the availability of natural ventilation and outdoor private living spaces play important roles in resident perceptions of liveability in the subtropics where the climate is conducive to year round “outdoor living”. Residents valued choice with regard to climate control methods in their apartments. They overwhelmingly preferred natural ventilation to manage thermal comfort, and turned to the air-conditioner for limited periods, particularly when external conditions were too noisy. These findings provide a unique evidence base for reducing the environmental impact of MSAB and increasing the acceptability of apartment living, through incorporating residential attributes positioned around climate-responsive architecture.
Resumo:
We used diffusion tensor magnetic resonance imaging (DTI) to reveal the extent of genetic effects on brain fiber microstructure, based on tensor-derived measures, in 22 pairs of monozygotic (MZ) twins and 23 pairs of dizygotic (DZ) twins (90 scans). After Log-Euclidean denoising to remove rank-deficient tensors, DTI volumes were fluidly registered by high-dimensional mapping of co-registered MP-RAGE scans to a geometrically-centered mean neuroanatomical template. After tensor reorientation using the strain of the 3D fluid transformation, we computed two widely used scalar measures of fiber integrity: fractional anisotropy (FA), and geodesic anisotropy (GA), which measures the geodesic distance between tensors in the symmetric positive-definite tensor manifold. Spatial maps of intraclass correlations (r) between MZ and DZ twins were compared to compute maps of Falconer's heritability statistics, i.e. the proportion of population variance explainable by genetic differences among individuals. Cumulative distribution plots (CDF) of effect sizes showed that the manifold measure, GA, comparably the Euclidean measure, FA, in detecting genetic correlations. While maps were relatively noisy, the CDFs showed promise for detecting genetic influences on brain fiber integrity as the current sample expands.
Resumo:
Visual information in the form of lip movements of the speaker has been shown to improve the performance of speech recognition and search applications. In our previous work, we proposed cross database training of synchronous hidden Markov models (SHMMs) to make use of external large and publicly available audio databases in addition to the relatively small given audio visual database. In this work, the cross database training approach is improved by performing an additional audio adaptation step, which enables audio visual SHMMs to benefit from audio observations of the external audio models before adding visual modality to them. The proposed approach outperforms the baseline cross database training approach in clean and noisy environments in terms of phone recognition accuracy as well as spoken term detection (STD) accuracy.
Resumo:
Spoken term detection (STD) is the task of looking up a spoken term in a large volume of speech segments. In order to provide fast search, speech segments are first indexed into an intermediate representation using speech recognition engines which provide multiple hypotheses for each speech segment. Approximate matching techniques are usually applied at the search stage to compensate the poor performance of automatic speech recognition engines during indexing. Recently, using visual information in addition to audio information has been shown to improve phone recognition performance, particularly in noisy environments. In this paper, we will make use of visual information in the form of lip movements of the speaker in indexing stage and will investigate its effect on STD performance. Particularly, we will investigate if gains in phone recognition accuracy will carry through the approximate matching stage to provide similar gains in the final audio-visual STD system over a traditional audio only approach. We will also investigate the effect of using visual information on STD performance in different noise environments.
Resumo:
Speech recognition can be improved by using visual information in the form of lip movements of the speaker in addition to audio information. To date, state-of-the-art techniques for audio-visual speech recognition continue to use audio and visual data of the same database for training their models. In this paper, we present a new approach to make use of one modality of an external dataset in addition to a given audio-visual dataset. By so doing, it is possible to create more powerful models from other extensive audio-only databases and adapt them on our comparatively smaller multi-stream databases. Results show that the presented approach outperforms the widely adopted synchronous hidden Markov models (HMM) trained jointly on audio and visual data of a given audio-visual database for phone recognition by 29% relative. It also outperforms the external audio models trained on extensive external audio datasets and also internal audio models by 5.5% and 46% relative respectively. We also show that the proposed approach is beneficial in noisy environments where the audio source is affected by the environmental noise.