14 resultados para Corrupted Diacritics

em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast


Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present results of a study into the performance of a variety of different image transform-based feature types for speaker-independent visual speech recognition of isolated digits. This includes the first reported use of features extracted using a discrete curvelet transform. The study will show a comparison of some methods for selecting features of each feature type and show the relative benefits of both static and dynamic visual features. The performance of the features will be tested on both clean video data and also video data corrupted in a variety of ways to assess each feature type's robustness to potential real-world conditions. One of the test conditions involves a novel form of video corruption we call jitter which simulates camera and/or head movement during recording.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper provides a summary of our studies on robust speech recognition based on a new statistical approach – the probabilistic union model. We consider speech recognition given that part of the acoustic features may be corrupted by noise. The union model is a method for basing the recognition on the clean part of the features, thereby reducing the effect of the noise on recognition. To this end, the union model is similar to the missing feature method. However, the two methods achieve this end through different routes. The missing feature method usually requires the identity of the noisy data for noise removal, while the union model combines the local features based on the union of random events, to reduce the dependence of the model on information about the noise. We previously investigated the applications of the union model to speech recognition involving unknown partial corruption in frequency band, in time duration, and in feature streams. Additionally, a combination of the union model with conventional noise-reduction techniques was studied, as a means of dealing with a mixture of known or trainable noise and unknown unexpected noise. In this paper, a unified review, in the context of dealing with unknown partial feature corruption, is provided into each of these applications, giving the appropriate theory and implementation algorithms, along with an experimental evaluation.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper investigates the problem of speaker identi-fication and verification in noisy conditions, assuming that speechsignals are corrupted by environmental noise, but knowledgeabout the noise characteristics is not available. This research ismotivated in part by the potential application of speaker recog-nition technologies on handheld devices or the Internet. Whilethe technologies promise an additional biometric layer of securityto protect the user, the practical implementation of such systemsfaces many challenges. One of these is environmental noise. Due tothe mobile nature of such systems, the noise sources can be highlytime-varying and potentially unknown. This raises the require-ment for noise robustness in the absence of information about thenoise. This paper describes a method that combines multicondi-tion model training and missing-feature theory to model noisewith unknown temporal-spectral characteristics. Multiconditiontraining is conducted using simulated noisy data with limitednoise variation, providing a “coarse” compensation for the noise,and missing-feature theory is applied to refine the compensationby ignoring noise variation outside the given training conditions,thereby reducing the training and testing mismatch. This paperis focused on several issues relating to the implementation of thenew model for real-world applications. These include the gener-ation of multicondition training data to model noisy speech, thecombination of different training data to optimize the recognitionperformance, and the reduction of the model’s complexity. Thenew algorithm was tested using two databases with simulated andrealistic noisy speech data. The first database is a redevelopmentof the TIMIT database by rerecording the data in the presence ofvarious noise types, used to test the model for speaker identifica-tion with a focus on the varieties of noise. The second database isa handheld-device database collected in realistic noisy conditions,used to further validate the model for real-world speaker verifica-tion. The new model is compared to baseline systems and is foundto achieve lower error rates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a novel image denoising technique based on the normal inverse Gaussian (NIG) density model using an extended non-negative sparse coding (NNSC) algorithm proposed by us. This algorithm can converge to feature basis vectors, which behave in the locality and orientation in spatial and frequency domain. Here, we demonstrate that the NIG density provides a very good fitness to the non-negative sparse data. In the denoising process, by exploiting a NIG-based maximum a posteriori estimator (MAP) of an image corrupted by additive Gaussian noise, the noise can be reduced successfully. This shrinkage technique, also referred to as the NNSC shrinkage technique, is self-adaptive to the statistical properties of image data. This denoising method is evaluated by values of the normalized signal to noise rate (SNR). Experimental results show that the NNSC shrinkage approach is indeed efficient and effective in denoising. Otherwise, we also compare the effectiveness of the NNSC shrinkage method with methods of standard sparse coding shrinkage, wavelet-based shrinkage and the Wiener filter. The simulation results show that our method outperforms the three kinds of denoising approaches mentioned above.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Approximants that can be considered weaker versions of voiced fricatives (termed here ‘frictionless continuants’) are poorly served by the IPA in terms of symbolization as compared to semi-vowel approximants. In this paper we survey the central approximants and the symbols and diacritics used to transcribe them; we focus on evidence for the use of non-rhotic frictionless continuants in both natural language (by which we mean non-clinical varieties) and disordered speech; and we suggest some possible unitary symbols for those that currently require the use of a hard-to-read lowering diacritic beneath the symbol for the corresponding voiced fricative.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Artifact removal from physiological signals is an essential component of the biosignal processing pipeline. The need for powerful and robust methods for this process has become particularly acute as healthcare technology deployment undergoes transition from the current hospital-centric setting toward a wearable and ubiquitous monitoring environment. Currently, determining the relative efficacy and performance of the multiple artifact removal techniques available on real world data can be problematic, due to incomplete information on the uncorrupted desired signal. The majority of techniques are presently evaluated using simulated data, and therefore, the quality of the conclusions is contingent on the fidelity of the model used. Consequently, in the biomedical signal processing community, there is considerable focus on the generation and validation of appropriate signal models for use in artifact suppression. Most approaches rely on mathematical models which capture suitable approximations to the signal dynamics or underlying physiology and, therefore, introduce some uncertainty to subsequent predictions of algorithm performance. This paper describes a more empirical approach to the modeling of the desired signal that we demonstrate for functional brain monitoring tasks which allows for the procurement of a ground truth signal which is highly correlated to a true desired signal that has been contaminated with artifacts. The availability of this ground truth, together with the corrupted signal, can then aid in determining the efficacy of selected artifact removal techniques. A number of commonly implemented artifact removal techniques were evaluated using the described methodology to validate the proposed novel test platform. © 2012 IEEE.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the maximum weighted stream posterior (MWSP) model as a robust and efficient stream integration method for audio-visual speech recognition in environments, where the audio or video streams may be subjected to unknown and time-varying corruption. A significant advantage of MWSP is that it does not require any specific measurements of the signal in either stream to calculate appropriate stream weights during recognition, and as such it is modality-independent. This also means that MWSP complements and can be used alongside many of the other approaches that have been proposed in the literature for this problem. For evaluation we used the large XM2VTS database for speaker-independent audio-visual speech recognition. The extensive tests include both clean and corrupted utterances with corruption added in either/both the video and audio streams using a variety of types (e.g., MPEG-4 video compression) and levels of noise. The experiments show that this approach gives excellent performance in comparison to another well-known dynamic stream weighting approach and also compared to any fixed-weighted integration approach in both clean conditions or when noise is added to either stream. Furthermore, our experiments show that the MWSP approach dynamically selects suitable integration weights on a frame-by-frame basis according to the level of noise in the streams and also according to the naturally fluctuating relative reliability of the modalities even in clean conditions. The MWSP approach is shown to maintain robust recognition performance in all tested conditions, while requiring no prior knowledge about the type or level of noise.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual feature-level fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there are limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new multimodal feature representation and a modified cosine similarity are introduced to combine and compare bimodal features with limited training data, as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal dataset created from the SPIDRE speaker recognition database and AR face recognition database with variable noise corruption of speech and occlusion in the face images. The system's speaker identification performance on the SPIDRE database, and facial identification performance on the AR database, is comparable with the literature. Combining both modalities using the new method of multimodal fusion leads to significantly improved accuracy over the unimodal systems, even when both modalities have been corrupted. The new method also shows improved identification accuracy compared with the bimodal systems based on multicondition model training or missing-feature decoding alone.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

While current speech recognisers give acceptable performance in carefully controlled environments, their performance degrades rapidly when they are applied in more realistic situations. Generally, the environmental noise may be classified into two classes: the wide-band noise and narrow band noise. While the multi-band model has been shown to be capable of dealing with speech corrupted by narrow-band noise, it is ineffective for wide-band noise. In this paper, we suggest a combination of the frequency-filtering technique with the probabilistic union model in the multi-band approach. The new system has been tested on the TIDIGITS database, corrupted by white noise, noise collected from a railway station, and narrow-band noise, respectively. The results have shown that this approach is capable of dealing with noise of narrow-band or wide-band characteristics, assuming no knowledge about the noisy environment.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we introduce a statistical data-correction framework that aims at improving the DSP system performance in presence of unreliable memories. The proposed signal processing framework implements best-effort error mitigation for signals that are corrupted by defects in unreliable storage arrays using a statistical correction function extracted from the signal statistics, a data-corruption model, and an application-specific cost function. An application example to communication systems demonstrates the efficacy of the proposed approach.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, we introduce a novel approach to face recognition which simultaneously tackles three combined challenges: 1) uneven illumination; 2) partial occlusion; and 3) limited training data. The new approach performs lighting normalization, occlusion de-emphasis and finally face recognition, based on finding the largest matching area (LMA) at each point on the face, as opposed to traditional fixed-size local area-based approaches. Robustness is achieved with novel approaches for feature extraction, LMA-based face image comparison and unseen data modeling. On the extended YaleB and AR face databases for face identification, our method using only a single training image per person, outperforms other methods using a single training image, and matches or exceeds methods which require multiple training images. On the labeled faces in the wild face verification database, our method outperforms comparable unsupervised methods. We also show that the new method performs competitively even when the training images are corrupted.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We investigate the impact of co-channel interference on the security performance of multiple amplify-and-forward (AF) relaying networks, where N intermediate AF relays assist the data transmission from the source to the destination. The relays are corrupted by multiple co-channel interferers, and the information transmitted from the relays to destination can be overheard by the eavesdropper. In order to deal with the interference and wiretap, the best out of N relays is selected for security enhancement. To this end, we derive a novel lower bound on the secrecy outage probability (SOP), which is then utilized to present two best relay selection criteria, based on the instantaneous and statistical channel information of the interfering links. For these criteria and the conventional maxmin criterion, we quantify the impact of co-channel interference and relay selection by deriving the lower bound on the SOP. Furthermore, we derive the asymptotic SOP for each criterion, to explicitly reveal the impact of transmit power allocation among interferers on the secrecy performance, which offers valuable insights into practical design. We demonstrate that all selection criteria achieve full secrecy diversity order N, while the proposed in this paper two criteria outperform the conventional max-min scheme.