846 resultados para Bit error rate


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Chaque année, le piratage mondial de la musique coûte plusieurs milliards de dollars en pertes économiques, pertes d’emplois et pertes de gains des travailleurs ainsi que la perte de millions de dollars en recettes fiscales. La plupart du piratage de la musique est dû à la croissance rapide et à la facilité des technologies actuelles pour la copie, le partage, la manipulation et la distribution de données musicales [Domingo, 2015], [Siwek, 2007]. Le tatouage des signaux sonores a été proposé pour protéger les droit des auteurs et pour permettre la localisation des instants où le signal sonore a été falsifié. Dans cette thèse, nous proposons d’utiliser la représentation parcimonieuse bio-inspirée par graphe de décharges (spikegramme), pour concevoir une nouvelle méthode permettant la localisation de la falsification dans les signaux sonores. Aussi, une nouvelle méthode de protection du droit d’auteur. Finalement, une nouvelle attaque perceptuelle, en utilisant le spikegramme, pour attaquer des systèmes de tatouage sonore. Nous proposons tout d’abord une technique de localisation des falsifications (‘tampering’) des signaux sonores. Pour cela nous combinons une méthode à spectre étendu modifié (‘modified spread spectrum’, MSS) avec une représentation parcimonieuse. Nous utilisons une technique de poursuite perceptive adaptée (perceptual marching pursuit, PMP [Hossein Najaf-Zadeh, 2008]) pour générer une représentation parcimonieuse (spikegramme) du signal sonore d’entrée qui est invariante au décalage temporel [E. C. Smith, 2006] et qui prend en compte les phénomènes de masquage tels qu’ils sont observés en audition. Un code d’authentification est inséré à l’intérieur des coefficients de la représentation en spikegramme. Puis ceux-ci sont combinés aux seuils de masquage. Le signal tatoué est resynthétisé à partir des coefficients modifiés, et le signal ainsi obtenu est transmis au décodeur. Au décodeur, pour identifier un segment falsifié du signal sonore, les codes d’authentification de tous les segments intacts sont analysés. Si les codes ne peuvent être détectés correctement, on sait qu’alors le segment aura été falsifié. Nous proposons de tatouer selon le principe à spectre étendu (appelé MSS) afin d’obtenir une grande capacité en nombre de bits de tatouage introduits. Dans les situations où il y a désynchronisation entre le codeur et le décodeur, notre méthode permet quand même de détecter des pièces falsifiées. Par rapport à l’état de l’art, notre approche a le taux d’erreur le plus bas pour ce qui est de détecter les pièces falsifiées. Nous avons utilisé le test de l’opinion moyenne (‘MOS’) pour mesurer la qualité des systèmes tatoués. Nous évaluons la méthode de tatouage semi-fragile par le taux d’erreur (nombre de bits erronés divisé par tous les bits soumis) suite à plusieurs attaques. Les résultats confirment la supériorité de notre approche pour la localisation des pièces falsifiées dans les signaux sonores tout en préservant la qualité des signaux. Ensuite nous proposons une nouvelle technique pour la protection des signaux sonores. Cette technique est basée sur la représentation par spikegrammes des signaux sonores et utilise deux dictionnaires (TDA pour Two-Dictionary Approach). Le spikegramme est utilisé pour coder le signal hôte en utilisant un dictionnaire de filtres gammatones. Pour le tatouage, nous utilisons deux dictionnaires différents qui sont sélectionnés en fonction du bit d’entrée à tatouer et du contenu du signal. Notre approche trouve les gammatones appropriés (appelés noyaux de tatouage) sur la base de la valeur du bit à tatouer, et incorpore les bits de tatouage dans la phase des gammatones du tatouage. De plus, il est montré que la TDA est libre d’erreur dans le cas d’aucune situation d’attaque. Il est démontré que la décorrélation des noyaux de tatouage permet la conception d’une méthode de tatouage sonore très robuste. Les expériences ont montré la meilleure robustesse pour la méthode proposée lorsque le signal tatoué est corrompu par une compression MP3 à 32 kbits par seconde avec une charge utile de 56.5 bps par rapport à plusieurs techniques récentes. De plus nous avons étudié la robustesse du tatouage lorsque les nouveaux codec USAC (Unified Audion and Speech Coding) à 24kbps sont utilisés. La charge utile est alors comprise entre 5 et 15 bps. Finalement, nous utilisons les spikegrammes pour proposer trois nouvelles méthodes d’attaques. Nous les comparons aux méthodes récentes d’attaques telles que 32 kbps MP3 et 24 kbps USAC. Ces attaques comprennent l’attaque par PMP, l’attaque par bruit inaudible et l’attaque de remplacement parcimonieuse. Dans le cas de l’attaque par PMP, le signal de tatouage est représenté et resynthétisé avec un spikegramme. Dans le cas de l’attaque par bruit inaudible, celui-ci est généré et ajouté aux coefficients du spikegramme. Dans le cas de l’attaque de remplacement parcimonieuse, dans chaque segment du signal, les caractéristiques spectro-temporelles du signal (les décharges temporelles ;‘time spikes’) se trouvent en utilisant le spikegramme et les spikes temporelles et similaires sont remplacés par une autre. Pour comparer l’efficacité des attaques proposées, nous les comparons au décodeur du tatouage à spectre étendu. Il est démontré que l’attaque par remplacement parcimonieux réduit la corrélation normalisée du décodeur de spectre étendu avec un plus grand facteur par rapport à la situation où le décodeur de spectre étendu est attaqué par la transformation MP3 (32 kbps) et 24 kbps USAC.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hybrid face recognition, using image (2D) and structural (3D) information, has explored the fusion of Nearest Neighbour classifiers. This paper examines the effectiveness of feature modelling for each individual modality, 2D and 3D. Furthermore, it is demonstrated that the fusion of feature modelling techniques for the 2D and 3D modalities yields performance improvements over the individual classifiers. By fusing the feature modelling classifiers for each modality with equal weights the average Equal Error Rate improves from 12.60% for the 2D classifier and 12.10% for the 3D classifier to 7.38% for the Hybrid 2D+3D clasiffier.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of the PC and Internet for placing telephone calls will present new opportunities to capture vast amounts of un-transcribed speech for a particular speaker. This paper investigates how to best exploit this data for speaker-dependent speech recognition. Supervised and unsupervised experiments in acoustic model and language model adaptation are presented. Using one hour of automatically transcribed speech per speaker with a word error rate of 36.0%, unsupervised adaptation resulted in an absolute gain of 6.3%, equivalent to 70% of the gain from the supervised case, with additional adaptation data likely to yield further improvements. LM adaptation experiments suggested that although there seems to be a small degree of speaker idiolect, adaptation to the speaker alone, without considering the topic of the conversation, is in itself unlikely to improve transcription accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor to replace the Bayesian Information Criterion (BIC) as a criterion for speaker clustering within a speaker diarization system. The BIC is one of the most popular decision criteria used in speaker diarization systems today. However, it will be shown in this paper that the BIC is only an approximation to the Bayes factor of marginal likelihoods of the data given each hypothesis. This paper uses the Bayes factor directly as a decision criterion for speaker clustering, thus removing the error introduced by the BIC approximation. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, leading to a 14.7% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we present a microphone array beamforming approach to blind speech separation. Unlike previous beamforming approaches, our system does not require a-priori knowledge of the microphone placement and speaker location, making the system directly comparable other blind source separation methods which require no prior knowledge of recording conditions. Microphone location is automatically estimated using an assumed noise field model, and speaker locations are estimated using cross correlation based methods. The system is evaluated on the data provided for the PASCAL Speech Separation Challenge 2 (SSC2), achieving a word error rate of 58% on the evaluation set.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid growth of mobile telephone use, satellite services, and now the wireless Internet and WLANs are generating tremendous changes in telecommunication and networking. As indoor wireless communications become more prevalent, modeling indoor radio wave propagation in populated environments is a topic of significant interest. Wireless MIMO communication exploits phenomena such as multipath propagation to increase data throughput and range, or reduce bit error rates, rather than attempting to eliminate effects of multipath propagation as traditional SISO communication systems seek to do. The MIMO approach can yield significant gains for both link and network capacities, with no additional transmitting power or bandwidth consumption when compared to conventional single-array diversity methods. When MIMO and OFDM systems are combined and deployed in a suitable rich scattering environment such as indoors, a significant capacity gain can be observed due to the assurance of multipath propagation. Channel variations can occur as a result of movement of personnel, industrial machinery, vehicles and other equipment moving within the indoor environment. The time-varying effects on the propagation channel in populated indoor environments depend on the different pedestrian traffic conditions and the particular type of environment considered. A systematic measurement campaign to study pedestrian movement effects in indoor MIMO-OFDM channels has not yet been fully undertaken. Measuring channel variations caused by the relative positioning of pedestrians is essential in the study of indoor MIMO-OFDM broadband wireless networks. Theoretically, due to high multipath scattering, an increase in MIMO-OFDM channel capacity is expected when pedestrians are present. However, measurements indicate that some reductions in channel capacity could be observed as the number of pedestrians approaches 10 due to a reduction in multipath conditions as more human bodies absorb the wireless signals. This dissertation presents a systematic characterization of the effects of pedestrians in indoor MIMO-OFDM channels. Measurement results, using the MIMO-OFDM channel sounder developed at the CSIRO ICT Centre, have been validated by a customized Geometric Optics-based ray tracing simulation. Based on measured and simulated MIMO-OFDM channel capacity and MIMO-OFDM capacity dynamic range, an improved deterministic model for MIMO-OFDM channels in indoor populated environments is presented. The model can be used for the design and analysis of future WLAN to be deployed in indoor environments. The results obtained show that, in both Fixed SNR and Fixed Tx for deterministic condition, the channel capacity dynamic range rose with the number of pedestrians as well as with the number of antenna combinations. In random scenarios with 10 pedestrians, an increment in channel capacity of up to 0.89 bits/sec/Hz in Fixed SNR and up to 1.52 bits/sec/Hz in Fixed Tx has been recorded compared to the one pedestrian scenario. In addition, from the results a maximum increase in average channel capacity of 49% has been measured while 4 antenna elements are used, compared with 2 antenna elements. The highest measured average capacity, 11.75 bits/sec/Hz, corresponds to the 4x4 array with 10 pedestrians moving randomly. Moreover, Additionally, the spread between the highest and lowest value of the the dynamic range is larger for Fixed Tx, predicted 5.5 bits/sec/Hz and measured 1.5 bits/sec/Hz, in comparison with Fixed SNR criteria, predicted 1.5 bits/sec/Hz and measured 0.7 bits/sec/Hz. This has been confirmed by both measurements and simulations ranging from 1 to 5, 7 and 10 pedestrians.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a method of voice activity detection (VAD) for high noise scenarios, using a noise robust voiced speech detection feature. The developed method is based on the fusion of two systems. The first system utilises the maximum peak of the normalised time-domain autocorrelation function (MaxPeak). The second zone system uses a novel combination of cross-correlation and zero-crossing rate of the normalised autocorrelation to approximate a measure of signal pitch and periodicity (CrossCorr) that is hypothesised to be noise robust. The score outputs by the two systems are then merged using weighted sum fusion to create the proposed autocorrelation zero-crossing rate (AZR) VAD. Accuracy of AZR was compared to state of the art and standardised VAD methods and was shown to outperform the best performing system with an average relative improvement of 24.8% in half-total error rate (HTER) on the QUT-NOISE-TIMIT database created using real recordings from high-noise environments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of the Bayes Factor as a distance metric for speaker segmentation within a speaker diarization system. The proposed approach uses a pair of constant sized, sliding windows to compute the value of the Bayes Factor between the adjacent windows over the entire audio. Results obtained on the 2002 Rich Transcription Evaluation dataset show an improved segmentation performance compared to previous approaches reported in literature using the Generalized Likelihood Ratio. When applied in a speaker diarization system, this approach results in a 5.1% relative improvement in the overall Diarization Error Rate compared to the baseline.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical dependence between classifier decisions is often shown to improve performance over statistically independent decisions. Though the solution for favourable dependence between two classifier decisions has been derived, the theoretical analysis for the general case of 'n' client and impostor decision fusion has not been presented before. This paper presents the expressions developed for favourable dependence of multi-instance and multi-sample fusion schemes that employ 'AND' and 'OR' rules. The expressions are experimentally evaluated by considering the proposed architecture for text-dependent speaker verification using HMM based digit dependent speaker models. The improvement in fusion performance is found to be higher when digit combinations with favourable client and impostor decisions are used for speaker verification. The total error rate of 20% for fusion of independent decisions is reduced to 2.1% for fusion of decisions that are favourable for both client and impostors. The expressions developed here are also applicable to other biometric modalities, such as finger prints and handwriting samples, for reliable identity verification.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Speaker diarization is the process of annotating an input audio with information that attributes temporal regions of the audio signal to their respective sources, which may include both speech and non-speech events. For speech regions, the diarization system also specifies the locations of speaker boundaries and assign relative speaker labels to each homogeneous segment of speech. In short, speaker diarization systems effectively answer the question of ‘who spoke when’. There are several important applications for speaker diarization technology, such as facilitating speaker indexing systems to allow users to directly access the relevant segments of interest within a given audio, and assisting with other downstream processes such as summarizing and parsing. When combined with automatic speech recognition (ASR) systems, the metadata extracted from a speaker diarization system can provide complementary information for ASR transcripts including the location of speaker turns and relative speaker segment labels, making the transcripts more readable. Speaker diarization output can also be used to localize the instances of specific speakers to pool data for model adaptation, which in turn boosts transcription accuracies. Speaker diarization therefore plays an important role as a preliminary step in automatic transcription of audio data. The aim of this work is to improve the usefulness and practicality of speaker diarization technology, through the reduction of diarization error rates. In particular, this research is focused on the segmentation and clustering stages within a diarization system. Although particular emphasis is placed on the broadcast news audio domain and systems developed throughout this work are also trained and tested on broadcast news data, the techniques proposed in this dissertation are also applicable to other domains including telephone conversations and meetings audio. Three main research themes were pursued: heuristic rules for speaker segmentation, modelling uncertainty in speaker model estimates, and modelling uncertainty in eigenvoice speaker modelling. The use of heuristic approaches for the speaker segmentation task was first investigated, with emphasis placed on minimizing missed boundary detections. A set of heuristic rules was proposed, to govern the detection and heuristic selection of candidate speaker segment boundaries. A second pass, using the same heuristic algorithm with a smaller window, was also proposed with the aim of improving detection of boundaries around short speaker segments. Compared to single threshold based methods, the proposed heuristic approach was shown to provide improved segmentation performance, leading to a reduction in the overall diarization error rate. Methods to model the uncertainty in speaker model estimates were developed, to address the difficulties associated with making segmentation and clustering decisions with limited data in the speaker segments. The Bayes factor, derived specifically for multivariate Gaussian speaker modelling, was introduced to account for the uncertainty of the speaker model estimates. The use of the Bayes factor also enabled the incorporation of prior information regarding the audio to aid segmentation and clustering decisions. The idea of modelling uncertainty in speaker model estimates was also extended to the eigenvoice speaker modelling framework for the speaker clustering task. Building on the application of Bayesian approaches to the speaker diarization problem, the proposed approach takes into account the uncertainty associated with the explicit estimation of the speaker factors. The proposed decision criteria, based on Bayesian theory, was shown to generally outperform their non- Bayesian counterparts.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Neutrophils serve as an intriguing model for the study of innate immune cellular activity induced by physiological stress. We measured changes in the transcriptome of circulating neutrophils following an experimental exercise trial (EXTRI) consisting of 1 h of intense cycling immediately followed by 1 h of intense running. Blood samples were taken at baseline, 3 h, 48 h, and 96 h post-EXTRI from eight healthy, endurance-trained, male subjects. RNA was extracted from isolated neutrophils. Differential gene expression was evaluated using Illumina microarrays and validated with quantitative PCR. Gene set enrichment analysis identified enriched molecular signatures chosen from the Molecular Signatures Database. Blood concentrations of muscle damage indexes, neutrophils, interleukin (IL)-6 and IL-10 were increased (P < 0.05) 3 h post-EXTRI. Upregulated groups of functionally related genes 3 h post-EXTRI included gene sets associated with the recognition of tissue damage, the IL-1 receptor, and Toll-like receptor (TLR) pathways (familywise error rate, P value < 0.05). The core enrichment for these pathways included TLRs, low-affinity immunoglobulin receptors, S100 calcium binding protein A12, and negative regulators of innate immunity, e.g., IL-1 receptor antagonist, and IL-1 receptor associated kinase-3. Plasma myoglobin changes correlated with neutrophil TLR4 gene expression (r = 0.74; P < 0.05). Neutrophils had returned to their nonactivated state 48 h post-EXTRI, indicating that their initial proinflammatory response was transient and rapidly counterregulated. This study provides novel insight into the signaling mechanisms underlying the neutrophil responses to endurance exercise, suggesting that their transcriptional activity was particularly induced by damage-associated molecule patterns, hypothetically originating from the leakage of muscle components into the circulation.