926 resultados para Advanced signal processing


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper proposes the addition of a weighted median Fisher discriminator (WMFD) projection prior to length-normalised Gaussian probabilistic linear discriminant analysis (GPLDA) modelling in order to compensate the additional session variation. In limited microphone data conditions, a linear-weighted approach is introduced to increase the influence of microphone speech dataset. The linear-weighted WMFD-projected GPLDA system shows improvements in EER and DCF values over the pooled LDA- and WMFD-projected GPLDA systems in inter-view-interview condition as WMFD projection extracts more speaker discriminant information with limited number of sessions/ speaker data, and linear-weighted GPLDA approach estimates reliable model parameters with limited microphone data.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a clustering-only approach to the problem of speaker diarization to eliminate the need for the commonly employed and computationally expensive Viterbi segmentation and realignment stage. We use multiple linear segmentations of a recording and carry out complete-linkage clustering within each segmentation scenario to obtain a set of clustering decisions for each case. We then collect all clustering decisions, across all cases, to compute a pairwise vote between the segments and conduct complete-linkage clustering to cluster them at a resolution equal to the minimum segment length used in the linear segmentations. We use our proposed cluster-voting approach to carry out speaker diarization and linking across the SAIVT-BNEWS corpus of Australian broadcast news data. We compare our technique to an equivalent baseline system with Viterbi realignment and show that our approach can outperform the baseline technique with respect to the diarization error rate (DER) and attribution error rate (AER).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Meta-analyses estimate a statistical effect size for a test or an analysis by combining results from multiple studies without necessarily having access to each individual study's raw data. Multi-site meta-analysis is crucial for imaging genetics, as single sites rarely have a sample size large enough to pick up effects of single genetic variants associated with brain measures. However, if raw data can be shared, combining data in a "mega-analysis" is thought to improve power and precision in estimating global effects. As part of an ENIGMA-DTI investigation, we use fractional anisotropy (FA) maps from 5 studies (total N=2, 203 subjects, aged 9-85) to estimate heritability. We combine the studies through meta-and mega-analyses as well as a mixture of the two - combining some cohorts with mega-analysis and meta-analyzing the results with those of the remaining sites. A combination of mega-and meta-approaches may boost power compared to meta-analysis alone.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Bioacoustic monitoring has become a significant research topic for species diversity conservation. Due to the development of sensing techniques, acoustic sensors are widely deployed in the field to record animal sounds over a large spatial and temporal scale. With large volumes of collected audio data, it is essential to develop semi-automatic or automatic techniques to analyse the data. This can help ecologists make decisions on how to protect and promote the species diversity. This paper presents generic features to characterize a range of bird species for vocalisation retrieval. In the implementation, audio recordings are first converted to spectrograms using short-time Fourier transform, then a ridge detection method is applied to the spectrogram for detecting points of interest. Based on the detected points, a new region representation are explored for describing various bird vocalisations and a local descriptor including temporal entropy, frequency bin entropy and histogram of counts of four ridge directions is calculated for each sub-region. To speed up the retrieval process, indexing is carried out and the retrieved results are ranked according to similarity scores. The experiment results show that our proposed feature set can achieve 0.71 in term of retrieval success rate which outperforms spectral ridge features alone (0.55) and Mel frequency cepstral coefficients (0.36).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose To develop a signal processing paradigm for extracting ERG responses to temporal sinusoidal modulation with contrasts ranging from below perceptual threshold to suprathreshold contrasts. To estimate the magnitude of intrinsic noise in ERG signals at different stimulus contrasts. Methods Photopic test stimuli were generated using a 4-primary Maxwellian view optical system. The 4-primary lights were sinusoidally temporally modulated in-phase (36 Hz; 2.5 - 50% Michelson). The stimuli were presented in 1 s epochs separated by a 1 ms blank interval and repeated 160 times (160.16 s duration) during the recording of the continuous flicker ERG from the right eye using DTL fiber electrodes. After artefact rejection, the ERG signal was extracted using Fourier methods in each of the 1 s epochs where a stimulus was presented. The signal processing allows for computation of the intrinsic noise distribution in addition to the signal to noise (SNR) ratio. Results We provide the initial report that the ERG intrinsic noise distribution is independent of stimulus contrast whereas SNR decreases linearly with decreasing contrast until the noise limit at ~2.5%. The 1ms blank intervals between epochs de-correlated the ERG signal at the line frequency (50 Hz) and thus increased the SNR of the averaged response. We confirm that response amplitude increases linearly with stimulus contrast. The phase response shows a shallow positive relationship with stimulus contrast. Conclusions This new technique will enable recording of intrinsic noise in ERG signals above and below perceptual visual threshold and is suitable for measurement of continuous rod and cone ERGs across a range of temporal frequencies, and post-receptoral processing in the primary retinogeniculate pathways at low stimulus contrasts. The intrinsic noise distribution may have application as a biomarker for detecting changes in disease progression or treatment efficacy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, the results of the time dispersion parameters obtained from a set of channel measurements conducted in various environments that are typical of multiuser Infostation application scenarios are presented. The measurement procedure takes into account the practical scenarios typical of the positions and movements of the users in the particular Infostation network. To provide one with the knowledge of how much data can be downloaded by users over a given time and mobile speed, data transfer analysis for multiband orthogonal frequency division multiplexing (MB-OFDM) is presented. As expected, the rough estimate of simultaneous data transfer in a multiuser Infostation scenario indicates dependency of the percentage of download on the data size, number and speed of the users, and the elapse time.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Rolling-element bearing failures are the most frequent problems in rotating machinery, which can be catastrophic and cause major downtime. Hence, providing advance failure warning and precise fault detection in such components are pivotal and cost-effective. The vast majority of past research has focused on signal processing and spectral analysis for fault diagnostics in rotating components. In this study, a data mining approach using a machine learning technique called anomaly detection (AD) is presented. This method employs classification techniques to discriminate between defect examples. Two features, kurtosis and Non-Gaussianity Score (NGS), are extracted to develop anomaly detection algorithms. The performance of the developed algorithms was examined through real data from a test to failure bearing. Finally, the application of anomaly detection is compared with one of the popular methods called Support Vector Machine (SVM) to investigate the sensitivity and accuracy of this approach and its ability to detect the anomalies in early stages.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents a system to analyze long field recordings with low signal-to-noise ratio (SNR) for bio-acoustic monitoring. A method based on spectral peak track, Shannon entropy, harmonic structure and oscillation structure is proposed to automatically detect anuran (frog) calling activity. Gaussian mixture model (GMM) is introduced for modelling those features. Four anuran species widespread in Queensland, Australia, are selected to evaluate the proposed system. A visualization method based on extracted indices is employed for detection of anuran calling activity which achieves high accuracy.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Acoustic classification of anurans (frogs) has received increasing attention for its promising application in biological and environment studies. In this study, a novel feature extraction method for frog call classification is presented based on the analysis of spectrograms. The frog calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal (frog calls) from background noise. The spectral peak tracks are used to extract various syllable features, including: syllable duration, dominant frequency, oscillation rate, frequency modulation, and energy modulation. Finally, a k-nearest neighbor classifier is used for classifying frog calls based on the results of principal component analysis. The experiment results show that syllable features can achieve an average classification accuracy of 90.5% which outperforms Mel-frequency cepstral coefficients features (79.0%).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Over past few decades, frog species have been experiencing dramatic decline around the world. The reason for this decline includes habitat loss, invasive species, climate change and so on. To better know the status of frog species, classifying frogs has become increasingly important. In this study, acoustic features are investigated for multi-level classification of Australian frogs: family, genus and species, including three families, eleven genera and eighty five species which are collected from Queensland, Australia. For each frog species, six instances are selected from which ten acoustic features are calculated. Then, the multicollinearity between ten features are studied for selecting non-correlated features for subsequent analysis. A decision tree (DT) classifier is used to visually and explicitly determine which acoustic features are relatively important for classifying family, which for genus, and which for species. Finally, a weighted support vector machines (SVMs) classifier is used for the multi- level classification with three most important acoustic features respectively. Our experiment results indicate that using different acoustic feature sets can successfully classify frogs at different levels and the average classification accuracy can be up to 85.6%, 86.1% and 56.2% for family, genus and species respectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, we first recast the generalized symmetric eigenvalue problem, where the underlying matrix pencil consists of symmetric positive definite matrices, into an unconstrained minimization problem by constructing an appropriate cost function, We then extend it to the case of multiple eigenvectors using an inflation technique, Based on this asymptotic formulation, we derive a quasi-Newton-based adaptive algorithm for estimating the required generalized eigenvectors in the data case. The resulting algorithm is modular and parallel, and it is globally convergent with probability one, We also analyze the effect of inexact inflation on the convergence of this algorithm and that of inexact knowledge of one of the matrices (in the pencil) on the resulting eigenstructure. Simulation results demonstrate that the performance of this algorithm is almost identical to that of the rank-one updating algorithm of Karasalo. Further, the performance of the proposed algorithm has been found to remain stable even over 1 million updates without suffering from any error accumulation problems.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

With the availability of a huge amount of video data on various sources, efficient video retrieval tools are increasingly in demand. Video being a multi-modal data, the perceptions of ``relevance'' between the user provided query video (in case of Query-By-Example type of video search) and retrieved video clips are subjective in nature. We present an efficient video retrieval method that takes user's feedback on the relevance of retrieved videos and iteratively reformulates the input query feature vectors (QFV) for improved video retrieval. The QFV reformulation is done by a simple, but powerful feature weight optimization method based on Simultaneous Perturbation Stochastic Approximation (SPSA) technique. A video retrieval system with video indexing, searching and relevance feedback (RF) phases is built for demonstrating the performance of the proposed method. The query and database videos are indexed using the conventional video features like color, texture, etc. However, we use the comprehensive and novel methods of feature representations, and a spatio-temporal distance measure to retrieve the top M videos that are similar to the query. In feedback phase, the user activated iterative on the previously retrieved videos is used to reformulate the QFV weights (measure of importance) that reflect the user's preference, automatically. It is our observation that a few iterations of such feedback are generally sufficient for retrieving the desired video clips. The novel application of SPSA based RF for user-oriented feature weights optimization makes the proposed method to be distinct from the existing ones. The experimental results show that the proposed RF based video retrieval exhibit good performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A defect-selective photothermal imaging system for the diagnostics of optical coatings is demonstrated. The instrument has been optimized for pump and probe parameters, detector performance, and signal processing algorithm. The imager is capable of mapping purely optical or thermal defects efficiently in coatings of low damage threshold and low absorbance. Detailed mapping of minor inhomogeneities at low pump power has been achieved through the simultaneous action of a low-noise fiber optic photothermal beam defection sensor and a common-mode-rejection demodulation (CMRD) technique. The linearity and sensitivity of the sensor have been examined theoretically and experimentally, and the signal to noise ratio improvement factor is found to be about 110 compared to a conventional bicell photodiode. The scanner is so designed that mapping of static or shock sensitive samples is possible. In the case of a sample with absolute absorptance of 3.8 x 10(-4), a change in absorptance of about 0.005 x 10(-4) has been detected without ambiguity, ensuring a contrast parameter of 760. This is about 1085% improvement over the conventional approach containing a bicell photodiode, at the same pump power. The merits of the system have been demonstrated by mapping two intentionally created damage sites in a MgF2 coating on fused silica at different excitation powers. Amplitude and phase maps were recorded for thermally thin and thick cases, and the results are compared to demonstrate a case which, in conventional imaging, would lead to a deceptive conclusion regarding the type and location of the damage. Also, a residual damage profile created by long term irradiation with high pump power density has been depicted.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this paper, expressions for convolution multiplication properties of DCT IV and DST IV are derived starting from equivalent DFT representations. Using these expressions methods for implementing linear filtering through block convolution in the DCT IV and DST IV domain are proposed. Techniques developed for DCT IV and DST IV are further extended to MDCT and MDST where the filter implementation is near exact for symmetric filters and approximate for non-symmetric filters. No additional overlapping is required for implementing the symmetric filtering in the MDCT domain and hence the proposed algorithm is computationally competitive with DFT based systems. Moreover, inherent 50% overlap between the adjacent frames used for MDCT/MDST domain reduces the blocking artifacts due to block processing or quantization. The techniques are computationally efficient for symmetric filters and provides a new alternative to DFT based convolution.