960 resultados para voice activity detection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acoustic parameters are frequently used to assess the presence of pathologies in human voice. Many of them have demonstrated to be useful but in some cases its results could be optimized by selecting appropriate working margins. In this study two indices, CIL and RALA, obtained from Modulation Spectra are described and tuned using different frame lengths and frequency ranges to maximize AUC in normal to pathological voice detection. After the tuning process, AUC reaches 0.96 and 0.95 values for CIL and RALA respectively representing an improvement of 16 % and 12 % at each case respect to the typical tuning based only on frame length selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The QUT-NOISE-SRE protocol is designed to mix the large QUT-NOISE database, consisting of over 10 hours of back- ground noise, collected across 10 unique locations covering 5 common noise scenarios, with commonly used speaker recognition datasets such as Switchboard, Mixer and the speaker recognition evaluation (SRE) datasets provided by NIST. By allowing common, clean, speech corpora to be mixed with a wide variety of noise conditions, environmental reverberant responses, and signal-to-noise ratios, this protocol provides a solid basis for the development, evaluation and benchmarking of robust speaker recognition algorithms, and is freely available to download alongside the QUT-NOISE database. In this work, we use the QUT-NOISE-SRE protocol to evaluate a state-of-the-art PLDA i-vector speaker recognition system, demonstrating the importance of designing voice-activity-detection front-ends specifically for speaker recognition, rather than aiming for perfect coherence with the true speech/non-speech boundaries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Spoken dialogue systems provide a convenient way for users to interact with a machine using only speech. However, they often rely on a rigid turn taking regime in which a voice activity detection (VAD) module is used to determine when the user is speaking and decide when is an appropriate time for the system to respond. This paper investigates replacing the VAD and discrete utterance recogniser of a conventional turn-taking system with a continuously operating recogniser that is always listening, and using the recogniser 1-best path to guide turn taking. In this way, a flexible framework for incremental dialogue management is possible. Experimental results show that it is possible to remove the VAD component and successfully use the recogniser best path to identify user speech, with more robustness to noise, potentially smaller latency times, and a reduction in overall recognition error rate compared to using the conventional approach. © 2013 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Este trabajo de Tesis ha abordado el objetivo de dar robustez y mejorar la Detección de Actividad de Voz en entornos acústicos adversos con el fin de favorecer el comportamiento de muchas aplicaciones vocales, por ejemplo aplicaciones de telefonía basadas en reconocimiento automático de voz, aplicaciones en sistemas de transcripción automática, aplicaciones en sistemas multicanal, etc. En especial, aunque se han tenido en cuenta todos los tipos de ruido, se muestra especial interés en el estudio de las voces de fondo, principal fuente de error de la mayoría de los Detectores de Actividad en la actualidad. Las tareas llevadas a cabo poseen como punto de partida un Detector de Actividad basado en Modelos Ocultos de Markov, cuyo vector de características contiene dos componentes: la energía normalizada y la variación de la energía. Las aportaciones fundamentales de esta Tesis son las siguientes: 1) ampliación del vector de características de partida dotándole así de información espectral, 2) ajuste de los Modelos Ocultos de Markov al entorno y estudio de diferentes topologías y, finalmente, 3) estudio e inclusión de nuevas características, distintas de las del punto 1, para filtrar los pulsos de pronunciaciones que proceden de las voces de fondo. Los resultados de detección, teniendo en cuenta los tres puntos anteriores, muestran con creces los avances realizados y son significativamente mejores que los resultados obtenidos, bajo las mismas condiciones, con otros detectores de actividad de referencia. This work has been focused on improving the robustness at Voice Activity Detection in adverse acoustic environments in order to enhance the behavior of many vocal applications, for example telephony applications based on automatic speech recognition, automatic transcription applications, multichannel systems applications, and so on. In particular, though all types of noise have taken into account, this research has special interest in the study of pronunciations coming from far-field speakers, the main error source of most activity detectors today. The tasks carried out have, as starting point, a Hidden Markov Models Voice Activity Detector which a feature vector containing two components: normalized energy and delta energy. The key points of this Thesis are the following: 1) feature vector extension providing spectral information, 2) Hidden Markov Models adjustment to environment and study of different Hidden Markov Model topologies and, finally, 3) study and inclusion of new features, different from point 1, to reject the pronunciations coming from far-field speakers. Detection results, taking into account the above three points, show the advantages of using this method and are significantly better than the results obtained under the same conditions by other well-known voice activity detectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a system to analyze long field recordings with low signal-to-noise ratio (SNR) for bio-acoustic monitoring. A method based on spectral peak track, Shannon entropy, harmonic structure and oscillation structure is proposed to automatically detect anuran (frog) calling activity. Gaussian mixture model (GMM) is introduced for modelling those features. Four anuran species widespread in Queensland, Australia, are selected to evaluate the proposed system. A visualization method based on extracted indices is employed for detection of anuran calling activity which achieves high accuracy.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

An acetylcholinesterase (AChE) activity detection system was fabricated based on the electrocatalysis of cobalt(II) tetraphenylporphyrin of the electrooxidation of thiocholine chloride, which is the product of the hydrolysis of acetylthiocholine chloride by AChE. A simple modified method was used to form the base electrode. AChE was cross-linked on the base electrode by glutaraldehyde. The optimum working conditions are discussed and the characteristics of the detection system are evaluated.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The electroencephalogram (EEG) is a medical technology that is used in the monitoring of the brain and in the diagnosis of many neurological illnesses. Although coarse in its precision, the EEG is a non-invasive tool that requires minimal set-up times, and is suitably unobtrusive and mobile to allow continuous monitoring of the patient, either in clinical or domestic environments. Consequently, the EEG is the current tool-of-choice with which to continuously monitor the brain where temporal resolution, ease-of- use and mobility are important. Traditionally, EEG data are examined by a trained clinician who identifies neurological events of interest. However, recent advances in signal processing and machine learning techniques have allowed the automated detection of neurological events for many medical applications. In doing so, the burden of work on the clinician has been significantly reduced, improving the response time to illness, and allowing the relevant medical treatment to be administered within minutes rather than hours. However, as typical EEG signals are of the order of microvolts (μV ), contamination by signals arising from sources other than the brain is frequent. These extra-cerebral sources, known as artefacts, can significantly distort the EEG signal, making its interpretation difficult, and can dramatically disimprove automatic neurological event detection classification performance. This thesis therefore, contributes to the further improvement of auto- mated neurological event detection systems, by identifying some of the major obstacles in deploying these EEG systems in ambulatory and clinical environments so that the EEG technologies can emerge from the laboratory towards real-world settings, where they can have a real-impact on the lives of patients. In this context, the thesis tackles three major problems in EEG monitoring, namely: (i) the problem of head-movement artefacts in ambulatory EEG, (ii) the high numbers of false detections in state-of-the-art, automated, epileptiform activity detection systems and (iii) false detections in state-of-the-art, automated neonatal seizure detection systems. To accomplish this, the thesis employs a wide range of statistical, signal processing and machine learning techniques drawn from mathematics, engineering and computer science. The first body of work outlined in this thesis proposes a system to automatically detect head-movement artefacts in ambulatory EEG and utilises supervised machine learning classifiers to do so. The resulting head-movement artefact detection system is the first of its kind and offers accurate detection of head-movement artefacts in ambulatory EEG. Subsequently, addtional physiological signals, in the form of gyroscopes, are used to detect head-movements and in doing so, bring additional information to the head- movement artefact detection task. A framework for combining EEG and gyroscope signals is then developed, offering improved head-movement arte- fact detection. The artefact detection methods developed for ambulatory EEG are subsequently adapted for use in an automated epileptiform activity detection system. Information from support vector machines classifiers used to detect epileptiform activity is fused with information from artefact-specific detection classifiers in order to significantly reduce the number of false detections in the epileptiform activity detection system. By this means, epileptiform activity detection which compares favourably with other state-of-the-art systems is achieved. Finally, the problem of false detections in automated neonatal seizure detection is approached in an alternative manner; blind source separation techniques, complimented with information from additional physiological signals are used to remove respiration artefact from the EEG. In utilising these methods, some encouraging advances have been made in detecting and removing respiration artefacts from the neonatal EEG, and in doing so, the performance of the underlying diagnostic technology is improved, bringing its deployment in the real-world, clinical domain one step closer.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper addresses the issue of activity understanding from video and its semantics-rich description. A novel approach is presented where activities are characterised and analysed at different resolutions. Semantic information is delivered according to the resolution at which the activity is observed. Furthermore, the multiresolution activity characterisation is exploited to detect abnormal activity. To achieve these system capabilities, the focus is given on context modelling by employing a soft computing-based algorithm which automatically enables the determination of the main activity zones of the observed scene by taking as input the trajectories of detected mobiles. Such areas are learnt at different resolutions (or granularities). In a second stage, learned zones are employed to extract people activities by relating mobile trajectories to the learned zones. In this way, the activity of a person can be summarised as the series of zones that the person has visited. Employing the inherent soft relation properties, the reported activities can be labelled with meaningful semantics. Depending on the granularity at which activity zones and mobile trajectories are considered, the semantic meaning of the activity shifts from broad interpretation to detailed description.Activity information at different resolutions is also employed to perform abnormal activity detection.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Activity recognition is an active research field nowadays, as it enables the development of highly adaptive applications, e.g. in the field of personal health. In this paper, a light high-level fusion algorithm to detect the activity that an individual is performing is presented. The algorithm relies on data gathered from accelerometers placed on different parts of the body, and on biometric sensors. Inertial sensors allow detecting activity by analyzing signal features such as amplitude or peaks. In addition, there is a relationship between the activity intensity and biometric response, which can be considered together with acceleration data to improve the accuracy of activity detection. The proposed algorithm is designed to work with minimum computational cost, being ready to run in a mobile device as part of a context-aware application. In order to enable different user scenarios, the algorithm offers best-effort activity estimation: its quality of estimation depends on the position and number of the available inertial sensors, and also on the presence of biometric information.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Deep convolutional network models have dominated recent work in human action recognition as well as image classification. However, these methods are often unduly influenced by the image background, learning and exploiting the presence of cues in typical computer vision datasets. For unbiased robotics applications, the degree of variation and novelty in action backgrounds is far greater than in computer vision datasets. To address this challenge, we propose an “action region proposal” method that, informed by optical flow, extracts image regions likely to contain actions for input into the network both during training and testing. In a range of experiments, we demonstrate that manually segmenting the background is not enough; but through active action region proposals during training and testing, state-of-the-art or better performance can be achieved on individual spatial and temporal video components. Finally, we show by focusing attention through action region proposals, we can further improve upon the existing state-of-the-art in spatio-temporally fused action recognition performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Real-Time services are traditionally supported on circuit switched network. However, there is a need to port these services on packet switched network. Architecture for audio conferencing application over the Internet in the light of ITU-T H.323 recommendations is considered. In a conference, considering packets only from a set of selected clients can reduce speech quality degradation because mixing packets from all clients can lead to lack of speech clarity. A distributed algorithm and architecture for selecting clients for mixing is suggested here based on a new quantifier of the voice activity called “Loudness Number” (LN). The proposed system distributes the computation load and reduces the load on client terminals. The highlights of this architecture are scalability, bandwidth saving and speech quality enhancement. Client selection for playing out tries to mimic a physical conference where the most vocal participants attract more attention. The contributions of the paper are expected to aid H.323 recommendations implementations for Multipoint Processors (MP). A working prototype based on the proposed architecture is already functional.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

A colorimetric and ``turn-on'' fluorescent chemosensor based on 1,9-pyrazoloanthrone specifically for cyanide and fluoride ion detection shows a remarkable solid state reaction when crystals of tetrabutylammonium cyanide and fluoride are brought in physical contact with 1,9-pyrazoloanthrone. X-ray crystal structures of 1,9-pyrazoloanthrone and complexes have been determined, and the ion sensing activity (detection limit of 0.2 and 2 ppb) has been inferred based on spectroscopic and structural features.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Neste trabalho contemplamos o emprego de detectores de voz como uma etapa de pré- processamento de uma técnica de separação cega de sinais implementada no domínio do tempo, que emprega estatísticas de segunda ordem para a separação de misturas convolutivas e determinadas. Seu algoritmo foi adaptado para realizar a separação tanto em banda cheia quanto em sub-bandas, considerando a presença e a ausência de instantes de silêncio em misturas de sinais de voz. A ideia principal consiste em detectar trechos das misturas que contenham atividade de voz, evitando que o algoritmo de separação seja acionado na ausência de voz, promovendo ganho de desempenho e redução do custo computacional.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cloud computing is a technological advancementthat provide resources through internet on pay-as-you-go basis.Cloud computing uses virtualisation technology to enhance theefficiency and effectiveness of its advantages. Virtualisation isthe key to consolidate the computing resources to run multiple instances on each hardware, increasing the utilization rate of every resource, thus reduces the number of resources needed to buy, rack, power, cool, and manage. Cloud computing has very appealing features, however, lots of enterprises and users are still reluctant to move into cloud due to serious security concerns related to virtualisation layer. Thus, it is foremost important to secure the virtual environment.In this paper, we present an elastic framework to secure virtualised environment for trusted cloud computing called Server Virtualisation Security System (SVSS). SVSS provide security solutions located on hyper visor for Virtual Machines by deploying malicious activity detection techniques, network traffic analysis techniques, and system resource utilization analysis techniques.SVSS consists of four modules: Anti-Virus Control Module,Traffic Behavior Monitoring Module, Malicious Activity Detection Module and Virtualisation Security Management Module.A SVSS prototype has been deployed to validate its feasibility,efficiency and accuracy on Xen virtualised environment.