30 resultados para speaker clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

A novel algorithm based on bimatrix game theory has been developed to improve the accuracy and reliability of a speaker diarization system. This algorithm fuses the output data of two open-source speaker diarization programs, LIUM and SHoUT, taking advantage of the best properties of each one. The performance of this new system has been tested by means of audio streams from several movies. From preliminary results on fragments of five movies, improvements of 63% in false alarms and missed speech mistakes have been achieved with respect to LIUM and SHoUT systems working alone. Moreover, we also improve in a 20% the number of recognized speakers, getting close to the real number of speakers in the audio stream

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose a novel fast random search clustering (RSC) algorithm for mixing matrix identification in multiple input multiple output (MIMO) linear blind inverse problems with sparse inputs. The proposed approach is based on the clustering of the observations around the directions given by the columns of the mixing matrix that occurs typically for sparse inputs. Exploiting this fact, the RSC algorithm proceeds by parameterizing the mixing matrix using hyperspherical coordinates, randomly selecting candidate basis vectors (i.e. clustering directions) from the observations, and accepting or rejecting them according to a binary hypothesis test based on the Neyman–Pearson criterion. The RSC algorithm is not tailored to any specific distribution for the sources, can deal with an arbitrary number of inputs and outputs (thus solving the difficult under-determined problem), and is applicable to both instantaneous and convolutive mixtures. Extensive simulations for synthetic and real data with different number of inputs and outputs, data size, sparsity factors of the inputs and signal to noise ratios confirm the good performance of the proposed approach under moderate/high signal to noise ratios. RESUMEN. Método de separación ciega de fuentes para señales dispersas basado en la identificación de la matriz de mezcla mediante técnicas de "clustering" aleatorio.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an approach to adapt dynamically the language models (LMs) used by a speech recognizer that is part of a spoken dialogue system. We have developed a grammar generation strategy that automatically adapts the LMs using the semantic information that the user provides (represented as dialogue concepts), together with the information regarding the intentions of the speaker (inferred by the dialogue manager, and represented as dialogue goals). We carry out the adaptation as a linear interpolation between a background LM, and one or more of the LMs associated to the dialogue elements (concepts or goals) addressed by the user. The interpolation weights between those models are automatically estimated on each dialogue turn, using measures such as the posterior probabilities of concepts and goals, estimated as part of the inference procedure to determine the actions to be carried out. We propose two approaches to handle the LMs related to concepts and goals. Whereas in the first one we estimate a LM for each one of them, in the second one we apply several clustering strategies to group together those elements that share some common properties, and estimate a LM for each cluster. Our evaluation shows how the system can estimate a dynamic model adapted to each dialogue turn, which helps to improve the performance of the speech recognition (up to a 14.82% of relative improvement), which leads to an improvement in both the language understanding and the dialogue management tasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The study of the effectiveness of the cognitive rehabilitation processes and the identification of cognitive profiles, in order to define comparable populations, is a controversial area, but concurrently it is strongly needed in order to improve therapies. There is limited evidence about cognitive rehabilitation efficacy. Many of the trials conclude that in spite of an apparent clinical good response, differences do not show statistical significance. The common feature in all these trials is heterogeneity among populations. In this situation, observational studies on very well controlled cohort of studies, together with innovative methods in knowledge extraction, could provide methodological insights for the design of more accurate comparative trials. Some correlation studies between neuropsychological tests and patients capacities have been carried out -1---2- and also correlation between tests and morphological changes in the brain -3-. The procedures efficacy depends on three main factors: the affectation profile, the scheduled tasks and the execution results. The relationship between them makes up the cognitive rehabilitation as a discipline, but its structure is not properly defined. In this work we present a clustering method used in Neuro Personal Trainer (NPT) to group patients into cognitive profiles using data mining techniques. The system uses these clusters to personalize treatments, using the patients assigned cluster to select which tasks are more suitable for its concrete needs, by comparing the results obtained in the past by patients with the same profile.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Many data streaming applications produces massive amounts of data that must be processed in a distributed fashion due to the resource limitation of a single machine. We propose a distributed data stream clustering protocol. Theoretical analysis shows preliminary results about the quality of discovered clustering. In addition, we present results about the ability to reduce the time complexity respect to the centralized approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

MFCC coefficients extracted from the power spectral density of speech as a whole, seems to have become the de facto standard in the area of speaker recognition, as demonstrated by its use in almost all systems submitted to the 2013 Speaker Recognition Evaluation (SRE) in Mobile Environment [1], thus relegating to background this component of the recognition systems. However, in this article we will show that selecting the adequate speaker characterization system is as important as the selection of the classifier. To accomplish this we will compare the recognition rates achieved by different recognition systems that relies on the same classifier (GMM-UBM) but connected with different feature extraction systems (based on both classical and biometric parameters). As a result we will show that a gender dependent biometric parameterization with a simple recognition system based on GMM- UBM paradigm provides very competitive or even better recognition rates when compared to more complex classification systems based on classical features

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Desentrañar el funcionamiento del cerebro es uno de los principales desafíos a los que se enfrenta la ciencia actual. Un área de estudio que ha despertado muchas expectativas e interés es el análisis de la estructura cortical desde el punto de vista morfológico, de manera que se cree una simulación del cerebro a nivel molecular. Con ello se espera poder profundizar en el estudio de numerosas enfermedades neurológicas y patológicas. Con el desarrollo de este proyecto se persigue el estudio del soma y de las espinas desde el punto de vista de la neuromorfología teórica. Es común en el estado del arte que en el análisis de las características morfológicas de una neurona en tres dimensiones el soma sea ignorado o, en el mejor de los casos, que sea sustituido por una simple esfera. De hecho, el concepto de soma resulta abstracto porque no se dispone de una dfinición estricta y robusta que especifique exactamente donde finaliza y comienzan las dendritas. En este proyecto se alcanza por primera vez una definición matemática de soma para determinar qué es el soma. Con el fin de simular somas se ahonda en los atributos utilizados en el estado del arte. Estas propiedades, de índole genérica, no especifican una morfología única. Es por ello que se propone un método que agrupe propiedades locales y globales de la morfología. En disposición de las características se procede con la categorización del cuerpo celular en distintas clases a partir de un nuevo subtipo de red bayesiana dinámica adaptada al espacio. Con ello se discute la existencia de distintas clases de somas y se descubren las diferencias entre los somas piramidales de distintas capas del cerebro. A partir del modelo matemático se simulan por primera vez somas virtuales. Algunas morfologías de espinas han sido atribuidas a ciertos comportamientos cognitivos. Por ello resulta de interés dictaminar las clases existentes y relacionarlas con funciones de la actividad cerebral. La clasificación más extendida (Peters y Kaiserman-Abramof, 1970) presenta una definición ambigua y subjetiva dependiente de la interpretación de cada individuo y por tanto discutible. Este estudio se sustenta en un conjunto de descriptores extraídos mediante una técnica de análisis topológico local para representaciones 3D. Sobre estos datos se trata de alcanzar el conjunto de clases más adecuado en el que agrupar las espinas así como de describir cada grupo mediante reglas unívocas. A partir de los resultados, se discute la existencia de un continuo de espinas y las propiedades que caracterizan a cada subtipo de espina. ---ABSTRACT---Unravel how the brain works is one of the main challenges faced by current science. A field of study which has aroused great expectations and interest is the analysis of the cortical structure from a morphological point of view, so that a molecular level simulation of the brain is achieved. This is expected to deepen the study of many neurological and pathological diseases. This project seeks the study of the soma and spines from the theoretical neuromorphology point of view. In the state of the art it is common that when it comes to analyze the morphological characteristics of a three dimension neuron the soma is ignored or, in the best case, it is replaced by a simple sphere. In fact, the concept of soma is abstract because there is not a robust and strict definition on exactly where it ends and dendrites begin. In this project a mathematical definition is reached for the first time to determine what a soma is. With the aim to simulate somas the atributes applied in the state of the art are studied. These properties, generic in nature, do not specify a unique morphology. It is why it was proposed a method to group local and global morphology properties. In arrangement of the characteristics it was proceed with the categorization of the celular body into diferent classes by using a new subtype of dynamic Bayesian network adapted to space. From the result the existance of different classes of somas and diferences among pyramidal somas from distinct brain layers are discovered. From the mathematical model virtual somas were simulated for the first time. Some morphologies of spines have been attributed to certain cognitive behaviours. For this reason it is interesting to rule the existent classes and to relate them with their functions in the brain activity. The most extended classification (Peters y Kaiserman-Abramof, 1970) presents an ambiguous and subjective definition that relies on the interpretation of each individual and consequently it is arguable. This study was based on the set of descriptors extracted from a local topological analysis technique for 3D representations. On these data it was tried to reach the most suitable set of classes to group the spines as well as to describe each cluster by unambiguous rules. From these results, the existance of a continuum of spines and the properties that characterize each spine subtype were discussed .

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La cuestión principal abordada en esta tesis doctoral es la mejora de los sistemas biométricos de reconocimiento de personas a partir de la voz, proponiendo el uso de una nueva parametrización, que hemos denominado parametrización biométrica extendida dependiente de género (GDEBP en sus siglas en inglés). No se propone una ruptura completa respecto a los parámetros clásicos sino una nueva forma de utilizarlos y complementarlos. En concreto, proponemos el uso de parámetros diferentes dependiendo del género del locutor, ya que como es bien sabido, la voz masculina y femenina presentan características diferentes que deberán modelarse, por tanto, de diferente manera. Además complementamos los parámetros clásicos utilizados (MFFC extraídos de la señal de voz), con un nuevo conjunto de parámetros extraídos a partir de la deconstrucción de la señal de voz en sus componentes de fuente glótica (más relacionada con el proceso y órganos de fonación y por tanto con características físicas del locutor) y de tracto vocal (más relacionada con la articulación acústica y por tanto con el mensaje emitido). Para verificar la validez de esta propuesta se plantean diversos escenarios, utilizando diferentes bases de datos, para validar que la GDEBP permite generar una descripción más precisa de los locutores que los parámetros MFCC clásicos independientes del género. En concreto se plantean diferentes escenarios de identificación sobre texto restringido y texto independiente utilizando las bases de datos de HESPERIA y ALBAYZIN. El trabajo también se completa con la participación en dos competiciones internacionales de reconocimiento de locutor, NIST SRE (2010 y 2012) y MOBIO 2013. En el primer caso debido a la naturaleza de las bases de datos utilizadas se obtuvieron resultados cercanos al estado del arte, mientras que en el segundo de los casos el sistema presentado obtuvo la mejor tasa de reconocimiento para locutores femeninos. A pesar de que el objetivo principal de esta tesis no es el estudio de sistemas de clasificación, sí ha sido necesario analizar el rendimiento de diferentes sistemas de clasificación, para ver el rendimiento de la parametrización propuesta. En concreto, se ha abordado el uso de sistemas de reconocimiento basados en el paradigma GMM-UBM, supervectores e i-vectors. Los resultados que se presentan confirman que la utilización de características que permitan describir los locutores de manera más precisa es en cierto modo más importante que la elección del sistema de clasificación utilizado por el sistema. En este sentido la parametrización propuesta supone un paso adelante en la mejora de los sistemas de reconocimiento biométrico de personas por la voz, ya que incluso con sistemas de clasificación relativamente simples se consiguen tasas de reconocimiento realmente competitivas. ABSTRACT The main question addressed in this thesis is the improvement of automatic speaker recognition systems, by the introduction of a new front-end module that we have called Gender Dependent Extended Biometric Parameterisation (GDEBP). This front-end do not constitute a complete break with respect to classical parameterisation techniques used in speaker recognition but a new way to obtain these parameters while introducing some complementary ones. Specifically, we propose a gender-dependent parameterisation, since as it is well known male and female voices have different characteristic, and therefore the use of different parameters to model these distinguishing characteristics should provide a better characterisation of speakers. Additionally, we propose the introduction of a new set of biometric parameters extracted from the components which result from the deconstruction of the voice into its glottal source estimate (close related to the phonation process and the involved organs, and therefore the physical characteristics of the speaker) and vocal tract estimate (close related to acoustic articulation and therefore to the spoken message). These biometric parameters constitute a complement to the classical MFCC extracted from the power spectral density of speech as a whole. In order to check the validity of this proposal we establish different practical scenarios, using different databases, so we can conclude that a GDEBP generates a more accurate description of speakers than classical approaches based on gender-independent MFCC. Specifically, we propose scenarios based on text-constrain and text-independent test using HESPERIA and ALBAYZIN databases. This work is also completed with the participation in two international speaker recognition evaluations: NIST SRE (2010 and 2012) and MOBIO 2013, with diverse results. In the first case, due to the nature of the NIST databases, we obtain results closed to state-of-the-art although confirming our hypothesis, whereas in the MOBIO SRE we obtain the best simple system performance for female speakers. Although the study of classification systems is beyond the scope of this thesis, we found it necessary to analise the performance of different classification systems, in order to verify the effect of them on the propose parameterisation. In particular, we have addressed the use of speaker recognition systems based on the GMM-UBM paradigm, supervectors and i-vectors. The presented results confirm that the selection of a set of parameters that allows for a more accurate description of the speakers is as important as the selection of the classification method used by the biometric system. In this sense, the proposed parameterisation constitutes a step forward in improving speaker recognition systems, since even when using relatively simple classification systems, really competitive recognition rates are achieved.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objectives: A recently introduced pragmatic scheme promises to be a useful catalog of interneuron names.We sought to automatically classify digitally reconstructed interneuronal morphologies according tothis scheme. Simultaneously, we sought to discover possible subtypes of these types that might emergeduring automatic classification (clustering). We also investigated which morphometric properties weremost relevant for this classification.Materials and methods: A set of 118 digitally reconstructed interneuronal morphologies classified into thecommon basket (CB), horse-tail (HT), large basket (LB), and Martinotti (MA) interneuron types by 42 of theworld?s leading neuroscientists, quantified by five simple morphometric properties of the axon and fourof the dendrites. We labeled each neuron with the type most commonly assigned to it by the experts. Wethen removed this class information for each type separately, and applied semi-supervised clustering tothose cells (keeping the others? cluster membership fixed), to assess separation from other types and lookfor the formation of new groups (subtypes). We performed this same experiment unlabeling the cells oftwo types at a time, and of half the cells of a single type at a time. The clustering model is a finite mixtureof Gaussians which we adapted for the estimation of local (per-cluster) feature relevance. We performedthe described experiments on three different subsets of the data, formed according to how many expertsagreed on type membership: at least 18 experts (the full data set), at least 21 (73 neurons), and at least26 (47 neurons).Results: Interneurons with more reliable type labels were classified more accurately. We classified HTcells with 100% accuracy, MA cells with 73% accuracy, and CB and LB cells with 56% and 58% accuracy,respectively. We identified three subtypes of the MA type, one subtype of CB and LB types each, andno subtypes of HT (it was a single, homogeneous type). We got maximum (adapted) Silhouette widthand ARI values of 1, 0.83, 0.79, and 0.42, when unlabeling the HT, CB, LB, and MA types, respectively,confirming the quality of the formed cluster solutions. The subtypes identified when unlabeling a singletype also emerged when unlabeling two types at a time, confirming their validity. Axonal morphometricproperties were more relevant that dendritic ones, with the axonal polar histogram length in the [pi, 2pi) angle interval being particularly useful.Conclusions: The applied semi-supervised clustering method can accurately discriminate among CB, HT, LB, and MA interneuron types while discovering potential subtypes, and is therefore useful for neuronal classification. The discovery of potential subtypes suggests that some of these types are more heteroge-neous that previously thought. Finally, axonal variables seem to be more relevant than dendritic ones fordistinguishing among the CB, HT, LB, and MA interneuron types.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent advances in non-destructive imaging techniques, such as X-ray computed tomography (CT), make it possible to analyse pore space features from the direct visualisation from soil structures. A quantitative characterisation of the three-dimensional solid-pore architecture is important to understand soil mechanics, as they relate to the control of biological, chemical, and physical processes across scales. This analysis technique therefore offers an opportunity to better interpret soil strata, as new and relevant information can be obtained. In this work, we propose an approach to automatically identify the pore structure of a set of 200-2D images that represent slices of an original 3D CT image of a soil sample, which can be accomplished through non-linear enhancement of the pixel grey levels and an image segmentation based on a PFCM (Possibilistic Fuzzy C-Means) algorithm. Once the solids and pore spaces have been identified, the set of 200-2D images is then used to reconstruct an approximation of the soil sample by projecting only the pore spaces. This reconstruction shows the structure of the soil and its pores, which become more bounded, less bounded, or unbounded with changes in depth. If the soil sample image quality is sufficiently favourable in terms of contrast, noise and sharpness, the pore identification is less complicated, and the PFCM clustering algorithm can be used without additional processing; otherwise, images require pre-processing before using this algorithm. Promising results were obtained with four soil samples, the first of which was used to show the algorithm validity and the additional three were used to demonstrate the robustness of our proposal. The methodology we present here can better detect the solid soil and pore spaces on CT images, enabling the generation of better 2D?3D representations of pore structures from segmented 2D images.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Automatic 2D-to-3D conversion is an important application for filling the gap between the increasing number of 3D displays and the still scant 3D content. However, existing approaches have an excessive computational cost that complicates its practical application. In this paper, a fast automatic 2D-to-3D conversion technique is proposed, which uses a machine learning framework to infer the 3D structure of a query color image from a training database with color and depth images. Assuming that photometrically similar images have analogous 3D structures, a depth map is estimated by searching the most similar color images in the database, and fusing the corresponding depth maps. Large databases are desirable to achieve better results, but the computational cost also increases. A clustering-based hierarchical search using compact SURF descriptors to characterize images is proposed to drastically reduce search times. A significant computational time improvement has been obtained regarding other state-of-the-art approaches, maintaining the quality results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El uso universal de síntesis de voz en diferentes aplicaciones requeriría un desarrollo sencillo de las nuevas voces con poca intervención manual. Teniendo en cuenta la cantidad de datos multimedia disponibles en Internet y los medios de comunicación, un objetivo interesante es el desarrollo de herramientas y métodos para construir automáticamente las voces de estilo de varios de ellos. En un trabajo anterior se esbozó una metodología para la construcción de este tipo de herramientas, y se presentaron experimentos preliminares con una base de datos multiestilo. En este artículo investigamos más a fondo esta tarea y proponemos varias mejoras basadas en la selección del número apropiado de hablantes iniciales, el uso o no de filtros de reducción de ruido, el uso de la F0 y el uso de un algoritmo de detección de música. Hemos demostrado que el mejor sistema usando un algoritmo de detección de música disminuye el error de precisión 22,36% relativo para el conjunto de desarrollo y 39,64% relativo para el montaje de ensayo en comparación con el sistema base, sin degradar el factor de mérito. La precisión media para el conjunto de prueba es 90.62% desde 76.18% para los reportajes de 99,93% para los informes meteorológicos.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

One of the biggest challenges in speech synthesis is the production of contextually-appropriate naturally sounding synthetic voices. This means that a Text-To-Speech system must be able to analyze a text beyond the sentence limits in order to select, or even modulate, the speaking style according to a broader context. Our current architecture is based on a two-step approach: text genre identification and speaking style synthesis according to the detected discourse genre. For the final implementation, a set of four genres and their corresponding speaking styles were considered: broadcast news, live sport commentaries, interviews and political speeches. In the final TTS evaluation, the four speaking styles were transplanted to the neutral voices of other speakers not included in the training database. When the transplanted styles were compared to the neutral voices, transplantation was significantly preferred and the similarity to the target speaker was as high as 78%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

On-line partial discharge (PD) measurements have become a common technique for assessing the insulation condition of installed high voltage (HV) insulated cables. When on-line tests are performed in noisy environments, or when more than one source of pulse-shaped signals are present in a cable system, it is difficult to perform accurate diagnoses. In these cases, an adequate selection of the non-conventional measuring technique and the implementation of effective signal processing tools are essential for a correct evaluation of the insulation degradation. Once a specific noise rejection filter is applied, many signals can be identified as potential PD pulses, therefore, a classification tool to discriminate the PD sources involved is required. This paper proposes an efficient method for the classification of PD signals and pulse-type noise interferences measured in power cables with HFCT sensors. By using a signal feature generation algorithm, representative parameters associated to the waveform of each pulse acquired are calculated so that they can be separated in different clusters. The efficiency of the clustering technique proposed is demonstrated through an example with three different PD sources and several pulse-shaped interferences measured simultaneously in a cable system with a high frequency current transformer (HFCT).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Phonation distortion leaves relevant marks in a speaker's biometric profile. Dysphonic voice production may be used for biometrical speaker characterization. In the present paper phonation features derived from the glottal source (GS) parameterization, after vocal tract inversion, is proposed for dysphonic voice characterization in Speaker Verification tasks. The glottal source derived parameters are matched in a forensic evaluation framework defining a distance-based metric specification. The phonation segments used in the study are derived from fillers, long vowels, and other phonation segments produced in spontaneous telephone conversations. Phonated segments from a telephonic database of 100 male Spanish native speakers are combined in a 10-fold cross-validation task to produce the set of quality measurements outlined in the paper. Shimmer, mucosal wave correlate, vocal fold cover biomechanical parameter unbalance and a subset of the GS cepstral profile produce accuracy rates as high as 99.57 for a wide threshold interval (62.08-75.04%). An Equal Error Rate of 0.64 % can be granted. The proposed metric framework is shown to behave more fairly than classical likelihood ratios in supporting the hypothesis of the defense vs that of the prosecution, thus ofering a more reliable evaluation scoring. Possible applications are Speaker Verification and Dysphonic Voice Grading.