59 resultados para speaker clustering


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Online music databases have increased significantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic single and multi-label music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is built in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multi-variate statistical approaches: principal components analysis (unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under the Gaussian hypothesis (supervised) and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by using the kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Biological neuronal networks constitute a special class of dynamical systems, as they are formed by individual geometrical components, namely the neurons. In the existing literature, relatively little attention has been given to the influence of neuron shape on the overall connectivity and dynamics of the emerging networks. The current work addresses this issue by considering simplified neuronal shapes consisting of circular regions (soma/axons) with spokes (dendrites). Networks are grown by placing these patterns randomly in the two-dimensional (2D) plane and establishing connections whenever a piece of dendrite falls inside an axon. Several topological and dynamical properties of the resulting graph are measured, including the degree distribution, clustering coefficients, symmetry of connections, size of the largest connected component, as well as three hierarchical measurements of the local topology. By varying the number of processes of the individual basic patterns, we can quantify relationships between the individual neuronal shape and the topological and dynamical features of the networks. Integrate-and-fire dynamics on these networks is also investigated with respect to transient activation from a source node, indicating that long-range connections play an important role in the propagation of avalanches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The k(0)-method instrumental neutron activation analysis (k(0)-INAA) was employed for determining chemical elements in bird feathers. A collection was obtained taking into account several bird species from wet ecosystems in diverse regions of Brazil. For comparison reason, feathers were actively sampled in a riparian forest from the Marins Stream, Piracicaba, Sao Paulo State, using mist nets specific for capturing birds. Biological certified reference materials were used for assessing the quality of analytical procedure. Quantification of chemical elements was performed using the k(0)-INAA Quantu Software. Sixteen chemical elements, including macro and micronutrients, and trace elements, have been quantified in feathers, in which analytical uncertainties varied from 2% to 40% depending on the chemical element mass fraction. Results indicated high mass fractions of Br (max=7.9 mgkg(-1)), Co (max= 0.47 mg kg(-1)), Cr (max =68 mg kg(-1)), Hg (max =2.79 mg kg(-1)), Sb (max= 0.20 mg kg(-1)), Se (max=1.3 mg kg(-1)) and Zn (max =192 mg kg(-1)) in bird feathers, probably associated with the degree of pollution of the areas evaluated. In order to corroborate the use of k(0)-INAA results in biomonitoring studies using avian community, different factor analysis methods were used to check chemical element source apportionment and locality clustering based on feather chemical composition. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Various molecular systems are available for epidemiological, genetic, evolutionary, taxonomic and systematic studies of innumerable fungal infections, especially those caused by the opportunistic pathogen C. albicans. A total of 75 independent oral isolates were selected in order to compare Multilocus Enzyme Electrophoresis (MLEE), Electrophoretic Karyotyping (EK) and Microsatellite Markers (Simple Sequence Repeats - SSRs), in their abilities to differentiate and group C. albicans isolates (discriminatory power), and also, to evaluate the concordance and similarity of the groups of strains determined by cluster analysis for each fingerprinting method. Isoenzyme typing was performed using eleven enzyme systems: Adh, Sdh, M1p, Mdh, Idh, Gdh, G6pdh, Asd, Cat, Po, and Lap (data previously published). The EK method consisted of chromosomal DNA separation by pulsed-field gel electrophoresis using a CHEF system. The microsatellite markers were investigated by PCR using three polymorphic loci: EF3, CDC3, and HIS3. Dendrograms were generated by the SAHN method and UPGMA algorithm based on similarity matrices (S(SM)). The discriminatory power of the three methods was over 95%, however a paired analysis among them showed a parity of 19.7-22.4% in the identification of strains. Weak correlation was also observed among the genetic similarity matrices (S(SM)(MLEE) x S(SM)(EK) x S(SM)(SSRs)). Clustering analyses showed a mean of 9 +/- 12.4 isolates per cluster (3.8 +/- 8 isolates/taxon) for MLEE, 6.2 +/- 4.9 isolates per cluster (4 +/- 4.5 isolates/taxon) for SSRs, and 4.1 +/- 2.3 isolates per cluster (2.6 +/- 2.3 isolates/taxon) for EK. A total of 45 (13%), 39(11.2%), 5 (1.4%) and 3 (0.9%) clusters pairs from 347 showed similarity (Si) of 0.1-10%, 10.1-20%, 20.1-30% and 30.1-40%, respectively. Clinical and molecular epidemiological correlation involving the opportunistic pathogen C. albicans may be attributed dependently of each method of genotyping (i.e., MLEE, EK, and SSRs) supplemented with similarity and grouping analysis. Therefore, the use of genotyping systems that give results which offer minimum disparity, or the combination of the results of these systems, can provide greater security and consistency in the determination of strains and their genetic relationships. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Today several different unsupervised classification algorithms are commonly used to cluster similar patterns in a data set based only on its statistical properties. Specially in image data applications, self-organizing methods for unsupervised classification have been successfully applied for clustering pixels or group of pixels in order to perform segmentation tasks. The first important contribution of this paper refers to the development of a self-organizing method for data classification, named Enhanced Independent Component Analysis Mixture Model (EICAMM), which was built by proposing some modifications in the Independent Component Analysis Mixture Model (ICAMM). Such improvements were proposed by considering some of the model limitations as well as by analyzing how it should be improved in order to become more efficient. Moreover, a pre-processing methodology was also proposed, which is based on combining the Sparse Code Shrinkage (SCS) for image denoising and the Sobel edge detector. In the experiments of this work, the EICAMM and other self-organizing models were applied for segmenting images in their original and pre-processed versions. A comparative analysis showed satisfactory and competitive image segmentation results obtained by the proposals presented herein. (C) 2008 Published by Elsevier B.V.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The taxonomy of the N(2)-fixing bacteria belonging to the genus Bradyrhizobium is still poorly refined, mainly due to conflicting results obtained by the analysis of the phenotypic and genotypic properties. This paper presents an application of a method aiming at the identification of possible new clusters within a Brazilian collection of 119 Bradryrhizobium strains showing phenotypic characteristics of B. japonicum and B. elkanii. The stability was studied as a function of the number of restriction enzymes used in the RFLP-PCR analysis of three ribosomal regions with three restriction enzymes per region. The method proposed here uses Clustering algorithms with distances calculated by average-linkage clustering. Introducing perturbations using sub-sampling techniques makes the stability analysis. The method showed efficacy in the grouping of the species B. japonicum and B. elkanii. Furthermore, two new clusters were clearly defined, indicating possible new species, and sub-clusters within each detected cluster. (C) 2008 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper analyses the presence of financial constraint in the investment decisions of 367 Brazilian firms from 1997 to 2004, using a Bayesian econometric model with group-varying parameters. The motivation for this paper is the use of clustering techniques to group firms in a totally endogenous form. In order to classify the firms we used a hybrid clustering method, that is, hierarchical and non-hierarchical clustering techniques jointly. To estimate the parameters a Bayesian approach was considered. Prior distributions were assumed for the parameters, classifying the model in random or fixed effects. Ordinate predictive density criterion was used to select the model providing a better prediction. We tested thirty models and the better prediction considers the presence of 2 groups in the sample, assuming the fixed effect model with a Student t distribution with 20 degrees of freedom for the error. The results indicate robustness in the identification of financial constraint when the firms are classified by the clustering techniques. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper, a framework for detection of human skin in digital images is proposed. This framework is composed of a training phase and a detection phase. A skin class model is learned during the training phase by processing several training images in a hybrid and incremental fuzzy learning scheme. This scheme combines unsupervised-and supervised-learning: unsupervised, by fuzzy clustering, to obtain clusters of color groups from training images; and supervised to select groups that represent skin color. At the end of the training phase, aggregation operators are used to provide combinations of selected groups into a skin model. In the detection phase, the learned skin model is used to detect human skin in an efficient way. Experimental results show robust and accurate human skin detection performed by the proposed framework.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The canonical representation of speech constitutes a perfect reconstruction (PR) analysis-synthesis system. Its parameters are the autoregressive (AR) model coefficients, the pitch period and the voiced and unvoiced components of the excitation represented as transform coefficients. Each set of parameters may be operated on independently. A time-frequency unvoiced excitation (TFUNEX) model is proposed that has high time resolution and selective frequency resolution. Improved time-frequency fit is obtained by using for antialiasing cancellation the clustering of pitch-synchronous transform tracks defined in the modulation transform domain. The TFUNEX model delivers high-quality speech while compressing the unvoiced excitation representation about 13 times over its raw transform coefficient representation for wideband speech.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We study the spreading of contagious diseases in a population of constant size using susceptible-infective-recovered (SIR) models described in terms of ordinary differential equations (ODEs) and probabilistic cellular automata (PCA). In the PCA model, each individual (represented by a cell in the lattice) is mainly locally connected to others. We investigate how the topological properties of the random network representing contacts among individuals influence the transient behavior and the permanent regime of the epidemiological system described by ODE and PCA. Our main conclusions are: (1) the basic reproduction number (commonly called R(0)) related to a disease propagation in a population cannot be uniquely determined from some features of transient behavior of the infective group; (2) R(0) cannot be associated to a unique combination of clustering coefficient and average shortest path length characterizing the contact network. We discuss how these results can embarrass the specification of control strategies for combating disease propagations. (C) 2009 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents the design and implementation of an embedded soft sensor, i. e., a generic and autonomous hardware module, which can be applied to many complex plants, wherein a certain variable cannot be directly measured. It is implemented based on a fuzzy identification algorithm called ""Limited Rules"", employed to model continuous nonlinear processes. The fuzzy model has a Takagi-Sugeno-Kang structure and the premise parameters are defined based on the Fuzzy C-Means (FCM) clustering algorithm. The firmware contains the soft sensor and it runs online, estimating the target variable from other available variables. Tests have been performed using a simulated pH neutralization plant. The results of the embedded soft sensor have been considered satisfactory. A complete embedded inferential control system is also presented, including a soft sensor and a PID controller. (c) 2007, ISA. Published by Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Using the network random generation models from Gustedt (2009)[23], we simulate and analyze several characteristics (such as the number of components, the degree distribution and the clustering coefficient) of the generated networks. This is done for a variety of distributions (fixed value, Bernoulli, Poisson, binomial) that are used to control the parameters of the generation process. These parameters are in particular the size of newly appearing sets of objects, the number of contexts in which new elements appear initially, the number of objects that are shared with `parent` contexts, and, the time period inside which a context may serve as a parent context (aging). The results show that these models allow to fine-tune the generation process such that the graphs adopt properties as can be found in real world graphs. (C) 2011 Elsevier B.V. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Sound source localization (SSL) is an essential task in many applications involving speech capture and enhancement. As such, speaker localization with microphone arrays has received significant research attention. Nevertheless, existing SSL algorithms for small arrays still have two significant limitations: lack of range resolution, and accuracy degradation with increasing reverberation. The latter is natural and expected, given that strong reflections can have amplitudes similar to that of the direct signal, but different directions of arrival. Therefore, correctly modeling the room and compensating for the reflections should reduce the degradation due to reverberation. In this paper, we show a stronger result. If modeled correctly, early reflections can be used to provide more information about the source location than would have been available in an anechoic scenario. The modeling not only compensates for the reverberation, but also significantly increases resolution for range and elevation. Thus, we show that under certain conditions and limitations, reverberation can be used to improve SSL performance. Prior attempts to compensate for reverberation tried to model the room impulse response (RIR). However, RIRs change quickly with speaker position, and are nearly impossible to track accurately. Instead, we build a 3-D model of the room, which we use to predict early reflections, which are then incorporated into the SSL estimation. Simulation results with real and synthetic data show that even a simplistic room model is sufficient to produce significant improvements in range and elevation estimation, tasks which would be very difficult when relying only on direct path signal components.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Survival models involving frailties are commonly applied in studies where correlated event time data arise due to natural or artificial clustering. In this paper we present an application of such models in the animal breeding field. Specifically, a mixed survival model with a multivariate correlated frailty term is proposed for the analysis of data from over 3611 Brazilian Nellore cattle. The primary aim is to evaluate parental genetic effects on the trait length in days that their progeny need to gain a commercially specified standard weight gain. This trait is not measured directly but can be estimated from growth data. Results point to the importance of genetic effects and suggest that these models constitute a valuable data analysis tool for beef cattle breeding.