867 resultados para speaker clustering
Resumo:
Speaker diarization is the process of sorting speeches according to the speaker. Diarization helps to search and retrieve what a certain speaker uttered in a meeting. Applications of diarization systemsextend to other domains than meetings, for example, lectures, telephone, television, and radio. Besides, diarization enhances the performance of several speech technologies such as speaker recognition, automatic transcription, and speaker tracking. Methodologies previously used in developing diarization systems are discussed. Prior results and techniques are studied and compared. Methods such as Hidden Markov Models and Gaussian Mixture Models that are used in speaker recognition and other speech technologies are also used in speaker diarization. The objective of this thesis is to develop a speaker diarization system in meeting domain. Experimental part of this work indicates that zero-crossing rate can be used effectively in breaking down the audio stream into segments, and adaptive Gaussian Models fit adequately short audio segments. Results show that 35 Gaussian Models and one second as average length of each segment are optimum values to build a diarization system for the tested data. Uniting the segments which are uttered by same speaker is done in a bottom-up clustering by a newapproach of categorizing the mixture weights.
Resumo:
Two new features have been proposed and used in the Rich Transcription Evaluation 2009 by the Universidad Politécnica de Madrid, which outperform the results of the baseline system. One of the features is the intensity channel contribution, a feature related to the location of the speaker. The second feature is the logarithm of the interpolated fundamental frequency. It is the first time that both features are applied to the clustering stage of multiple distant microphone meetings diarization. It is shown that the inclusion of both features improves the baseline results by 15.36% and 16.71% relative to the development set and the RT 09 set, respectively. If we consider speaker errors only, the relative improvement is 23% and 32.83% on the development set and the RT09 set, respectively.
Resumo:
Different types of water bodies, including lakes, streams, and coastal marine waters, are often susceptible to fecal contamination from a range of point and nonpoint sources, and have been evaluated using fecal indicator microorganisms. The most commonly used fecal indicator is Escherichia coli, but traditional cultivation methods do not allow discrimination of the source of pollution. The use of triplex PCR offers an approach that is fast and inexpensive, and here enabled the identification of phylogroups. The phylogenetic distribution of E. coli subgroups isolated from water samples revealed higher frequencies of subgroups A1 and B23 in rivers impacted by human pollution sources, while subgroups D1 and D2 were associated with pristine sites, and subgroup B1 with domesticated animal sources, suggesting their use as a first screening for pollution source identification. A simple classification is also proposed based on phylogenetic subgroup distribution using the w-clique metric, enabling differentiation of polluted and unpolluted sites.
Resumo:
In the southern region of Mato Grosso do Sul state, Brazil, a foot-and-mouth disease (FMD) epidemic started in September 2005. A total of 33 outbreaks were detected and 33,741 FMD-susceptible animals were slaughtered and destroyed. There were no reports of FMD cases in other species than bovines. Based on the data of this epidemic, it was carried out an analysis using the K-function and it was observed spatial clustering of outbreaks within a range of 25km. This observation may be related to the dynamics of foot-and-mouth disease spread and to the measures undertaken to control the disease dissemination. The control measures were effective once the disease did not spread to farms more than 47 km apart from the initial outbreaks.
Resumo:
Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.
Resumo:
10 lectal variables were examined with respect to Norwegian speakers' acceptance of long-distance reflexives (LDR), using a questionnaire to elicit grammaticality judgements on 50 potential LDR sentences. A sample of 180 speakers completed the questionnaire. The data was analysed using a general linear model univariate model, and Spearman's correlation. In this sample the results showed that dialect and level of education had significant effects on speakers' acceptance of long-distance reflexives, while sex, age, being a native speaker, having both native-speaker parents, living in the city or the country, and the speaker's attitudes to the two Norwegian writing languages had no influence on speakers' acceptance of long-distance reflexives. It is suggested that the influence of Danish on Norwegian writing and on the southern dialects may be the cause of the observed variation with respect to LDR in Norwegian.
Resumo:
Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.
Resumo:
The high-affinity receptors for human granulocyte-macrophage colony-stimulating factor (GM-CSF), interleukin-1 (IL-3), and IL-5 are heterodimeric complexes consisting of cytokine-specific alpha subunits and a common signal-transducing beta subunit (h beta c). We have previously demonstrated the oncogenic potential of this group of receptors by identifying constitutively activating point mutations in the extracellular and transmembrane domains of h beta c. We report here a comprehensive screen of the entire h beta c molecule that has led to the identification of additional constitutive point mutations by virtue of their ability to confer factor independence on murine FDC-P1 cells. These mutations were clustered exclusively in a central region of h beta c that encompasses the extracellular membrane-proximal domain, transmembrane domain, and membrane-proximal region of the cytoplasmic domain. Interestingly, most h beta c mutants exhibited cell type-specific constitutive activity, with only two transmembrane domain mutants able to confer factor independence on both murine FDC-P1 and BAF-B03 cells. Examination of the biochemical properties of these mutants in FDC-P1 cells indicated that MAP kinase (ERK1/2), STAT, and JAK2 signaling molecules were constitutively activated. In contrast, only some of the mutant beta subunits were constitutively tyrosine phosphorylated. Taken together; these results highlight key regions involved in h beta c activation, dissociate h beta c tyrosine phosphorylation from MAP kinase and STAT activation, and suggest the involvement of distinct mechanisms by which proliferative signals can be generated by h beta c. (C) 1998 by The American Society of Hematology.
Resumo:
A graph clustering algorithm constructs groups of closely related parts and machines separately. After they are matched for the least intercell moves, a refining process runs on the initial cell formation to decrease the number of intercell moves. A simple modification of this main approach can deal with some practical constraints, such as the popular constraint of bounding the maximum number of machines in a cell. Our approach makes a big improvement in the computational time. More importantly, improvement is seen in the number of intercell moves when the computational results were compared with best known solutions from the literature. (C) 2009 Elsevier Ltd. All rights reserved.
Resumo:
Geospatial clustering must be designed in such a way that it takes into account the special features of geoinformation and the peculiar nature of geographical environments in order to successfully derive geospatially interesting global concentrations and localized excesses. This paper examines families of geospaital clustering recently proposed in the data mining community and identifies several features and issues especially important to geospatial clustering in data-rich environments.