116 resultados para height partition clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Standard differential equation–based models of collective cell behaviour, such as the logistic growth model, invoke a mean–field assumption which is equivalent to assuming that individuals within the population interact with each other in proportion to the average population density. Implementing such assumptions implies that the dynamics of the system are unaffected by spatial structure, such as the formation of patches or clusters within the population. Recent theoretical developments have introduced a class of models, known as moment dynamics models, which aim to account for the dynamics of individuals, pairs of individuals, triplets of individuals and so on. Such models enable us to describe the dynamics of populations with clustering, however, little progress has been made with regard to applying moment dynamics models to experimental data. Here, we report new experimental results describing the formation of a monolayer of cells using two different cell types: 3T3 fibroblast cells and MDA MB 231 breast cancer cells. Our analysis indicates that the 3T3 fibroblast cells are relatively motile and we observe that the 3T3 fibroblast monolayer forms without clustering. Alternatively, the MDA MB 231 cells are less motile and we observe that the MDA MB 231 monolayer formation is associated with significant clustering. We calibrate a moment dynamics model and a standard mean–field model to both data sets. Our results indicate that the mean–field and moment dynamics models provide similar descriptions of the 3T3 fibroblast monolayer formation whereas these two models give very different predictions for the MDA MD 231 monolayer formation. These outcomes indicate that standard mean–field models of collective cell behaviour are not always appropriate and that care ought to be exercised when implementing such a model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Numerical investigation is carried out for natural convection heat transfer in an isosceles triangular enclosure partitioned in the centre by a vertical wall with infinite conductivity. A sudden temperature difference between two zones of the enclosure has been imposed to trigger the natural convection. As a result, heat is transferred between both sides of the enclosure through the conducting vertical wall with natural convection boundary layers forming adjacent to the middle partition and two inclined surfaces. The Finite Volume based software, Ansys 14.5 (Fluent) is used for the numerical simulations. The numerical results are obtained for different values of aspect ratio, A (0.2, 0.5 and 1.0) and Rayleigh number, Ra (10^5 <= Ra <= 10^8) for a fixed Prandtl number, Pr = 0.72 of air. It is anticipated from the numerical simulations that the coupled thermal boundary layers development adjacent to the partition undergoes several distinct stages including an initial stage, a transitional stage and a steady stage. Time dependent features of the coupled thermal boundary layers as well as the overall natural convection flow in the partitioned enclosure have been discussed in this study.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

An Application Specific Instruction-set Processor (ASIP) is a specialized processor tailored to run a particular application/s efficiently. However, when there are multiple candidate applications in the application’s domain it is difficult and time consuming to find optimum set of applications to be implemented. Existing ASIP design approaches perform this selection manually based on a designer’s knowledge. We help in cutting down the number of candidate applications by devising a classification method to cluster similar applications based on the special-purpose operations they share. This provides a significant reduction in the comparison overhead while resulting in customized ASIP instruction sets which can benefit a whole family of related applications. Our method gives users the ability to quantify the degree of similarity between the sets of shared operations to control the size of clusters. A case study involving twelve algorithms confirms that our approach can successfully cluster similar algorithms together based on the similarity of their component operations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous growth of the XML data poses a great concern in the area of XML data management. The need for processing large amounts of XML data brings complications to many applications, such as information retrieval, data integration and many others. One way of simplifying this problem is to break the massive amount of data into smaller groups by application of clustering techniques. However, XML clustering is an intricate task that may involve the processing of both the structure and the content of XML data in order to identify similar XML data. This research presents four clustering methods, two methods utilizing the structure of XML documents and the other two utilizing both the structure and the content. The two structural clustering methods have different data models. One is based on a path model and other is based on a tree model. These methods employ rigid similarity measures which aim to identifying corresponding elements between documents with different or similar underlying structure. The two clustering methods that utilize both the structural and content information vary in terms of how the structure and content similarity are combined. One clustering method calculates the document similarity by using a linear weighting combination strategy of structure and content similarities. The content similarity in this clustering method is based on a semantic kernel. The other method calculates the distance between documents by a non-linear combination of the structure and content of XML documents using a semantic kernel. Empirical analysis shows that the structure-only clustering method based on the tree model is more scalable than the structure-only clustering method based on the path model as the tree similarity measure for the tree model does not need to visit the parents of an element many times. Experimental results also show that the clustering methods perform better with the inclusion of the content information on most test document collections. To further the research, the structural clustering method based on tree model is extended and employed in XML transformation. The results from the experiments show that the proposed transformation process is faster than the traditional transformation system that translates and converts the source XML documents sequentially. Also, the schema matching process of XML transformation produces a better matching result in a shorter time.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering identities in a broadcast video is a useful task to aid in video annotation and retrieval. Quality based frame selection is a crucial task in video face clustering, to both improve the clustering performance and reduce the computational cost. We present a frame work that selects the highest quality frames available in a video to cluster the face. This frame selection technique is based on low level and high level features (face symmetry, sharpness, contrast and brightness) to select the highest quality facial images available in a face sequence for clustering. We also consider the temporal distribution of the faces to ensure that selected faces are taken at times distributed throughout the sequence. Normalized feature scores are fused and frames with high quality scores are used in a Local Gabor Binary Pattern Histogram Sequence based face clustering system. We present a news video database to evaluate the clustering system performance. Experiments on the newly created news database show that the proposed method selects the best quality face images in the video sequence, resulting in improved clustering performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

To the Editor: In affluent-urban areas of India, overweight (6 %) and obesity (8 %) are prevalent in children as young as 2–5 y [1]. A potential risk factor for childhood obesity could be parent’s under-reporting their child’s anthropometry. In Indian culture, a larger body size is typically acceptable, and mothers may consider a chubby baby as healthy [2]. Therefore, it was proposed that Indian mothers may under-report their child’s weight status. The present study examined the validity of maternal reported height and weight of young, urban-affluent Indian children aged 2–5 y. After receiving approval from the QUT Human Research Ethics Committee, Australia 111 mothers with children aged 2–5 y attending private medical clinics (n = 5) in the affluent areas of Mumbai were recruited. Child’s height and weight were measured by the researcher using standard equipment/protocols. Mothers also reported their child’s height and weight.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K-means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley’s Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Information on the variation available for different plant attributes has enabled germplasm collections to be effectively utilised in plant breeding. A world sourced collection of white clover germplasm has been developed at the White Clover Resource Centre at Glen Innes, New South Wales. This collection of 439 accessions was characterised under field conditions as a preliminary study of the genotypic variation for morphological attributes; stolon density, stolon branching, number of nodes. number of rooted nodes, stolon thickness, internode length, leaf length, plant height and plant spread, together with seasonal herbage yield. Characterisation was conducted on different batches of germplasm (subsets of accessions taken from the complete collection) over a period of five years. Inclusion of two check cultivars, Haifa and Huia, in each batch enabled adjustment of the characterisation data for year effects and attribute-by-year interaction effects. The component of variance for seasonal herbage yield among batches was large relative to that for accessions. Accession-by-experiment and accession-by-season interactions for herbage yield were not detected. Accession mean repeatability for herbage yield across seasons was intermediate (0.453). The components of genotypic variance among accessions for all attributes, except plant height, were larger than their respective standard errors. The estimates of accession mean repeatability for the attributes ranged from low (0.277 for plant height) to intermediate (0.544 for internode length). Multivariate techniques of clustering and ordination were used to investigate the diversity present among the accessions in the collection. Both cluster analysis and principal component analysis suggested that seven groups of accessions existed. It was also proposed from the pattern analysis results that accessions from a group characterised by large leaves, tall plants and thick stolons could be crossed with accessions from a group that had above average stolon density and stolon branching. This material could produce breeding populations to be used in recurrent selection for the development of white clover cultivars for dryland summer moisture stress environments in Australia. The germplasm collection was also found to be deficient in genotypes with high stolon density, high number of branches high number of rooted nodes and large leaves. This warrants addition of new germplasm accessions possessing these characteristics to the present germplasm collection.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis presents new methods for classification and thematic grouping of billions of web pages, at scales previously not achievable. This process is also known as document clustering, where similar documents are automatically associated with clusters that represent various distinct topic. These automatically discovered topics are in turn used to improve search engine performance by only searching the topics that are deemed relevant to particular user queries.