154 resultados para means clustering
Resumo:
This paper proposes the use of Bayesian approaches with the cross likelihood ratio (CLR) as a criterion for speaker clustering within a speaker diarization system, using eigenvoice modeling techniques. The CLR has previously been shown to be an effective decision criterion for speaker clustering using Gaussian mixture models. Recently, eigenvoice modeling has become an increasingly popular technique, due to its ability to adequately represent a speaker based on sparse training data, as well as to provide an improved capture of differences in speaker characteristics. The integration of eigenvoice modeling into the CLR framework to capitalize on the advantage of both techniques has also been shown to be beneficial for the speaker clustering task. Building on that success, this paper proposes the use of Bayesian methods to compute the conditional probabilities in computing the CLR, thus effectively combining the eigenvoice-CLR framework with the advantages of a Bayesian approach to the diarization problem. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 33.5% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
“Who are you? How do you define yourself, your identity?” With these words Allan Moore opens his exhaustive new work proposing a more comprehensive approach to the musicological analysis of popular song. The last three decades have seen a huge expansion of the anthology of the sociological and cultural meanings of pop, but Moore’s book is not another exploration of this field, although some of these ideas are incorporated in this work. Rather, he addresses the limitations of conventional musicology when dealing particularly with songs: “I address popular song rather than popular music. The defining feature of popular song lies in the interaction of everyday words and music… it is how they interact that produces significance in the experience of song”.
Resumo:
Capacity probability models of generating units are commonly used in many power system reliability studies, at hierarchical level one (HLI). Analytical modelling of a generating system with many units or generating units with many derated states in a system, can result in an extensive number of states in the capacity model. Limitations on available memory and computational time of present computer facilities can pose difficulties for assessment of such systems in many studies. A cluster procedure using the nearest centroid sorting method was used for IEEE-RTS load model. The application proved to be very effective in producing a highly similar model with substantially fewer states. This paper presents an extended application of the clustering method to include capacity probability representation. A series of sensitivity studies are illustrated using IEEE-RTS generating system and load models. The loss of load and energy expectations (LOLE, LOEE), are used as indicators to evaluate the application
Resumo:
Speaker diarization determines instances of the same speaker within a recording. Extending this task to a collection of recordings for linking together segments spoken by a unique speaker requires speaker linking. In this paper we propose a speaker linking system using linkage clustering and state-of-the-art speaker recognition techniques. We evaluate our approach against two baseline linking systems using agglomerative cluster merging (AC) and agglomerative clustering with model retraining (ACR). We demonstrate that our linking method, using complete-linkage clustering, provides a relative improvement of 20% and 29% in attribution error rate (AER), over the AC and ACR systems, respectively.
Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach
Resumo:
In this paper we propose and evaluate a speaker attribution system using a complete-linkage clustering method. Speaker attribution refers to the annotation of a collection of spoken audio based on speaker identities. This can be achieved using diarization and speaker linking. The main challenge associated with attribution is achieving computational efficiency when dealing with large audio archives. Traditional agglomerative clustering methods with model merging and retraining are not feasible for this purpose. This has motivated the use of linkage clustering methods without retraining. We first propose a diarization system using complete-linkage clustering and show that it outperforms traditional agglomerative and single-linkage clustering based diarization systems with a relative improvement of 40% and 68%, respectively. We then propose a complete-linkage speaker linking system to achieve attribution and demonstrate a 26% relative improvement in attribution error rate (AER) over the single-linkage speaker linking approach.
Resumo:
Grouping users in social networks is an important process that improves matching and recommendation activities in social networks. The data mining methods of clustering can be used in grouping the users in social networks. However, the existing general purpose clustering algorithms perform poorly on the social network data due to the special nature of users' data in social networks. One main reason is the constraints that need to be considered in grouping users in social networks. Another reason is the need of capturing large amount of information about users which imposes computational complexity to an algorithm. In this paper, we propose a scalable and effective constraint-based clustering algorithm based on a global similarity measure that takes into consideration the users' constraints and their importance in social networks. Each constraint's importance is calculated based on the occurrence of this constraint in the dataset. Performance of the algorithm is demonstrated on a dataset obtained from an online dating website using internal and external evaluation measures. Results show that the proposed algorithm is able to increases the accuracy of matching users in social networks by 10% in comparison to other algorithms.
Resumo:
The SimCalc Vision and Contributions Advances in Mathematics Education 2013, pp 419-436 Modeling as a Means for Making Powerful Ideas Accessible to Children at an Early Age Richard Lesh, Lyn English, Serife Sevis, Chanda Riggs … show all 4 hide » Look Inside » Get Access Abstract In modern societies in the 21st century, significant changes have been occurring in the kinds of “mathematical thinking” that are needed outside of school. Even in the case of primary school children (grades K-2), children not only encounter situations where numbers refer to sets of discrete objects that can be counted. Numbers also are used to describe situations that involve continuous quantities (inches, feet, pounds, etc.), signed quantities, quantities that have both magnitude and direction, locations (coordinates, or ordinal quantities), transformations (actions), accumulating quantities, continually changing quantities, and other kinds of mathematical objects. Furthermore, if we ask, what kind of situations can children use numbers to describe? rather than restricting attention to situations where children should be able to calculate correctly, then this study shows that average ability children in grades K-2 are (and need to be) able to productively mathematize situations that involve far more than simple counts. Similarly, whereas nearly the entire K-16 mathematics curriculum is restricted to situations that can be mathematized using a single input-output rule going in one direction, even the lives of primary school children are filled with situations that involve several interacting actions—and which involve feedback loops, second-order effects, and issues such as maximization, minimization, or stabilizations (which, many years ago, needed to be postponed until students had been introduced to calculus). …This brief paper demonstrates that, if children’s stories are used to introduce simulations of “real life” problem solving situations, then average ability primary school children are quite capable of dealing productively with 60-minute problems that involve (a) many kinds of quantities in addition to “counts,” (b) integrated collections of concepts associated with a variety of textbook topic areas, (c) interactions among several different actors, and (d) issues such as maximization, minimization, and stabilization.
Resumo:
Large Igneous Provinces are exceptional intraplate igneous events throughout Earth’s history. Their significance and potential global impact is related to the total volume of magma intruded and released during these geologically brief events (peak eruptions are often within 1-5 Myrs duration) where millions to tens of millions of cubic kilometers of magma are produced. In some cases, at least 1% of the Earth’s surface has been directly covered in volcanic rock, being equivalent to the size of small continents with comparable crustal thicknesses. Large Igneous Provinces are thus important, albeit episodic episodes of new crust addition. However, most magmatism is basaltic so that contributions to crustal growth will not always be picked up in zircon geochronology studies that better trace major episodes of extension-related silicic magmatism and the silicic Large Igneous Provinces. Much headway has been made on our understanding of these anomalous igneous events over the last 25 years, driving many new ideas and models. This includes their: 1) global spatial and temporal distribution, with a long-term average of one event approximately every 20 Myrs, but a clear clustering of events at times of supercontinent break-up – Large Igneous Provinces are thus an integral part of the Wilson cycle and are becoming an increasingly important tool in reconnecting dispersed continental fragments; 2) compositional diversity that in part reflects their crustal setting of ocean basins, and continental interiors and margins where in the latter setting, LIP magmatism can be silicicdominant; 3) mineral and energy resources with major PGE and precious metal resources being hosted in these provinces, as well as magmatism impacting on the hydrocarbon potential of volcanic basins and rifted margins through enhancing source rock maturation, providing fluid migration pathways, and trap formation; 4) biospheric, hydrospheric and atmospheric impacts, with Large Igneous Provinces now widely regarded as a key trigger mechanism for mass extinctions, although the exact kill mechanism(s) are still being resolved; 5) role in mantle geodynamics and thermal evolution of the Earth, by potentially recording the transport of material from the lower mantle or core-mantle boundary to the Earth's surface and being a fundamental component in whole mantle convection models; and 6) recognition on the inner planets where the lack of plate tectonics and erosional processes and planetary antiquity means that the very earliest record of LIP events during planetary evolution may be better preserved than on Earth.
Resumo:
Standard differential equation–based models of collective cell behaviour, such as the logistic growth model, invoke a mean–field assumption which is equivalent to assuming that individuals within the population interact with each other in proportion to the average population density. Implementing such assumptions implies that the dynamics of the system are unaffected by spatial structure, such as the formation of patches or clusters within the population. Recent theoretical developments have introduced a class of models, known as moment dynamics models, which aim to account for the dynamics of individuals, pairs of individuals, triplets of individuals and so on. Such models enable us to describe the dynamics of populations with clustering, however, little progress has been made with regard to applying moment dynamics models to experimental data. Here, we report new experimental results describing the formation of a monolayer of cells using two different cell types: 3T3 fibroblast cells and MDA MB 231 breast cancer cells. Our analysis indicates that the 3T3 fibroblast cells are relatively motile and we observe that the 3T3 fibroblast monolayer forms without clustering. Alternatively, the MDA MB 231 cells are less motile and we observe that the MDA MB 231 monolayer formation is associated with significant clustering. We calibrate a moment dynamics model and a standard mean–field model to both data sets. Our results indicate that the mean–field and moment dynamics models provide similar descriptions of the 3T3 fibroblast monolayer formation whereas these two models give very different predictions for the MDA MD 231 monolayer formation. These outcomes indicate that standard mean–field models of collective cell behaviour are not always appropriate and that care ought to be exercised when implementing such a model.
Resumo:
Automated process discovery techniques aim at extracting models from information system logs in order to shed light into the business processes supported by these systems. Existing techniques in this space are effective when applied to relatively small or regular logs, but otherwise generate large and spaghetti-like models. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. The result is a collection of process models -- each one representing a variant of the business process -- as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically by means of subprocess extraction. The proposed technique allows users to set a desired bound for the complexity of the produced models. Experiments on real-life logs show that the technique produces collections of models that are up to 64% smaller than those extracted under the same complexity bounds by applying existing trace clustering techniques.
Resumo:
The issue of particle emissions from diesel engines is still a matter of concern due its deleterious effects both on human health and environment(Ristovski et al., 2012). Recently, International Agency for Research on Cancer (IARC) inclusion of diesel engine exhaust particles as carcinogenic to human health added a new margin on it. Apart from the use of after treatment technology, biodiesel is also considered as potential way to reduce particle emission alongside with other emissions(Xue, Grift, & Hansen, 2011). Global biodiesel production is still reasonably small compared to its counterpart fossil diesel, but even this small amount comes from a wide variety of feed stocks. Contrary to fossil diesel, the important physicochemical properties of biodiesel vary among different feed stocks(Hoekman, Broch, Robbins, Ceniceros, & Natarajan, 2012).
Resumo:
This paper examines ‘green’ entrepreneurial nascent and young firms in Australia. Findings of interest in this paper include: • Green entrepreneurs are more likely to be highly educated and have an extended depth of experience within their industry and are more likely to have started a business prior to their current venture. • Green entrepreneurs exhibit increased levels of innovation, with an increased focus on new & high technology, R&D and the development of proprietary technology. • Green entrepreneurs are most likely to be based upon a product rather than a service and have a higher emphasis upon growth when compared with non-green entrepreneurs. • Green entrepreneurial firms tend to have a longer venture creation process and draw financial resources from a larger number of sources and rely more upon equity as a means of financing their venture.
Resumo:
We propose a cluster ensemble method to map the corpus documents into the semantic space embedded in Wikipedia and group them using multiple types of feature space. A heterogeneous cluster ensemble is constructed with multiple types of relations i.e. document-term, document-concept and document-category. A final clustering solution is obtained by exploiting associations between document pairs and hubness of the documents. Empirical analysis with various real data sets reveals that the proposed meth-od outperforms state-of-the-art text clustering approaches.
Resumo:
This paper examines the use of short video tutorials in a post-graduate accounting subject, as a means of helping students transition from dependent to more independent learners. Five short (three to five minute) video tutorials were introduced in an effort to shift the reliance for learning from the lecturer to the student. Students’ usage of video tutorials, comments by students, and reliance on teaching staff for individual assistance were monitored over three semesters from 2008 to 2009. Interviews with students were then conducted in late 2009 to more comprehensively evaluate the use and benefits of video tutorials. Findings reveal preliminary but positive outcomes in terms of both more efficient teaching and more effective learning.