Biblioteca Digital

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Veja mais

Manufacturing Yield Improvement by Clustering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.

Veja mais

Using Fiber Bragg Grating (FBG) sensors for vertical displacement measurement of bridges

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many bridges, vertical displacements are one of the most relevant parameters for structural health monitoring in both the short and long terms. Bridge managers around the globe are always looking for a simple way to measure vertical displacements of bridges. However, it is difficult to carry out such measurements. On the other hand, in recent years, with the advancement of fiber-optic technologies, fiber Bragg grating (FBG) sensors are more commonly used in structural health monitoring due to their outstanding advantages including multiplexing capability, immunity of electromagnetic interference as well as high resolution and accuracy. For these reasons, using FBG sensors is proposed to develop a simple, inexpensive and practical method to measure vertical displacements of bridges. A curvature approach for vertical displacement measurement using curvature measurements is proposed. In addition, with the successful development of a FBG tilt sensors, an inclination approach is also proposed using inclination measurements. A series of simulation tests of a full-scale bridge was conducted. It shows that both the approaches can be implemented to determine vertical displacements for bridges with various support conditions, varying stiffness (EI) along the spans and without any prior known loading. These approaches can thus measure vertical displacements for most of slab-on-girder and box-girder bridges. Moreover, with the advantages of FBG sensors, they can be implemented to monitor bridge behavior remotely and in real time. Further recommendations of these approaches for developments will also be discussed at the end of the paper.

Veja mais

Identifying differences in safe roads and crash prone roads using clustering data mining

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.

Veja mais

Utilising semantic tags in XML clustering

Relevância:

20.00% 20.00%

Publicador:

Veja mais

Fiber Bragg Grating sensors for railway systems

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Fiber Bragg grating (FBG) sensor technology has been attracting substantial industrial interests for the last decade. FBG sensors have seen increasing acceptance and widespread use for structural sensing and health monitoring applications in composites, civil engineering, aerospace, marine, oil & gas, and smart structures. One transportation system that has been benefitted tremendously from this technology is railways, where it is of the utmost importance to understand the structural and operating conditions of rails as well as that of freight and passenger service cars to ensure safe and reliable operation. Fiberoptic sensors, mostly in the form of FBGs, offer various important characteristics, such as EMI/RFI immunity, multiplexing capability, and very long-range interrogation (up to 230 km between FBGs and measurement unit), over the conventional electrical sensors for the distinctive operational conditions in railways. FBG sensors are unique from other types of fiber-optic sensors as the measured information is wavelength-encoded, which provides self-referencing and renders their signals less susceptible to intensity fluctuations. In addition, FBGs are reflective sensors that can be interrogated from either end, providing redundancy to FBG sensing networks. These two unique features are particularly important for the railway industry where safe and reliable operations are the major concerns. Furthermore, FBGs are very versatile and transducers based on FBGs can be designed to measure a wide range of parameters such as acceleration and inclination. Consequently, a single interrogator can deal with a large number of FBG sensors to measure a multitude of parameters at different locations that spans over a large area.

Veja mais

Aggregate distance based clustering using Fibonacci Series -FIBCLUS

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

Veja mais

Clustering of web users using the tensor decomposed models

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.

Veja mais

Classifying the user intent of web queries using k-means clustering

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose: Web search engines are frequently used by people to locate information on the Internet. However, not all queries have an informational goal. Instead of information, some people may be looking for specific web sites or may wish to conduct transactions with web services. This paper aims to focus on automatically classifying the different user intents behind web queries. Design/methodology/approach: For the research reported in this paper, 130,000 web search engine queries are categorized as informational, navigational, or transactional using a k-means clustering approach based on a variety of query traits. Findings: The research findings show that more than 75 percent of web queries (clustered into eight classifications) are informational in nature, with about 12 percent each for navigational and transactional. Results also show that web queries fall into eight clusters, six primarily informational, and one each of primarily transactional and navigational. Research limitations/implications: This study provides an important contribution to web search literature because it provides information about the goals of searchers and a method for automatically classifying the intents of the user queries. Automatic classification of user intent can lead to improved web search engines by tailoring results to specific user needs. Practical implications: The paper discusses how web search engines can use automatically classified user queries to provide more targeted and relevant results in web searching by implementing a real time classification method as presented in this research. Originality/value: This research investigates a new application of a method for automatically classifying the intent of user queries. There has been limited research to date on automatically classifying the user intent of web queries, even though the pay-off for web search engines can be quite beneficial. © Emerald Group Publishing Limited.

Veja mais

XML data clustering: An overview

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last few years we have observed a proliferation of approaches for clustering XML docu- ments and schemas based on their structure and content. The presence of such a huge amount of approaches is due to the different applications requiring the XML data to be clustered. These applications need data in the form of similar contents, tags, paths, structures and semantics. In this paper, we first outline the application contexts in which clustering is useful, then we survey approaches so far proposed relying on the abstract representation of data (instances or schema), on the identified similarity measure, and on the clustering algorithm. This presentation leads to draw a taxonomy in which the current approaches can be classified and compared. We aim at introducing an integrated view that is useful when comparing XML data clustering approaches, when developing a new clustering algorithm, and when implementing an XML clustering compo- nent. Finally, the paper moves into the description of future trends and research issues that still need to be faced.

Veja mais

Enriching XML documents clustering by using concise structure and content

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the growing number of XML documents on theWeb it becomes essential to effectively organise these XML documents in order to retrieve useful information from them. A possible solution is to apply clustering on the XML documents to discover knowledge that promotes effective data management, information retrieval and query processing. However, many issues arise in discovering knowledge from these types of semi-structured documents due to their heterogeneity and structural irregularity. Most of the existing research on clustering techniques focuses only on one feature of the XML documents, this being either their structure or their content due to scalability and complexity problems. The knowledge gained in the form of clusters based on the structure or the content is not suitable for reallife datasets. It therefore becomes essential to include both the structure and content of XML documents in order to improve the accuracy and meaning of the clustering solution. However, the inclusion of both these kinds of information in the clustering process results in a huge overhead for the underlying clustering algorithm because of the high dimensionality of the data. The overall objective of this thesis is to address these issues by: (1) proposing methods to utilise frequent pattern mining techniques to reduce the dimension; (2) developing models to effectively combine the structure and content of XML documents; and (3) utilising the proposed models in clustering. This research first determines the structural similarity in the form of frequent subtrees and then uses these frequent subtrees to represent the constrained content of the XML documents in order to determine the content similarity. A clustering framework with two types of models, implicit and explicit, is developed. The implicit model uses a Vector Space Model (VSM) to combine the structure and the content information. The explicit model uses a higher order model, namely a 3- order Tensor Space Model (TSM), to explicitly combine the structure and the content information. This thesis also proposes a novel incremental technique to decompose largesized tensor models to utilise the decomposed solution for clustering the XML documents. The proposed framework and its components were extensively evaluated on several real-life datasets exhibiting extreme characteristics to understand the usefulness of the proposed framework in real-life situations. Additionally, this research evaluates the outcome of the clustering process on the collection selection problem in the information retrieval on the Wikipedia dataset. The experimental results demonstrate that the proposed frequent pattern mining and clustering methods outperform the related state-of-the-art approaches. In particular, the proposed framework of utilising frequent structures for constraining the content shows an improvement in accuracy over content-only and structure-only clustering results. The scalability evaluation experiments conducted on large scaled datasets clearly show the strengths of the proposed methods over state-of-the-art methods. In particular, this thesis work contributes to effectively combining the structure and the content of XML documents for clustering, in order to improve the accuracy of the clustering solution. In addition, it also contributes by addressing the research gaps in frequent pattern mining to generate efficient and concise frequent subtrees with various node relationships that could be used in clustering.

Veja mais

A high sensitive fiber Bragg grating strain sensor with automatic temperature compensation

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A high sensitive fiber Bragg grating (FBG) strain sensor with automatic temperature compensation is demonstrated. FBG is axially linked with a stick and their free ends are fixed to the measured object. When the measured strain changes, the stick does not change in length, but the FBG does. When the temperature changes, the stick changes in length to pull the FBG to realize temperature compensation. In experiments, 1.45 times strain sensitivity of bare FBG with temperature compensation of less than 0.1 nm Bragg wavelength drift over 100 ◦C shift is achieved.

Veja mais

1000 resultados para Fiber clustering

Filtro por publicador