995 resultados para document clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background: Waist circumference has been identified as a valuable predictor of cardiovascular risk in children. The development of waist circumference percentiles and cut-offs for various ethnic groups are necessary because of differences in body composition. The purpose of this study was to develop waist circumference percentiles for Chinese children and to explore optimal waist circumference cut-off values for predicting cardiovascular risk factors clustering in this population.----- ----- Methods: Height, weight, and waist circumference were measured in 5529 children (2830 boys and 2699 girls) aged 6-12 years randomly selected from southern and northern China. Blood pressure, fasting triglycerides, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and glucose were obtained in a subsample (n = 1845). Smoothed percentile curves were produced using the LMS method. Receiver-operating characteristic analysis was used to derive the optimal age- and gender-specific waist circumference thresholds for predicting the clustering of cardiovascular risk factors.----- ----- Results: Gender-specific waist circumference percentiles were constructed. The waist circumference thresholds were at the 90th and 84th percentiles for Chinese boys and girls respectively, with sensitivity and specificity ranging from 67% to 83%. The odds ratio of a clustering of cardiovascular risk factors among boys and girls with a higher value than cut-off points was 10.349 (95% confidence interval 4.466 to 23.979) and 8.084 (95% confidence interval 3.147 to 20.767) compared with their counterparts.----- ----- Conclusions: Percentile curves for waist circumference of Chinese children are provided. The cut-off point for waist circumference to predict cardiovascular risk factors clustering is at the 90th and 84th percentiles for Chinese boys and girls, respectively.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

It is well known that a statutory requirement of formality is associated with contracts concerning land. In this regard, s 59 of the Property Law Act 1974 (Qld) provides: No action may be brought upon any contract for the sale or other disposition of land or any interest in land unless the contract upon which such action is brought, or some memorandum or note of the contract, is in writing, and signed by the party to be charged, or by some person by the party lawfully authorised. In addition to the possibility of a formal contract, the statutory wording clearly contemplates reliance on an informal note or memorandum. To constitute a sufficient note or memorandum for the purposes of the statute, the signed note or memorandum must contain details of the parties to the contract, an adequate description of the property, the price and any other essential terms. It is also accepted that the doctrine of joinder may be invoked in circumstances where the document signed by the party to be charged contains an express or implied reference to any other document. In this way, a sufficient note or memorandum may be constituted by the joinder of a number of documents.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Road asset managers are overwhelmed with a high volume of raw data which they need to process and utilise in supporting their decision making. This paper presents a method that processes road-crash data of a whole road network and exposes hidden value inherent in the data by deploying the clustering data mining method. The goal of the method is to partition the road network into a set of groups (classes) based on common data and characterise the class crash types to produce a crash profiles for each cluster. By comparing similar road classes with differing crash types and rates, insight can be gained into these differences that are caused by the particular characteristics of their roads. These differences can be used as evidence in knowledge development and decision support.

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes an innovative instance similarity based evaluation metric that reduces the search map for clustering to be performed. An aggregate global score is calculated for each instance using the novel idea of Fibonacci series. The use of Fibonacci numbers is able to separate the instances effectively and, in hence, the intra-cluster similarity is increased and the inter-cluster similarity is decreased during clustering. The proposed FIBCLUS algorithm is able to handle datasets with numerical, categorical and a mix of both types of attributes. Results obtained with FIBCLUS are compared with the results of existing algorithms such as k-means, x-means expected maximization and hierarchical algorithms that are widely used to cluster numeric, categorical and mix data types. Empirical analysis shows that FIBCLUS is able to produce better clustering solutions in terms of entropy, purity and F-score in comparison to the above described existing algorithms.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose to use the Tensor Space Modeling (TSM) to represent and analyze the user’s web log data that consists of multiple interests and spans across multiple dimensions. Further we propose to use the decomposition factors of the Tensors for clustering the users based on similarity of search behaviour. Preliminary results show that the proposed method outperforms the traditional Vector Space Model (VSM) based clustering.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Video surveillance systems using Closed Circuit Television (CCTV) cameras, is one of the fastest growing areas in the field of security technologies. However, the existing video surveillance systems are still not at a stage where they can be used for crime prevention. The systems rely heavily on human observers and are therefore limited by factors such as fatigue and monitoring capabilities over long periods of time. This work attempts to address these problems by proposing an automatic suspicious behaviour detection which utilises contextual information. The utilisation of contextual information is done via three main components: a context space model, a data stream clustering algorithm, and an inference algorithm. The utilisation of contextual information is still limited in the domain of suspicious behaviour detection. Furthermore, it is nearly impossible to correctly understand human behaviour without considering the context where it is observed. This work presents experiments using video feeds taken from CAVIAR dataset and a camera mounted on one of the buildings Z-Block) at the Queensland University of Technology, Australia. From these experiments, it is shown that by exploiting contextual information, the proposed system is able to make more accurate detections, especially of those behaviours which are only suspicious in some contexts while being normal in the others. Moreover, this information gives critical feedback to the system designers to refine the system.