282 resultados para clustering algorithm
Resumo:
This paper presents a method for calculating the in-bucket payload volume on a dragline for the purpose of estimating the material’s bulk density in real-time. Knowledge of the bulk density can provide instant feedback to mine planning and scheduling to improve blasting and in turn provide a more uniform bulk density across the excavation site. Furthermore costs and emissions in dragline operation, maintenance and downstream material processing can be reduced. The main challenge is to determine an accurate position and orientation of the bucket with the constraint of real-time performance. The proposed solution uses a range bearing and tilt sensor to locate and scan the bucket between the lift and dump stages of the dragline cycle. Various scanning strategies are investigated for their benefits in this real-time application. The bucket is segmented from the scene using cluster analysis while the pose of the bucket is calculated using the iterative closest point (ICP) algorithm. Payload points are segmented from the bucket by a fixed distance neighbour clustering method to preserve boundary points and exclude low density clusters introduced by overhead chains and the spreader bar. A height grid is then used to represent the payload from which the volume can be calculated by summing over the grid cells. We show volume calculated on a scaled system with an accuracy of greater than 95 per cent.
Resumo:
This paper presents an overview of the experiments conducted using Hybrid Clustering of XML documents using Constraints (HCXC) method for the clustering task in the INEX 2009 XML Mining track. This technique utilises frequent subtrees generated from the structure to extract the content for clustering the XML documents. It also presents the experimental study using several data representations such as the structure-only, content-only and using both the structure and the content of XML documents for the purpose of clustering them. Unlike previous years, this year the XML documents were marked up using the Wiki tags and contains categories derived by using the YAGO ontology. This paper also presents the results of studying the effect of these tags on XML clustering using the HCXC method.
Resumo:
Background: Waist circumference has been identified as a valuable predictor of cardiovascular risk in children. The development of waist circumference percentiles and cut-offs for various ethnic groups are necessary because of differences in body composition. The purpose of this study was to develop waist circumference percentiles for Chinese children and to explore optimal waist circumference cut-off values for predicting cardiovascular risk factors clustering in this population.----- ----- Methods: Height, weight, and waist circumference were measured in 5529 children (2830 boys and 2699 girls) aged 6-12 years randomly selected from southern and northern China. Blood pressure, fasting triglycerides, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and glucose were obtained in a subsample (n = 1845). Smoothed percentile curves were produced using the LMS method. Receiver-operating characteristic analysis was used to derive the optimal age- and gender-specific waist circumference thresholds for predicting the clustering of cardiovascular risk factors.----- ----- Results: Gender-specific waist circumference percentiles were constructed. The waist circumference thresholds were at the 90th and 84th percentiles for Chinese boys and girls respectively, with sensitivity and specificity ranging from 67% to 83%. The odds ratio of a clustering of cardiovascular risk factors among boys and girls with a higher value than cut-off points was 10.349 (95% confidence interval 4.466 to 23.979) and 8.084 (95% confidence interval 3.147 to 20.767) compared with their counterparts.----- ----- Conclusions: Percentile curves for waist circumference of Chinese children are provided. The cut-off point for waist circumference to predict cardiovascular risk factors clustering is at the 90th and 84th percentiles for Chinese boys and girls, respectively.
Resumo:
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.
Resumo:
Focusing on the conditions that an optimization problem may comply with, the so-called convergence conditions have been proposed and sequentially a stochastic optimization algorithm named as DSZ algorithm is presented in order to deal with both unconstrained and constrained optimizations. The principle is discussed in the theoretical model of DSZ algorithm, from which we present the practical model of DSZ algorithm. Practical model efficiency is demonstrated by the comparison with the similar algorithms such as Enhanced simulated annealing (ESA), Monte Carlo simulated annealing (MCS), Sniffer Global Optimization (SGO), Directed Tabu Search (DTS), and Genetic Algorithm (GA), using a set of well-known unconstrained and constrained optimization test cases. Meanwhile, further attention goes to the strategies how to optimize the high-dimensional unconstrained problem using DSZ algorithm.
Resumo:
In this paper we present pyktree, an implementation of the K-tree algorithm in the Python programming language. The K-tree algorithm provides highly balanced search trees for vector quantization that scales up to very large data sets. Pyktree is highly modular and well suited for rapid-prototyping of novel distance measures and centroid representations. It is easy to install and provides a python package for library use as well as command line tools.
Resumo:
Fractures of long bones are sometimes treated using various types of fracture fixation devices including internal plate fixators. These are specialised plates which are used to bridge the fracture gap(s) whilst anatomically aligning the bone fragments. The plate is secured in position by screws. The aim of such a device is to support and promote the natural healing of the bone. When using an internal fixation device, it is necessary for the clinician to decide upon many parameters, for example, the type of plate and where to position it; how many and where to position the screws. While there have been a number of experimental and computational studies conducted regarding the configuration of screws in the literature, there is still inadequate information available concerning the influence of screw configuration on fracture healing. Because screw configuration influences the amount of flexibility at the area of fracture, it has a direct influence on the fracture healing process. Therefore, it is important that the chosen screw configuration does not inhibit the healing process. In addition to the impact on the fracture healing process, screw configuration plays an important role in the distribution of stresses in the plate due to the applied loads. A plate that experiences high stresses is prone to early failure. Hence, the screw configuration used should not encourage the occurrence of high stresses. This project develops a computational program in Fortran programming language to perform mathematical optimisation to determine the screw configuration of an internal fixation device within constraints of interfragmentary movement by minimising the corresponding stress in the plate. Thus, the optimal solution suggests the positioning and number of screws which satisfies the predefined constraints of interfragmentary movements. For a set of screw configurations the interfragmentary displacement and the stress occurring in the plate were calculated by the Finite Element Method. The screw configurations were iteratively changed and each time the corresponding interfragmentary displacements were compared with predefined constraints. Additionally, the corresponding stress was compared with the previously calculated stress value to determine if there was a reduction. These processes were continued until an optimal solution was achieved. The optimisation program has been shown to successfully predict the optimal screw configuration in two cases. The first case was a simplified bone construct whereby the screw configuration solution was comparable with those recommended in biomechanical literature. The second case was a femoral construct, of which the resultant screw configuration was shown to be similar to those used in clinical cases. The optimisation method and programming developed in this study has shown that it has potential to be used for further investigations with the improvement of optimisation criteria and the efficiency of the program.
Resumo:
In this paper we extend the concept of speaker annotation within a single-recording, or speaker diarization, to a collection wide approach we call speaker attribution. Accordingly, speaker attribution is the task of clustering expectantly homogenous intersession clusters obtained using diarization according to common cross-recording identities. The result of attribution is a collection of spoken audio across multiple recordings attributed to speaker identities. In this paper, an attribution system is proposed using mean-only MAP adaptation of a combined-gender UBM to model clusters from a perfect diarization system, as well as a JFA-based system with session variability compensation. The normalized cross-likelihood ratio is calculated for each pair of clusters to construct an attribution matrix and the complete linkage algorithm is employed to conduct clustering of the inter-session clusters. A matched cluster purity and coverage of 87.1% was obtained on the NIST 2008 SRE corpus.
Resumo:
Genetic research of complex diseases is a challenging, but exciting, area of research. The early development of the research was limited, however, until the completion of the Human Genome and HapMap projects, along with the reduction in the cost of genotyping, which paves the way for understanding the genetic composition of complex diseases. In this thesis, we focus on the statistical methods for two aspects of genetic research: phenotype definition for diseases with complex etiology and methods for identifying potentially associated Single Nucleotide Polymorphisms (SNPs) and SNP-SNP interactions. With regard to phenotype definition for diseases with complex etiology, we firstly investigated the effects of different statistical phenotyping approaches on the subsequent analysis. In light of the findings, and the difficulties in validating the estimated phenotype, we proposed two different methods for reconciling phenotypes of different models using Bayesian model averaging as a coherent mechanism for accounting for model uncertainty. In the second part of the thesis, the focus is turned to the methods for identifying associated SNPs and SNP interactions. We review the use of Bayesian logistic regression with variable selection for SNP identification and extended the model for detecting the interaction effects for population based case-control studies. In this part of study, we also develop a machine learning algorithm to cope with the large scale data analysis, namely modified Logic Regression with Genetic Program (MLR-GEP), which is then compared with the Bayesian model, Random Forests and other variants of logic regression.
Resumo:
Circuit breaker restrikes are unwanted occurrence, which can ultimately lead to breaker. Before 2008, there was little evidence in the literature of monitoring techniques based on restrike measurement and interpretation produced during switching of capacitor banks and shunt reactor banks. In 2008 a non-intrusive radiometric restrike measurement method, as well a restrike hardware detection algorithm was developed. The limitations of the radiometric measurement method are a band limited frequency response as well as limitations in amplitude determination. Current detection methods and algorithms required the use of wide bandwidth current transformers and voltage dividers. A novel non-intrusive restrike diagnostic algorithm using ATP (Alternative Transient Program) and wavelet transforms is proposed. Wavelet transforms are the most common use in signal processing, which is divided into two tests, i.e. restrike detection and energy level based on deteriorated waveforms in different types of restrike. A ‘db5’ wavelet was selected in the tests as it gave a 97% correct diagnostic rate evaluated using a database of diagnostic signatures. This was also tested using restrike waveforms simulated under different network parameters which gave a 92% correct diagnostic responses. The diagnostic technique and methodology developed in this research can be applied to any power monitoring system with slight modification for restrike detection.
Resumo:
We consider the problem of choosing, sequentially, a map which assigns elements of a set A to a few elements of a set B. On each round, the algorithm suffers some cost associated with the chosen assignment, and the goal is to minimize the cumulative loss of these choices relative to the best map on the entire sequence. Even though the offline problem of finding the best map is provably hard, we show that there is an equivalent online approximation algorithm, Randomized Map Prediction (RMP), that is efficient and performs nearly as well. While drawing upon results from the "Online Prediction with Expert Advice" setting, we show how RMP can be utilized as an online approach to several standard batch problems. We apply RMP to online clustering as well as online feature selection and, surprisingly, RMP often outperforms the standard batch algorithms on these problems.