853 resultados para height partition clustering
Resumo:
The Multiple Pheromone Ant Clustering Algorithm (MPACA) models the collective behaviour of ants to find clusters in data and to assign objects to the most appropriate class. It is an ant colony optimisation approach that uses pheromones to mark paths linking objects that are similar and potentially members of the same cluster or class. Its novelty is in the way it uses separate pheromones for each descriptive attribute of the object rather than a single pheromone representing the whole object. Ants that encounter other ants frequently enough can combine the attribute values they are detecting, which enables the MPACA to learn influential variable interactions. This paper applies the model to real-world data from two domains. One is logistics, focusing on resource allocation rather than the more traditional vehicle-routing problem. The other is mental-health risk assessment. The task for the MPACA in each domain was to predict class membership where the classes for the logistics domain were the levels of demand on haulage company resources and the mental-health classes were levels of suicide risk. Results on these noisy real-world data were promising, demonstrating the ability of the MPACA to find patterns in the data with accuracy comparable to more traditional linear regression models. © 2013 Polish Information Processing Society.
Resumo:
Ant colony optimisation algorithms model the way ants use pheromones for marking paths to important locations in their environment. Pheromone traces are picked up, followed, and reinforced by other ants but also evaporate over time. Optimal paths attract more pheromone and less useful paths fade away. The main innovation of the proposed Multiple Pheromone Ant Clustering Algorithm (MPACA) is to mark objects using many pheromones, one for each value of each attribute describing the objects in multidimensional space. Every object has one or more ants assigned to each attribute value and the ants then try to find other objects with matching values, depositing pheromone traces that link them. Encounters between ants are used to determine when ants should combine their features to look for conjunctions and whether they should belong to the same colony. This paper explains the algorithm and explores its potential effectiveness for cluster analysis. © 2014 Springer International Publishing Switzerland.
Resumo:
DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
Purpose: To compare graticule and image capture assessment of the lower tear film meniscus height (TMH). Methods: Lower tear film meniscus height measures were taken in the right eyes of 55 healthy subjects at two study visits separated by 6 months. Two images of the TMH were captured in each subject with a digital camera attached to a slit-lamp biomicroscope and stored in a computer for future analysis. Using the best of two images, the TMH was quantified by manually drawing a line across the tear meniscus profile, following which the TMH was measured in pixels and converted into millimetres, where one pixel corresponded to 0.0018 mm. Additionally, graticule measures were carried out by direct observation using a calibrated graticule inserted into the same slit-lamp eyepiece. The graticule was calibrated so that actual readings, in 0.03 mm increments, could be made with a 40× ocular. Results: Smaller values of TMH were found in this study compared to previous studies. TMH, as measured with the image capture technique (0.13 ± 0.04 mm), was significantly greater (by approximately 0.01 ± 0.05 mm, p = 0.03) than that measured with the graticule technique (0.12 ± 0.05 mm). No bias was found across the range sampled. Repeatability of the TMH measurements taken at two study visits showed that graticule measures were significantly different (0.02 ± 0.05 mm, p = 0.01) and highly correlated (r = 0.52, p < 0.0001), whereas image capture measures were similar (0.01 ± 0.03 mm, p = 0.16), and also highly correlated (r = 0.56, p < 0.0001). Conclusions: Although graticule and image analysis techniques showed similar mean values for TMH, the image capture technique was more repeatable than the graticule technique and this can be attributed to the higher measurement resolution of the image capture (i.e. 0.0018 mm) compared to the graticule technique (i.e. 0.03 mm). © 2006 British Contact Lens Association.
Resumo:
Peptides are of great therapeutic potential as vaccines and drugs. Knowledge of physicochemical descriptors, including the partition coefficient P (commonly expressed in logarithm form: logP), is useful for screening out unsuitable molecules and also for the development of predictive Quantitative Structure-Activity Relationships (QSARs). In this paper we develop a new approach to the prediction of LogP values for peptides based on an empirical relationship between global molecular properties and measured physical properties. Our method was successful in terms of peptide prediction (total r2 = 0.641). The final model consisted of 5 physicochemical descriptors (molecular weight, number of single bonds, 2D-VDW volume, 2D-VSA hydrophobic and 2D-VSA polar). The approach is peptide specific and its predictive accuracy was high. Overall, 67% of the peptides were able to be predicted within +/-0.5 log units from the experimental values. Our method thus represents a novel prediction method with proven predictive ability.
Resumo:
Emerging vehicular comfort applications pose a host of completely new set of requirements such as maintaining end-to-end connectivity, packet routing, and reliable communication for internet access while on the move. One of the biggest challenges is to provide good quality of service (QoS) such as low packet delay while coping with the fast topological changes. In this paper, we propose a clustering algorithm based on minimal path loss ratio (MPLR) which should help in spectrum efficiency and reduce data congestion in the network. The vehicular nodes which experience minimal path loss are selected as the cluster heads. The performance of the MPLR clustering algorithm is calculated by rate of change of cluster heads, average number of clusters and average cluster size. Vehicular traffic models derived from the Traffic Wales data are fed as input to the motorway simulator. A mathematical analysis for the rate of change of cluster head is derived which validates the MPLR algorithm and is compared with the simulated results. The mathematical and simulated results are in good agreement indicating the stability of the algorithm and the accuracy of the simulator. The MPLR system is also compared with V2R system with MPLR system performing better. © 2013 IEEE.
Resumo:
IPO underpricing has been attributed to valuation uncertainty, which can be at least partially resolved by the indirect learning associated with IPO clustering [Benveniste, L.M., Ljungqvist, A., Wilhelm, W.J., Yu, X.Y., 2003. Evidence of information spillovers in the production of investment banking services. Journal of Finance 58, 577–608]. We examine why firms might choose not to issue their IPOs contemporaneously with clusters of similar firms, forgoing opportunities to learn from their peers. We find that the willingness to file an IPO without the benefit of indirect learning from peer firm IPOs is directly related to insiders’ needs for portfolio diversification and the firm’s need to raise capital.
Resumo:
Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.
Resumo:
Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches.
Resumo:
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. One of the most accuracy approach based on dynamic modeling of cluster similarity is called Chameleon. In this paper we present a modified hierarchical clustering algorithm that used the main idea of Chameleon and the effectiveness of suggested approach will be demonstrated by the experimental results.
Resumo:
The purpose of this paper is to explain the notion of clustering and a concrete clustering method- agglomerative hierarchical clustering algorithm. It shows how a data mining method like clustering can be applied to the analysis of stocks, traded on the Bulgarian Stock Exchange in order to identify similar temporal behavior of the traded stocks. This problem is solved with the aid of a data mining tool that is called XLMiner™ for Microsoft Excel Office.
Resumo:
In a paper the method of complex systems and processes clustering based use of genetic algorithm is offered. The aspects of its realization and shaping of fitness-function are considered. The solution of clustering task of Ukraine areas on socio-economic indexes is represented and comparative analysis with outcomes of classical methods is realized.
Resumo:
In recent years, there has been an increas-ing interest in learning a distributed rep-resentation of word sense. Traditional context clustering based models usually require careful tuning of model parame-ters, and typically perform worse on infre-quent word senses. This paper presents a novel approach which addresses these lim-itations by first initializing the word sense embeddings through learning sentence-level embeddings from WordNet glosses using a convolutional neural networks. The initialized word sense embeddings are used by a context clustering based model to generate the distributed representations of word senses. Our learned represen-tations outperform the publicly available embeddings on 2 out of 4 metrics in the word similarity task, and 6 out of 13 sub tasks in the analogical reasoning task.
Resumo:
A partition of a positive integer n is a way of writing it as the sum of positive integers without regard to order; the summands are called parts. The number of partitions of n, usually denoted by p(n), is determined asymptotically by the famous partition formula of Hardy and Ramanujan [5]. We shall introduce the uniform probability measure P on the set of all partitions of n assuming that the probability 1/p(n) is assigned to each n-partition. The symbols E and V ar will be further used to denote the expectation and variance with respect to the measure P . Thus, each conceivable numerical characteristic of the parts in a partition can be regarded as a random variable.