790 resultados para Agglomerative Hierarchical Clustering
Resumo:
We propose a family of multivariate heavy-tailed distributions that allow variable marginal amounts of tailweight. The originality comes from introducing multidimensional instead of univariate scale variables for the mixture of scaled Gaussian family of distributions. In contrast to most existing approaches, the derived distributions can account for a variety of shapes and have a simple tractable form with a closed-form probability density function whatever the dimension. We examine a number of properties of these distributions and illustrate them in the particular case of Pearson type VII and t tails. For these latter cases, we provide maximum likelihood estimation of the parameters and illustrate their modelling flexibility on simulated and real data clustering examples.
Resumo:
An algorithm is described for developing a hierarchy among a set of elements having certain precedence relations. This algorithm, which is based on tracing a path through the graph, is easily implemented by a computer.
Resumo:
An algorithm is described for developing a hierarchy among a set of elements having certain precedence relations. This algorithm, which is based on tracing a path through the graph, is easily implemented by a computer.
Resumo:
The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.
Resumo:
The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
Resumo:
Online content services can greatly benefit from personalisation features that enable delivery of content that is suited to each user's specific interests. This thesis presents a system that applies text analysis and user modeling techniques in an online news service for the purpose of personalisation and user interest analysis. The system creates a detailed thematic profile for each content item and observes user's actions towards content items to learn user's preferences. A handcrafted taxonomy of concepts, or ontology, is used in profile formation to extract relevant concepts from the text. User preference learning is automatic and there is no need for explicit preference settings or ratings from the user. Learned user profiles are segmented into interest groups using clustering techniques with the objective of providing a source of information for the service provider. Some theoretical background for chosen techniques is presented while the main focus is in finding practical solutions to some of the current information needs, which are not optimally served with traditional techniques.
Resumo:
The concept of feature selection in a nonparametric unsupervised learning environment is practically undeveloped because no true measure for the effectiveness of a feature exists in such an environment. The lack of a feature selection phase preceding the clustering process seriously affects the reliability of such learning. New concepts such as significant features, level of significance of features, and immediate neighborhood are introduced which result in meeting implicitly the need for feature slection in the context of clustering techniques.
Resumo:
A new clustering technique, based on the concept of immediato neighbourhood, with a novel capability to self-learn the number of clusters expected in the unsupervized environment, has been developed. The method compares favourably with other clustering schemes based on distance measures, both in terms of conceptual innovations and computational economy. Test implementation of the scheme using C-l flight line training sample data in a simulated unsupervized mode has brought out the efficacy of the technique. The technique can easily be implemented as a front end to established pattern classification systems with supervized learning capabilities to derive unified learning systems capable of operating in both supervized and unsupervized environments. This makes the technique an attractive proposition in the context of remotely sensed earth resources data analysis wherein it is essential to have such a unified learning system capability.
Resumo:
The term acclimation has been used with several connotations in the field of acclimatory physiology. An attempt has been made, in this paper, to define precisely the term “acclimation” for effective modelling of acclimatory processes. Acclimation is defined with respect to a specific variable, as cumulative experience gained by the organism when subjected to a step change in the environment. Experimental observations on a large number of variables in animals exposed to sustained stress, show that after initial deviation from the basal value (defined as “growth”), the variables tend to return to basal levels (defined as “decay”). This forms the basis for modelling biological responses in terms of their growth and decay. Hierarchical systems theory as presented by Mesarovic, Macko & Takahara (1970) facilitates modelling of complex and partially characterized systems. This theory, in conjunction with “growth-decay” analysis of biological variables, is used to model temperature regulating system in animals exposed to cold. This approach appears to be applicable at all levels of biological organization. Regulation of hormonal activity which forms a part of the temperature regulating system, and the relationship of the latter with the “energy” system of the animal of which it forms a part, are also effectively modelled by this approach. It is believed that this systematic approach would eliminate much of the current circular thinking in the area of acclimatory physiology.
Resumo:
This paper presents a statistical aircraft trajectory clustering approach aimed at discriminating between typical manned and expected unmanned traffic patterns. First, a resampled version of each trajectory is modelled using a mixture of Von Mises distributions (circular statistics). Second, the remodelled trajectories are globally aligned using tools from bioinformatics. Third, the alignment scores are used to cluster the trajectories using an iterative k-medoids approach and an appropriate distance function. The approach is then evaluated using synthetically generated unmanned aircraft flights combined with real air traffic position reports taken over a sector of Northern Queensland, Australia. Results suggest that the technique is useful in distinguishing between expected unmanned and manned aircraft traffic behaviour, as well as identifying some common conventional air traffic patterns.
Resumo:
Partitional clustering algorithms, which partition the dataset into a pre-defined number of clusters, can be broadly classified into two types: algorithms which explicitly take the number of clusters as input and algorithms that take the expected size of a cluster as input. In this paper, we propose a variant of the k-means algorithm and prove that it is more efficient than standard k-means algorithms. An important contribution of this paper is the establishment of a relation between the number of clusters and the size of the clusters in a dataset through the analysis of our algorithm. We also demonstrate that the integration of this algorithm as a pre-processing step in classification algorithms reduces their running-time complexity.
Resumo:
Hierarchical SnO2 hollow spheres self-assembled from nanosheets were prepared with and without carbon coating. The combination of nanosized architecture, hollow structure, and a conductive carbon layer endows the SnO2-based anode with improved specific capacity and cycling stability, making it more promising for use in lithium ion batteries.
Resumo:
Three-dimensional (3D) hierarchical nanoscale architectures comprised of building blocks, with specifically engineered morphologies, are expected to play important roles in the fabrication of 'next generation' microelectronic and optoelectronic devices due to their high surface-to-volume ratio as well as opto-electronic properties. Herein, a series of well-defined 3D hierarchical rutile TiO2 architectures (HRT) were successfully prepared using a facile hydrothermal method without any surfactant or template, simply by changing the concentration of hydrochloric acid used in the synthesis. The production of these materials provides, to the best of our knowledge, the first identified example of a ledgewise growth mechanism in a rutile TiO2 structure. Also for the first time, a Dye-sensitized Solar Cell (DSC) combining a HRT is reported in conjunction with a high-extinction-coefficient metal-free organic sensitizer (D149), achieving a conversion efficiency of 5.5%, which is superior to ones employing P25 (4.5%), comparable to state-of-the-art commercial transparent titania anatase paste (5.8%). Further to this, an overall conversion efficiency 8.6% was achieved when HRT was used as the light scattering layer, a considerable improvement over the commercial transparent/reflector titania anatase paste (7.6%), a significantly smaller gap in performance than has been seen previously.
Resumo:
Superhydrophobic and superhydrophilic surfaces have been extensively investigated due to their importance for industrial applications. It has been reported, however, that superhydrophobic surfaces are very sensitive to heat, ultraviolet (UV) light, and electric potential, which interfere with their long-term durability. In this study, we introduce a novel approach to achieve robust superhydrophobic thin films by designing architecture-defined complex nanostructures. A family of ZnO hollow microspheres with controlled constituent architectures in the morphologies of 1D nanowire networks, 2D nanosheet stacks, and 3D mesoporous nanoball blocks, respectively, was synthesized via a two-step self-assembly approach, where the oligomers or the constituent nanostructures with specially designed structures are first formed from surfactant templates, and then further assembled into complex morphologies by the addition of a second co-surfactant. The thin films composed of two-step synthesized ZnO hollow microspheres with different architectures presented superhydrophobicities with contact angles of 150°-155°, superior to the contact angle of 103° for one-step synthesized ZnO hollow microspheres with smooth and solid surfaces. Moreover, the robust superhydrophobicity was further improved by perfluorinated silane surface modification. The perfluorinated silane treated ZnO hollow microsphere thin films maintained excellent hydrophobicity even after 75 h of UV irradiation. The realization of environmentally durable superhydrophobic surfaces provides a promising solution for their long-term service under UV or strong solar light irradiations.
Resumo:
Network Interfaces (NIs) are used in Multiprocessor System-on-Chips (MPSoCs) to connect CPUs to a packet switched Network-on-Chip. In this work we introduce a new NI architecture for our hierarchical CoreVA-MPSoC. The CoreVA-MPSoC targets streaming applications in embedded systems. The main contribution of this paper is a system-level analysis of different NI configurations, considering both software and hardware costs for NoC communication. Different configurations of the NI are compared using a benchmark suite of 10 streaming applications. The best performing NI configuration shows an average speedup of 20 for a CoreVA-MPSoC with 32 CPUs compared to a single CPU. Furthermore, we present physical implementation results using a 28 nm FD-SOI standard cell technology. A hierarchical MPSoC with 8 CPU clusters and 4 CPUs in each cluster running at 800MHz requires an area of 4.56mm2.