18 resultados para discriminant analysis and cluster analysis
em Cambridge University Engineering Department Publications Database
Resumo:
This paper presents an incremental learning solution for Linear Discriminant Analysis (LDA) and its applications to object recognition problems. We apply the sufficient spanning set approximation in three steps i.e. update for the total scatter matrix, between-class scatter matrix and the projected data matrix, which leads an online solution which closely agrees with the batch solution in accuracy while significantly reducing the computational complexity. The algorithm yields an efficient solution to incremental LDA even when the number of classes as well as the set size is large. The incremental LDA method has been also shown useful for semi-supervised online learning. Label propagation is done by integrating the incremental LDA into an EM framework. The method has been demonstrated in the task of merging large datasets which were collected during MPEG standardization for face image retrieval, face authentication using the BANCA dataset, and object categorisation using the Caltech101 dataset. © 2010 Springer Science+Business Media, LLC.
Semantic Discriminant mapping for classification and browsing of remote sensing textures and objects
Resumo:
We present a new approach based on Discriminant Analysis to map a high dimensional image feature space onto a subspace which has the following advantages: 1. each dimension corresponds to a semantic likelihood, 2. an efficient and simple multiclass classifier is proposed and 3. it is low dimensional. This mapping is learnt from a given set of labeled images with a class groundtruth. In the new space a classifier is naturally derived which performs as well as a linear SVM. We will show that projecting images in this new space provides a database browsing tool which is meaningful to the user. Results are presented on a remote sensing database with eight classes, made available online. The output semantic space is a low dimensional feature space which opens perspectives for other recognition tasks. © 2005 IEEE.
Resumo:
Standard forms of density-functional theory (DFT) have good predictive power for many materials, but are not yet fully satisfactory for solid, liquid and cluster forms of water. We use a many-body separation of the total energy into its 1-body, 2-body (2B) and beyond-2-body (B2B) components to analyze the deficiencies of two popular DFT approximations. We show how machine-learning methods make this analysis possible for ice structures as well as for water clusters. We find that the crucial energy balance between compact and extended geometries can be distorted by 2B and B2B errors, and that both types of first-principles error are important.
Resumo:
© 2015 John P. Cunningham and Zoubin Ghahramani. Linear dimensionality reduction methods are a cornerstone of analyzing high dimensional data, due to their simple geometric interpretations and typically attractive computational properties. These methods capture many data features of interest, such as covariance, dynamical structure, correlation between data sets, input-output relationships, and margin between data classes. Methods have been developed with a variety of names and motivations in many fields, and perhaps as a result the connections between all these methods have not been highlighted. Here we survey methods from this disparate literature as optimization programs over matrix manifolds. We discuss principal component analysis, factor analysis, linear multidimensional scaling, Fisher's linear discriminant analysis, canonical correlations analysis, maximum autocorrelation factors, slow feature analysis, sufficient dimensionality reduction, undercomplete independent component analysis, linear regression, distance metric learning, and more. This optimization framework gives insight to some rarely discussed shortcomings of well-known methods, such as the suboptimality of certain eigenvector solutions. Modern techniques for optimization over matrix manifolds enable a generic linear dimensionality reduction solver, which accepts as input data and an objective to be optimized, and returns, as output, an optimal low-dimensional projection of the data. This simple optimization framework further allows straightforward generalizations and novel variants of classical methods, which we demonstrate here by creating an orthogonal-projection canonical correlations analysis. More broadly, this survey and generic solver suggest that linear dimensionality reduction can move toward becoming a blackbox, objective-agnostic numerical technology.
Resumo:
This paper describes the development of the 2003 CU-HTK large vocabulary speech recognition system for Conversational Telephone Speech (CTS). The system was designed based on a multi-pass, multi-branch structure where the output of all branches is combined using system combination. A number of advanced modelling techniques such as Speaker Adaptive Training, Heteroscedastic Linear Discriminant Analysis, Minimum Phone Error estimation and specially constructed Single Pronunciation dictionaries were employed. The effectiveness of each of these techniques and their potential contribution to the result of system combination was evaluated in the framework of a state-of-the-art LVCSR system with sophisticated adaptation. The final 2003 CU-HTK CTS system constructed from some of these models is described and its performance on the DARPA/NIST 2003 Rich Transcription (RT-03) evaluation test set is discussed.
Resumo:
This paper discusses the Cambridge University HTK (CU-HTK) system for the automatic transcription of conversational telephone speech. A detailed discussion of the most important techniques in front-end processing, acoustic modeling and model training, language and pronunciation modeling are presented. These include the use of conversation side based cepstral normalization, vocal tract length normalization, heteroscedastic linear discriminant analysis for feature projection, minimum phone error training and speaker adaptive training, lattice-based model adaptation, confusion network based decoding and confidence score estimation, pronunciation selection, language model interpolation, and class based language models. The transcription system developed for participation in the 2002 NIST Rich Transcription evaluations of English conversational telephone speech data is presented in detail. In this evaluation the CU-HTK system gave an overall word error rate of 23.9%, which was the best performance by a statistically significant margin. Further details on the derivation of faster systems with moderate performance degradation are discussed in the context of the 2002 CU-HTK 10 × RT conversational speech transcription system. © 2005 IEEE.
Resumo:
Supersonic cluster beam deposition has been used to produce films with different nanostructures by controlling the deposition parameters such as the film thickness, substrate temperature and cluster mass distribution. The field emission properties of cluster-assembled carbon films have been characterized and correlated to the evolution of the film nanostructure. Threshold fields ranging between 4 and 10 V/mum and saturation current densities as high as 0.7 mA have been measured for samples heated during deposition. A series of voltage ramps, i.e., a conditioning process, was found to initiate more stable and reproducible emission. It was found that the presence of graphitic particles (onions, nanotube embryos) in the films substantially enhances the field emission performance. Films patterned on a micrometer scale have been conditioned spot by spot by a ball-tip anode, showing that a relatively high emission site density can be achieved from the cluster-assembled material. (C) 2002 American Institute of Physics.
Resumo:
A computer can assist the process of design by analogy by recording past designs. The experience these represent could be much wider than that of designers using the system, who therefore need to identify potential cases of interest. If the computer assists with this lookup, the designers can concentrate on the more interesting aspect of extracting and using the ideas which are found. However, as the knowledge base grows it becomes ever harder to find relevant cases using a keyword indexing scheme without knowing precisely what to look for. Therefore a more flexible searching system is needed.
If a similarity measure can be defined for the features of the designs, then it is possible to match and cluster them. Using a simple measure like co-occurrence of features within a particular case would allow this to happen without human intervention, which is tedious and time- consuming. Any knowledge that is acquired about how features are related to each other will be very shallow: it is not intended as a cognitive model for how humans understand, learn, or retrieve information, but more an attempt to make effective, efficient use of the information available. The question remains of whether such shallow knowledge is sufficient for the task.
A system to retrieve information from a large database is described. It uses co-occurrences to relate keywords to each other, and then extends search queries with similar words. This seems to make relevant material more accessible, providing hope that this retrieval technique can be applied to a broader knowledge base.
Resumo:
In this paper, an analytical tool - cluster analysis - that is commonly used in biology, archaeology, linguistics and psychology is applied to materials and design. Here we use it to cluster materials and the processes that shape them, using their attributes as indicators of relationship. The attributes that are chosen are important to design and designers. The resulting clusters, and the classifications that can be developed from them, depend on the selected attributes and - to some extent - on the method of clustering. Alternative classifications for design that is focused on the technical or aesthetic attributes of materials and the materials and shapes allowed by processes are explored.