17 resultados para Fuzzy c-means clustering

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Descriptions of vegetation communities are often based on vague semantic terms describing species presence and dominance. For this reason, some researchers advocate the use of fuzzy sets in the statistical classification of plant species data into communities. In this study, spatially referenced vegetation abundance values collected from Greek phrygana were analysed by ordination (DECORANA), and classified on the resulting axes using fuzzy c-means to yield a point data-set representing local memberships in characteristic plant communities. The fuzzy clusters matched vegetation communities noted in the field, which tended to grade into one another, rather than occupying discrete patches. The fuzzy set representation of the community exploited the strengths of detrended correspondence analysis while retaining richer information than a TWINSPAN classification of the same data. Thus, in the absence of phytosociological benchmarks, meaningful and manageable habitat information could be derived from complex, multivariate species data. We also analysed the influence of the reliability of different surveyors' field observations by multiple sampling at a selected sample location. We show that the impact of surveyor error was more severe in the Boolean than the fuzzy classification. © 2007 Springer.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Segmentation is an important step in many medical imaging applications and a variety of image segmentation techniques exist. One group of segmentation algorithms is based on clustering concepts. In this article we investigate several fuzzy c-means based clustering algorithms and their application to medical image segmentation. In particular we evaluate the conventional hard c-means (HCM) and fuzzy c-means (FCM) approaches as well as three computationally more efficient derivatives of fuzzy c-means: fast FCM with random sampling, fast generalised FCM, and a new anisotropic mean shift based FCM. © 2010 by IJTS, ISDER.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-means algorithm is one of the most popular clustering algorithms in current use as it is relatively fast yet simple to understand and deploy in practice. Nevertheless, its use entails certain restrictive assumptions about the data, the negative consequences of which are not always immediately apparent, as we demonstrate. While more flexible algorithms have been developed, their widespread use has been hindered by their computational and technical complexity. Motivated by these considerations, we present a flexible alternative to K-means that relaxes most of the assumptions, whilst remaining almost as fast and simple. This novel algorithm which we call MAP-DP (maximum a-posteriori Dirichlet process mixtures), is statistically rigorous as it is based on nonparametric Bayesian Dirichlet process mixture modeling. This approach allows us to overcome most of the limitations imposed by K-means. The number of clusters K is estimated from the data instead of being fixed a-priori as in K-means. In addition, while K-means is restricted to continuous data, the MAP-DP framework can be applied to many kinds of data, for example, binary, count or ordinal data. Also, it can efficiently separate outliers from the data. This additional flexibility does not incur a significant computational overhead compared to K-means with MAP-DP convergence typically achieved in the order of seconds for many practical problems. Finally, in contrast to K-means, since the algorithm is based on an underlying statistical model, the MAP-DP framework can deal with missing data and enables model testing such as cross validation in a principled way. We demonstrate the simplicity and effectiveness of this algorithm on the health informatics problem of clinical sub-typing in a cluster of diseases known as parkinsonism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a hybrid generative/discriminative framework for semantic parsing which combines the hidden vector state (HVS) model and the hidden Markov support vector machines (HM-SVMs). The HVS model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. The HM-SVMs combine the advantages of the hidden Markov models and the support vector machines. By employing a modified K-means clustering method, a small set of most representative sentences can be automatically selected from an un-annotated corpus. These sentences together with their abstract annotations are used to train an HVS model which could be subsequently applied on the whole corpus to generate semantic parsing results. The most confident semantic parsing results are selected to generate a fully-annotated corpus which is used to train the HM-SVMs. The proposed framework has been tested on the DARPA Communicator Data. Experimental results show that an improvement over the baseline HVS parser has been observed using the hybrid framework. When compared with the HM-SVMs trained from the fully-annotated corpus, the hybrid framework gave a comparable performance with only a small set of lightly annotated sentences. © 2008. Licensed under the Creative Commons.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Dirichlet process mixture model (DPMM) is a ubiquitous, flexible Bayesian nonparametric statistical model. However, full probabilistic inference in this model is analytically intractable, so that computationally intensive techniques such as Gibbs sampling are required. As a result, DPMM-based methods, which have considerable potential, are restricted to applications in which computational resources and time for inference is plentiful. For example, they would not be practical for digital signal processing on embedded hardware, where computational resources are at a serious premium. Here, we develop a simplified yet statistically rigorous approximate maximum a-posteriori (MAP) inference algorithm for DPMMs. This algorithm is as simple as DP-means clustering, solves the MAP problem as well as Gibbs sampling, while requiring only a fraction of the computational effort. (For freely available code that implements the MAP-DP algorithm for Gaussian mixtures see http://www.maxlittle.net/.) Unlike related small variance asymptotics (SVA), our method is non-degenerate and so inherits the “rich get richer” property of the Dirichlet process. It also retains a non-degenerate closed-form likelihood which enables out-of-sample calculations and the use of standard tools such as cross-validation. We illustrate the benefits of our algorithm on a range of examples and contrast it to variational, SVA and sampling approaches from both a computational complexity perspective as well as in terms of clustering performance. We demonstrate the wide applicabiity of our approach by presenting an approximate MAP inference method for the infinite hidden Markov model whose performance contrasts favorably with a recently proposed hybrid SVA approach. Similarly, we show how our algorithm can applied to a semiparametric mixed-effects regression model where the random effects distribution is modelled using an infinite mixture model, as used in longitudinal progression modelling in population health science. Finally, we propose directions for future research on approximate MAP inference in Bayesian nonparametrics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evaluation of industrial policy interventions has attracted increasing policy and academic attention in recent years. Despite the widespread consensus regarding the need for evaluation, the issue of how to evaluate, and the associated methodological considerations, continue to be issues of considerable debate. The authors develop an approach to estimate the net additionality of financial assistance from Enterprise Ireland to indigenously owned firms in Ireland for the period 2000 to 2002. With a sample of Enterprise Ireland assisted firms, an innovative, self-assessment, in-depth, face-to-face, interview methodology was adopted. The authors also explore a way of incorporating the indirect benefits of assistance into derived deadweight estimate issue which is seldom discussed in the context of deadweight estimates. They conclude by reflecting on the key methodological lessons learned from the evaluation process, and highlight some pertinent evaluation issues which should form the focus of much future discussion in this field of research.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In order to survive in the increasingly customer-oriented marketplace, continuous quality improvement marks the fastest growing quality organization’s success. In recent years, attention has been focused on intelligent systems which have shown great promise in supporting quality control. However, only a small number of the currently used systems are reported to be operating effectively because they are designed to maintain a quality level within the specified process, rather than to focus on cooperation within the production workflow. This paper proposes an intelligent system with a newly designed algorithm and the universal process data exchange standard to overcome the challenges of demanding customers who seek high-quality and low-cost products. The intelligent quality management system is equipped with the ‘‘distributed process mining” feature to provide all levels of employees with the ability to understand the relationships between processes, especially when any aspect of the process is going to degrade or fail. An example of generalized fuzzy association rules are applied in manufacturing sector to demonstrate how the proposed iterative process mining algorithm finds the relationships between distributed process parameters and the presence of quality problems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate the sensitivity of a Markov model with states and transition probabilities obtained from clustering a molecular dynamics trajectory. We have examined a 500 ns molecular dynamics trajectory of the peptide valine-proline-alanine-leucine in explicit water. The sensitivity is quantified by varying the boundaries of the clusters and investigating the resulting variation in transition probabilities and the average transition time between states. In this way, we represent the effect of clustering using different clustering algorithms. It is found that in terms of the investigated quantities, the peptide dynamics described by the Markov model is sensitive to the clustering; in particular, the average transition times are found to vary up to 46%. Moreover, inclusion of nonphysical sparsely populated clusters can lead to serious errors of up to 814%. In the investigation, the time step used in the transition matrix is determined by the minimum time scale on which the system behaves approximately Markovian. This time step is found to be about 100 ps. It is concluded that the description of peptide dynamics with transition matrices should be performed with care, and that using standard clustering algorithms to obtain states and transition probabilities may not always produce reliable results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis is concerned with various aspects of Air Pollution due to smell, the impact it has on communities exposed to it, the means by which it may be controlled and the manner in which a local authority may investigate the problems it causes. The approach is a practical one drawing on examples occurring within a Local Authority's experience and for that reason the research is anecdotal and is not a comprehensive treatise on the full range of options available. Odour Pollution is not yet a well organised discipline and might be considered esoteric as it is necessary to incorporate elements of science and the humanities. It has been necessary to range widely across a number of aspects of the subject so that discussion is often restricted but many references have been included to enable a reader to pursue a particular point in greater depth. In a `fuzzy' subject there is often a yawning gap separating theory and practice, thus case studies have been used to illustrate the interplay of various disciplines in resolution of a problem. The essence of any science is observation and measurement. Observation has been made of the spread of odour pollution through a community and also of relevant meterological data so that a mathematical model could be constructed and its predictions checked. It has been used to explore the results of some options for odour control. Measurements of odour perception and human behaviour seldom have the precision and accuracy of the physical sciences. However methods of social research enabled individual perception of odour pollution to be quantified and an insight gained into reaction of a community exposed to it. Odours have four attributes that can be measured and together provide a complete description of its perception. No objective techniques of measurement have yet been developed but in this thesis simple, structured procedures of subjective assessment have been improvised and their use enabled the functioning of the components of an odour control system to be assessed. Such data enabled the action of the system to be communicated using terms that are understood by a non specialist audience.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The region of tenascin-C containing only alternately spliced fibronectin type-III repeat D (fnD) increases neurite outgrowth by itself and also as part of tenascin-C. We previously localized the active site within fnD to an eight amino acid sequence unique to tenascin-C, VFDNFVLK, and showed that the amino acids FD and FV are required for activity. The purpose of this study was to identify the neuronal receptor that interacts with VFDNFVLK and to investigate the hypothesis that FD and FV are important for receptor binding. Function-blocking antibodies against both alpha7 and beta1 integrin subunits were found to abolish VFDNFVLK-mediated process extension from cerebellar granule neurons. VFDNFVLK but not its mutant, VSPNGSLK, induced clustering of neuronal beta1 integrin immunoreactivity. This strongly implicates FD and FV as important structural elements for receptor activation. Moreover, biochemical experiments revealed an association of the alpha7beta1 integrin with tenascin-C peptides containing the VFDNFVLK sequence but not with peptides with alterations in FD and/or FV. These findings are the first to provide evidence that the alpha7beta1 integrin mediates a response to tenascin-C and the first to demonstrate a functional role for the alpha7beta1 integrin receptor in CNS neurons.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Renewable energy project development is highly complex and success is by no means guaranteed. Decisions are often made with approximate or uncertain information yet the current methods employed by decision-makers do not necessarily accommodate this. Levelised energy costs (LEC) are one such commonly applied measure utilised within the energy industry to assess the viability of potential projects and inform policy. The research proposes a method for achieving this by enhancing the traditional discounting LEC measure with fuzzy set theory. Furthermore, the research develops the fuzzy LEC (F-LEC) methodology to incorporate the cost of financing a project from debt and equity sources. Applied to an example bioenergy project, the research demonstrates the benefit of incorporating fuzziness for project viability, optimal capital structure and key variable sensitivity analysis decision-making. The proposed method contributes by incorporating uncertain and approximate information to the widely utilised LEC measure and by being applicable to a wide range of energy project viability decisions. © 2013 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The hepatitis C virus (HCV) is able to persist as a chronic infection, which can lead to cirrhosis and liver cancer. There is evidence that clearance of HCV is linked to strong responses by CD8 cytotoxic T lymphocytes (CTLs), suggesting that eliciting CTL responses against HCV through an epitope-based vaccine could prove an effective means of immunization. However, HCV genomic plasticity as well as the polymorphisms of HLA I molecules restricting CD8 T-cell responses challenges the selection of epitopes for a widely protective vaccine. Here, we devised an approach to overcome these limitations. From available databases, we first collected a set of 245 HCV-specific CD8 T-cell epitopes, all known to be targeted in the course of a natural infection in humans. After a sequence variability analysis, we next identified 17 highly invariant epitopes. Subsequently, we predicted the epitope HLA I binding profiles that determine their potential presentation and recognition. Finally, using the relevant HLA I-genetic frequencies, we identified various epitope subsets encompassing 6 conserved HCV-specific CTL epitopes each predicted to elicit an effective T-cell response in any individual regardless of their HLA I background. We implemented this epitope selection approach for free public use at the EPISOPT web server. © 2013 Magdalena Molero-Abraham et al.