2 resultados para Query clustering

em Brock University, Canada


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Lattice valued fuzziness is more general than crispness or fuzziness based on the unit interval. In this work, we present a query language for a lattice based fuzzy database. We define a Lattice Fuzzy Structured Query Language (LFSQL) taking its membership values from an arbitrary lattice L. LFSQL can handle, manage and represent crisp values, linear ordered membership degrees and also allows membership degrees from lattices with non-comparable values. This gives richer membership degrees, and hence makes LFSQL more flexible than FSQL or SQL. In order to handle vagueness or imprecise information, every entry into an L-fuzzy database is an L-fuzzy set instead of crisp values. All of this makes LFSQL an ideal query language to handle imprecise data where some factors are non-comparable. After defining the syntax of the language formally, we provide its semantics using L-fuzzy sets and relations. The semantics can be used in future work to investigate concepts such as functional dependencies. Last but not least, we present a parser for LFSQL implemented in Haskell.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.