4 resultados para self-organizing map

em Brock University, Canada


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The goal of most clustering algorithms is to find the optimal number of clusters (i.e. fewest number of clusters). However, analysis of molecular conformations of biological macromolecules obtained from computer simulations may benefit from a larger array of clusters. The Self-Organizing Map (SOM) clustering method has the advantage of generating large numbers of clusters, but often gives ambiguous results. In this work, SOMs have been shown to be reproducible when the same conformational dataset is independently clustered multiple times (~100), with the help of the Cramérs V-index (C_v). The ability of C_v to determine which SOMs are reproduced is generalizable across different SOM source codes. The conformational ensembles produced from MD (molecular dynamics) and REMD (replica exchange molecular dynamics) simulations of the penta peptide Met-enkephalin (MET) and the 34 amino acid protein human Parathyroid Hormone (hPTH) were used to evaluate SOM reproducibility. The training length for the SOM has a huge impact on the reproducibility. Analysis of MET conformational data definitively determined that toroidal SOMs cluster data better than bordered maps due to the fact that toroidal maps do not have an edge effect. For the source code from MATLAB, it was determined that the learning rate function should be LINEAR with an initial learning rate factor of 0.05 and the SOM should be trained by a sequential algorithm. The trained SOMs can be used as a supervised classification for another dataset. The toroidal 10×10 hexagonal SOMs produced from the MATLAB program for hPTH conformational data produced three sets of reproducible clusters (27%, 15%, and 13% of 100 independent runs) which find similar partitionings to those of smaller 6×6 SOMs. The χ^2 values produced as part of the C_v calculation were used to locate clusters with identical conformational memberships on independently trained SOMs, even those with different dimensions. The χ^2 values could relate the different SOM partitionings to each other.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Euclidean distance matrix analysis (EDMA) methods are used to distinguish whether or not significant difference exists between conformational samples of antibody complementarity determining region (CDR) loops, isolated LI loop and LI in three-loop assembly (LI, L3 and H3) obtained from Monte Carlo simulation. After the significant difference is detected, the specific inter-Ca distance which contributes to the difference is identified using EDMA.The estimated and improved mean forms of the conformational samples of isolated LI loop and LI loop in three-loop assembly, CDR loops of antibody binding site, are described using EDMA and distance geometry (DGEOM). To the best of our knowledge, it is the first time the EDMA methods are used to analyze conformational samples of molecules obtained from Monte Carlo simulations. Therefore, validations of the EDMA methods using both positive control and negative control tests for the conformational samples of isolated LI loop and LI in three-loop assembly must be done. The EDMA-I bootstrap null hypothesis tests showed false positive results for the comparison of six samples of the isolated LI loop and true positive results for comparison of conformational samples of isolated LI loop and LI in three-loop assembly. The bootstrap confidence interval tests revealed true negative results for comparisons of six samples of the isolated LI loop, and false negative results for the conformational comparisons between isolated LI loop and LI in three-loop assembly. Different conformational sample sizes are further explored by combining the samples of isolated LI loop to increase the sample size, or by clustering the sample using self-organizing map (SOM) to narrow the conformational distribution of the samples being comparedmolecular conformations. However, there is no improvement made for both bootstrap null hypothesis and confidence interval tests. These results show that more work is required before EDMA methods can be used reliably as a method for comparison of samples obtained by Monte Carlo simulations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Remote sensing techniques involving hyperspectral imagery have applications in a number of sciences that study some aspects of the surface of the planet. The analysis of hyperspectral images is complex because of the large amount of information involved and the noise within that data. Investigating images with regard to identify minerals, rocks, vegetation and other materials is an application of hyperspectral remote sensing in the earth sciences. This thesis evaluates the performance of two classification and clustering techniques on hyperspectral images for mineral identification. Support Vector Machines (SVM) and Self-Organizing Maps (SOM) are applied as classification and clustering techniques, respectively. Principal Component Analysis (PCA) is used to prepare the data to be analyzed. The purpose of using PCA is to reduce the amount of data that needs to be processed by identifying the most important components within the data. A well-studied dataset from Cuprite, Nevada and a dataset of more complex data from Baffin Island were used to assess the performance of these techniques. The main goal of this research study is to evaluate the advantage of training a classifier based on a small amount of data compared to an unsupervised method. Determining the effect of feature extraction on the accuracy of the clustering and classification method is another goal of this research. This thesis concludes that using PCA increases the learning accuracy, and especially so in classification. SVM classifies Cuprite data with a high precision and the SOM challenges SVM on datasets with high level of noise (like Baffin Island).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Self-dual doubly even linear binary error-correcting codes, often referred to as Type II codes, are codes closely related to many combinatorial structures such as 5-designs. Extremal codes are codes that have the largest possible minimum distance for a given length and dimension. The existence of an extremal (72,36,16) Type II code is still open. Previous results show that the automorphism group of a putative code C with the aforementioned properties has order 5 or dividing 24. In this work, we present a method and the results of an exhaustive search showing that such a code C cannot admit an automorphism group Z6. In addition, we present so far unpublished construction of the extended Golay code by P. Becker. We generalize the notion and provide example of another Type II code that can be obtained in this fashion. Consequently, we relate Becker's construction to the construction of binary Type II codes from codes over GF(2^r) via the Gray map.