5 resultados para Fingerprints.
em Aston University Research Archive
Resumo:
A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.
Resumo:
Combinatorial libraries continue to play a key role in drug discovery. To increase structural diversity, several experimental methods have been developed. However, limited efforts have been performed so far to quantify the diversity of the broadly used diversity-oriented synthetic (DOS) libraries. Herein we report a comprehensive characterization of 15 bis-diazacyclic combinatorial libraries obtained through libraries from libraries, which is a DOS approach. Using MACCS keys, radial and different pharmacophoric fingerprints as well as six molecular properties, it was demonstrated the increased structural and property diversity of the libraries from libraries over the individual libraries. Comparison of the libraries to existing drugs, NCI Diversity and the Molecular Libraries Small Molecule Repository revealed the structural uniqueness of the combinatorial libraries (mean similarity < 0.5 for any fingerprint representation). In particular, bis-cyclic thiourea libraries were the most structurally dissimilar to drugs retaining drug-like character in property space. This study represents the first comprehensive quantification of the diversity of libraries from libraries providing a solid quantitative approach to compare and contrast the diversity of DOS libraries with existing drugs or any other compound collection.
Resumo:
Short rotation willow coppice (SRC) has been investigated for the influence of K, Ca, Mg, Fe and P on its pyrolysis and combustion behaviours. These metals are the typical components that appear in biomass. The willow sample was pretreated to remove salts and metals by hydrochloric acid, and this demineralised sample was impregnated with each individual metal at the same mol g biomass (2.4 × 10 mol g demineralised willow). Characterisation was performed using thermogravimetric analysis (TGA), and differential thermal analysis (DTA) for combustion. In pyrolysis, volatile fingerprints were measured by means of pyrolysis-gas chromatography-mass spectrometry (PY-GC-MS). The yields and distribution of pyrolysis products have been influenced by the presence of the catalysts. Most notably, both potassium and phosphorous strongly catalysed the pyrolysis, modifying both the yield and distribution of reaction products. Temperature programmed combustion TGA indicates that combustion of biomass char is catalysed by all the metals, while phosphorus strongly inhibits the char combustion. In this case, combustion rates follow the order for volatile release/combustion: P>K>Fe>Raw>HCl>Mg>Ca, and for char combustion K>Fe>raw>Ca-Mg>HCl>P. The samples impregnated with phosphorus and potassium were also studied for combustion under flame conditions, and the same trend was observed, i.e. both potassium and phosphorus catalyse the volatile release/combustion, while, in char combustion, potassium is a catalyst and phosphorus a strong inhibitor, i.e. K impregnated>(faster than) raw>demineralised»P impregnated.
Resumo:
In this paper, we proposed a new method using long digital straight segments (LDSSs) for fingerprint recognition based on such a discovery that LDSSs in fingerprints can accurately characterize the global structure of fingerprints. Different from the estimation of orientation using the slope of the straight segments, the length of LDSSs provides a measure for stability of the estimated orientation. In addition, each digital straight segment can be represented by four parameters: x-coordinate, y-coordinate, slope and length. As a result, only about 600 bytes are needed to store all the parameters of LDSSs of a fingerprint, as is much less than the storage orientation field needs. Finally, the LDSSs can well capture the structural information of local regions. Consequently, LDSSs are more feasible to apply to the matching process than orientation fields. The experiments conducted on fingerprint databases FVC2002 DB3a and DB4a show that our method is effective.
Resumo:
Classification of MHC molecules into supertypes in terms of peptide-binding specificities is an important issue, with direct implications for the development of epitope-based vaccines with wide population coverage. In view of extremely high MHC polymorphism (948 class I and 633 class II HLA alleles) the experimental solution of this task is presently impossible. In this study, we describe a bioinformatics strategy for classifying MHC molecules into supertypes using information drawn solely from three-dimensional protein structure. Two chemometric techniques–hierarchical clustering and principal component analysis–were used independently on a set of 783 HLA class I molecules to identify supertypes based on structural similarities and molecular interaction fields calculated for the peptide binding site. Eight supertypes were defined: A2, A3, A24, B7, B27, B44, C1, and C4. The two techniques gave 77% consensus, i.e., 605 HLA class I alleles were classified in the same supertype by both methods. The proposed strategy allowed “supertype fingerprints to be identified. Thus, the A2 supertype fingerprint is Tyr9/Phe9, Arg97, and His114 or Tyr116; the A3-Tyr9/Phe9/Ser9, Ile97/Met97 and Glu114 or Asp116; the A24-Ser9 and Met97; the B7-Asn63 and Leu81; the B27-Glu63 and Leu81; for B44-Ala81; the C1-Ser77; and the C4-Asn77. action fields calculated for the peptide binding site. Eight supertypes were defined: A2, A3, A24, B7, B27, B44, C1, and C4. The two techniques gave 77% consensus, i.e., 605 HLA class I alleles were classified in the same supertype by both methods. The proposed strategy allowed “supertype fingerprints to be identified. Thus, the A2 supertype fingerprint is Tyr9/Phe9, Arg97, and His114 or Tyr116; the A3-Tyr9/Phe9/Ser9, Ile97/Met97 and Glu114 or Asp116; the A24-Ser9 and Met97; the B7-Asn63 and Leu81; the B27-Glu63 and Leu81; for B44-Ala81; the C1-Ser77; and the C4-Asn77.