793 resultados para WHIM DESCRIPTORS
Resumo:
A visualization plot of a data set of molecular data is a useful tool for gaining insight into a set of molecules. In chemoinformatics, most visualization plots are of molecular descriptors, and the statistical model most often used to produce a visualization is principal component analysis (PCA). This paper takes PCA, together with four other statistical models (NeuroScale, GTM, LTM, and LTM-LIN), and evaluates their ability to produce clustering in visualizations not of molecular descriptors but of molecular fingerprints. Two different tasks are addressed: understanding structural information (particularly combinatorial libraries) and relating structure to activity. The quality of the visualizations is compared both subjectively (by visual inspection) and objectively (with global distance comparisons and local k-nearest-neighbor predictors). On the data sets used to evaluate clustering by structure, LTM is found to perform significantly better than the other models. In particular, the clusters in LTM visualization space are consistent with the relationships between the core scaffolds that define the combinatorial sublibraries. On the data sets used to evaluate clustering by activity, LTM again gives the best performance but by a smaller margin. The results of this paper demonstrate the value of using both a nonlinear projection map and a Bernoulli noise model for modeling binary data.
Resumo:
Recent and potential changes in technology have resulted in the anticipation of increases in the frequency of job changes. This has led manpower policy makers to investigate the feasibility of incorporating the employment skills of job groups in the general prediction of future job learning and performance with a view to the establishment of "job families" within which transfer might be considered reciprocally high. A structured job analysis instrument (the Position Analysis Questionnaire) is evaluated in terms of two distinct sets of scores; job dimensions and synthetically established attribute/trait profiles. Studies demonstrate that estimates of a job's structure/dimensions and requisite human attributes can be reliably established. Three alternative techniques of statistically assembling profiles of the requisite human attributes for jobs are found to have differential levels of reliability and differential degrees of validity in their estimation of the "actual" ability requirements of jobs. The utility of these two sets of job descriptors to serve as representations of the cognitive structure similarity of job groups is investigated in a study which simulates a job transfer situation. The central role of the index of similarity used to assess the relationship between "target" and "present" job is demonstrated. The relative extents to which job structure similarity and job attribute similariity are associated with positive transfer are investigated. The studies demonstrate that the dimensions of jobs, and more fruitfully their requisite human attributes can serve as bases to predict job transfer learning and performance. The nature of the index of similarity used to optimally formulate predictions of transfer is such that networks of jobs might be establishable to which current job incumbents could be expected to transfer positively. The derivation of "job families" with anticipated reciprocal transfer consequences is considered to be less appropriate.
Resumo:
This paper presents a new method for human face recognition by utilizing Gabor-based region covariance matrices as face descriptors. Both pixel locations and Gabor coefficients are employed to form the covariance matrices. Experimental results demonstrate the advantages of this proposed method.
Resumo:
Minimal access procedures in surgery offer benefits of reduced patient recovery time and less pain, yet for the surgeon the task is more complex, as both tactile and visual perception of the working site is reduced. In this paper, experimental evidence of the performance of a novel sensing system embedded in an actuated flexible digit element is presented. The digit represents a steerable tip element of devices such as endoscopes and laparoscopes. This solution is able to discriminate types of contact and tissue interaction, and to feed back this information with the shape of the flexible digit. As an alternative to this information, force level, force distribution, and other quantifiable descriptors can also be evaluated. These can be used to aid perception in processes such as navigation and investigation of tissues through palpation. The solution is pragmatic, and by virtue of its efficient mechanical construction and a polymer construction, it offers opportunities for a disposable element with suitability for magnetic resonance imaging (MRI) and other scanning environments. By using only four photonics sensing elements, full perception of tissue contact and the shape of the actuated digit can be described in the feedback of this information. The distributive sensory method applied to the sensory signals relies on the coupled values of the sensory data transients of the four deployed sensing elements to discriminate tissue interaction directly in near real time.
Resumo:
Peptides are of great therapeutic potential as vaccines and drugs. Knowledge of physicochemical descriptors, including the partition coefficient P (commonly expressed in logarithm form: logP), is useful for screening out unsuitable molecules and also for the development of predictive Quantitative Structure-Activity Relationships (QSARs). In this paper we develop a new approach to the prediction of LogP values for peptides based on an empirical relationship between global molecular properties and measured physical properties. Our method was successful in terms of peptide prediction (total r2 = 0.641). The final model consisted of 5 physicochemical descriptors (molecular weight, number of single bonds, 2D-VDW volume, 2D-VSA hydrophobic and 2D-VSA polar). The approach is peptide specific and its predictive accuracy was high. Overall, 67% of the peptides were able to be predicted within +/-0.5 log units from the experimental values. Our method thus represents a novel prediction method with proven predictive ability.
Resumo:
Peptides are of great therapeutic potential as vaccines and drugs. Knowledge of physicochemical descriptors, including the partition coefficient logP, is useful for the development of predictive Quantitative Structure-Activity Relationships (QSARs). We have investigated the accuracy of available programs for the prediction of logP values for peptides with known experimental values obtained from the literature. Eight prediction programs were tested, of which seven programs were fragment-based methods: XLogP, LogKow, PLogP, ACDLogP, AlogP, Interactive Analysis's LogP and MlogP; and one program used a whole molecule approach: QikProp. The predictive accuracy of the programs was assessed using r(2) values, with ALogP being the most effective (r( 2) = 0.822) and MLogP the least (r(2) = 0.090). We also examined three distinct types of peptide structure: blocked, unblocked, and cyclic. For each study (all peptides, blocked, unblocked and cyclic peptides) the performance of programs rated from best to worse is as follows: all peptides - ALogP, QikProp, PLogP, XLogP, IALogP, LogKow, ACDLogP, and MlogP; blocked peptides - PLogP, XLogP, ACDLogP, IALogP, LogKow, QikProp, ALogP, and MLogP; unblocked peptides - QikProp, IALogP, ALogP, ACDLogP, MLogP, XLogP, LogKow and PLogP; cyclic peptides - LogKow, ALogP, XLogP, MLogP, QikProp, ACDLogP, IALogP. In summary, all programs gave better predictions for blocked peptides, while, in general, logP values for cyclic peptides were under-predicted and those of unblocked peptides were over-predicted.
Resumo:
The simulated classical dynamics of a small molecule exhibiting self-organizing behavior via a fast transition between two states is analyzed by calculation of the statistical complexity of the system. It is shown that the complexity of molecular descriptors such as atom coordinates and dihedral angles have different values before and after the transition. This provides a new tool to identify metastable states during molecular self-organization. The highly concerted collective motion of the molecule is revealed. Low-dimensional subspaces dynamics is found sensitive to the processes in the whole, high-dimensional phase space of the system. © 2004 Wiley Periodicals, Inc.
Resumo:
Minimal access procedures in surgery offer benefits of reduced patient recovery time and less pain, yet for the surgeon the task is more complex, as both tactile and visual perception of the working site is reduced. In this paper, experimental evidence of the performance of a novel sensing system embedded in an actuated flexible digit element is presented. The digit represents a steerable tip element of devices such as endoscopes and laparoscopes. This solution is able to discriminate types of contact and tissue interaction, and to feed back this information with the shape of the flexible digit. As an alternative to this information, force level, force distribution, and other quantifiable descriptors can also be evaluated. These can be used to aid perception in processes such as navigation and investigation of tissues through palpation. The solution is pragmatic, and by virtue of its efficient mechanical construction and a polymer construction, it offers opportunities for a disposable element with suitability for magnetic resonance imaging (MRI) and other scanning environments. By using only four photonics sensing elements, full perception of tissue contact and the shape of the actuated digit can be described in the feedback of this information. The distributive sensory method applied to the sensory signals relies on the coupled values of the sensory data transients of the four deployed sensing elements to discriminate tissue interaction directly in near real time.
Resumo:
A proteochemometrics approach was applied to a set of 2666 peptides binding to 12 HLA-DRB1 proteins. Sequences of both peptide and protein were described using three z-descriptors. Cross terms accounting for adjacent positions and for every second position in the peptides were included in the models, as well as cross terms for peptide/protein interactions. Models were derived based on combinations of different blocks of variables. These models had moderate goodness of fit, as expressed by r2, which ranged from 0.685 to 0.732; and good cross-validated predictive ability, as expressed by q2, which varied from 0.678 to 0.719. The external predictive ability was tested using a set of 356 HLA-DRB1 binders, which showed an r2(pred) in the range 0.364-0.530. Peptide and protein positions involved in the interactions were analyzed in terms of hydrophobicity, steric bulk and polarity.
Resumo:
MOTIVATION: There is much interest in reducing the complexity inherent in the representation of the 20 standard amino acids within bioinformatics algorithms by developing a so-called reduced alphabet. Although there is no universally applicable residue grouping, there are numerous physiochemical criteria upon which one can base groupings. Local descriptors are a form of alignment-free analysis, the efficiency of which is dependent upon the correct selection of amino acid groupings. RESULTS: Within the context of G-protein coupled receptor (GPCR) classification, an optimization algorithm was developed, which was able to identify the most efficient grouping when used to generate local descriptors. The algorithm was inspired by the relatively new computational intelligence paradigm of artificial immune systems. A number of amino acid groupings produced by this algorithm were evaluated with respect to their ability to generate local descriptors capable of providing an accurate classification algorithm for GPCRs.
Resumo:
Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.
Resumo:
In this paper a review of the most used MPEG-7 descriptors are presented. Some considerations for choosing the most proper descriptor for a particular image or video data set are outlined.
Resumo:
As the volume of image data and the need of using it in various applications is growing significantly in the last days it brings a necessity of retrieval efficiency and effectiveness. Unfortunately, existing indexing methods are not applicable to a wide range of problem-oriented fields due to their operating time limitations and strong dependency on the traditional descriptors extracted from the image. To meet higher requirements, a novel distance-based indexing method for region-based image retrieval has been proposed and investigated. The method creates premises for considering embedded partitions of images to carry out the search with different refinement or roughening level and so to seek the image meaningful content.
Resumo:
Allergy is an overreaction by the immune system to a previously encountered, ordinarily harmless substance - typically proteins - resulting in skin rash, swelling of mucous membranes, sneezing or wheezing, or other abnormal conditions. The use of modified proteins is increasingly widespread: their presence in food, commercial products, such as washing powder, and medical therapeutics and diagnostics, makes predicting and identifying potential allergens a crucial societal issue. The prediction of allergens has been explored widely using bioinformatics, with many tools being developed in the last decade; many of these are freely available online. Here, we report a set of novel models for allergen prediction utilizing amino acid E-descriptors, auto- and cross-covariance transformation, and several machine learning methods for classification, including logistic regression (LR), decision tree (DT), naïve Bayes (NB), random forest (RF), multilayer perceptron (MLP) and k nearest neighbours (kNN). The best performing method was kNN with 85.3% accuracy at 5-fold cross-validation. The resulting model has been implemented in a revised version of the AllerTOP server (http://www.ddg-pharmfac.net/AllerTOP). © Springer-Verlag 2014.
Classification of Paintings by Artist, Movement, and Indoor Setting Using MPEG-7 Descriptor Features
Resumo:
ACM Computing Classification System (1998): I.4.9, I.4.10.