914 resultados para High-dimensional index structure


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The structure of a novel plant defensin isolated from the flowers of Petunia hybrida has been determined by H-1 NMR spectroscopy. P. hybrida defensin 1 (PhD1) is a basic, cysteine-rich, antifungal protein of 47 residues and is the first example of a new subclass of plant defensins with five disulfide bonds whose structure has been determined. PhD1 has the fold of the cysteine-stabilized alphabeta motif, consisting of an alpha-helix and a triple-stranded antiparallel beta-sheet, except that it contains a fifth disulfide bond from the first loop to the alpha-helix. The additional disulfide bond is accommodated in PhD1 without any alteration of its tertiary structure with respect to other plant defensins. Comparison of its structure with those of classic, four-disulfide defensins has allowed us to identify a previously unrecognized hydrogen bond network that is integral to structure stabilization in the family.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The water characteristics in cooked pressure-heat treated (45 degreesC for 45 min prior to pressurisation at 150 MPa for 30 min) and non-pressurised, cooked (control) samples of beef Longissimus aged for 1, 3, 8 or 16 days were studied by nuclear magnetic resonance microscopy. A multi-echo sequence was used to obtain T2 images, and independent of ageing period, the T2 values were found to be lower in pressure-heat treated meat revealing alterations in water characteristics of pressure-treated, cooked meat compared with cooked meat. With increasing ageing duration, the T2 values in both pressure-treated, cooked and cooked meat decreased indicating that the water became more tightly trapped in the protein network. In addition, independent of length of ageing period the relationship between cooking loss in the cooked meat and transverse relaxation differed between non-pressurised and pressure-treated meat. which reveals that the mechanisms changing the water properties in beef during ageing are different from those occuring during pressure-heat treatment of meat. (C) 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The complex mixture of biologically active peptides that constitute the venom of Conus species provides a rich source of ion channel neurotoxins. These peptides, commonly known as conotoxins, exhibit a high degree of selectivity and potency for different ion channels and their subtypes making them invaluable tools for unravelling the secrets of the nervous system. Furthermore, several conotoxin molecules have profound applications in drug discovery, with some examples currently undergoing clinical trials. Despite their relatively easy access by chemical synthesis, rapid access to libraries of conotoxin analogues for use in structure-activity relationship studies still poses a significant limitation. This is exacerbated in conotoxins containing multiple disulfide bonds, which often require synthetic strategies utilising several steps. This review will examine the structure and activity of some of the known classes of conotoxins and will highlight their potential as neuropharmacological tools and as drug leads. Some of the classical and more recent approaches to the chemical synthesis of conotoxins, particularly with respect to the controlled formation of disulfide bonds will be discussed in detail. Finally, some examples of structure-activity relationship studies will be discussed, as well as some novel approaches for designing conotoxin analogues.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Based on a newly established sequencing strategy featured by its efficiency, simplicity, and easy manipulation, the sequences of four novel cyclotides (macrocyclic knotted proteins) isolated from an Australian plant Viola hederaceae were determined. The three-dimensional solution structure of V. hederaceae leaf cyclotide-1 ( vhl-1), a leaf-specific expressed 31-residue cyclotide, has been determined using two-dimensional H-1 NMR spectroscopy. vhl-1 adopts a compact and well defined structure including a distorted triple-stranded β- sheet, a short 310 helical segment and several turns. It is stabilized by three disulfide bonds, which, together with backbone segments, form a cyclic cystine knot motif. The three-disulfide bonds are almost completely buried into the protein core, and the six cysteines contribute only 3.8% to the molecular surface. A pH titration experiment revealed that the folding of vhl-1 shows little pH dependence and allowed the pK(a) of 3.0 for Glu(3) and ∼ 5.0 for Glu(14) to be determined. Met(7) was found to be oxidized in the native form, consistent with the fact that its side chain protrudes into the solvent, occupying 7.5% of the molecular surface. vhl-1 shows anti-HIV activity with an EC50 value of 0.87 μ m.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Acetohydroxyacid synthase (AHAS; EC 2.2.1.6) catalyzes the first common step in branched-chain amino acid biosynthesis. The enzyme is inhibited by several chemical classes of compounds and this inhibition is the basis of action of the sulfonylurea and imidazolinone herbicides. The commercial sulfonylureas contain a pyrimidine or a triazine ring that is substituted at both meta positions, thus obeying the initial rules proposed by Levitt. Here we assess the activity of 69 monosubstituted sulfonylurea analogs and related compounds as inhibitors of pure recombinant Arabidopsis thaliana AHAS and show that disubstitution is not absolutely essential as exemplified by our novel herbicide, monosulfuron (2-nitro-N-(4'-methyl-pyrimidin-2'-yl) phenyl-sulfonylurea), which has a pyrimidine ring with a single meta substituent. A subset of these compounds was tested for herbicidal activity and it was shown that their effect in vivo correlates well with their potency in vitro as AHAS inhibitors. Three-dimensional quantitative structure-activity relationships were developed using comparative molecular field analysis and comparative molecular similarity indices analysis. For the latter, the best result was obtained when steric, electrostatic, hydrophobic and H-bond acceptor factors were taken into consideration. The resulting fields were mapped on to the published crystal structure of the yeast enzyme and it was shown that the steric and hydrophobic fields are in good agreement with sulfonylurea-AHAS interaction geometry.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Conotoxins are small conformationally constrained peptides found in the venom of marine snails of the genus Conus. They are usually cysteine rich and frequently contain a high degree of post-translational modifications such as C-terminal amidation, hydroxylation, carboxylation, bromination, epimerisation and glycosylation. Here we review the role of NMR in determining the three-dimensional structures of conotoxins and also provide a compilation and analysis of H-1 and C-13 chemical shifts of post-translationally modified amino acids and compare them with data from common amino acids. This analysis provides a reference source for chemical shifts of post-translationally modified amino acids. Copyright (C) 2006 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis is a study of the generation of topographic mappings - dimension reducing transformations of data that preserve some element of geometric structure - with feed-forward neural networks. As an alternative to established methods, a transformational variant of Sammon's method is proposed, where the projection is effected by a radial basis function neural network. This approach is related to the statistical field of multidimensional scaling, and from that the concept of a 'subjective metric' is defined, which permits the exploitation of additional prior knowledge concerning the data in the mapping process. This then enables the generation of more appropriate feature spaces for the purposes of enhanced visualisation or subsequent classification. A comparison with established methods for feature extraction is given for data taken from the 1992 Research Assessment Exercise for higher educational institutions in the United Kingdom. This is a difficult high-dimensional dataset, and illustrates well the benefit of the new topographic technique. A generalisation of the proposed model is considered for implementation of the classical multidimensional scaling (¸mds}) routine. This is related to Oja's principal subspace neural network, whose learning rule is shown to descend the error surface of the proposed ¸mds model. Some of the technical issues concerning the design and training of topographic neural networks are investigated. It is shown that neural network models can be less sensitive to entrapment in the sub-optimal global minima that badly affect the standard Sammon algorithm, and tend to exhibit good generalisation as a result of implicit weight decay in the training process. It is further argued that for ideal structure retention, the network transformation should be perfectly smooth for all inter-data directions in input space. Finally, there is a critique of optimisation techniques for topographic mappings, and a new training algorithm is proposed. A convergence proof is given, and the method is shown to produce lower-error mappings more rapidly than previous algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The generative topographic mapping (GTM) model was introduced by Bishop et al. (1998, Neural Comput. 10(1), 215-234) as a probabilistic re- formulation of the self-organizing map (SOM). It offers a number of advantages compared with the standard SOM, and has already been used in a variety of applications. In this paper we report on several extensions of the GTM, including an incremental version of the EM algorithm for estimating the model parameters, the use of local subspace models, extensions to mixed discrete and continuous data, semi-linear models which permit the use of high-dimensional manifolds whilst avoiding computational intractability, Bayesian inference applied to hyper-parameters, and an alternative framework for the GTM based on Gaussian processes. All of these developments directly exploit the probabilistic structure of the GTM, thereby allowing the underlying modelling assumptions to be made explicit. They also highlight the advantages of adopting a consistent probabilistic framework for the formulation of pattern recognition algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis describes the Generative Topographic Mapping (GTM) --- a non-linear latent variable model, intended for modelling continuous, intrinsically low-dimensional probability distributions, embedded in high-dimensional spaces. It can be seen as a non-linear form of principal component analysis or factor analysis. It also provides a principled alternative to the self-organizing map --- a widely established neural network model for unsupervised learning --- resolving many of its associated theoretical problems. An important, potential application of the GTM is visualization of high-dimensional data. Since the GTM is non-linear, the relationship between data and its visual representation may be far from trivial, but a better understanding of this relationship can be gained by computing the so-called magnification factor. In essence, the magnification factor relates the distances between data points, as they appear when visualized, to the actual distances between those data points. There are two principal limitations of the basic GTM model. The computational effort required will grow exponentially with the intrinsic dimensionality of the density model. However, if the intended application is visualization, this will typically not be a problem. The other limitation is the inherent structure of the GTM, which makes it most suitable for modelling moderately curved probability distributions of approximately rectangular shape. When the target distribution is very different to that, theaim of maintaining an `interpretable' structure, suitable for visualizing data, may come in conflict with the aim of providing a good density model. The fact that the GTM is a probabilistic model means that results from probability theory and statistics can be used to address problems such as model complexity. Furthermore, this framework provides solid ground for extending the GTM to wider contexts than that of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases, visual observation and the quantitative quality measures have shown better visualisation at lower levels.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Failure to detect patients at risk of attempting suicide can result in tragic consequences. Identifying risks earlier and more accurately helps prevent serious incidents occurring and is the objective of the GRiST clinical decision support system (CDSS). One of the problems it faces is high variability in the type and quantity of data submitted for patients, who are assessed in multiple contexts along the care pathway. Although GRiST identifies up to 138 patient cues to collect, only about half of them are relevant for any one patient and their roles may not be for risk evaluation but more for risk management. This paper explores the data collection behaviour of clinicians using GRiST to see whether it can elucidate which variables are important for risk evaluations and when. The GRiST CDSS is based on a cognitive model of human expertise manifested by a sophisticated hierarchical knowledge structure or tree. This structure is used by the GRiST interface to provide top-down controlled access to the patient data. Our research explores relationships between the answers given to these higher-level 'branch' questions to see whether they can help direct assessors to the most important data, depending on the patient profile and assessment context. The outcome is a model for dynamic data collection driven by the knowledge hierarchy. It has potential for improving other clinical decision support systems operating in domains with high dimensional data that are only partially collected and in a variety of combinations.