11 resultados para Hierarchical Spatial Classification

em Aston University Research Archive


Relevância:

40.00% 40.00%

Publicador:

Resumo:

G protein-coupled receptors (GPCRs) play important physiological roles transducing extracellular signals into intracellular responses. Approximately 50% of all marketed drugs target a GPCR. There remains considerable interest in effectively predicting the function of a GPCR from its primary sequence.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We address the important bioinformatics problem of predicting protein function from a protein's primary sequence. We consider the functional classification of G-Protein-Coupled Receptors (GPCRs), whose functions are specified in a class hierarchy. We tackle this task using a novel top-down hierarchical classification system where, for each node in the class hierarchy, the predictor attributes to be used in that node and the classifier to be applied to the selected attributes are chosen in a data-driven manner. Compared with a previous hierarchical classification system selecting classifiers only, our new system significantly reduced processing time without significantly sacrificing predictive accuracy.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

MOTIVATION: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. RESULTS: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The chemical functionality within porous architectures dictates their performance as heterogeneous catalysts; however, synthetic routes to control the spatial distribution of individual functions within porous solids are limited. Here we report the fabrication of spatially orthogonal bifunctional porous catalysts, through the stepwise template removal and chemical functionalization of an interconnected silica framework. Selective removal of polystyrene nanosphere templates from a lyotropic liquid crystal-templated silica sol–gel matrix, followed by extraction of the liquid crystal template, affords a hierarchical macroporous–mesoporous architecture. Decoupling of the individual template extractions allows independent functionalization of macropore and mesopore networks on the basis of chemical and/or size specificity. Spatial compartmentalization of, and directed molecular transport between, chemical functionalities affords control over the reaction sequence in catalytic cascades; herein illustrated by the Pd/Pt-catalysed oxidation of cinnamyl alcohol to cinnamic acid. We anticipate that our methodology will prompt further design of multifunctional materials comprising spatially compartmentalized functions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing geographical patterns by collocating events, objects or their attributes has a long history in surveillance and monitoring, and is particularly applied in environmental contexts, such as ecology or epidemiology. The identification of patterns or structures at some scales can be addressed using spatial statistics, particularly marked point processes methodologies. Classification and regression trees are also related to this goal of finding "patterns" by deducing the hierarchy of influence of variables on a dependent outcome. Such variable selection methods have been applied to spatial data, but, often without explicitly acknowledging the spatial dependence. Many methods routinely used in exploratory point pattern analysis are2nd-order statistics, used in a univariate context, though there is also a wide literature on modelling methods for multivariate point pattern processes. This paper proposes an exploratory approach for multivariate spatial data using higher-order statistics built from co-occurrences of events or marks given by the point processes. A spatial entropy measure, derived from these multinomial distributions of co-occurrences at a given order, constitutes the basis of the proposed exploratory methods. © 2010 Elsevier Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The number of remote sensing platforms and sensors rises almost every year, yet much work on the interpretation of land cover is still carried out using either single images or images from the same source taken at different dates. Two questions could be asked of this proliferation of images: can the information contained in different scenes be used to improve the classification accuracy and, what is the best way to combine the different imagery? Two of these multiple image sources are MODIS on the Terra platform and ETM+ on board Landsat7, which are suitably complementary. Daily MODIS images with 36 spectral bands in 250-1000 m spatial resolution and seven spectral bands of ETM+ with 30m and 16 days spatial and temporal resolution respectively are available. In the UK, cloud cover may mean that only a few ETM+ scenes may be available for any particular year and these may not be at the time of year of most interest. The MODIS data may provide information on land cover over the growing season, such as harvest dates, that is not present in the ETM+ data. Therefore, the primary objective of this work is to develop a methodology for the integration of medium spatial resolution Landsat ETM+ image, with multi-temporal, multi-spectral, low-resolution MODIS \Terra images, with the aim of improving the classification of agricultural land. Additionally other data may also be incorporated such as field boundaries from existing maps. When classifying agricultural land cover of the type seen in the UK, where crops are largely sown in homogenous fields with clear and often mapped boundaries, the classification is greatly improved using the mapped polygons and utilising the classification of the polygon as a whole as an apriori probability in classifying each individual pixel using a Bayesian approach. When dealing with multiple images from different platforms and dates it is highly unlikely that the pixels will be exactly co-registered and these pixels will contain a mixture of different real world land covers. Similarly the different atmospheric conditions prevailing during the different days will mean that the same emission from the ground will give rise to different sensor reception. Therefore, a method is presented with a model of the instantaneous field of view and atmospheric effects to enable different remote sensed data sources to be integrated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Classical studies of area summation measure contrast detection thresholds as a function of grating diameter. Unfortunately, (i) this approach is compromised by retinal inhomogeneity and (ii) it potentially confounds summation of signal with summation of internal noise. The Swiss cheese stimulus of T. S. Meese and R. J. Summers (2007) and the closely related Battenberg stimulus of T. S. Meese (2010) were designed to avoid these problems by keeping target diameter constant and modulating interdigitated checks of first-order carrier contrast within the stimulus region. This approach has revealed a contrast integration process with greater potency than the classical model of spatial probability summation. Here, we used Swiss cheese stimuli to investigate the spatial limits of contrast integration over a range of carrier frequencies (1–16 c/deg) and raised plaid modulator frequencies (0.25–32 cycles/check). Subthreshold summation for interdigitated carrier pairs remained strong (~4 to 6 dB) up to 4 to 8 cycles/check. Our computational analysis of these results implied linear signal combination (following square-law transduction) over either (i) 12 carrier cycles or more or (ii) 1.27 deg or more. Our model has three stages of summation: short-range summation within linear receptive fields, medium-range integration to compute contrast energy for multiple patches of the image, and long-range pooling of the contrast integrators by probability summation. Our analysis legitimizes the inclusion of widespread integration of signal (and noise) within hierarchical image processing models. It also confirms the individual differences in the spatial extent of integration that emerge from our approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Remote sensing data is routinely used in ecology to investigate the relationship between landscape pattern as characterised by land use and land cover maps, and ecological processes. Multiple factors related to the representation of geographic phenomenon have been shown to affect characterisation of landscape pattern resulting in spatial uncertainty. This study investigated the effect of the interaction between landscape spatial pattern and geospatial processing methods statistically; unlike most papers which consider the effect of each factor in isolation only. This is important since data used to calculate landscape metrics typically undergo a series of data abstraction processing tasks and are rarely performed in isolation. The geospatial processing methods tested were the aggregation method and the choice of pixel size used to aggregate data. These were compared to two components of landscape pattern, spatial heterogeneity and the proportion of landcover class area. The interactions and their effect on the final landcover map were described using landscape metrics to measure landscape pattern and classification accuracy (response variables). All landscape metrics and classification accuracy were shown to be affected by both landscape pattern and by processing methods. Large variability in the response of those variables and interactions between the explanatory variables were observed. However, even though interactions occurred, this only affected the magnitude of the difference in landscape metric values. Thus, provided that the same processing methods are used, landscapes should retain their ranking when their landscape metrics are compared. For example, highly fragmented landscapes will always have larger values for the landscape metric "number of patches" than less fragmented landscapes. But the magnitude of difference between the landscapes may change and therefore absolute values of landscape metrics may need to be interpreted with caution. The explanatory variables which had the largest effects were spatial heterogeneity and pixel size. These explanatory variables tended to result in large main effects and large interactions. The high variability in the response variables and the interaction of the explanatory variables indicate it would be difficult to make generalisations about the impact of processing on landscape pattern as only two processing methods were tested and it is likely that untested processing methods will potentially result in even greater spatial uncertainty. © 2013 Elsevier B.V.