18 resultados para Landmark-based spectral clustering
em University of Queensland eSpace - Australia
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
The number of mammalian transcripts identified by full-length cDNA projects and genome sequencing projects is increasing remarkably. Clustering them into a strictly nonredundant and comprehensive set provides a platform for functional analysis of the transcriptome and proteome, but the quality of the clustering and predictive usefulness have previously required manual curation to identify truncated transcripts and inappropriate clustering of closely related sequences. A Representative Transcript and Protein Sets (RTPS) pipeline was previously designed to identify the nonredundant and comprehensive set of mouse transcripts based on clustering of a large mouse full-length cDNA set (FANTOM2). Here we propose an alternative method that is more robust, requires less manual curation, and is applicable to other organisms in addition to mouse. RTPSs of human, mouse, and rat have been produced by this method and used for validation. Their comprehensiveness and quality are discussed by comparison with other clustering approaches. The RTPSs are available at ftp://fantom2.gsc.riken.go.jp/RTPS/. (C). 2004 Elsevier Inc. All rights reserved.
Resumo:
A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.
Resumo:
We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Since the landmark contributions of Homer Smith and co-workers in the 1930s there has been a considerable advance in our knowledge regarding the osmoregulatory strategy of elasmobranch fish. Smith recognised that urea was retained in the body fluids as part of the 'osmoregulatory ballast' of elasmobranch fish so that body fluid osmolality is raised to a level that is iso- or slightly hyper-osmotic to that of the surrounding medium. From studies at that time he also postulated that many marine dwelling elasmobranchs were not capable of adaptation to dilute environments. However, more recent investigations have demonstrated that, at least in some species, this may not be the case. Gradual acclimation of marine dwelling elasmobranchs to varying environmental salinities under laboratory conditions has demonstrated that these fish do have the capacity to acclimate to changes in salinity through independent regulation of Na+, Cl- and urea levels. This suggests that many of the presumed stenohaline marine elasmobranchs could in fact be described as partially euryhaline. The contributions of Thomas Thorson in the 1970s demonstrated the osmoregulatory strategy of a fully euryhaline elasmobranch, the bull shark, Carcharhinus leucas, and more recent investigations have examined the mechanisms behind this strategy in the euryhaline elasmobranch, Dasyatis sabina. Both partially euryhaline and fully euryhaline species utilise the same physiological processes to control urea, Na+ and Cl- levels within the body fluids. The role of the gills, kidney, liver, rectal gland and drinking process is discussed in relation to the endocrine control of urea, Na+ and Cl- levels as elasmobranchs acclimate to different environmental salinities. (C) 2003 Elsevier Inc. All rights reserved.
Resumo:
Microspectrophotometric examination of the retina of a procellariiform marine bird, the wedge-tailed shearwater Puffinus pacificus, revealed the presence of five different types of vitamin A(1)-based visual pigment in seven different types of photoreceptor. A single class of rod contained a medium-wavelength sensitive visual pigment with a wavelength of maximum absorbance (lambda(max)) at 502 nm. Four different types of single cone contained visual pigments maximally sensitive in either the violet (VS, lambda(max) 406 nm), short (SWS, lambda(max) 450 nm), medium (MWS, lambda(max) 503 nm) or long (LWS, lambda(max) 566 nm) spectral ranges. In the peripheral retina, the SWS, MWS and LWS single cones contained pigmented oil droplets in their inner segments with cut-off wavelengths (lambda(cut)) at 445 (C-type), 506 (Y-type) and 562 nm (R-type), respectively. The VS visual pigment was paired with a transparent (T-type) oil droplet that displayed no significant absorption above at least 370 run. Both the principal and accessory members of the double cone pair contained the same 566 nm lambda(max) visual pigment as the LWS single cones but only the principal member contained an oil droplet, which had a lambda(cut) at 413 nm. The retina had a horizontal band or 'visual streak' of increased photoreceptor density running across the retina approximately 1.5 mm dorsal to the top of the pecten. Cones in the centre of the horizontal streak were smaller and had oil droplets that were either transparent/colourless or much less pigmented than at the periphery. It is proposed that the reduction in cone oil droplet pigmentation in retinal areas associated with high visual acuity is an adaptation to compensate for the reduced photon capture ability of the narrower photoreceptors found there. Measurements of the spectral transmittance of the ocular media reveal that wavelengths down to at least 300 nm would be transmitted to the retina.
Resumo:
In broader catchment scale investigations, there is a need to understand and ultimately exploit the spatial variation of agricultural crops for an improved economic return. In many instances, this spatial variation is temporally unstable and may be different for various crop attributes and crop species. In the Australian sugar industry, the opportunity arose to evaluate the performance of 231 farms in the Tully Mill area in far north Queensland using production information on cane yield (t/ha) and CCS ( a fresh weight measure of sucrose content in the cane) accumulated over a 12-year period. Such an arrangement of data can be expressed as a 3-way array where a farm x attribute x year matrix can be evaluated and interactions considered. Two multivariate techniques, the 3-way mixture method of clustering and the 3-mode principal component analysis, were employed to identify meaningful relationships between farms that performed similarly for both cane yield and CCS. In this context, farm has a spatial component and the aim of this analysis was to determine if systematic patterns in farm performance expressed by cane yield and CCS persisted over time. There was no spatial relationship between cane yield and CCS. However, the analysis revealed that the relationship between farms was remarkably stable from one year to the next for both attributes and there was some spatial aggregation of farm performance in parts of the mill area. This finding is important, since temporally consistent spatial variation may be exploited to improve regional production. Alternatively, the putative causes of the spatial variation may be explored to enhance the understanding of sugarcane production in the wet tropics of Australia.
Resumo:
Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
Resumo:
To characterize potential mechanism-based inactivation (MBI) of major human drug-metabolizing cytochromes P450 (CYP) by monoamine oxidase (MAO) inhibitors, including the antitubercular drug isoniazid. Human liver microsomal CYP1A2, CYP2C9, CYP2C19, CYP2D6 and CYP3A activities were investigated following co- and preincubation with MAO inhibitors. Inactivation kinetic constants (K-I and k(inact)) were determined where a significant preincubation effect was observed. Spectral studies were conducted to elucidate the mechanisms of inactivation. Hydrazine MAO inhibitors generally exhibited greater inhibition of CYP following preincubation, whereas this was less frequent for the propargylamines, and tranylcypromine and moclobemide. Phenelzine and isoniazid inactivated all CYP but were most potent toward CYP3A and CYP2C19. Respective inactivation kinetic constants (K-I and k(inact)) for isoniazid were 48.6 mu M and 0.042 min(-1) and 79.3 mu M and 0.039 min(-1). Clorgyline was a selective inactivator of CYP1A2 (6.8 mu M and 0.15 min(-1)). Inactivation of CYP was irreversible, consistent with metabolite-intermediate complexation for isoniazid and clorgyline, and haeme destruction for phenelzine. With the exception of phenelzine-mediated CYP3A inactivation, glutathione and superoxide dismutase failed to protect CYP from inactivation by isoniazid and phenelzine. Glutathione partially slowed (17%) the inactivation of CYP1A2 by clorgyline. Alternate substrates or inhibitors generally protected against CYP inactivation. These data are consistent with mechanism-based inactivation of human drug-metabolizing CYP enzymes and suggest that impaired metabolic clearance may contribute to clinical drug-drug interactions with some MAO inhibitors.
Resumo:
Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Resumo:
In this paper, we present ICICLE (Image ChainNet and Incremental Clustering Engine), a prototype system that we have developed to efficiently and effectively retrieve WWW images based on image semantics. ICICLE has two distinguishing features. First, it employs a novel image representation model called Weight ChainNet to capture the semantics of the image content. A new formula, called list space model, for computing semantic similarities is also introduced. Second, to speed up retrieval, ICICLE employs an incremental clustering mechanism, ICC (Incremental Clustering on ChainNet), to cluster images with similar semantics into the same partition. Each cluster has a summary representative and all clusters' representatives are further summarized into a balanced and full binary tree structure. We conducted an extensive performance study to evaluate ICICLE. Compared with some recently proposed methods, our results show that ICICLE provides better recall and precision. Our clustering technique ICC facilitates speedy retrieval of images without sacrificing recall and precision significantly.
Resumo:
This paper illustrates a method for finding useful visual landmarks for performing simultaneous localization and mapping (SLAM). The method is based loosely on biological principles, using layers of filtering and pooling to create learned templates that correspond to different views of the environment. Rather than using a set of landmarks and reporting range and bearing to the landmark, this system maps views to poses. The challenge is to produce a system that produces the same view for small changes in robot pose, but provides different views for larger changes in pose. The method has been developed to interface with the RatSLAM system, a biologically inspired method of SLAM. The paper describes the method of learning and recalling visual landmarks in detail, and shows the performance of the visual system in real robot tests.
Resumo:
This paper presents the implementation of a modified particle filter for vision-based simultaneous localization and mapping of an autonomous robot in a structured indoor environment. Through this method, artificial landmarks such as multi-coloured cylinders can be tracked with a camera mounted on the robot, and the position of the robot can be estimated at the same time. Experimental results in simulation and in real environments show that this approach has advantages over the extended Kalman filter with ambiguous data association and various levels of odometric noise.
Resumo:
Web transaction data between Web visitors and Web functionalities usually convey user task-oriented behavior pattern. Mining such type of click-stream data will lead to capture usage pattern information. Nowadays Web usage mining technique has become one of most widely used methods for Web recommendation, which customizes Web content to user-preferred style. Traditional techniques of Web usage mining, such as Web user session or Web page clustering, association rule and frequent navigational path mining can only discover usage pattern explicitly. They, however, cannot reveal the underlying navigational activities and identify the latent relationships that are associated with the patterns among Web users as well as Web pages. In this work, we propose a Web recommendation framework incorporating Web usage mining technique based on Probabilistic Latent Semantic Analysis (PLSA) model. The main advantages of this method are, not only to discover usage-based access pattern, but also to reveal the underlying latent factor as well. With the discovered user access pattern, we then present user more interested content via collaborative recommendation. To validate the effectiveness of proposed approach, we conduct experiments on real world datasets and make comparisons with some existing traditional techniques. The preliminary experimental results demonstrate the usability of the proposed approach.