24 resultados para Landmark-based spectral clustering

em University of Queensland eSpace - Australia


Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The number of mammalian transcripts identified by full-length cDNA projects and genome sequencing projects is increasing remarkably. Clustering them into a strictly nonredundant and comprehensive set provides a platform for functional analysis of the transcriptome and proteome, but the quality of the clustering and predictive usefulness have previously required manual curation to identify truncated transcripts and inappropriate clustering of closely related sequences. A Representative Transcript and Protein Sets (RTPS) pipeline was previously designed to identify the nonredundant and comprehensive set of mouse transcripts based on clustering of a large mouse full-length cDNA set (FANTOM2). Here we propose an alternative method that is more robust, requires less manual curation, and is applicable to other organisms in addition to mouse. RTPSs of human, mouse, and rat have been produced by this method and used for validation. Their comprehensiveness and quality are discussed by comparison with other clustering approaches. The RTPSs are available at ftp://fantom2.gsc.riken.go.jp/RTPS/. (C). 2004 Elsevier Inc. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A progressive spatial query retrieves spatial data based on previous queries (e.g., to fetch data in a more restricted area with higher resolution). A direct query, on the other side, is defined as an isolated window query. A multi-resolution spatial database system should support both progressive queries and traditional direct queries. It is conceptually challenging to support both types of query at the same time, as direct queries favour location-based data clustering, whereas progressive queries require fragmented data clustered by resolutions. Two new scaleless data structures are proposed in this paper. Experimental results using both synthetic and real world datasets demonstrate that the query processing time based on the new multiresolution approaches is comparable and often better than multi-representation data structures for both types of queries.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We develop a test of evolutionary change that incorporates a null hypothesis of homogeneity, which encompasses time invariance in the variance and autocovariance structure of residuals from estimated econometric relationships. The test framework is based on examining whether shifts in spectral decomposition between two frames of data are significant. Rejection of the null hypothesis will point not only to weak nonstationarity but to shifts in the structure of the second-order moments of the limiting distribution of the random process. This would indicate that the second-order properties of any underlying attractor set has changed in a statistically significant way, pointing to the presence of evolutionary change. A demonstration of the test's applicability to a real-world macroeconomic problem is accomplished by applying the test to the Australian Building Society Deposits (ABSD) model.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We consider the problem of assessing the number of clusters in a limited number of tissue samples containing gene expressions for possibly several thousands of genes. It is proposed to use a normal mixture model-based approach to the clustering of the tissue samples. One advantage of this approach is that the question on the number of clusters in the data can be formulated in terms of a test on the smallest number of components in the mixture model compatible with the data. This test can be carried out on the basis of the likelihood ratio test statistic, using resampling to assess its null distribution. The effectiveness of this approach is demonstrated on simulated data and on some microarray datasets, as considered previously in the bioinformatics literature. (C) 2004 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A study of the gamma-radiolysis of the commercial polymers U-polymer, UP (Unitake) and polycarbonate, PC, (Aldrich) has been undertaken using ESR spectroscopy. The G-value of radical formation at 77 K has been found to be 0.31 +/- 0.01 for UP and 0.5 +/- 0.02 for PC. By using thermal annealing and spectral subtraction, the paramagnetic species formed on irradiation has been assigned. The effect of radiation on the chemical structure of UP and PC has been investigated at ambient temperature and at 423 K. The NMR results show that a new phenol type chain end is formed in the polymers on exposure to gamma-radiation. The G-value of formation of the new phenol ends was estimated to be 0.7 for PC (423 K) and 0.4 for UP (300 K). (C) 1998 John Wiley & Sons, Ltd.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new method is presented to determine an accurate eigendecomposition of difficult low temperature unimolecular master equation problems. Based on a generalisation of the Nesbet method, the new method is capable of achieving complete spectral resolution of the master equation matrix with relative accuracy in the eigenvectors. The method is applied to a test case of the decomposition of ethane at 300 K from a microcanonical initial population with energy transfer modelled by both Ergodic Collision Theory and the exponential-down model. The fact that quadruple precision (16-byte) arithmetic is required irrespective of the eigensolution method used is demonstrated. (C) 2001 Elsevier Science B.V. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The majority of the world's population now resides in urban environments and information on the internal composition and dynamics of these environments is essential to enable preservation of certain standards of living. Remotely sensed data, especially the global coverage of moderate spatial resolution satellites such as Landsat, Indian Resource Satellite and Systeme Pour I'Observation de la Terre (SPOT), offer a highly useful data source for mapping the composition of these cities and examining their changes over time. The utility and range of applications for remotely sensed data in urban environments could be improved with a more appropriate conceptual model relating urban environments to the sampling resolutions of imaging sensors and processing routines. Hence, the aim of this work was to take the Vegetation-Impervious surface-Soil (VIS) model of urban composition and match it with the most appropriate image processing methodology to deliver information on VIS composition for urban environments. Several approaches were evaluated for mapping the urban composition of Brisbane city (south-cast Queensland, Australia) using Landsat 5 Thematic Mapper data and 1:5000 aerial photographs. The methods evaluated were: image classification; interpretation of aerial photographs; and constrained linear mixture analysis. Over 900 reference sample points on four transects were extracted from the aerial photographs and used as a basis to check output of the classification and mixture analysis. Distinctive zonations of VIS related to urban composition were found in the per-pixel classification and aggregated air-photo interpretation; however, significant spectral confusion also resulted between classes. In contrast, the VIS fraction images produced from the mixture analysis enabled distinctive densities of commercial, industrial and residential zones within the city to be clearly defined, based on their relative amount of vegetation cover. The soil fraction image served as an index for areas being (re)developed. The logical match of a low (L)-resolution, spectral mixture analysis approach with the moderate spatial resolution image data, ensured the processing model matched the spectrally heterogeneous nature of the urban environments at the scale of Landsat Thematic Mapper data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Since the landmark contributions of Homer Smith and co-workers in the 1930s there has been a considerable advance in our knowledge regarding the osmoregulatory strategy of elasmobranch fish. Smith recognised that urea was retained in the body fluids as part of the 'osmoregulatory ballast' of elasmobranch fish so that body fluid osmolality is raised to a level that is iso- or slightly hyper-osmotic to that of the surrounding medium. From studies at that time he also postulated that many marine dwelling elasmobranchs were not capable of adaptation to dilute environments. However, more recent investigations have demonstrated that, at least in some species, this may not be the case. Gradual acclimation of marine dwelling elasmobranchs to varying environmental salinities under laboratory conditions has demonstrated that these fish do have the capacity to acclimate to changes in salinity through independent regulation of Na+, Cl- and urea levels. This suggests that many of the presumed stenohaline marine elasmobranchs could in fact be described as partially euryhaline. The contributions of Thomas Thorson in the 1970s demonstrated the osmoregulatory strategy of a fully euryhaline elasmobranch, the bull shark, Carcharhinus leucas, and more recent investigations have examined the mechanisms behind this strategy in the euryhaline elasmobranch, Dasyatis sabina. Both partially euryhaline and fully euryhaline species utilise the same physiological processes to control urea, Na+ and Cl- levels within the body fluids. The role of the gills, kidney, liver, rectal gland and drinking process is discussed in relation to the endocrine control of urea, Na+ and Cl- levels as elasmobranchs acclimate to different environmental salinities. (C) 2003 Elsevier Inc. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Microspectrophotometric examination of the retina of a procellariiform marine bird, the wedge-tailed shearwater Puffinus pacificus, revealed the presence of five different types of vitamin A(1)-based visual pigment in seven different types of photoreceptor. A single class of rod contained a medium-wavelength sensitive visual pigment with a wavelength of maximum absorbance (lambda(max)) at 502 nm. Four different types of single cone contained visual pigments maximally sensitive in either the violet (VS, lambda(max) 406 nm), short (SWS, lambda(max) 450 nm), medium (MWS, lambda(max) 503 nm) or long (LWS, lambda(max) 566 nm) spectral ranges. In the peripheral retina, the SWS, MWS and LWS single cones contained pigmented oil droplets in their inner segments with cut-off wavelengths (lambda(cut)) at 445 (C-type), 506 (Y-type) and 562 nm (R-type), respectively. The VS visual pigment was paired with a transparent (T-type) oil droplet that displayed no significant absorption above at least 370 run. Both the principal and accessory members of the double cone pair contained the same 566 nm lambda(max) visual pigment as the LWS single cones but only the principal member contained an oil droplet, which had a lambda(cut) at 413 nm. The retina had a horizontal band or 'visual streak' of increased photoreceptor density running across the retina approximately 1.5 mm dorsal to the top of the pecten. Cones in the centre of the horizontal streak were smaller and had oil droplets that were either transparent/colourless or much less pigmented than at the periphery. It is proposed that the reduction in cone oil droplet pigmentation in retinal areas associated with high visual acuity is an adaptation to compensate for the reduced photon capture ability of the narrower photoreceptors found there. Measurements of the spectral transmittance of the ocular media reveal that wavelengths down to at least 300 nm would be transmitted to the retina.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In broader catchment scale investigations, there is a need to understand and ultimately exploit the spatial variation of agricultural crops for an improved economic return. In many instances, this spatial variation is temporally unstable and may be different for various crop attributes and crop species. In the Australian sugar industry, the opportunity arose to evaluate the performance of 231 farms in the Tully Mill area in far north Queensland using production information on cane yield (t/ha) and CCS ( a fresh weight measure of sucrose content in the cane) accumulated over a 12-year period. Such an arrangement of data can be expressed as a 3-way array where a farm x attribute x year matrix can be evaluated and interactions considered. Two multivariate techniques, the 3-way mixture method of clustering and the 3-mode principal component analysis, were employed to identify meaningful relationships between farms that performed similarly for both cane yield and CCS. In this context, farm has a spatial component and the aim of this analysis was to determine if systematic patterns in farm performance expressed by cane yield and CCS persisted over time. There was no spatial relationship between cane yield and CCS. However, the analysis revealed that the relationship between farms was remarkably stable from one year to the next for both attributes and there was some spatial aggregation of farm performance in parts of the mill area. This finding is important, since temporally consistent spatial variation may be exploited to improve regional production. Alternatively, the putative causes of the spatial variation may be explored to enhance the understanding of sugarcane production in the wet tropics of Australia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.