42 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
Nuclear magnetic resonance spectroscopy was used to investigate the conformations of the platypus venom C-type natriuretic peptide A (OvCNPa) in aqueous solutions and in solutions containing sodium dodecyl sulfate (SDS) micelles. The chemically synthesized OvCNPa showed a substantial decrease in flexibility in aqueous solution at 10 degreesC, allowing the observation of medium- and long-range nuclear Overhauser enhancement (NOE) connectivities. Three-dimensional structures calculated using these data showed flexible and reasonably well-defined regions, the locations of which were similar in the two solvents. In aqueous solution, the linear part that spans residues 3-14 was basically an extended conformation while the cyclic portion, defined by residues 23-39, contained a series of beta-turns. The overall shape of the cyclic portion was similar to that observed for an atrial natriuretic peptide (ANP) variant in aqueous solution. OvCNPa adopted a different conformation in SDS micelles wherein the N-terminal region, defined by residues 2-10, was more compact, characterised by turns and a helix, while the cyclic region had turns and an overall shape that was fundamentally different from those structures observed in aqueous solution. The hydrophobic cluster, situated at the centre of the ring of the structure in aqueous solution, was absent in the structure in the presence of SDS micelles. Thus, OvCNPa interacts with SDS micelles and can possibly form ion-channels in cell membranes. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
Using data from the H I Parkes All Sky Survey (HIPASS), we have searched for neutral hydrogen in galaxies in a region similar to25x25 deg(2) centred on NGC 1399, the nominal centre of the Fornax cluster. Within a velocity search range of 300-3700 km s(-1) and to a 3sigma lower flux limit of similar to40 mJy, 110 galaxies with H I emission were detected, one of which is previously uncatalogued. None of the detections has early-type morphology. Previously unknown velocities for 14 galaxies have been determined, with a further four velocity measurements being significantly dissimilar to published values. Identification of an optical counterpart is relatively unambiguous for more than similar to90 per cent of our H I galaxies. The galaxies appear to be embedded in a sheet at the cluster velocity which extends for more than 30degrees across the search area. At the nominal cluster distance of similar to20 Mpc, this corresponds to an elongated structure more than 10 Mpc in extent. A velocity gradient across the structure is detected, with radial velocities increasing by similar to500 km s(-1) from south-east to north-west. The clustering of galaxies evident in optical surveys is only weakly suggested in the spatial distribution of our H I detections. Of 62 H I detections within a 10degrees projected radius of the cluster centre, only two are within the core region (projected radius
Resumo:
To identify novel cytokine-related genes, we searched the set of 60,770 annotated RIKEN mouse cDNA clones (FANTOM2 clones), using keywords such as cytokine itself or cytokine names (such as interferon, interleukin, epidermal growth factor, fibroblast growth factor, and transforming growth factor). This search produced 108 known cytokines and cytokine-related products such as cytokine receptors, cytokine-associated genes, or their products (enhancers, accessory proteins, cytokine-induced genes). We found 15 clusters of FANTOM2 clones that are candidates for novel cytokine-related genes. These encoded products with strong sequence similarity to guanylate-binding protein (GBP-5), interleukin-1 receptor-associated kinase 2 (IRAK-2), interleukin 20 receptor alpha isoform 3, a member of the interferon-inducible proteins of the Ifi 200 cluster, four members of the membrane-associated family 1-8 of interferon-inducible proteins, one p27-like protein, and a hypothetical protein containing a Toll/Interleukin receptor domain. All four clones representing novel candidates of gene products from the family contain a novel highly conserved cross-species domain. Clones similar to growth factor-related products included transforming growth factor beta-inducible early growth response protein 2 (TIEG-2), TGFbeta-induced factor 2, integrin beta-like 1, latent TGF-binding protein 4S, and FGF receptor 4B. We performed a detailed sequence analysis of the candidate novel genes to elucidate their likely functional properties.
Resumo:
Geospatial clustering must be designed in such a way that it takes into account the special features of geoinformation and the peculiar nature of geographical environments in order to successfully derive geospatially interesting global concentrations and localized excesses. This paper examines families of geospaital clustering recently proposed in the data mining community and identifies several features and issues especially important to geospatial clustering in data-rich environments.
Resumo:
When the data consist of certain attributes measured on the same set of items in different situations, they would be described as a three-mode three-way array. A mixture likelihood approach can be implemented to cluster the items (i.e., one of the modes) on the basis of both of the other modes simultaneously (i.e,, the attributes measured in different situations). In this paper, it is shown that this approach can be extended to handle three-mode three-way arrays where some of the data values are missing at random in the sense of Little and Rubin (1987). The methodology is illustrated by clustering the genotypes in a three-way soybean data set where various attributes were measured on genotypes grown in several environments.
Resumo:
Understanding the ecological role of benthic microalgae, a highly productive component of coral reef ecosystems, requires information on their spatial distribution. The spatial extent of benthic microalgae on Heron Reef (southern Great Barrier Reef, Australia) was mapped using data from the Landsat 5 Thematic Mapper sensor. integrated with field measurements of sediment chlorophyll concentration and reflectance. Field-measured sediment chlorophyll concentrations. 2 ranging from 23-1.153 mg chl a m(2), were classified into low, medium, and high concentration classes (1-170, 171-290, and > 291 mg chl a m(-2)) using a K-means clustering algorithm. The mapping process assumed that areas in the Thematic Mapper image exhibiting similar reflectance levels in red and blue bands would correspond to areas of similar chlorophyll a levels. Regions of homogenous reflectance values corresponding to low, medium, and high chlorophyll levels were identified over the reef sediment zone by applying a standard image classification algorithm to the Thematic Mapper image. The resulting distribution map revealed large-scale ( > 1 km 2) patterns in chlorophyll a levels throughout the sediment zone of Heron Reef. Reef-wide estimates of chlorophyll a distribution indicate that benthic Microalgae may constitute up to 20% of the total benthic chlorophyll a at Heron Reef. and thus contribute significantly to total primary productivity on the reef.
Resumo:
Dimethyl sulfide dehydrogenase from the purple phototrophic bacterium Rhodovulum sulfidophilum catalyzes the oxidation of dimethyl sulfide to dimethyl sulfoxide. Recent DNA sequence analysis of the ddh operon, encoding dimethyl sulfide dehydrogenase (ddhABC), and biochemical analysis (1) have revealed that it is a member of the DMSO reductase family of molybdenum enzymes and is closely related to respiratory nitrate reductase (NarGHI). Variable temperature X-band EPR spectra (120122 K) of purified heterotrimeric dimethyl sulfide dehydrogenase showed resonances arising from multiple redox centers, Mo(V), [3Fe-4S](+), [4Fe-4S](+), and a b-type heme. A pH-dependent EPR study of the Mo(V) center in (H2O)-H-1 and (H2O)-H-2 revealed the presence of three Mo(V) species in equilibrium, Mo(V)-OH2, Mo(v)-anion, and Mo(V)-OH. Above pH 8.2 the dominant species was Mo(V)-OH. The maximum specific activity occurred at pH 9.27. Comparison of the rhombicity and anisotropy parameters for the Mo(V) species in DMS dehydrogenase with other molybdenum enzymes of the DMSO reductase family showed that it was most similar to the low-pH nitrite spectrum of Escherichia coli nitrate reductase (NarGHI), consistent with previous sequence analysis of DdhA and NarG. A sequence comparison of DdhB and NarH has predicted the presence of four [Fe-S] clusters in DdhB. A [3Fe-4S](+) cluster was identified in dimethyl sulfide dehydrogenase whose properties resembled those of center 2 of NarH. A [4Fe-4S](+) cluster was also identified with unusual spin Hamiltonian parameters, suggesting that one of the iron atoms may have a fifth non-sulfur ligand. The g matrix for this cluster is very similar to that found for the minor conformation of center 1 in NarH [Guigliarelli, B., Asso, M., More, C., Augher, V., Blasco, F., Pommier, J., Giodano, G., and Bertrand, P. (1992) Eur. J. Biochem. 307,63-68]. Analysis of a ddhC mutant showed that this gene encodes the b-type cytochrome in dimethyl sulfide dehydrogenase. Magnetic circular dichroism studies revealed that the axial ligands to the iron in this cytochrome are a histidine and methionine, consistent with predictions from protein sequence analysis. Redox potentiometry showed that the b-type cytochrome has a high midpoint redox potential (E-o = +315 mV, pH 8).
Resumo:
Lucerne (Medicago sativa L.) is autotetraploid, and predominantly allogamous. This complex breeding structure maximises the genetic diversity within lucerne populations making it difficult to genetically discriminate between populations. The objective of this study was to evaluate the level of random genetic diversity within and between a selection of Australian-grown lucerne cultivars, with tetraploid M. falcata included as a possible divergent control source. This diversity was evaluated using random amplified polymorphic DNA (RAPDs). Nineteen plants from each of 10 cultivars were analysed. Using 11 RAPD primers, 96 polymorphic bands were scored as present or absent across the 190 individuals. Genetic similarity estimates (GSEs) of all pair-wise comparisons were calculated from these data. Mean GSEs within cultivars ranged from 0.43 to 0.51. Cultivar Venus (0.43) had the highest level of intra-population genetic diversity and cultivar Sequel HR (0.51) had the lowest level of intra-population genetic diversity. Mean GSEs between cultivars ranged from 0.31 to 0.49, which overlapped with values obtained for within-cultivar GSE, thus not allowing separation of the cultivars. The high level of intra- and inter-population diversity that was detected is most likely due to the breeding of synthetic cultivars using parents derived from a number of diverse sources. Cultivar-specific polymorphisms were only identified in the M. falcata source, which like M. sativa, is outcrossing and autotetraploid. From a cluster analysis and a principal components analysis, it was clear that M. falcata was distinct from the other cultivars. The results indicate that the M. falcata accession tested has not been widely used in Australian lucerne breeding programs, and offers a means of introducing new genetic diversity into the lucerne gene pool. This provides a means of maximising heterozygosity, which is essential to maximising productivity in lucerne.
Resumo:
We focus on mixtures of factor analyzers from the perspective of a method for model-based density estimation from high-dimensional data, and hence for the clustering of such data. This approach enables a normal mixture model to be fitted to a sample of n data points of dimension p, where p is large relative to n. The number of free parameters is controlled through the dimension of the latent factor space. By working in this reduced space, it allows a model for each component-covariance matrix with complexity lying between that of the isotropic and full covariance structure models. We shall illustrate the use of mixtures of factor analyzers in a practical example that considers the clustering of cell lines on the basis of gene expressions from microarray experiments. (C) 2002 Elsevier Science B.V. All rights reserved.