31 resultados para Semi-supervised clustering
em University of Queensland eSpace - Australia
Resumo:
In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Resumo:
In natural estuaries, contaminant transport is driven by the turbulent momentum mixing. The predictions of scalar dispersion can rarely be predicted accurately because of a lack of fundamental understanding of the turbulence structure in estuaries. Herein detailed turbulence field measurements were conducted at high frequency and continuously for up to 50 hours per investigation in a small subtropical estuary with semi-diurnal tides. Acoustic Doppler velocimetry was deemed the most appropriate measurement technique for such small estuarine systems with shallow water depths (less than 0.5 m at low tides), and a thorough post-processing technique was applied. The estuarine flow is always a fluctuating process. The bulk flow parameters fluctuated with periods comparable to tidal cycles and other large-scale processes. But turbulence properties depended upon the instantaneous local flow properties. They were little affected by the flow history, but their structure and temporal variability were influenced by a variety of mechanisms. This resulted in behaviour which deviated from that for equilibrium turbulent boundary layer induced by velocity shear only. A striking feature of the data sets is the large fluctuations in all turbulence characteristics during the tidal cycle. This feature was rarely documented, but an important difference between the data sets used in this study from earlier reported measurements is that the present data were collected continuously at high frequency during relatively long periods. The findings bring new lights in the fluctuating nature of momentum exchange coefficients and integral time and length scales. These turbulent properties should not be assumed constant.
Resumo:
Data mining is the process to identify valid, implicit, previously unknown, potentially useful and understandable information from large databases. It is an important step in the process of knowledge discovery in databases, (Olaru & Wehenkel, 1999). In a data mining process, input data can be structured, seme-structured, or unstructured. Data can be in text, categorical or numerical values. One of the important characteristics of data mining is its ability to deal data with large volume, distributed, time variant, noisy, and high dimensionality. A large number of data mining algorithms have been developed for different applications. For example, association rules mining can be useful for market basket problems, clustering algorithms can be used to discover trends in unsupervised learning problems, classification algorithms can be applied in decision-making problems, and sequential and time series mining algorithms can be used in predicting events, fault detection, and other supervised learning problems (Vapnik, 1999). Classification is among the most important tasks in the data mining, particularly for data mining applications into engineering fields. Together with regression, classification is mainly for predictive modelling. So far, there have been a number of classification algorithms in practice. According to (Sebastiani, 2002), the main classification algorithms can be categorized as: decision tree and rule based approach such as C4.5 (Quinlan, 1996); probability methods such as Bayesian classifier (Lewis, 1998); on-line methods such as Winnow (Littlestone, 1988) and CVFDT (Hulten 2001), neural networks methods (Rumelhart, Hinton & Wiliams, 1986); example-based methods such as k-nearest neighbors (Duda & Hart, 1973), and SVM (Cortes & Vapnik, 1995). Other important techniques for classification tasks include Associative Classification (Liu et al, 1998) and Ensemble Classification (Tumer, 1996).
Resumo:
Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.
Resumo:
The high-affinity receptors for human granulocyte-macrophage colony-stimulating factor (GM-CSF), interleukin-1 (IL-3), and IL-5 are heterodimeric complexes consisting of cytokine-specific alpha subunits and a common signal-transducing beta subunit (h beta c). We have previously demonstrated the oncogenic potential of this group of receptors by identifying constitutively activating point mutations in the extracellular and transmembrane domains of h beta c. We report here a comprehensive screen of the entire h beta c molecule that has led to the identification of additional constitutive point mutations by virtue of their ability to confer factor independence on murine FDC-P1 cells. These mutations were clustered exclusively in a central region of h beta c that encompasses the extracellular membrane-proximal domain, transmembrane domain, and membrane-proximal region of the cytoplasmic domain. Interestingly, most h beta c mutants exhibited cell type-specific constitutive activity, with only two transmembrane domain mutants able to confer factor independence on both murine FDC-P1 and BAF-B03 cells. Examination of the biochemical properties of these mutants in FDC-P1 cells indicated that MAP kinase (ERK1/2), STAT, and JAK2 signaling molecules were constitutively activated. In contrast, only some of the mutant beta subunits were constitutively tyrosine phosphorylated. Taken together; these results highlight key regions involved in h beta c activation, dissociate h beta c tyrosine phosphorylation from MAP kinase and STAT activation, and suggest the involvement of distinct mechanisms by which proliferative signals can be generated by h beta c. (C) 1998 by The American Society of Hematology.
Resumo:
An analytical solution is derived for tidal fluctuations in a coupled coastal aquifer system consisting of a semi-confined aquifer, a thin semi-permeable layer and a phreatic aquifer. Based on the solution, we study the interactions (via leakage) between the confined and unconfined aquifers in response to tides. The results show that, under certain conditions, leakage from the confined aquifer can affect considerably the tidal water table fluctuation in the phreatic aquifer and vice versa. Ignoring these effects could lead to errors in estimating aquifer properties based on tidal signals. (C) 2002 Elsevier Science Ltd. All rights reserved.
Resumo:
[1] We attempt to generate new solutions for the moisture content form of the one-dimensional Richards' [1931] equation using the Lisle [1992] equivalence mapping. This mapping is used as no more general set of transformations exists for mapping the one-dimensional Richards' equation into itself. Starting from a given solution, the mapping has the potential to generate an infinite number of new solutions for a series of nonlinear diffusivity and hydraulic conductivity functions. We first seek new analytical solutions satisfying Richards' equation subject to a constant flux surface boundary condition for a semi-infinite dry soil, starting with the Burgers model. The first iteration produces an existing solution, while subsequent iterations are shown to endlessly reproduce this same solution. Next, we briefly consider the problem of redistribution in a finite-length soil. In this case, Lisle's equivalence mapping is generalized to account for arbitrary initial conditions. As was the case for infiltration, however, it is found that new analytical solutions are not generated using the equivalence mapping, although existing solutions are recovered.
Resumo:
Geospatial clustering must be designed in such a way that it takes into account the special features of geoinformation and the peculiar nature of geographical environments in order to successfully derive geospatially interesting global concentrations and localized excesses. This paper examines families of geospaital clustering recently proposed in the data mining community and identifies several features and issues especially important to geospatial clustering in data-rich environments.
Resumo:
A semi-empirical linear equation has been developed to optimise the amount of maltodextrin additive (DE 6) required to successfully spray dry a sugar-rich product on the basis of its composition. Based on spray drying experiments, drying index values for individual sugars (sucrose, glucose, frutose) and citric acid were determined, and us;ng these index values an equation for model mixtures of these components was established. This equation has been tested with two sugar-rich natural products, pineapple juice and honey. The relationship was found to be valid for these products.