938 resultados para Elaborazione d’immagini, Microscopia, Istopatologia, Classificazione, K-means


Relevância:

100.00% 100.00%

Publicador:

Resumo:

For many clustering algorithms, such as K-Means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further  empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For many clustering algorithms, such as k-means, EM, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images or biological data. The fundamental question this paper addresses is: ldquoHow can we effectively estimate the natural number of clusters in a given text collection?rdquo. We propose to use spectral analysis, which analyzes the eigenvalues (not eigenvectors) of the collection, as the solution to the above. We first present the relationship between a text collection and its underlying spectra. We then show how the answer to this question enhances the clustering process. Finally, we conclude with empirical results and related work.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study examines attitudes of U.S.-based Academy of Marketing Science members toward teaching, research, participation in administration (including service), and academic promotional issues. Individuals were grouped using Ward’s and K-means clustering procedures, which revealed four groups—established academics, research-focused academics, less satisfied midcareer academics, and satisfied teachers. Clusters were further profiled according to the amount of time spent on teaching, research, and administration; research output; and individual demographic and institutional characteristics. Overall, clusters were generally dissatisfied with a range of work-related issues, with workload stress appearing as an issue that needs to be addressed within marketing academia.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Australasian tertiary education sector has undergone significant organizational and cultural changes, which have increased pressures on academics to undertake a range of additional activities while at the same time improving research performance. These pressures impact on individuals in different ways, although there may be some groups or clusters of individuals within institutions with common characteristics. Managers may need to develop different sets of management strategies and policies to assist each group of academics to deal better with these pressures and improve their individual performance. The paper examines Australasian marketing academics’ perceptions of their work environments and whether these perceptions result in differing clusters of individuals who might also vary based on their research performance, time allocated to different academic roles, and their professional and demographic characteristics. Sixty-eight members of the Australian and New Zealand Academy of Marketing responded to a survey using a modified version of an instrument developed by Diamantopoulos et al. (1992). K-means clustering procedure identified four groups of academics – “Traditional Academics,” “Satisfied Professors,” “Newer Academics,” and “Satisfied Researchers.” While only a few significant differences among clusters were identified in relation to time allocated to academic activities and research performance, it appears that clusters differ on several professional and demographic characteristics.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Network traffic classification is an essential component for network management and security systems. To address the limitations of traditional port-based and payload-based methods, recent studies have been focusing on alternative approaches. One promising direction is applying machine learning techniques to classify traffic flows based on packet and flow level statistics. In particular, previous papers have illustrated that clustering can achieve high accuracy and discover unknown application classes. In this work, we present a novel semi-supervised learning method using constrained clustering algorithms. The motivation is that in network domain a lot of background information is available in addition to the data instances themselves. For example, we might know that flow ƒ1 and ƒ2 are using the same application protocol because they are visiting the same host address at the same port simultaneously. In this case, ƒ1 and ƒ2 shall be grouped into the same cluster ideally. Therefore, we describe these correlations in the form of pair-wise must-link constraints and incorporate them in the process of clustering. We have applied three constrained variants of the K-Means algorithm, which perform hard or soft constraint satisfaction and metric learning from constraints. A number of real-world traffic traces have been used to show the availability of constraints and to test the proposed approach. The experimental results indicate that by incorporating constraints in the course of clustering, the overall accuracy and cluster purity can be significantly improved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Due to the limitations of the traditional port-based and payload-based traffic classification approaches, the past decade has seen extensive work on utilizing machine learning techniques to classify network traffic based on packet and flow level features. In particular, previous studies have shown that the unsupervised clustering approach is both accurate and capable of discovering previously unknown application classes. In this paper, we explore the utility of side information in the process of traffic clustering. Specifically, we focus on the flow correlation information that can be efficiently extracted from packet headers and expressed as instance-level constraints, which indicate that particular sets of flows are using the same application and thus should be put into the same cluster. To incorporate the constraints, we propose a modified constrained K-Means algorithm. A variety of real-world traffic traces are used to show that the constraints are widely available. The experimental results indicate that the constrained approach not only improves the quality of the resulted clusters, but also speeds up the convergence of the clustering process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background

Imatinib mesylate is currently the drug of choice to treat chronic myeloid leukemia. However, patient resistance and cytotoxicity make secondary lines of treatment, such as omacetaxine mepesuccinate, a necessity. Given that drug cytotoxicity represents a major problem during treatment, it is essential to understand the biological pathways affected to better predict poor drug response and prioritize a treatment regime.
Methods

We conducted cell viability and gene expression assays to determine heritability and gene expression changes associated with imatinib and omacetaxine treatment of 55 non-cancerous lymphoblastoid cell lines, derived from 17 pedigrees. In total, 48,803 transcripts derived from Illumina Human WG-6 BeadChips were analyzed for each sample using SOLAR, whilst correcting for kinship structure.
Results

Cytotoxicity within cell lines was highly heritable following imatinib treatment (h2 = 0.60-0.73), but not omacetaxine treatment. Cell lines treated with an IC20 dose of imatinib or omacetaxine showed differential gene expression for 956 (1.96%) and 3,892 transcripts (7.97%), respectively; 395 of these (0.8%) were significantly influenced by both imatinib and omacetaxine treatment. k-means clustering and DAVID functional annotation showed expression changes in genes related to kinase binding and vacuole-related functions following imatinib treatment, whilst expression changes in genes related to cell division and apoptosis were evident following treatment with omacetaxine. The enrichment scores for these ontologies were very high (mostly >10).
Conclusions

Induction of gene expression changes related to different pathways following imatinib and omacetaxine treatment suggests that the cytotoxicity of such drugs may be differentially tolerated by individuals based on their genetic background.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thickness of the retinal nerve fiber layer (RFNL) has become a diagnose measure for glaucoma assessment. To measure this thickness, accurate segmentation of the RFNL in optical coherence tomography (OCT) images is essential. Identification of a suitable segmentation algorithm will facilitate the enhancement of the RNFL thickness measurement accuracy. This paper investigates the performance of six algorithms in the segmentation of RNFL in OCT images. The algorithms are: normalised cuts, region growing, k-means clustering, active contour, level sets segmentation: Piecewise Gaussian Method (PGM) and Kernelized Method (KM). The performance of the six algorithms are determined through a set of experiments on OCT retinal images. An experimental procedure is used to measure the performance of the tested algorithms. The measured segmentation precision-recall results of the six algorithms are compared and discussed.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stock price forecast has long been received special attention of investors and financial institutions. As stock prices are changeable over time and increasingly uncertain in modern financial markets, their forecasting becomes more important than ever before. A hybrid approach consisting of two components, a neural network and a fuzzy logic system, is proposed in this paper for stock price prediction. The first component of the hybrid, i.e. a feedforward neural network (FFNN), is used to select inputs that are highly relevant to the dependent variables. An interval type-2 fuzzy logic system (IT2 FLS) is employed as the second component of the hybrid forecasting method. The IT2 FLS’s parameters are initialized through deployment of the k-means clustering method and they are adjusted by the genetic algorithm. Experimental results demonstrate the efficiency of the FFNN input selection approach as it reduces the complexity and increase the accuracy of the forecasting models. In addition, IT2 FLS outperforms the widely used type-1 FLS and FFNN models in stock price forecasting. The combination of the FFNN and the IT2 FLS produces dominant forecasting accuracy compared to employing only the IT2 FLSs without the FFNN input selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparison of applying different clustering algorithms on a point cloud constructed from the depth maps captured by a RGBD camera such as Microsoft Kinect. The depth sensor is capable of returning images, where each pixel represents the distance to its corresponding point not the RGB data. This is considered as the real novelty of the RGBD camera in computer vision compared to the common video-based and stereo-based products. Depth sensors captures depth data without using markers, 2D to 3D-transition or determining feature points. The captured depth map then cluster the 3D depth points into different clusters to determine the different limbs of the human-body. The 3D points clustering is achieved by different clustering techniques. Our Experiments show good performance and results in using clustering to determine different human-body limbs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The reduction of size of ensemble classifiers is important for various security applications. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization based methods. The present paper introduces and investigates a new pruning technique. It is called a Three-Level Pruning Technique, TLPT, because it simultaneously combines all three approaches in three levels of the process. This paper investigates the TLPT method combining the state-of-the-art ranking of the Ensemble Pruning via Individual Contribution ordering, EPIC, the clustering of the K-Means Pruning, KMP, and the optimisation method of Directed Hill Climbing Ensemble Pruning, DHCEP, for a phishing dataset. Our new experiments presented in this paper show that the TLPT is competitive in comparison to EPIC, KMP and DHCEP, and can achieve better outcomes. These experimental results demonstrate the effectiveness of the TLPT technique in this example of information security application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.