38 resultados para Elaborazione d’immagini, Microscopia, Istopatologia, Classificazione, K-means


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Stock price forecast has long been received special attention of investors and financial institutions. As stock prices are changeable over time and increasingly uncertain in modern financial markets, their forecasting becomes more important than ever before. A hybrid approach consisting of two components, a neural network and a fuzzy logic system, is proposed in this paper for stock price prediction. The first component of the hybrid, i.e. a feedforward neural network (FFNN), is used to select inputs that are highly relevant to the dependent variables. An interval type-2 fuzzy logic system (IT2 FLS) is employed as the second component of the hybrid forecasting method. The IT2 FLS’s parameters are initialized through deployment of the k-means clustering method and they are adjusted by the genetic algorithm. Experimental results demonstrate the efficiency of the FFNN input selection approach as it reduces the complexity and increase the accuracy of the forecasting models. In addition, IT2 FLS outperforms the widely used type-1 FLS and FFNN models in stock price forecasting. The combination of the FFNN and the IT2 FLS produces dominant forecasting accuracy compared to employing only the IT2 FLSs without the FFNN input selection.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The recent years have seen extensive work on statistics-based network traffic classification using machine learning (ML) techniques. In the particular scenario of learning from unlabeled traffic data, some classic unsupervised clustering algorithms (e.g. K-Means and EM) have been applied but the reported results are unsatisfactory in terms of low accuracy. This paper presents a novel approach for the task, which performs clustering based on Random Forest (RF) proximities instead of Euclidean distances. The approach consists of two steps. In the first step, we derive a proximity measure for each pair of data points by performing a RF classification on the original data and a set of synthetic data. In the next step, we perform a K-Medoids clustering to partition the data points into K groups based on the proximity matrix. Evaluations have been conducted on real-world Internet traffic traces and the experimental results indicate that the proposed approach is more accurate than the previous methods.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow reflecting the knowledge discovered from the input data as learning progresses. The spread factor parameter (SF) in GSOM can be utilized to control the spread of the map, thus giving an analyst a flexibility to examine the clusters at different granularities. Although GSOM has been applied in various areas and has been proven effective in knowledge discovery tasks, no comprehensive study has been done on the effect of the spread factor parameter value to the cluster formation and separation. Therefore, the aim of this paper is to investigate the effect of the spread factor value towards cluster separation in the GSOM. We used simple k-means algorithm as a method to identify clusters in the GSOM. By using Davies–Bouldin index, clusters formed by different values of spread factor are obtained and the resulting clusters are analyzed. In this work, we show that clusters can be more separated when the spread factor value is increased. Hierarchical clusters can then be constructed by mapping the GSOM clusters at different spread factor values.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents a comparison of applying different clustering algorithms on a point cloud constructed from the depth maps captured by a RGBD camera such as Microsoft Kinect. The depth sensor is capable of returning images, where each pixel represents the distance to its corresponding point not the RGB data. This is considered as the real novelty of the RGBD camera in computer vision compared to the common video-based and stereo-based products. Depth sensors captures depth data without using markers, 2D to 3D-transition or determining feature points. The captured depth map then cluster the 3D depth points into different clusters to determine the different limbs of the human-body. The 3D points clustering is achieved by different clustering techniques. Our Experiments show good performance and results in using clustering to determine different human-body limbs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The reduction of size of ensemble classifiers is important for various security applications. The majority of known pruning algorithms belong to the following three categories: ranking based, clustering based, and optimization based methods. The present paper introduces and investigates a new pruning technique. It is called a Three-Level Pruning Technique, TLPT, because it simultaneously combines all three approaches in three levels of the process. This paper investigates the TLPT method combining the state-of-the-art ranking of the Ensemble Pruning via Individual Contribution ordering, EPIC, the clustering of the K-Means Pruning, KMP, and the optimisation method of Directed Hill Climbing Ensemble Pruning, DHCEP, for a phishing dataset. Our new experiments presented in this paper show that the TLPT is competitive in comparison to EPIC, KMP and DHCEP, and can achieve better outcomes. These experimental results demonstrate the effectiveness of the TLPT technique in this example of information security application.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistics-based Internet traffic classification using machine learning techniques has attracted extensive research interest lately, because of the increasing ineffectiveness of traditional port-based and payload-based approaches. In particular, unsupervised learning, that is, traffic clustering, is very important in real-life applications, where labeled training data are difficult to obtain and new patterns keep emerging. Although previous studies have applied some classic clustering algorithms such as K-Means and EM for the task, the quality of resultant traffic clusters was far from satisfactory. In order to improve the accuracy of traffic clustering, we propose a constrained clustering scheme that makes decisions with consideration of some background information in addition to the observed traffic statistics. Specifically, we make use of equivalence set constraints indicating that particular sets of flows are using the same application layer protocols, which can be efficiently inferred from packet headers according to the background knowledge of TCP/IP networking. We model the observed data and constraints using Gaussian mixture density and adapt an approximate algorithm for the maximum likelihood estimation of model parameters. Moreover, we study the effects of unsupervised feature discretization on traffic clustering by using a fundamental binning method. A number of real-world Internet traffic traces have been used in our evaluation, and the results show that the proposed approach not only improves the quality of traffic clusters in terms of overall accuracy and per-class metrics, but also speeds up the convergence.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Courvisanos J., Jain A. and Mardaneh K. Economic resilience of regions under crises: a study of the Australian economy, Regional Studies. Identifying patterns of economic resilience in regions by industry categories is the focus of this paper. Patterns emerge from adaptive capacity in four distinct functional groups of local government regions in Australia, in respect of their resilience from shocks on specific industries. A model of regional adaptive cycles around four sequential phases – reorganization, exploitation, conservation and release – is adopted as the framework for recognizing such patterns. A data-mining method utilizes a k-means algorithm to evaluate the impact of two major shocks – a 13-year drought and the Global Financial Crisis – on four functional groups of regions, using census data from 2001, 2006 and 2011.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

When no prior knowledge is available, clustering is a useful technique for categorizing data into meaningful groups or clusters. In this paper, a modified fuzzy min-max (MFMM) clustering neural network is proposed. Its efficacy for tackling power quality monitoring tasks is demonstrated. A literature review on various clustering techniques is first presented. To evaluate the proposed MFMM model, a performance comparison study using benchmark data sets pertaining to clustering problems is conducted. The results obtained are comparable with those reported in the literature. Then, a real-world case study on power quality monitoring tasks is performed. The results are compared with those from the fuzzy c-means and k-means clustering methods. The experimental outcome positively indicates the potential of MFMM in undertaking data clustering tasks and its applicability to the power systems domain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The users often have additional knowledge when Bayesian nonparametric models (BNP) are employed, e.g. for clustering there may be prior knowledge that some of the data instances should be in the same cluster (must-link constraint) or in different clusters (cannot-link constraint), and similarly for topic modeling some words should be grouped together or separately because of an underlying semantic. This can be achieved by imposing appropriate sampling probabilities based on such constraints. However, the traditional inference technique of BNP models via Gibbs sampling is time consuming and is not scalable for large data. Variational approximations are faster but many times they do not offer good solutions. Addressing this we present a small-variance asymptotic analysis of the MAP estimates of BNP models with constraints. We derive the objective function for Dirichlet process mixture model with constraints and devise a simple and efficient K-means type algorithm. We further extend the small-variance analysis to hierarchical BNP models with constraints and devise a similar simple objective function. Experiments on synthetic and real data sets demonstrate the efficiency and effectiveness of our algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Traffic subarea division is vital for traffic system management and traffic network analysis in intelligent transportation systems (ITSs). Since existing methods may not be suitable for big traffic data processing, this paper presents a MapReduce-based Parallel Three-Phase K -Means (Par3PKM) algorithm for solving traffic subarea division problem on a widely adopted Hadoop distributed computing platform. Specifically, we first modify the distance metric and initialization strategy of K -Means and then employ a MapReduce paradigm to redesign the optimized K -Means algorithm for parallel clustering of large-scale taxi trajectories. Moreover, we propose a boundary identifying method to connect the borders of clustering results for each cluster. Finally, we divide traffic subarea of Beijing based on real-world trajectory data sets generated by 12,000 taxis in a period of one month using the proposed approach. Experimental evaluation results indicate that when compared with K -Means, Par2PK-Means, and ParCLARA, Par3PKM achieves higher efficiency, more accuracy, and better scalability and can effectively divide traffic subarea with big taxi trajectory data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Anecdotally, the fast pace at which the USA men's basketball team played at the 2008 Olympics was the main reason for their dominance, although there is no way of quantifying what a fast pace is or how it contributed to point differentials. The aim of this study was to examine the game-related statistics that discriminate between fast- and slow-paced games, as well as to identify key performance factors relating to point differentials. We analysed game-related statistics for each quarter of the eight games played by the USA using a k-means cluster analysis to classify game pace using ball possessions per game quarter. We then tested for differences in game statistics between slow- and fast-paced game quarters using analysis of variance and discriminant analysis. How differences in game-related statistics affected point differentials was examined using linear regression. The largest structure coefficient between game paces for the USA was for recovered balls (0.33, < 0.001). The biggest contributors to the point differences in games were recovered balls (16.9, P < 0.001) and field goals (22.2, P < 0.001). We conclude that when the USA play a fast-paced game, they are able to recover more balls from opponents that they then turn into effective field-goal shooting.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For the operator of a power system, having an accurate forecast of the day-ahead load is imperative in order to guaranty the reliability of supply and also to minimize generation costs and pollution. Furthermore, in a restructured power system, other parties, like utility companies, large consumers and in some cases even ordinary consumers, can benefit from a higher quality demand forecast. In this paper, the application of smart meter data for producing more accurate load forecasts has been discussed. First an ordinary neural network model is used to generate a forecast for the total load of a number of consumers. The results of this step are used as a benchmark for comparison with the forecast results of a more sophisticated method. In this new method, using wavelet decomposition and a clustering technique called interactive k-means, the consumers are divided into a number of clusters. Then for each cluster an individual neural network is trained. Consequently, by adding the outputs of all of the neural networks, a forecast for the total load is generated. A comparison between the forecast using a single model and the forecast generated by the proposed method, proves that smart meter data can be used to significantly improve the quality of load forecast.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The spectrum nature and heterogeneity within autism spectrum disorders (ASD) pose as a challenge for treatment. Personalisation of syllabus for children with ASD can improve the efficacy of learning by adjusting the number of opportunities and deciding the course of syllabus. We research the data-motivated approach in an attempt to disentangle this heterogeneity for personalisation of syllabus. With the help of technology and a structured syllabus, collecting data while a child with ASD masters the skills is made possible. The performance data collected are, however, growing and contain missing elements based on the pace and the course each child takes while navigating through the syllabus. Bayesian nonparametric methods are known for automatically discovering the number of latent components and their parameters when the model involves higher complexity. We propose a nonparametric Bayesian matrix factorisation model that discovers learning patterns and the way participants associate with them. Our model is built upon the linear Poisson gamma model (LPGM) with an Indian buffet process prior and extended to incorporate data with missing elements. In this paper, for the first time we have presented learning patterns deduced automatically from data mining and machine learning methods using intervention data recorded for over 500 children with ASD. We compare the results with non-negative matrix factorisation and K-means, which being parametric, not only require us to specify the number of learning patterns in advance, but also do not have a principle approach to deal with missing data. The F1 score observed over varying degree of similarity measure (Jaccard Index) suggests that LPGM yields the best outcome. By observing these patterns with additional knowledge regarding the syllabus it may be possible to observe the progress and dynamically modify the syllabus for improved learning.