373 resultados para unsupervised


Relevância:

10.00% 10.00%

Publicador:

Resumo:

We propose a unified data modeling approach that is equally applicable to supervised regression and classification applications, as well as to unsupervised probability density function estimation. A particle swarm optimization (PSO) aided orthogonal forward regression (OFR) algorithm based on leave-one-out (LOO) criteria is developed to construct parsimonious radial basis function (RBF) networks with tunable nodes. Each stage of the construction process determines the center vector and diagonal covariance matrix of one RBF node by minimizing the LOO statistics. For regression applications, the LOO criterion is chosen to be the LOO mean square error, while the LOO misclassification rate is adopted in two-class classification applications. By adopting the Parzen window estimate as the desired response, the unsupervised density estimation problem is transformed into a constrained regression problem. This PSO aided OFR algorithm for tunable-node RBF networks is capable of constructing very parsimonious RBF models that generalize well, and our analysis and experimental results demonstrate that the algorithm is computationally even simpler than the efficient regularization assisted orthogonal least square algorithm based on LOO criteria for selecting fixed-node RBF models. Another significant advantage of the proposed learning procedure is that it does not have learning hyperparameters that have to be tuned using costly cross validation. The effectiveness of the proposed PSO aided OFR construction procedure is illustrated using several examples taken from regression and classification, as well as density estimation applications.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper derives an efficient algorithm for constructing sparse kernel density (SKD) estimates. The algorithm first selects a very small subset of significant kernels using an orthogonal forward regression (OFR) procedure based on the D-optimality experimental design criterion. The weights of the resulting sparse kernel model are then calculated using a modified multiplicative nonnegative quadratic programming algorithm. Unlike most of the SKD estimators, the proposed D-optimality regression approach is an unsupervised construction algorithm and it does not require an empirical desired response for the kernel selection task. The strength of the D-optimality OFR is owing to the fact that the algorithm automatically selects a small subset of the most significant kernels related to the largest eigenvalues of the kernel design matrix, which counts for the most energy of the kernel training data, and this also guarantees the most accurate kernel weight estimate. The proposed method is also computationally attractive, in comparison with many existing SKD construction algorithms. Extensive numerical investigation demonstrates the ability of this regression-based approach to efficiently construct a very sparse kernel density estimate with excellent test accuracy, and our results show that the proposed method compares favourably with other existing sparse methods, in terms of test accuracy, model sparsity and complexity, for constructing kernel density estimates.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Light Detection And Ranging (LIDAR) is an important modality in terrain and land surveying for many environmental, engineering and civil applications. This paper presents the framework for a recently developed unsupervised classification algorithm called Skewness Balancing for object and ground point separation in airborne LIDAR data. The main advantages of the algorithm are threshold-freedom and independence from LIDAR data format and resolution, while preserving object and terrain details. The framework for Skewness Balancing has been built in this contribution with a prediction model in which unknown LIDAR tiles can be categorised as “hilly” or “moderate” terrains. Accuracy assessment of the model is carried out using cross-validation with an overall accuracy of 95%. An extension to the algorithm is developed to address the overclassification issue for hilly terrain. For moderate terrain, the results show that from the classified tiles detached objects (buildings and vegetation) and attached objects (bridges and motorway junctions) are separated from bare earth (ground, roads and yards) which makes Skewness Balancing ideal to be integrated into geographic information system (GIS) software packages.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Airborne LIght Detection And Ranging (LIDAR) provides accurate height information for objects on the earth, which makes LIDAR become more and more popular in terrain and land surveying. In particular, LIDAR data offer vital and significant features for land-cover classification which is an important task in many application domains. In this paper, an unsupervised approach based on an improved fuzzy Markov random field (FMRF) model is developed, by which the LIDAR data, its co-registered images acquired by optical sensors, i.e. aerial color image and near infrared image, and other derived features are fused effectively to improve the ability of the LIDAR system for the accurate land-cover classification. In the proposed FMRF model-based approach, the spatial contextual information is applied by modeling the image as a Markov random field (MRF), with which the fuzzy logic is introduced simultaneously to reduce the errors caused by the hard classification. Moreover, a Lagrange-Multiplier (LM) algorithm is employed to calculate a maximum A posteriori (MAP) estimate for the classification. The experimental results have proved that fusing the height data and optical images is particularly suited for the land-cover classification. The proposed approach works very well for the classification from airborne LIDAR data fused with its coregistered optical images and the average accuracy is improved to 88.9%.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The Self-Organizing Map (SOM) is a popular unsupervised neural network able to provide effective clustering and data visualization for multidimensional input datasets. In this paper, we present an application of the simulated annealing procedure to the SOM learning algorithm with the aim to obtain a fast learning and better performances in terms of quantization error. The proposed learning algorithm is called Fast Learning Self-Organized Map, and it does not affect the easiness of the basic learning algorithm of the standard SOM. The proposed learning algorithm also improves the quality of resulting maps by providing better clustering quality and topology preservation of input multi-dimensional data. Several experiments are used to compare the proposed approach with the original algorithm and some of its modification and speed-up techniques.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In the recent years, the area of data mining has been experiencing considerable demand for technologies that extract knowledge from large and complex data sources. There has been substantial commercial interest as well as active research in the area that aim to develop new and improved approaches for extracting information, relationships, and patterns from large datasets. Artificial neural networks (NNs) are popular biologically-inspired intelligent methodologies, whose classification, prediction, and pattern recognition capabilities have been utilized successfully in many areas, including science, engineering, medicine, business, banking, telecommunication, and many other fields. This paper highlights from a data mining perspective the implementation of NN, using supervised and unsupervised learning, for pattern recognition, classification, prediction, and cluster analysis, and focuses the discussion on their usage in bioinformatics and financial data analysis tasks. © 2012 Wiley Periodicals, Inc.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Spontaneous activity of the brain at rest frequently has been considered a mere backdrop to the salient activity evoked by external stimuli or tasks. However, the resting state of the brain consumes most of its energy budget, which suggests a far more important role. An intriguing hint comes from experimental observations of spontaneous activity patterns, which closely resemble those evoked by visual stimulation with oriented gratings, except that cortex appeared to cycle between different orientation maps. Moreover, patterns similar to those evoked by the behaviorally most relevant horizontal and vertical orientations occurred more often than those corresponding to oblique angles. We hypothesize that this kind of spontaneous activity develops at least to some degree autonomously, providing a dynamical reservoir of cortical states, which are then associated with visual stimuli through learning. To test this hypothesis, we use a biologically inspired neural mass model to simulate a patch of cat visual cortex. Spontaneous transitions between orientation states were induced by modest modifications of the neural connectivity, establishing a stable heteroclinic channel. Significantly, the experimentally observed greater frequency of states representing the behaviorally important horizontal and vertical orientations emerged spontaneously from these simulations. We then applied bar-shaped inputs to the model cortex and used Hebbian learning rules to modify the corresponding synaptic strengths. After unsupervised learning, different bar inputs reliably and exclusively evoked their associated orientation state; whereas in the absence of input, the model cortex resumed its spontaneous cycling. We conclude that the experimentally observed similarities between spontaneous and evoked activity in visual cortex can be explained as the outcome of a learning process that associates external stimuli with a preexisting reservoir of autonomous neural activity states. Our findings hence demonstrate how cortical connectivity can link the maintenance of spontaneous activity in the brain mechanistically to its core cognitive functions.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: In many experimental pipelines, clustering of multidimensional biological datasets is used to detect hidden structures in unlabelled input data. Taverna is a popular workflow management system that is used to design and execute scientific workflows and aid in silico experimentation. The availability of fast unsupervised methods for clustering and visualization in the Taverna platform is important to support a data-driven scientific discovery in complex and explorative bioinformatics applications. Results: This work presents a Taverna plugin, the Biological Data Interactive Clustering Explorer (BioDICE), that performs clustering of high-dimensional biological data and provides a nonlinear, topology preserving projection for the visualization of the input data and their similarities. The core algorithm in the BioDICE plugin is Fast Learning Self Organizing Map (FLSOM), which is an improved variant of the Self Organizing Map (SOM) algorithm. The plugin generates an interactive 2D map that allows the visual exploration of multidimensional data and the identification of groups of similar objects. The effectiveness of the plugin is demonstrated on a case study related to chemical compounds. Conclusions: The number and variety of available tools and its extensibility have made Taverna a popular choice for the development of scientific data workflows. This work presents a novel plugin, BioDICE, which adds a data-driven knowledge discovery component to Taverna. BioDICE provides an effective and powerful clustering tool, which can be adopted for the explorative analysis of biological datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

We present a method for the recognition of complex actions. Our method combines automatic learning of simple actions and manual definition of complex actions in a single grammar. Contrary to the general trend in complex action recognition that consists in dividing recognition into two stages, our method performs recognition of simple and complex actions in a unified way. This is performed by encoding simple action HMMs within the stochastic grammar that models complex actions. This unified approach enables a more effective influence of the higher activity layers into the recognition of simple actions which leads to a substantial improvement in the classification of complex actions. We consider the recognition of complex actions based on person transits between areas in the scene. As input, our method receives crossings of tracks along a set of zones which are derived using unsupervised learning of the movement patterns of the objects in the scene. We evaluate our method on a large dataset showing normal, suspicious and threat behaviour on a parking lot. Experiments show an improvement of ~ 30% in the recognition of both high-level scenarios and their composing simple actions with respect to a two-stage approach. Experiments with synthetic noise simulating the most common tracking failures show that our method only experiences a limited decrease in performance when moderate amounts of noise are added.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Clustering quality or validation indices allow the evaluation of the quality of clustering in order to support the selection of a specific partition or clustering structure in its natural unsupervised environment, where the real solution is unknown or not available. In this paper, we investigate the use of quality indices mostly based on the concepts of clusters` compactness and separation, for the evaluation of clustering results (partitions in particular). This work intends to offer a general perspective regarding the appropriate use of quality indices for the purpose of clustering evaluation. After presenting some commonly used indices, as well as indices recently proposed in the literature, key issues regarding the practical use of quality indices are addressed. A general methodological approach is presented which considers the identification of appropriate indices thresholds. This general approach is compared with the simple use of quality indices for evaluating a clustering solution.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: Home-management of malaria (HMM) strategy improves early access of anti-malarial medicines to high-risk groups in remote areas of sub-Saharan Africa. However, limited data are available on the effectiveness of using artemisinin-based combination therapy (ACT) within the HMM strategy. The aim of this study was to assess the effectiveness of artemether-lumefantrine (AL), presently the most favoured ACT in Africa, in under-five children with uncomplicated Plasmodium falciparum malaria in Tanzania, when provided by community health workers (CHWs) and administered unsupervised by parents or guardians at home. Methods: An open label, single arm prospective study was conducted in two rural villages with high malaria transmission in Kibaha District, Tanzania. Children presenting to CHWs with uncomplicated fever and a positive rapid malaria diagnostic test (RDT) were provisionally enrolled and provided AL for unsupervised treatment at home. Patients with microscopy confirmed P. falciparum parasitaemia were definitely enrolled and reviewed weekly by the CHWs during 42 days. Primary outcome measure was PCR corrected parasitological cure rate by day 42, as estimated by Kaplan-Meier survival analysis. This trial is registered with ClinicalTrials.gov, number NCT00454961. Results: A total of 244 febrile children were enrolled between March-August 2007. Two patients were lost to follow up on day 14, and one patient withdrew consent on day 21. Some 141/241 (58.5%) patients had recurrent infection during follow-up, of whom 14 had recrudescence. The PCR corrected cure rate by day 42 was 93.0% (95% CI 88.3%-95.9%). The median lumefantrine concentration was statistically significantly lower in patients with recrudescence (97 ng/mL [IQR 0-234]; n = 10) compared with reinfections (205 ng/mL [114-390]; n = 92), or no parasite reappearance (217 [121-374] ng/mL; n = 70; p <= 0.046). Conclusions: Provision of AL by CHWs for unsupervised malaria treatment at home was highly effective, which provides evidence base for scaling-up implementation of HMM with AL in Tanzania.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this paper artificial neural network (ANN) based on supervised and unsupervised algorithms were investigated for use in the study of rheological parameters of solid pharmaceutical excipients, in order to develop computational tools for manufacturing solid dosage forms. Among four supervised neural networks investigated, the best learning performance was achieved by a feedfoward multilayer perceptron whose architectures was composed by eight neurons in the input layer, sixteen neurons in the hidden layer and one neuron in the output layer. Learning and predictive performance relative to repose angle was poor while to Carr index and Hausner ratio (CI and HR, respectively) showed very good fitting capacity and learning, therefore HR and CI were considered suitable descriptors for the next stage of development of supervised ANNs. Clustering capacity was evaluated for five unsupervised strategies. Network based on purely unsupervised competitive strategies, classic "Winner-Take-All", "Frequency-Sensitive Competitive Learning" and "Rival-Penalize Competitive Learning" (WTA, FSCL and RPCL, respectively) were able to perform clustering from database, however this classification was very poor, showing severe classification errors by grouping data with conflicting properties into the same cluster or even the same neuron. On the other hand it could not be established what was the criteria adopted by the neural network for those clustering. Self-Organizing Maps (SOM) and Neural Gas (NG) networks showed better clustering capacity. Both have recognized the two major groupings of data corresponding to lactose (LAC) and cellulose (CEL). However, SOM showed some errors in classify data from minority excipients, magnesium stearate (EMG) , talc (TLC) and attapulgite (ATP). NG network in turn performed a very consistent classification of data and solve the misclassification of SOM, being the most appropriate network for classifying data of the study. The use of NG network in pharmaceutical technology was still unpublished. NG therefore has great potential for use in the development of software for use in automated classification systems of pharmaceutical powders and as a new tool for mining and clustering data in drug development

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Image segmentation is one of the image processing problems that deserves special attention from the scientific community. This work studies unsupervised methods to clustering and pattern recognition applicable to medical image segmentation. Natural Computing based methods have shown very attractive in such tasks and are studied here as a way to verify it's applicability in medical image segmentation. This work treats to implement the following methods: GKA (Genetic K-means Algorithm), GFCMA (Genetic FCM Algorithm), PSOKA (PSO and K-means based Clustering Algorithm) and PSOFCM (PSO and FCM based Clustering Algorithm). Besides, as a way to evaluate the results given by the algorithms, clustering validity indexes are used as quantitative measure. Visual and qualitative evaluations are realized also, mainly using data given by the BrainWeb brain simulator as ground truth

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The techniques of Machine Learning are applied in classification tasks to acquire knowledge through a set of data or information. Some learning methods proposed in literature are methods based on semissupervised learning; this is represented by small percentage of labeled data (supervised learning) combined with a quantity of label and non-labeled examples (unsupervised learning) during the training phase, which reduces, therefore, the need for a large quantity of labeled instances when only small dataset of labeled instances is available for training. A commom problem in semi-supervised learning is as random selection of instances, since most of paper use a random selection technique which can cause a negative impact. Much of machine learning methods treat single-label problems, in other words, problems where a given set of data are associated with a single class; however, through the requirement existent to classify data in a lot of domain, or more than one class, this classification as called multi-label classification. This work presents an experimental analysis of the results obtained using semissupervised learning in troubles of multi-label classification using reliability parameter as an aid in the classification data. Thus, the use of techniques of semissupervised learning and besides methods of multi-label classification, were essential to show the results

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this work, we propose a two-stage algorithm for real-time fault detection and identification of industrial plants. Our proposal is based on the analysis of selected features using recursive density estimation and a new evolving classifier algorithm. More specifically, the proposed approach for the detection stage is based on the concept of density in the data space, which is not the same as probability density function, but is a very useful measure for abnormality/outliers detection. This density can be expressed by a Cauchy function and can be calculated recursively, which makes it memory and computational power efficient and, therefore, suitable for on-line applications. The identification/diagnosis stage is based on a self-developing (evolving) fuzzy rule-based classifier system proposed in this work, called AutoClass. An important property of AutoClass is that it can start learning from scratch". Not only do the fuzzy rules not need to be prespecified, but neither do the number of classes for AutoClass (the number may grow, with new class labels being added by the on-line learning process), in a fully unsupervised manner. In the event that an initial rule base exists, AutoClass can evolve/develop it further based on the newly arrived faulty state data. In order to validate our proposal, we present experimental results from a level control didactic process, where control and error signals are used as features for the fault detection and identification systems, but the approach is generic and the number of features can be significant due to the computationally lean methodology, since covariance or more complex calculations, as well as storage of old data, are not required. The obtained results are significantly better than the traditional approaches used for comparison