875 resultados para alberi, decisione, apprendimento, ensemble, learning, machine
Resumo:
Due to the increased incidence of skin cancer, computational methods based on intelligent approaches have been developed to aid dermatologists in the diagnosis of skin lesions. This paper proposes a method to classify texture in images, since it is an important feature for the successfully identification of skin lesions. For this is defined a feature vector, with the fractal dimension of images through the box-counting method (BCM), which is used with a SVM to classify the texture of the lesions in to non-irregular or irregular. With the proposed solution, we could obtain an accuracy of 72.84%. © 2012 AISTI.
Resumo:
Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.
Resumo:
This study aim to verify the use of learning strategies in students of the elementary level presenting interdisciplinary diagnosis of attention dei cit hyperactivity disorder (ADHD). Nine students, male gender, attending 3rd to 9th grade level of the elementary level, average age 10 years and 7 months, presenting interdisciplinary diagnosis of attention dei cit hyperactivity disorder (ADHD). h e students were submitted to the application of the Evaluation of Learning Strategies from elementary level – EAVAP-EF – scale, which aimed to evaluate the strategies reported and used by students in situation of study and learning, as follows: cognitive strategies, metacognitive strategies and absence of dysfunctional metacognitive strategies. h e general result at EAVAP-EF scale, showed that students with ADHD reached the percentile 25%, considered as low performance in the use of the learning strategies. For the variable absence of dysfunctional metacognitive strategies, the students presented percentile 30%, percentile 25% for cognitive strategies and 55% for metacognitive strategies. h e results showed that ADHD students do not use ef ectively the learning cognitive and metacognitive strategies and present the use of dysfunctional metacognitive strategies. h ese alterations match with the framework of ADHD because the entry of information, either visual or auditory, showed alterations, derived from inattention, which af ected the learning in classroom situation.
Resumo:
Pós-graduação em Engenharia Mecânica - FEG
Resumo:
Both Semi-Supervised Leaning and Active Learning are techniques used when unlabeled data is abundant, but the process of labeling them is expensive and/or time consuming. In this paper, those two machine learning techniques are combined into a single nature-inspired method. It features particles walking on a network built from the data set, using a unique random-greedy rule to select neighbors to visit. The particles, which have both competitive and cooperative behavior, are created on the network as the result of label queries. They may be created as the algorithm executes and only nodes affected by the new particles have to be updated. Therefore, it saves execution time compared to traditional active learning frameworks, in which the learning algorithm has to be executed several times. The data items to be queried are select based on information extracted from the nodes and particles temporal dynamics. Two different rules for queries are explored in this paper, one of them is based on querying by uncertainty approaches and the other is based on data and labeled nodes distribution. Each of them may perform better than the other according to some data sets peculiarities. Experimental results on some real-world data sets are provided, and the proposed method outperforms the semi-supervised learning method, from which it is derived, in all of them.
Resumo:
Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.
Resumo:
In the pattern recognition research field, Support Vector Machines (SVM) have been an effectiveness tool for classification purposes, being successively employed in many applications. The SVM input data is transformed into a high dimensional space using some kernel functions where linear separation is more likely. However, there are some computational drawbacks associated to SVM. One of them is the computational burden required to find out the more adequate parameters for the kernel mapping considering each non-linearly separable input data space, which reflects the performance of SVM. This paper introduces the Polynomial Powers of Sigmoid for SVM kernel mapping, and it shows their advantages over well-known kernel functions using real and synthetic datasets.
Resumo:
In active learning, a machine learning algorithmis given an unlabeled set of examples U, and is allowed to request labels for a relatively small subset of U to use for training. The goal is then to judiciously choose which examples in U to have labeled in order to optimize some performance criterion, e.g. classification accuracy. We study how active learning affects AUC. We examine two existing algorithms from the literature and present our own active learning algorithms designed to maximize the AUC of the hypothesis. One of our algorithms was consistently the top performer, and Closest Sampling from the literature often came in second behind it. When good posterior probability estimates were available, our heuristics were by far the best.
Resumo:
We explore the problem of budgeted machine learning, in which the learning algorithm has free access to the training examples’ labels but has to pay for each attribute that is specified. This learning model is appropriate in many areas, including medical applications. We present new algorithms for choosing which attributes to purchase of which examples in the budgeted learning model based on algorithms for the multi-armed bandit problem. All of our approaches outperformed the current state of the art. Furthermore, we present a new means for selecting an example to purchase after the attribute is selected, instead of selecting an example uniformly at random, which is typically done. Our new example selection method improved performance of all the algorithms we tested, both ours and those in the literature.
Resumo:
The multiple-instance learning (MIL) model has been successful in areas such as drug discovery and content-based image-retrieval. Recently, this model was generalized and a corresponding kernel was introduced to learn generalized MIL concepts with a support vector machine. While this kernel enjoyed empirical success, it has limitations in its representation. We extend this kernel by enriching its representation and empirically evaluate our new kernel on data from content-based image retrieval, biological sequence analysis, and drug discovery. We found that our new kernel generalized noticeably better than the old one in content-based image retrieval and biological sequence analysis and was slightly better or even with the old kernel in the other applications, showing that an SVM using this kernel does not overfit despite its richer representation.
Resumo:
Semi-supervised learning is one of the important topics in machine learning, concerning with pattern classification where only a small subset of data is labeled. In this paper, a new network-based (or graph-based) semi-supervised classification model is proposed. It employs a combined random-greedy walk of particles, with competition and cooperation mechanisms, to propagate class labels to the whole network. Due to the competition mechanism, the proposed model has a local label spreading fashion, i.e., each particle only visits a portion of nodes potentially belonging to it, while it is not allowed to visit those nodes definitely occupied by particles of other classes. In this way, a "divide-and-conquer" effect is naturally embedded in the model. As a result, the proposed model can achieve a good classification rate while exhibiting low computational complexity order in comparison to other network-based semi-supervised algorithms. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method.
Resumo:
Semi-supervised learning techniques have gained increasing attention in the machine learning community, as a result of two main factors: (1) the available data is exponentially increasing; (2) the task of data labeling is cumbersome and expensive, involving human experts in the process. In this paper, we propose a network-based semi-supervised learning method inspired by the modularity greedy algorithm, which was originally applied for unsupervised learning. Changes have been made in the process of modularity maximization in a way to adapt the model to propagate labels throughout the network. Furthermore, a network reduction technique is introduced, as well as an extensive analysis of its impact on the network. Computer simulations are performed for artificial and real-world databases, providing a numerical quantitative basis for the performance of the proposed method.
Resumo:
Competitive learning is an important machine learning approach which is widely employed in artificial neural networks. In this paper, we present a rigorous definition of a new type of competitive learning scheme realized on large-scale networks. The model consists of several particles walking within the network and competing with each other to occupy as many nodes as possible, while attempting to reject intruder particles. The particle's walking rule is composed of a stochastic combination of random and preferential movements. The model has been applied to solve community detection and data clustering problems. Computer simulations reveal that the proposed technique presents high precision of community and cluster detections, as well as low computational complexity. Moreover, we have developed an efficient method for estimating the most likely number of clusters by using an evaluator index that monitors the information generated by the competition process itself. We hope this paper will provide an alternative way to the study of competitive learning.
Resumo:
Semisupervised learning is a machine learning approach that is able to employ both labeled and unlabeled samples in the training process. In this paper, we propose a semisupervised data classification model based on a combined random-preferential walk of particles in a network (graph) constructed from the input dataset. The particles of the same class cooperate among themselves, while the particles of different classes compete with each other to propagate class labels to the whole network. A rigorous model definition is provided via a nonlinear stochastic dynamical system and a mathematical analysis of its behavior is carried out. A numerical validation presented in this paper confirms the theoretical predictions. An interesting feature brought by the competitive-cooperative mechanism is that the proposed model can achieve good classification rates while exhibiting low computational complexity order in comparison to other network-based semisupervised algorithms. Computer simulations conducted on synthetic and real-world datasets reveal the effectiveness of the model.
Resumo:
In the collective imaginaries a robot is a human like machine as any androids in science fiction. However the type of robots that you will encounter most frequently are machinery that do work that is too dangerous, boring or onerous. Most of the robots in the world are of this type. They can be found in auto, medical, manufacturing and space industries. Therefore a robot is a system that contains sensors, control systems, manipulators, power supplies and software all working together to perform a task. The development and use of such a system is an active area of research and one of the main problems is the development of interaction skills with the surrounding environment, which include the ability to grasp objects. To perform this task the robot needs to sense the environment and acquire the object informations, physical attributes that may influence a grasp. Humans can solve this grasping problem easily due to their past experiences, that is why many researchers are approaching it from a machine learning perspective finding grasp of an object using information of already known objects. But humans can select the best grasp amongst a vast repertoire not only considering the physical attributes of the object to grasp but even to obtain a certain effect. This is why in our case the study in the area of robot manipulation is focused on grasping and integrating symbolic tasks with data gained through sensors. The learning model is based on Bayesian Network to encode the statistical dependencies between the data collected by the sensors and the symbolic task. This data representation has several advantages. It allows to take into account the uncertainty of the real world, allowing to deal with sensor noise, encodes notion of causality and provides an unified network for learning. Since the network is actually implemented and based on the human expert knowledge, it is very interesting to implement an automated method to learn the structure as in the future more tasks and object features can be introduced and a complex network design based only on human expert knowledge can become unreliable. Since structure learning algorithms presents some weaknesses, the goal of this thesis is to analyze real data used in the network modeled by the human expert, implement a feasible structure learning approach and compare the results with the network designed by the expert in order to possibly enhance it.