915 resultados para Supervised Classifiers
Resumo:
The growth of social networking platforms has drawn a lot of attentions to the need for social computing. Social computing utilises human insights for computational tasks as well as design of systems that support social behaviours and interactions. One of the key aspects of social computing is the ability to attribute responsibility such as blame or praise to social events. This ability helps an intelligent entity account and understand other intelligent entities’ social behaviours, and enriches both the social functionalities and cognitive aspects of intelligent agents. In this paper, we present an approach with a model for blame and praise detection in text. We build our model based on various theories of blame and include in our model features used by humans determining judgment such as moral agent causality, foreknowledge, intentionality and coercion. An annotated corpus has been created for the task of blame and praise detection from text. The experimental results show that while our model gives similar results compared to supervised classifiers on classifying text as blame, praise or others, it outperforms supervised classifiers on more finer-grained classification of determining the direction of blame and praise, i.e., self-blame, blame-others, self-praise or praise-others, despite not using labelled training data.
Resumo:
Among the types of remote sensing acquisitions, optical images are certainly one of the most widely relied upon data sources for Earth observation. They provide detailed measurements of the electromagnetic radiation reflected or emitted by each pixel in the scene. Through a process termed supervised land-cover classification, this allows to automatically yet accurately distinguish objects at the surface of our planet. In this respect, when producing a land-cover map of the surveyed area, the availability of training examples representative of each thematic class is crucial for the success of the classification procedure. However, in real applications, due to several constraints on the sample collection process, labeled pixels are usually scarce. When analyzing an image for which those key samples are unavailable, a viable solution consists in resorting to the ground truth data of other previously acquired images. This option is attractive but several factors such as atmospheric, ground and acquisition conditions can cause radiometric differences between the images, hindering therefore the transfer of knowledge from one image to another. The goal of this Thesis is to supply remote sensing image analysts with suitable processing techniques to ensure a robust portability of the classification models across different images. The ultimate purpose is to map the land-cover classes over large spatial and temporal extents with minimal ground information. To overcome, or simply quantify, the observed shifts in the statistical distribution of the spectra of the materials, we study four approaches issued from the field of machine learning. First, we propose a strategy to intelligently sample the image of interest to collect the labels only in correspondence of the most useful pixels. This iterative routine is based on a constant evaluation of the pertinence to the new image of the initial training data actually belonging to a different image. Second, an approach to reduce the radiometric differences among the images by projecting the respective pixels in a common new data space is presented. We analyze a kernel-based feature extraction framework suited for such problems, showing that, after this relative normalization, the cross-image generalization abilities of a classifier are highly increased. Third, we test a new data-driven measure of distance between probability distributions to assess the distortions caused by differences in the acquisition geometry affecting series of multi-angle images. Also, we gauge the portability of classification models through the sequences. In both exercises, the efficacy of classic physically- and statistically-based normalization methods is discussed. Finally, we explore a new family of approaches based on sparse representations of the samples to reciprocally convert the data space of two images. The projection function bridging the images allows a synthesis of new pixels with more similar characteristics ultimately facilitating the land-cover mapping across images.
Resumo:
This thesis is about detection of local image features. The research topic belongs to the wider area of object detection, which is a machine vision and pattern recognition problem where an object must be detected (located) in an image. State-of-the-art object detection methods often divide the problem into separate interest point detection and local image description steps, but in this thesis a different technique is used, leading to higher quality image features which enable more precise localization. Instead of using interest point detection the landmark positions are marked manually. Therefore, the quality of the image features is not limited by the interest point detection phase and the learning of image features is simplified. The approach combines both interest point detection and local description into one phase for detection. Computational efficiency of the descriptor is therefore important, leaving out many of the commonly used descriptors as unsuitably heavy. Multiresolution Gabor features has been the main descriptor in this thesis and improving their efficiency is a significant part. Actual image features are formed from descriptors by using a classifierwhich can then recognize similar looking patches in new images. The main classifier is based on Gaussian mixture models. Classifiers are used in one-class classifier configuration where there are only positive training samples without explicit background class. The local image feature detection method has been tested with two freely available face detection databases and a proprietary license plate database. The localization performance was very good in these experiments. Other applications applying the same under-lying techniques are also presented, including object categorization and fault detection.
Resumo:
Mobile malwares are increasing with the growing number of Mobile users. Mobile malwares can perform several operations which lead to cybersecurity threats such as, stealing financial or personal information, installing malicious applications, sending premium SMS, creating backdoors, keylogging and crypto-ransomware attacks. Knowing the fact that there are many illegitimate Applications available on the App stores, most of the mobile users remain careless about the security of their Mobile devices and become the potential victim of these threats. Previous studies have shown that not every antivirus is capable of detecting all the threats; due to the fact that Mobile malwares use advance techniques to avoid detection. A Network-based IDS at the operator side will bring an extra layer of security to the subscribers and can detect many advanced threats by analyzing their traffic patterns. Machine Learning(ML) will provide the ability to these systems to detect unknown threats for which signatures are not yet known. This research is focused on the evaluation of Machine Learning classifiers in Network-based Intrusion detection systems for Mobile Networks. In this study, different techniques of Network-based intrusion detection with their advantages, disadvantages and state of the art in Hybrid solutions are discussed. Finally, a ML based NIDS is proposed which will work as a subsystem, to Network-based IDS deployed by Mobile Operators, that can help in detecting unknown threats and reducing false positives. In this research, several ML classifiers were implemented and evaluated. This study is focused on Android-based malwares, as Android is the most popular OS among users, hence most targeted by cyber criminals. Supervised ML algorithms based classifiers were built using the dataset which contained the labeled instances of relevant features. These features were extracted from the traffic generated by samples of several malware families and benign applications. These classifiers were able to detect malicious traffic patterns with the TPR upto 99.6% during Cross-validation test. Also, several experiments were conducted to detect unknown malware traffic and to detect false positives. These classifiers were able to detect unknown threats with the Accuracy of 97.5%. These classifiers could be integrated with current NIDS', which use signatures, statistical or knowledge-based techniques to detect malicious traffic. Technique to integrate the output from ML classifier with traditional NIDS is discussed and proposed for future work.
Resumo:
We describe a system that learns from examples to recognize people in images taken indoors. Images of people are represented by color-based and shape-based features. Recognition is carried out through combinations of Support Vector Machine classifiers (SVMs). Different types of multiclass strategies based on SVMs are explored and compared to k-Nearest Neighbors classifiers (kNNs). The system works in real time and shows high performance rates for people recognition throughout one day.
Resumo:
This work proposes and discusses an approach for inducing Bayesian classifiers aimed at balancing the tradeoff between the precise probability estimates produced by time consuming unrestricted Bayesian networks and the computational efficiency of Naive Bayes (NB) classifiers. The proposed approach is based on the fundamental principles of the Heuristic Search Bayesian network learning. The Markov Blanket concept, as well as a proposed ""approximate Markov Blanket"" are used to reduce the number of nodes that form the Bayesian network to be induced from data. Consequently, the usually high computational cost of the heuristic search learning algorithms can be lessened, while Bayesian network structures better than NB can be achieved. The resulting algorithms, called DMBC (Dynamic Markov Blanket Classifier) and A-DMBC (Approximate DMBC), are empirically assessed in twelve domains that illustrate scenarios of particular interest. The obtained results are compared with NB and Tree Augmented Network (TAN) classifiers, and confinn that both proposed algorithms can provide good classification accuracies and better probability estimates than NB and TAN, while being more computationally efficient than the widely used K2 Algorithm.
Resumo:
Various popular machine learning techniques, like support vector machines, are originally conceived for the solution of two-class (binary) classification problems. However, a large number of real problems present more than two classes. A common approach to generalize binary learning techniques to solve problems with more than two classes, also known as multiclass classification problems, consists of hierarchically decomposing the multiclass problem into multiple binary sub-problems, whose outputs are combined to define the predicted class. This strategy results in a tree of binary classifiers, where each internal node corresponds to a binary classifier distinguishing two groups of classes and the leaf nodes correspond to the problem classes. This paper investigates how measures of the separability between classes can be employed in the construction of binary-tree-based multiclass classifiers, adapting the decompositions performed to each particular multiclass problem. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Several real problems involve the classification of data into categories or classes. Given a data set containing data whose classes are known, Machine Learning algorithms can be employed for the induction of a classifier able to predict the class of new data from the same domain, performing the desired discrimination. Some learning techniques are originally conceived for the solution of problems with only two classes, also named binary classification problems. However, many problems require the discrimination of examples into more than two categories or classes. This paper presents a survey on the main strategies for the generalization of binary classifiers to problems with more than two classes, known as multiclass classification problems. The focus is on strategies that decompose the original multiclass problem into multiple binary subtasks, whose outputs are combined to obtain the final prediction.
Resumo:
Species` potential distribution modelling consists of building a representation of the fundamental ecological requirements of a species from biotic and abiotic conditions where the species is known to occur. Such models can be valuable tools to understand the biogeography of species and to support the prediction of its presence/absence considering a particular environment scenario. This paper investigates the use of different supervised machine learning techniques to model the potential distribution of 35 plant species from Latin America. Each technique was able to extract a different representation of the relations between the environmental conditions and the distribution profile of the species. The experimental results highlight the good performance of random trees classifiers, indicating this particular technique as a promising candidate for modelling species` potential distribution. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The main purpose of a gene interaction network is to map the relationships of the genes that are out of sight when a genomic study is tackled. DNA microarrays allow the measure of gene expression of thousands of genes at the same time. These data constitute the numeric seed for the induction of the gene networks. In this paper, we propose a new approach to build gene networks by means of Bayesian classifiers, variable selection and bootstrap resampling. The interactions induced by the Bayesian classifiers are based both on the expression levels and on the phenotype information of the supervised variable. Feature selection and bootstrap resampling add reliability and robustness to the overall process removing the false positive findings. The consensus among all the induced models produces a hierarchy of dependences and, thus, of variables. Biologists can define the depth level of the model hierarchy so the set of interactions and genes involved can vary from a sparse to a dense set. Experimental results show how these networks perform well on classification tasks. The biological validation matches previous biological findings and opens new hypothesis for future studies
Resumo:
INTRODUCTION: Objective assessment of motor skills has become an important challenge in minimally invasive surgery (MIS) training.Currently, there is no gold standard defining and determining the residents' surgical competence.To aid in the decision process, we analyze the validity of a supervised classifier to determine the degree of MIS competence based on assessment of psychomotor skills METHODOLOGY: The ANFIS is trained to classify performance in a box trainer peg transfer task performed by two groups (expert/non expert). There were 42 participants included in the study: the non-expert group consisted of 16 medical students and 8 residents (< 10 MIS procedures performed), whereas the expert group consisted of 14 residents (> 10 MIS procedures performed) and 4 experienced surgeons. Instrument movements were captured by means of the Endoscopic Video Analysis (EVA) tracking system. Nine motion analysis parameters (MAPs) were analyzed, including time, path length, depth, average speed, average acceleration, economy of area, economy of volume, idle time and motion smoothness. Data reduction was performed by means of principal component analysis, and then used to train the ANFIS net. Performance was measured by leave one out cross validation. RESULTS: The ANFIS presented an accuracy of 80.95%, where 13 experts and 21 non-experts were correctly classified. Total root mean square error was 0.88, while the area under the classifiers' ROC curve (AUC) was measured at 0.81. DISCUSSION: We have shown the usefulness of ANFIS for classification of MIS competence in a simple box trainer exercise. The main advantage of using ANFIS resides in its continuous output, which allows fine discrimination of surgical competence. There are, however, challenges that must be taken into account when considering use of ANFIS (e.g. training time, architecture modeling). Despite this, we have shown discriminative power of ANFIS for a low-difficulty box trainer task, regardless of the individual significances between MAPs. Future studies are required to confirm the findings, inclusion of new tasks, conditions and sample population.
Resumo:
Background Objective assessment of psychomotor skills has become an important challenge in the training of minimally invasive surgical (MIS) techniques. Currently, no gold standard defining surgical competence exists for classifying residents according to their surgical skills. Supervised classification has been proposed as a means for objectively establishing competence thresholds in psychomotor skills evaluation. This report presents a study comparing three classification methods for establishing their validity in a set of tasks for basic skills’ assessment. Methods Linear discriminant analysis (LDA), support vector machines (SVM), and adaptive neuro-fuzzy inference systems (ANFIS) were used. A total of 42 participants, divided into an experienced group (4 expert surgeons and 14 residents with >10 laparoscopic surgeries performed) and a nonexperienced group (16 students and 8 residents with <10 laparoscopic surgeries performed), performed three box trainer tasks validated for assessment of MIS psychomotor skills. Instrument movements were captured using the TrEndo tracking system, and nine motion analysis parameters (MAPs) were analyzed. The performance of the classifiers was measured by leave-one-out cross-validation using the scores obtained by the participants. Results The mean accuracy performances of the classifiers were 71 % (LDA), 78.2 % (SVM), and 71.7 % (ANFIS). No statistically significant differences in the performance were identified between the classifiers. Conclusions The three proposed classifiers showed good performance in the discrimination of skills, especially when information from all MAPs and tasks combined were considered. A correlation between the surgeons’ previous experience and their execution of the tasks could be ascertained from results. However, misclassifications across all the classifiers could imply the existence of other factors influencing psychomotor competence.
Resumo:
Social media has become an effective channel for communicating both trends and public opinion on current events. However the automatic topic classification of social media content pose various challenges. Topic classification is a common technique used for automatically capturing themes that emerge from social media streams. However, such techniques are sensitive to the evolution of topics when new event-dependent vocabularies start to emerge (e.g., Crimea becoming relevant to War Conflict during the Ukraine crisis in 2014). Therefore, traditional supervised classification methods which rely on labelled data could rapidly become outdated. In this paper we propose a novel transfer learning approach to address the classification task of new data when the only available labelled data belong to a previous epoch. This approach relies on the incorporation of knowledge from DBpedia graphs. Our findings show promising results in understanding how features age, and how semantic features can support the evolution of topic classifiers.