3 resultados para multiclass classification problems
em Universidad de Alicante
Resumo:
Feature selection is an important and active issue in clustering and classification problems. By choosing an adequate feature subset, a dataset dimensionality reduction is allowed, thus contributing to decreasing the classification computational complexity, and to improving the classifier performance by avoiding redundant or irrelevant features. Although feature selection can be formally defined as an optimisation problem with only one objective, that is, the classification accuracy obtained by using the selected feature subset, in recent years, some multi-objective approaches to this problem have been proposed. These either select features that not only improve the classification accuracy, but also the generalisation capability in case of supervised classifiers, or counterbalance the bias toward lower or higher numbers of features that present some methods used to validate the clustering/classification in case of unsupervised classifiers. The main contribution of this paper is a multi-objective approach for feature selection and its application to an unsupervised clustering procedure based on Growing Hierarchical Self-Organising Maps (GHSOMs) that includes a new method for unit labelling and efficient determination of the winning unit. In the network anomaly detection problem here considered, this multi-objective approach makes it possible not only to differentiate between normal and anomalous traffic but also among different anomalies. The efficiency of our proposals has been evaluated by using the well-known DARPA/NSL-KDD datasets that contain extracted features and labelled attacks from around 2 million connections. The selected feature sets computed in our experiments provide detection rates up to 99.8% with normal traffic and up to 99.6% with anomalous traffic, as well as accuracy values up to 99.12%.
Resumo:
In many classification problems, it is necessary to consider the specific location of an n-dimensional space from which features have been calculated. For example, considering the location of features extracted from specific areas of a two-dimensional space, as an image, could improve the understanding of a scene for a video surveillance system. In the same way, the same features extracted from different locations could mean different actions for a 3D HCI system. In this paper, we present a self-organizing feature map able to preserve the topology of locations of an n-dimensional space in which the vector of features have been extracted. The main contribution is to implicitly preserving the topology of the original space because considering the locations of the extracted features and their topology could ease the solution to certain problems. Specifically, the paper proposes the n-dimensional constrained self-organizing map preserving the input topology (nD-SOM-PINT). Features in adjacent areas of the n-dimensional space, used to extract the feature vectors, are explicitly in adjacent areas of the nD-SOM-PINT constraining the neural network structure and learning. As a study case, the neural network has been instantiate to represent and classify features as trajectories extracted from a sequence of images into a high level of semantic understanding. Experiments have been thoroughly carried out using the CAVIAR datasets (Corridor, Frontal and Inria) taken into account the global behaviour of an individual in order to validate the ability to preserve the topology of the two-dimensional space to obtain high-performance classification for trajectory classification in contrast of non-considering the location of features. Moreover, a brief example has been included to focus on validate the nD-SOM-PINT proposal in other domain than the individual trajectory. Results confirm the high accuracy of the nD-SOM-PINT outperforming previous methods aimed to classify the same datasets.
Resumo:
In the chemical textile domain experts have to analyse chemical components and substances that might be harmful for their usage in clothing and textiles. Part of this analysis is performed searching opinions and reports people have expressed concerning these products in the Social Web. However, this type of information on the Internet is not as frequent for this domain as for others, so its detection and classification is difficult and time-consuming. Consequently, problems associated to the use of chemical substances in textiles may not be detected early enough, and could lead to health problems, such as allergies or burns. In this paper, we propose a framework able to detect, retrieve, and classify subjective sentences related to the chemical textile domain, that could be integrated into a wider health surveillance system. We also describe the creation of several datasets with opinions from this domain, the experiments performed using machine learning techniques and different lexical resources such as WordNet, and the evaluation focusing on the sentiment classification, and complaint detection (i.e., negativity). Despite the challenges involved in this domain, our approach obtains promising results with an F-score of 65% for polarity classification and 82% for complaint detection.