32 resultados para Data classification


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Interactive visual representations complement traditional statistical and machine learning techniques for data analysis, allowing users to play a more active role in a knowledge discovery process and making the whole process more understandable. Though visual representations are applicable to several stages of the knowledge discovery process, a common use of visualization is in the initial stages to explore and organize a sometimes unknown and complex data set. In this context, the integrated and coordinated - that is, user actions should be capable of affecting multiple visualizations when desired - use of multiple graphical representations allows data to be observed from several perspectives and offers richer information than isolated representations. In this paper we propose an underlying model for an extensible and adaptable environment that allows independently developed visualization components to be gradually integrated into a user configured knowledge discovery application. Because a major requirement when using multiple visual techniques is the ability to link amongst them, so that user actions executed on a representation propagate to others if desired, the model also allows runtime configuration of coordinated user actions over different visual representations. We illustrate how this environment is being used to assist data exploration and organization in a climate classification problem.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During the petroleum well drilling operation many mechanical and hydraulic parameters are monitored by an instrumentation system installed in the rig called a mud-logging system. These sensors, distributed in the rig, monitor different operation parameters such as weight on the hook and drillstring rotation. These measurements are known as mud-logging records and allow the online following of all the drilling process with well monitoring purposes. However, in most of the cases, these data are stored without taking advantage of all their potential. On the other hand, to make use of the mud-logging data, an analysis and interpretationt is required. That is not an easy task because of the large volume of information involved. This paper presents a Support Vector Machine (SVM) used to automatically classify the drilling operation stages through the analysis of some mud-logging parameters. In order to validate the results of SVM technique, it was compared to a classification elaborated by a Petroleum Engineering expert. © 2006 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper proposes a fuzzy classification system for the risk of infestation by weeds in agricultural zones considering the variability of weeds. The inputs of the system are features of the infestation extracted from estimated maps by kriging for the weed seed production and weed coverage, and from the competitiveness, inferred from narrow and broad-leaved weeds. Furthermore, a Bayesian network classifier is used to extract rules from data which are compared to the fuzzy rule set obtained on the base of specialist knowledge. Results for the risk inference in a maize crop field are presented and evaluated by the estimated yield loss. © 2009 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness. In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training, it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic resonance images of the human brain. © 2010 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper we would like to shed light the problem of efficiency and effectiveness of image classification in large datasets. As the amount of data to be processed and further classified has increased in the last years, there is a need for faster and more precise pattern recognition algorithms in order to perform online and offline training and classification procedures. We deal here with the problem of moist area classification in radar image in a fast manner. Experimental results using Optimum-Path Forest and its training set pruning algorithm also provided and discussed. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Semi-supervised learning is applied to classification problems where only a small portion of the data items is labeled. In these cases, the reliability of the labels is a crucial factor, because mislabeled items may propagate wrong labels to a large portion or even the entire data set. This paper aims to address this problem by presenting a graph-based (network-based) semi-supervised learning method, specifically designed to handle data sets with mislabeled samples. The method uses teams of walking particles, with competitive and cooperative behavior, for label propagation in the network constructed from the input data set. The proposed model is nature-inspired and it incorporates some features to make it robust to a considerable amount of mislabeled data items. Computer simulations show the performance of the method in the presence of different percentage of mislabeled data, in networks of different sizes and average node degree. Importantly, these simulations reveals the existence of the critical points of the mislabeled subset size, below which the network is free of wrong label contamination, but above which the mislabeled samples start to propagate their labels to the rest of the network. Moreover, numerical comparisons have been made among the proposed method and other representative graph-based semi-supervised learning methods using both artificial and real-world data sets. Interestingly, the proposed method has increasing better performance than the others as the percentage of mislabeled samples is getting larger. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The efficiency in image classification tasks can be improved using combined information provided by several sources, such as shape, color, and texture visual properties. Although many works proposed to combine different feature vectors, we model the descriptor combination as an optimization problem to be addressed by evolutionary-based techniques, which compute distances between samples that maximize their separability in the feature space. The robustness of the proposed technique is assessed by the Optimum-Path Forest classifier. Experiments showed that the proposed methodology can outperform individual information provided by single descriptors in well-known public datasets. © 2012 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Predicting and mapping productivity areas allows crop producers to improve their planning of agricultural activities. The primary aims of this work were the identification and mapping of specific management areas allowing coffee bean quality to be predicted from soil attributes and their relationships to relief. The study area was located in the Southeast of the Minas Gerais state, Brazil. A grid containing a total of 145 uniformly spaced nodes 50 m apart was established over an area of 31. 7 ha from which samples were collected at depths of 0. 00-0. 20 m in order to determine physical and chemical attributes of the soil. These data were analysed in conjunction with plant attributes including production, proportion of beans retained by different sieves and drink quality. The results of principal component analysis (PCA) in combination with geostatistical data showed the attributes clay content and available iron to be the best choices for identifying four crop production environments. Environment A, which exhibited high clay and available iron contents, and low pH and base saturation, was that providing the highest yield (30. 4l ha-1) and best coffee beverage quality (61 sacks ha-1). Based on the results, we believe that multivariate analysis, geostatistics and the soil-relief relationships contained in the digital elevation model (DEM) can be effectively used in combination for the hybrid mapping of areas of varying suitability for coffee production. © 2012 Springer Science+Business Media New York.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we show a local-in-time existence result for the 3D micropolar fluid system in the framework of Besov-Morrey spaces. The initial data class is larger than the previous ones and contains strongly singular functions and measures. © 2013 Springer Basel.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important tool for the heart disease diagnosis is the analysis of electrocardiogram (ECG) signals, since the non-invasive nature and simplicity of the ECG exam. According to the application, ECG data analysis consists of steps such as preprocessing, segmentation, feature extraction and classification aiming to detect cardiac arrhythmias (i.e.; cardiac rhythm abnormalities). Aiming to made a fast and accurate cardiac arrhythmia signal classification process, we apply and analyze a recent and robust supervised graph-based pattern recognition technique, the optimum-path forest (OPF) classifier. To the best of our knowledge, it is the first time that OPF classifier is used to the ECG heartbeat signal classification task. We then compare the performance (in terms of training and testing time, accuracy, specificity, and sensitivity) of the OPF classifier to the ones of other three well-known expert system classifiers, i.e.; support vector machine (SVM), Bayesian and multilayer artificial neural network (MLP), using features extracted from six main approaches considered in literature for ECG arrhythmia analysis. In our experiments, we use the MIT-BIH Arrhythmia Database and the evaluation protocol recommended by The Association for the Advancement of Medical Instrumentation. A discussion on the obtained results shows that OPF classifier presents a robust performance, i.e.; there is no need for parameter setup, as well as a high accuracy at an extremely low computational cost. Moreover, in average, the OPF classifier yielded greater performance than the MLP and SVM classifiers in terms of classification time and accuracy, and to produce quite similar performance to the Bayesian classifier, showing to be a promising technique for ECG signal analysis. © 2012 Elsevier Ltd. All rights reserved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The traditional characteristics and challenges for organizing and searching information on the World Wide Web are outlined and reviewed. The classification features of two of these methods, such as Google, in the case of automated search engines, and Yahoo! Directory, in the case of subject directories are analyzed. Recent advances in the Semantic Web, particularly the growing application of ontologies and Linked Data are also reviewed. Finally, some problems and prospects related to the use of classification and indexing on the World Wide Web are discussed, emphasizing the need of rethinking the role of classification in the organization of these resources and outlining the possibilities of applying Ranganathan's facet theories of classification.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)