24 resultados para Optical pattern recognition Data processing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the estimation of surfaces from a set of 3D points using the unified framework described in [1]. This framework proposes the use of competitive learning for curve estimation, i.e., a set of points is defined on a deformable curve and they all compete to represent the available data. This paper extends the use of the unified framework to surface estimation. It o shown that competitive learning performes better than snakes, improving the model performance in the presence of concavities and allowing to desciminate close surfaces. The proposed model is evaluated in this paper using syntheticdata and medical images (MRI and ultrasound images).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In hyperspectral imagery a pixel typically consists mixture of spectral signatures of reference substances, also called endmembers. Linear spectral mixture analysis, or linear unmixing, aims at estimating the number of endmembers, their spectral signatures, and their abundance fractions. This paper proposes a framework for hyperpsectral unmixing. A blind method (SISAL) is used for the estimation of the unknown endmember signature and their abundance fractions. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. The proposed framework simultaneously estimates the number of endmembers present in the hyperspectral image by an algorithm based on the minimum description length (MDL) principle. Experimental results on both synthetic and real hyperspectral data demonstrate the effectiveness of the proposed algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the estimation of object boundaries from a set of 3D points. An extension of the constrained clustering algorithm developed by Abrantes and Marques in the context of edge linking is presented. The object surface is approximated using rectangular meshes and simplex nets. Centroid-based forces are used for attracting the model nodes towards the data, using competitive learning methods. It is shown that competitive learning improves the model performance in the presence of concavities and allows to discriminate close surfaces. The proposed model is evaluated using synthetic data and medical images (MRI and ultrasound images).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is an elaboration of the simplex identification via split augmented Lagrangian (SISAL) algorithm (Bioucas-Dias, 2009) to blindly unmix hyperspectral data. SISAL is a linear hyperspectral unmixing method of the minimum volume class. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. With respect to SISAL, we introduce a dimensionality estimation method based on the minimum description length (MDL) principle. The effectiveness of the proposed algorithm is illustrated with simulated and real data.