Biblioteca Digital

24 resultados para Optical pattern recognition Data processing

Data analytics in the cloud with flexible mapreduced workflows

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data analytic applications are characterized by large data sets that are subject to a series of processing phases. Some of these phases are executed sequentially but others can be executed concurrently or in parallel on clusters, grids or clouds. The MapReduce programming model has been applied to process large data sets in cluster and cloud environments. For developing an application using MapReduce there is a need to install/configure/access specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. It would be desirable to provide more flexibility in adjusting such configurations according to the application characteristics. Furthermore the composition of the multiple phases of a data analytic application requires the specification of all the phases and their orchestration. The original MapReduce model and environment lacks flexible support for such configuration and composition. Recognizing that scientific workflows have been successfully applied to modeling complex applications, this paper describes our experiments on implementing MapReduce as subworkflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). A text mining data analytic application is modeled as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. As in typical MapReduce environments, the end user only needs to define the application algorithms for input data processing and for the map and reduce functions. In the paper we present experimental results when using the AWARD framework to execute MapReduce workflows deployed over multiple Amazon EC2 (Elastic Compute Cloud) instances.

Veja mais

Feature Discretization with Relevance and Mutual Information Criteria

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.

Veja mais

Estimation of 3D shapes using active surface models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the estimation of surfaces from a set of 3D points using the unified framework described in [1]. This framework proposes the use of competitive learning for curve estimation, i.e., a set of points is defined on a deformable curve and they all compete to represent the available data. This paper extends the use of the unified framework to surface estimation. It o shown that competitive learning performes better than snakes, improving the model performance in the presence of concavities and allowing to desciminate close surfaces. The proposed model is evaluated in this paper using syntheticdata and medical images (MRI and ultrasound images).

Veja mais

Estimation of signal subspace on hyperspectral data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.

Veja mais

A MAP approach to evidence accumulation clustering

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.

Veja mais

Hyperspectral imagery framework for unmixing and dimensionality estimation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In hyperspectral imagery a pixel typically consists mixture of spectral signatures of reference substances, also called endmembers. Linear spectral mixture analysis, or linear unmixing, aims at estimating the number of endmembers, their spectral signatures, and their abundance fractions. This paper proposes a framework for hyperpsectral unmixing. A blind method (SISAL) is used for the estimation of the unknown endmember signature and their abundance fractions. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. The proposed framework simultaneously estimates the number of endmembers present in the hyperspectral image by an algorithm based on the minimum description length (MDL) principle. Experimental results on both synthetic and real hyperspectral data demonstrate the effectiveness of the proposed algorithm.

Veja mais

Text classification using compression-based dissimilarity measures

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Arguably, the most difficult task in text classification is to choose an appropriate set of features that allows machine learning algorithms to provide accurate classification. Most state-of-the-art techniques for this task involve careful feature engineering and a pre-processing stage, which may be too expensive in the emerging context of massive collections of electronic texts. In this paper, we propose efficient methods for text classification based on information-theoretic dissimilarity measures, which are used to define dissimilarity-based representations. These methods dispense with any feature design or engineering, by mapping texts into a feature space using universal dissimilarity measures; in this space, classical classifiers (e.g. nearest neighbor or support vector machines) can then be used. The reported experimental evaluation of the proposed methods, on sentiment polarity analysis and authorship attribution problems, reveals that it approximates, sometimes even outperforms previous state-of-the-art techniques, despite being much simpler, in the sense that they do not require any text pre-processing or feature engineering.

Veja mais

An algoritmh for the estimation of 3D surfaces using competitive learning

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper addresses the estimation of object boundaries from a set of 3D points. An extension of the constrained clustering algorithm developed by Abrantes and Marques in the context of edge linking is presented. The object surface is approximated using rectangular meshes and simplex nets. Centroid-based forces are used for attracting the model nodes towards the data, using competitive learning methods. It is shown that competitive learning improves the model performance in the presence of concavities and allows to discriminate close surfaces. The proposed model is evaluated using synthetic data and medical images (MRI and ultrasound images).

Veja mais

Hyperspectral unmixing with simultaneous dimensionality estimation

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper is an elaboration of the simplex identification via split augmented Lagrangian (SISAL) algorithm (Bioucas-Dias, 2009) to blindly unmix hyperspectral data. SISAL is a linear hyperspectral unmixing method of the minimum volume class. This method solve a non-convex problem by a sequence of augmented Lagrangian optimizations, where the positivity constraints, forcing the spectral vectors to belong to the convex hull of the endmember signatures, are replaced by soft constraints. With respect to SISAL, we introduce a dimensionality estimation method based on the minimum description length (MDL) principle. The effectiveness of the proposed algorithm is illustrated with simulated and real data.

Veja mais

24 resultados para Optical pattern recognition Data processing

Filtro por publicador