Biblioteca Digital

29 resultados para data gathering algorithm

Feature selection for clustering categorical data with an embedded modelling approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Veja mais

Categorical data clustering using a minimum message length criterion

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Veja mais

Determining the number of clusters in categorical data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Cluster analysis for categorical data has been an active area of research. A well-known problem in this area is the determination of the number of clusters, which is unknown and must be inferred from the data. In order to estimate the number of clusters, one often resorts to information criteria, such as BIC (Bayesian information criterion), MML (minimum message length, proposed by Wallace and Boulton, 1968), and ICL (integrated classification likelihood). In this work, we adopt the approach developed by Figueiredo and Jain (2002) for clustering continuous data. They use an MML criterion to select the number of clusters and a variant of the EM algorithm to estimate the model parameters. This EM variant seamlessly integrates model estimation and selection in a single algorithm. For clustering categorical data, we assume a finite mixture of multinomial distributions and implement a new EM algorithm, following a previous version (Silvestre et al., 2008). Results obtained with synthetic datasets are encouraging. The main advantage of the proposed approach, when compared to the above referred criteria, is the speed of execution, which is especially relevant when dealing with large data sets.

Veja mais

Coding mode decision algorithm for binary descriptor coding

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In visual sensor networks, local feature descriptors can be computed at the sensing nodes, which work collaboratively on the data obtained to make an efficient visual analysis. In fact, with a minimal amount of computational effort, the detection and extraction of local features, such as binary descriptors, can provide a reliable and compact image representation. In this paper, it is proposed to extract and code binary descriptors to meet the energy and bandwidth constraints at each sensing node. The major contribution is a binary descriptor coding technique that exploits the correlation using two different coding modes: Intra, which exploits the correlation between the elements that compose a descriptor; and Inter, which exploits the correlation between descriptors of the same image. The experimental results show bitrate savings up to 35% without any impact in the performance efficiency of the image retrieval task. © 2014 EURASIP.

Veja mais

The impact of Sinogram-Affirmed Iterative Reconstruction on patient dose and image quality compared to filtered back projection: a narrative review

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Objective: Summarize all relevant findings in published literature regarding the potential dose reduction related to image quality using Sinogram-Affirmed Iterative Reconstruction (SAFIRE) compared to Filtered Back Projection (FBP). Background: Computed Tomography (CT) is one of the most used radiographic modalities in clinical practice providing high spatial and contrast resolution. However it also delivers a relatively high radiation dose to the patient. Reconstructing raw-data using Iterative Reconstruction (IR) algorithms has the potential to iteratively reduce image noise while maintaining or improving image quality of low dose standard FBP reconstructions. Nevertheless, long reconstruction times made IR unpractical for clinical use until recently. Siemens Medical developed a new IR algorithm called SAFIRE, which uses up to 5 different strength levels, and poses an alternative to the conventional IR with a significant reconstruction time reduction. Methods: MEDLINE, ScienceDirect and CINAHL databases were used for gathering literature. Eleven articles were included in this review (from 2012 to July 2014). Discussion: This narrative review summarizes the results of eleven articles (using studies on both patients and phantoms) and describes SAFIRE strengths for noise reduction in low dose acquisitions while providing acceptable image quality. Conclusion: Even though the results differ slightly, the literature gathered for this review suggests that the dose in current CT protocols can be reduced at least 50% while maintaining or improving image quality. There is however a lack of literature concerning paediatric population (with increased radiation sensitivity). Further studies should also assess the impact of SAFIRE on diagnostic accuracy.

Veja mais

Efficient feature selection filters for high-dimensional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.

Veja mais

GPU implementation of a hyperspectral coded aperture algorithm for compressive sensing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a new parallel implementation of a previously hyperspectral coded aperture (HYCA) algorithm for compressive sensing on graphics processing units (GPUs). HYCA method combines the ideas of spectral unmixing and compressive sensing exploiting the high spatial correlation that can be observed in the data and the generally low number of endmembers needed in order to explain the data. The proposed implementation exploits the GPU architecture at low level, thus taking full advantage of the computational power of GPUs using shared memory and coalesced accesses to memory. The proposed algorithm is evaluated not only in terms of reconstruction error but also in terms of computational performance using two different GPU architectures by NVIDIA: GeForce GTX 590 and GeForce GTX TITAN. Experimental results using real data reveals signficant speedups up with regards to serial implementation.

Veja mais

Velocity estimation of moving ships using C-band SLC SAR data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new algorithm for the velocity vector estimation of moving ships using Single Look Complex (SLC) SAR data in strip map acquisition mode is proposed. The algorithm exploits both amplitude and phase information of the Doppler decompressed data spectrum, with the aim to estimate both the azimuth antenna pattern and the backscattering coefficient as function of the look angle. The antenna pattern estimation provides information about the target velocity; the backscattering coefficient can be used for vessel classification. The range velocity is retrieved in the slow time frequency domain by estimating the antenna pattern effects induced by the target motion, while the azimuth velocity is calculated by the estimated range velocity and the ship orientation. Finally, the algorithm is tested on simulated SAR SLC data.

Veja mais

Fast unsupervised technique for extraction of endmembers spectra from hyperspectral data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the most challenging task underlying many hyperspectral imagery applications is the linear unmixing. The key to linear unmixing is to find the set of reference substances, also called endmembers, that are representative of a given scene. This paper presents the vertex component analysis (VCA) a new method to unmix linear mixtures of hyperspectral sources. The algorithm is unsupervised and exploits a simple geometric fact: endmembers are vertices of a simplex. The algorithm complexity, measured in floating points operations, is O (n), where n is the sample size. The effectiveness of the proposed scheme is illustrated using simulated data.

Veja mais

Unmixing hyperspectral data: independent and dependent component analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.

Veja mais

Fast unsupervised extraction of endmembers spectra from hyperspectral data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Linear unmixing decomposes an hyperspectral image into a collection of re ectance spectra, called endmember signatures, and a set corresponding abundance fractions from the respective spatial coverage. This paper introduces vertex component analysis, an unsupervised algorithm to unmix linear mixtures of hyperpsectral data. VCA exploits the fact that endmembers occupy vertices of a simplex, and assumes the presence of pure pixels in data. VCA performance is illustrated using simulated and real data. VCA competes with state-of-the-art methods with much lower computational complexity.

Veja mais

Hyperspectral unmixing algorithm via dependent component analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces a new method to blindly unmix hyperspectral data, termed dependent component analysis (DECA). This method decomposes a hyperspectral images into a collection of reflectance (or radiance) spectra of the materials present in the scene (endmember signatures) and the corresponding abundance fractions at each pixel. DECA assumes that each pixel is a linear mixture of the endmembers signatures weighted by the correspondent abundance fractions. These abudances are modeled as mixtures of Dirichlet densities, thus enforcing the constraints on abundance fractions imposed by the acquisition process, namely non-negativity and constant sum. The mixing matrix is inferred by a generalized expectation-maximization (GEM) type algorithm. This method overcomes the limitations of unmixing methods based on Independent Component Analysis (ICA) and on geometrical based approaches. The effectiveness of the proposed method is illustrated using simulated data based on U.S.G.S. laboratory spectra and real hyperspectral data collected by the AVIRIS sensor over Cuprite, Nevada.

Veja mais

New developments on VCA unmixing algorithm

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hyperspectral sensors are being developed for remote sensing applications. These sensors produce huge data volumes which require faster processing and analysis tools. Vertex component analysis (VCA) has become a very useful tool to unmix hyperspectral data. It has been successfully used to determine endmembers and unmix large hyperspectral data sets without the use of any a priori knowledge of the constituent spectra. Compared with other geometric-based approaches VCA is an efficient method from the computational point of view. In this paper we introduce new developments for VCA: 1) a new signal subspace identification method (HySime) is applied to infer the signal subspace where the data set live. This step also infers the number of endmembers present in the data set; 2) after the projection of the data set onto the signal subspace, the algorithm iteratively projects the data set onto several directions orthogonal to the subspace spanned by the endmembers already determined. The new endmember signature corresponds to these extreme of the projections. The capability of VCA to unmix large hyperspectral scenes (real or simulated), with low computational complexity, is also illustrated.

Veja mais

A fast parallel hyperspectral coded aperture algorithm for compressive sensing using OpenCL

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we develop a fast implementation of an hyperspectral coded aperture (HYCA) algorithm on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems, which includes a wide variety of devices, from dense multicore systems from major manufactures such as Intel or ARM to new accelerators such as graphics processing units (GPUs), field programmable gate arrays (FPGAs), the Intel Xeon Phi and other custom devices. Our proposed implementation of HYCA significantly reduces its computational cost. Our experiments have been conducted using simulated data and reveal considerable acceleration factors. This kind of implementations with the same descriptive language on different architectures are very important in order to really calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.

Veja mais

29 resultados para data gathering algorithm

Filtro por publicador