881 resultados para Landmark-based spectral clustering
Resumo:
In this paper we present results on the use of a multilayered a-SiC:H heterostructure as a wavelength-division demultiplexing device (WDM) for the visible light spectrum. The WDM device is a glass/ITO/a-SiC:H (p-i-n)/ a-SiC:H(-p) /Si:H(-i)/SiC:H (-n)/ITO heterostructure in which the generated photocurrent at different values of the applied bias can be assigned to the different optical signals. The device was characterized through spectral response measurements, under different electrical bias. Demonstration of the device functionality for WDM applications was done with three different input channels covering wavelengths within the visible range. The recovery of the input channels is explained using the photocurrent spectral dependence on the applied voltage. The influence of the optical power density was also analysed. An electrical model, supported by a numerical simulation explains the device operation. Short range optical communications constitute the major application field, however other applications are also foreseen.
Resumo:
This article is is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Attribution-NonCommercial (CC BY-NC) license lets others remix, tweak, and build upon work non-commercially, and although the new works must also acknowledge & be non-commercial.
Resumo:
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Scheduling of constrained deadline sporadic task systems on multiprocessor platforms is an area which has received much attention in the recent past. It is widely believed that finding an optimal scheduler is hard, and therefore most studies have focused on developing algorithms with good processor utilization bounds. These algorithms can be broadly classified into two categories: partitioned scheduling in which tasks are statically assigned to individual processors, and global scheduling in which each task is allowed to execute on any processor in the platform. In this paper we consider a third, more general, approach called cluster-based scheduling. In this approach each task is statically assigned to a processor cluster, tasks in each cluster are globally scheduled among themselves, and clusters in turn are scheduled on the multiprocessor platform. We develop techniques to support such cluster-based scheduling algorithms, and also consider properties that minimize total processor utilization of individual clusters. In the last part of this paper, we develop new virtual cluster-based scheduling algorithms. For implicit deadline sporadic task systems, we develop an optimal scheduling algorithm that is neither Pfair nor ERfair. We also show that the processor utilization bound of us-edf{m/(2m−1)} can be improved by using virtual clustering. Since neither partitioned nor global strategies dominate over the other, cluster-based scheduling is a natural direction for research towards achieving improved processor utilization bounds.
Resumo:
Biosignals analysis has become widespread, upstaging their typical use in clinical settings. Electrocardiography (ECG) plays a central role in patient monitoring as a diagnosis tool in today's medicine and as an emerging biometric trait. In this paper we adopt a consensus clustering approach for the unsupervised analysis of an ECG-based biometric records. This type of analysis highlights natural groups within the population under investigation, which can be correlated with ground truth information in order to gain more insights about the data. Preliminary results are promising, for meaningful clusters are extracted from the population under analysis. © 2014 EURASIP.
Resumo:
Clustering ensemble methods produce a consensus partition of a set of data points by combining the results of a collection of base clustering algorithms. In the evidence accumulation clustering (EAC) paradigm, the clustering ensemble is transformed into a pairwise co-association matrix, thus avoiding the label correspondence problem, which is intrinsic to other clustering ensemble schemes. In this paper, we propose a consensus clustering approach based on the EAC paradigm, which is not limited to crisp partitions and fully exploits the nature of the co-association matrix. Our solution determines probabilistic assignments of data points to clusters by minimizing a Bregman divergence between the observed co-association frequencies and the corresponding co-occurrence probabilities expressed as functions of the unknown assignments. We additionally propose an optimization algorithm to find a solution under any double-convex Bregman divergence. Experiments on both synthetic and real benchmark data show the effectiveness of the proposed approach.
Resumo:
Reporter genes are routinely used in every laboratory for molecular and cellular biology for studying heterologous gene expression and general cellular biological mechanisms, such as transfection processes. Although well characterized and broadly implemented, reporter genes present serious limitations, either by involving time-consuming procedures or by presenting possible side effects on the expression of the heterologous gene or even in the general cellular metabolism. Fourier transform mid-infrared (FT-MIR) spectroscopy was evaluated to simultaneously analyze in a rapid (minutes) and high-throughput mode (using 96-wells microplates), the transfection efficiency, and the effect of the transfection process on the host cell biochemical composition and metabolism. Semi-adherent HEK and adherent AGS cell lines, transfected with the plasmid pVAX-GFP using Lipofectamine, were used as model systems. Good partial least squares (PLS) models were built to estimate the transfection efficiency, either considering each cell line independently (R 2 ≥ 0.92; RMSECV ≤ 2 %) or simultaneously considering both cell lines (R 2 = 0.90; RMSECV = 2 %). Additionally, the effect of the transfection process on the HEK cell biochemical and metabolic features could be evaluated directly from the FT-IR spectra. Due to the high sensitivity of the technique, it was also possible to discriminate the effect of the transfection process from the transfection reagent on KEK cells, e.g., by the analysis of spectral biomarkers and biochemical and metabolic features. The present results are far beyond what any reporter gene assay or other specific probe can offer for these purposes.
Resumo:
Human mesenchymal stem/stromal cells (MSCs) have received considerable attention in the field of cell-based therapies due to their high differentiation potential and ability to modulate immune responses. However, since these cells can only be isolated in very low quantities, successful realization of these therapies requires MSCs ex-vivo expansion to achieve relevant cell doses. The metabolic activity is one of the parameters often monitored during MSCs cultivation by using expensive multi-analytical methods, some of them time-consuming. The present work evaluates the use of mid-infrared (MIR) spectroscopy, through rapid and economic high-throughput analyses associated to multivariate data analysis, to monitor three different MSCs cultivation runs conducted in spinner flasks, under xeno-free culture conditions, which differ in the type of microcarriers used and the culture feeding strategy applied. After evaluating diverse spectral preprocessing techniques, the optimized partial least square (PLS) regression models based on the MIR spectra to estimate the glucose, lactate and ammonia concentrations yielded high coefficients of determination (R2 ≥ 0.98, ≥0.98, and ≥0.94, respectively) and low prediction errors (RMSECV ≤ 4.7%, ≤4.4% and ≤5.7%, respectively). Besides PLS models valid for specific expansion protocols, a robust model simultaneously valid for the three processes was also built for predicting glucose, lactate and ammonia, yielding a R2 of 0.95, 0.97 and 0.86, and a RMSECV of 0.33, 0.57, and 0.09 mM, respectively. Therefore, MIR spectroscopy combined with multivariate data analysis represents a promising tool for both optimization and control of MSCs expansion processes.
Resumo:
Hyperspectral imaging has become one of the main topics in remote sensing applications, which comprise hundreds of spectral bands at different (almost contiguous) wavelength channels over the same area generating large data volumes comprising several GBs per flight. This high spectral resolution can be used for object detection and for discriminate between different objects based on their spectral characteristics. One of the main problems involved in hyperspectral analysis is the presence of mixed pixels, which arise when the spacial resolution of the sensor is not able to separate spectrally distinct materials. Spectral unmixing is one of the most important task for hyperspectral data exploitation. However, the unmixing algorithms can be computationally very expensive, and even high power consuming, which compromises the use in applications under on-board constraints. In recent years, graphics processing units (GPUs) have evolved into highly parallel and programmable systems. Specifically, several hyperspectral imaging algorithms have shown to be able to benefit from this hardware taking advantage of the extremely high floating-point processing performance, compact size, huge memory bandwidth, and relatively low cost of these units, which make them appealing for onboard data processing. In this paper, we propose a parallel implementation of an augmented Lagragian based method for unsupervised hyperspectral linear unmixing on GPUs using CUDA. The method called simplex identification via split augmented Lagrangian (SISAL) aims to identify the endmembers of a scene, i.e., is able to unmix hyperspectral data sets in which the pure pixel assumption is violated. The efficient implementation of SISAL method presented in this work exploits the GPU architecture at low level, using shared memory and coalesced accesses to memory.
Resumo:
The Evidence Accumulation Clustering (EAC) paradigm is a clustering ensemble method which derives a consensus partition from a collection of base clusterings obtained using different algorithms. It collects from the partitions in the ensemble a set of pairwise observations about the co-occurrence of objects in a same cluster and it uses these co-occurrence statistics to derive a similarity matrix, referred to as co-association matrix. The Probabilistic Evidence Accumulation for Clustering Ensembles (PEACE) algorithm is a principled approach for the extraction of a consensus clustering from the observations encoded in the co-association matrix based on a probabilistic model for the co-association matrix parameterized by the unknown assignments of objects to clusters. In this paper we extend the PEACE algorithm by deriving a consensus solution according to a MAP approach with Dirichlet priors defined for the unknown probabilistic cluster assignments. In particular, we study the positive regularization effect of Dirichlet priors on the final consensus solution with both synthetic and real benchmark data.
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
The main result of this work is a new criterion for the formation of good clusters in a graph. This criterion uses a new dynamical invariant, the performance of a clustering, that characterizes the quality of the formation of clusters. We prove that the growth of the dynamical invariant, the network topological entropy, has the effect of worsening the quality of a clustering, in a process of cluster formation by the successive removal of edges. Several examples of clustering on the same network are presented to compare the behavior of other parameters such as network topological entropy, conductance, coefficient of clustering and performance of a clustering with the number of edges in a process of clustering by successive removal.
Resumo:
In the present paper we compare clustering solutions using indices of paired agreement. We propose a new method - IADJUST - to correct indices of paired agreement, excluding agreement by chance. This new method overcomes previous limitations known in the literature as it permits the correction of any index. We illustrate its use in external clustering validation, to measure the accordance between clusters and an a priori known structure. The adjusted indices are intended to provide a realistic measure of clustering performance that excludes agreement by chance with ground truth. We use simulated data sets, under a range of scenarios - considering diverse numbers of clusters, clusters overlaps and balances - to discuss the pertinence and the precision of our proposal. Precision is established based on comparisons with the analytical approach for correction specific indices that can be corrected in this way are used for this purpose. The pertinence of the proposed correction is discussed when making a detailed comparison between the performance of two classical clustering approaches, namely Expectation-Maximization (EM) and K-Means (KM) algorithms. Eight indices of paired agreement are studied and new corrected indices are obtained.