881 resultados para Landmark-based spectral clustering
Resumo:
We present a new framework for large-scale data clustering. The main idea is to modify functional dimensionality reduction techniques to directly optimize over discrete labels using stochastic gradient descent. Compared to methods like spectral clustering our approach solves a single optimization problem, rather than an ad-hoc two-stage optimization approach, does not require a matrix inversion, can easily encode prior knowledge in the set of implementable functions, and does not have an ?out-of-sample? problem. Experimental results on both artificial and real-world datasets show the usefulness of our approach.
Resumo:
Magnetic Resonance Imaging (MRI) is a multi sequence medical imaging technique in which stacks of images are acquired with different tissue contrasts. Simultaneous observation and quantitative analysis of normal brain tissues and small abnormalities from these large numbers of different sequences is a great challenge in clinical applications. Multispectral MRI analysis can simplify the job considerably by combining unlimited number of available co-registered sequences in a single suite. However, poor performance of the multispectral system with conventional image classification and segmentation methods makes it inappropriate for clinical analysis. Recent works in multispectral brain MRI analysis attempted to resolve this issue by improved feature extraction approaches, such as transform based methods, fuzzy approaches, algebraic techniques and so forth. Transform based feature extraction methods like Independent Component Analysis (ICA) and its extensions have been effectively used in recent studies to improve the performance of multispectral brain MRI analysis. However, these global transforms were found to be inefficient and inconsistent in identifying less frequently occurred features like small lesions, from large amount of MR data. The present thesis focuses on the improvement in ICA based feature extraction techniques to enhance the performance of multispectral brain MRI analysis. Methods using spectral clustering and wavelet transforms are proposed to resolve the inefficiency of ICA in identifying small abnormalities, and problems due to ICA over-completeness. Effectiveness of the new methods in brain tissue classification and segmentation is confirmed by a detailed quantitative and qualitative analysis with synthetic and clinical, normal and abnormal, data. In comparison to conventional classification techniques, proposed algorithms provide better performance in classification of normal brain tissues and significant small abnormalities.
Resumo:
n this paper we present a novel hybrid approach for multimodal medical image registration based on diffeomorphic demons. Diffeomorphic demons have proven to be a robust and efficient way for intensity-based image registration. A very recent extension even allows to use mutual information (MI) as a similarity measure to registration multimodal images. However, due to the intensity correspondence uncertainty existing in some anatomical parts, it is difficult for a purely intensity-based algorithm to solve the registration problem. Therefore, we propose to combine the resulting transformations from both intensity-based and landmark-based methods for multimodal non-rigid registration based on diffeomorphic demons. Several experiments on different types of MR images were conducted, for which we show that a better anatomical correspondence between the images can be obtained using the hybrid approach than using either intensity information or landmarks alone.
Resumo:
We consider the problem of fitting a union of subspaces to a collection of data points drawn from one or more subspaces and corrupted by noise and/or gross errors. We pose this problem as a non-convex optimization problem, where the goal is to decompose the corrupted data matrix as the sum of a clean and self-expressive dictionary plus a matrix of noise and/or gross errors. By self-expressive we mean a dictionary whose atoms can be expressed as linear combinations of themselves with low-rank coefficients. In the case of noisy data, our key contribution is to show that this non-convex matrix decomposition problem can be solved in closed form from the SVD of the noisy data matrix. The solution involves a novel polynomial thresholding operator on the singular values of the data matrix, which requires minimal shrinkage. For one subspace, a particular case of our framework leads to classical PCA, which requires no shrinkage. For multiple subspaces, the low-rank coefficients obtained by our framework can be used to construct a data affinity matrix from which the clustering of the data according to the subspaces can be obtained by spectral clustering. In the case of data corrupted by gross errors, we solve the problem using an alternating minimization approach, which combines our polynomial thresholding operator with the more traditional shrinkage-thresholding operator. Experiments on motion segmentation and face clustering show that our framework performs on par with state-of-the-art techniques at a reduced computational cost.
Resumo:
We have undertaken two-dimensional gel electrophoresis proteomic profiling on a series of cell lines with different recombinant antibody production rates. Due to the nature of gel-based experiments not all protein spots are detected across all samples in an experiment, and hence datasets are invariably incomplete. New approaches are therefore required for the analysis of such graduated datasets. We approached this problem in two ways. Firstly, we applied a missing value imputation technique to calculate missing data points. Secondly, we combined a singular value decomposition based hierarchical clustering with the expression variability test to identify protein spots whose expression correlates with increased antibody production. The results have shown that while imputation of missing data was a useful method to improve the statistical analysis of such data sets, this was of limited use in differentiating between the samples investigated, and highlighted a small number of candidate proteins for further investigation. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM) and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. Finally, we parcellate all subjects data with a spectral clustering of the PLS latent variables. We present results of the application of the proposed method on both single-subject and multi-subject fMRI datasets. Preliminary experimental results, evaluated with intra-parcel variance of GLM t-values and PLS derived t-values, indicate that this data-driven approach offers improvement in terms of parcellation accuracy over GLM based techniques.
Resumo:
The study of the user scheduling problem in a Low Earth Orbit (LEO) Multi-User MIMO system is the objective of this thesis. With the application of cutting-edge digital beamforming algorithms, a LEO satellite with an antenna array and a large number of antenna elements can provide service to many user terminals (UTs) in full frequency reuse (FFR) schemes. Since the number of UTs on-ground are many more than the transmit antennas on the satellite, user scheduling is necessary. Scheduling can be accomplished by grouping users into different clusters: users within the same cluster are multiplexed and served together via Space Division Multiple Access (SDMA), i.e., digital beamforming or Multi-User MIMO techniques; the different clusters of users are then served on different time slots via Time Division Multiple Access (TDMA). The design of an optimal user grouping strategy is known to be an NP-complete problem which can be solved only through exhaustive search. In this thesis, we provide a graph-based user scheduling and feed space beamforming architecture for the downlink with the aim of reducing user inter-beam interference. The main idea is based on clustering users whose pairwise great-circle distance is as large as possible. First, we create a graph where the users represent the vertices, whereas an edge in the graph between 2 users exists if their great-circle distance is above a certain threshold. In the second step, we develop a low complex greedy user clustering technique and we iteratively search for the maximum clique in the graph, i.e., the largest fully connected subgraph in the graph. Finally, by using the 3 aforementioned power normalization techniques, a Minimum Mean Square Error (MMSE) beamforming matrix is deployed on a cluster basis. The suggested scheduling system is compared with a position-based scheduler, which generates a beam lattice on the ground and randomly selects one user per beam to form a cluster.
Resumo:
Many organisations need to extract useful information from huge amounts of movement data. One example is found in maritime transportation, where the automated identification of a diverse range of traffic routes is a key management issue for improving the maintenance of ports and ocean routes, and accelerating ship traffic. This paper addresses, in a first stage, the research challenge of developing an approach for the automated identification of traffic routes based on clustering motion vectors rather than reconstructed trajectories. The immediate benefit of the proposed approach is to avoid the reconstruction of trajectories in terms of their geometric shape of the path, their position in space, their life span, and changes of speed, direction and other attributes over time. For clustering the moving objects, an adapted version of the Shared Nearest Neighbour algorithm is used. The motion vectors, with a position and a direction, are analysed in order to identify clusters of vectors that are moving towards the same direction. These clusters represent traffic routes and the preliminary results have shown to be promising for the automated identification of traffic routes with different shapes and densities, as well as for handling noise data.
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.
Resumo:
Demand response can play a very relevant role in the context of power systems with an intensive use of distributed energy resources, from which renewable intermittent sources are a significant part. More active consumers participation can help improving the system reliability and decrease or defer the required investments. Demand response adequate use and management is even more important in competitive electricity markets. However, experience shows difficulties to make demand response be adequately used in this context, showing the need of research work in this area. The most important difficulties seem to be caused by inadequate business models and by inadequate demand response programs management. This paper contributes to developing methodologies and a computational infrastructure able to provide the involved players with adequate decision support on demand response programs and contracts design and use. The presented work uses DemSi, a demand response simulator that has been developed by the authors to simulate demand response actions and programs, which includes realistic power system simulation. It includes an optimization module for the application of demand response programs and contracts using deterministic and metaheuristic approaches. The proposed methodology is an important improvement in the simulator while providing adequate tools for demand response programs adoption by the involved players. A machine learning method based on clustering and classification techniques, resulting in a rule base concerning DR programs and contracts use, is also used. A case study concerning the use of demand response in an incident situation is presented.
Resumo:
Genetically engineered bioreporters are an excellent complement to traditional methods of chemical analysis. The application of fluorescence flow cytometry to detection of bioreporter response enables rapid and efficient characterization of bacterial bioreporter population response on a single-cell basis. In the present study, intrapopulation response variability was used to obtain higher analytical sensitivity and precision. We have analyzed flow cytometric data for an arsenic-sensitive bacterial bioreporter using an artificial neural network-based adaptive clustering approach (a single-layer perceptron model). Results for this approach are far superior to other methods that we have applied to this fluorescent bioreporter (e.g., the arsenic detection limit is 0.01 microM, substantially lower than for other detection methods/algorithms). The approach is highly efficient computationally and can be implemented on a real-time basis, thus having potential for future development of high-throughput screening applications.
Resumo:
BACKGROUND: Transgressive segregation describes the occurrence of novel phenotypes in hybrids with extreme trait values not observed in either parental species. A previously experimentally untested prediction is that the amount of transgression increases with the genetic distance between hybridizing species. This follows from QTL studies suggesting that transgression is most commonly due to complementary gene action or epistasis, which become more frequent at larger genetic distances. This is because the number of QTLs fixed for alleles with opposing signs in different species should increase with time since speciation provided that speciation is not driven by disruptive selection. We measured the amount of transgression occurring in hybrids of cichlid fish bred from species pairs with gradually increasing genetic distances and varying phenotypic similarity. Transgression in multi-trait shape phenotypes was quantified using landmark-based geometric morphometric methods. RESULTS: We found that genetic distance explained 52% and 78% of the variation in transgression frequency in F1 and F2 hybrids, respectively. Confirming theoretical predictions, transgression when measured in F2 hybrids, increased linearly with genetic distance between hybridizing species. Phenotypic similarity of species on the other hand was not related to the amount of transgression. CONCLUSION: The commonness and ease with which novel phenotypes are produced in cichlid hybrids between unrelated species has important implications for the interaction of hybridization with adaptation and speciation. Hybridization may generate new genotypes with adaptive potential that did not reside as standing genetic variation in either parental population, potentially enhancing a population's responsiveness to selection. Our results make it conceivable that hybridization contributed to the rapid rates of phenotypic evolution in the large and rapid adaptive radiations of haplochromine cichlids.
Resumo:
The optimization of the pilot overhead in single-user wireless fading channels is investigated, and the dependence of this overhead on various system parameters of interest (e.g., fading rate, signal-to-noise ratio) is quantified. The achievable pilot-based spectral efficiency is expanded with respect to the fading rate about the no-fading point, which leads to an accurate order expansion for the pilot overhead. This expansion identifies that the pilot overhead, as well as the spectral efficiency penalty with respect to a reference system with genie-aided CSI (channel state information) at the receiver, depend on the square root of the normalized Doppler frequency. It is also shown that the widely-used block fading model is a special case of more accurate continuous fading models in terms of the achievable pilot-based spectral efficiency. Furthermore, it is established that the overhead optimization for multiantenna systems is effectively the same as for single-antenna systems with the normalized Doppler frequency multiplied by the number of transmit antennas.
Resumo:
The optimization of the pilot overhead in wireless fading channels is investigated, and the dependence of this overhead on various system parameters of interest (e.g., fading rate, signal-to-noise ratio) is quantified. The achievable pilot-based spectral efficiency is expanded with respect to the fading rate about the no-fading point, which leads to an accurate order expansion for the pilot overhead. This expansion identifies that the pilot overhead, as well as the spectral efficiency penalty with respect to a reference system with genie-aided CSI (channel state information) at the receiver, depend on the square root of the normalized Doppler frequency. It is also shown that the widely-usedblock fading model is a special case of more accurate continuous fading models in terms of the achievable pilot-based spectral efficiency. Furthermore, it is established that the overhead optimization for multiantenna systems is effectively the same as for single-antenna systems with thenormalized Doppler frequency multiplied by the number of transmit antennas.
Resumo:
Colorectal cancer (CRC) is a major cause of cancer mortality. Whereas some patients respond well to therapy, others do not, and thus more precise, individualized treatment strategies are needed. To that end, we analyzed gene expression profiles from 1,290 CRC tumors using consensus-based unsupervised clustering. The resultant clusters were then associated with therapeutic response data to the epidermal growth factor receptor-targeted drug cetuximab in 80 patients. The results of these studies define six clinically relevant CRC subtypes. Each subtype shares similarities to distinct cell types within the normal colon crypt and shows differing degrees of 'stemness' and Wnt signaling. Subtype-specific gene signatures are proposed to identify these subtypes. Three subtypes have markedly better disease-free survival (DFS) after surgical resection, suggesting these patients might be spared from the adverse effects of chemotherapy when they have localized disease. One of these three subtypes, identified by filamin A expression, does not respond to cetuximab but may respond to cMET receptor tyrosine kinase inhibitors in the metastatic setting. Two other subtypes, with poor and intermediate DFS, associate with improved response to the chemotherapy regimen FOLFIRI in adjuvant or metastatic settings. Development of clinically deployable assays for these subtypes and of subtype-specific therapies may contribute to more effective management of this challenging disease.