819 resultados para Hybrid clustering algorithm
Resumo:
This paper presents a new hierarchical clustering algorithm for crop stage classification using hyperspectral satellite image. Amongst the multiple benefits and uses of remote sensing, one of the important application is to solve the problem of crop stage classification. Modern commercial imaging satellites, owing to their large volume of satellite imagery, offer greater opportunities for automated image analysis. Hence, we propose a unsupervised algorithm namely Hierarchical Artificial Immune System (HAIS) of two steps: splitting the cluster centers and merging them. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The classification results have been compared with K-means and Artificial Immune System algorithms. From the results obtained, we conclude that the proposed hierarchical clustering algorithm is accurate.
Resumo:
In this paper, we present a methodology for identifying best features from a large feature space. In high dimensional feature space nearest neighbor search is meaningless. In this feature space we see quality and performance issue with nearest neighbor search. Many data mining algorithms use nearest neighbor search. So instead of doing nearest neighbor search using all the features we need to select relevant features. We propose feature selection using Non-negative Matrix Factorization(NMF) and its application to nearest neighbor search. Recent clustering algorithm based on Locally Consistent Concept Factorization(LCCF) shows better quality of document clustering by using local geometrical and discriminating structure of the data. By using our feature selection method we have shown further improvement of performance in the clustering.
Resumo:
This paper primarily intends to develop a GIS (geographical information system)-based data mining approach for optimally selecting the locations and determining installed capacities for setting up distributed biomass power generation systems in the context of decentralized energy planning for rural regions. The optimal locations within a cluster of villages are obtained by matching the installed capacity needed with the demand for power, minimizing the cost of transportation of biomass from dispersed sources to power generation system, and cost of distribution of electricity from the power generation system to demand centers or villages. The methodology was validated by using it for developing an optimal plan for implementing distributed biomass-based power systems for meeting the rural electricity needs of Tumkur district in India consisting of 2700 villages. The approach uses a k-medoid clustering algorithm to divide the total region into clusters of villages and locate biomass power generation systems at the medoids. The optimal value of k is determined iteratively by running the algorithm for the entire search space for different values of k along with demand-supply matching constraints. The optimal value of the k is chosen such that it minimizes the total cost of system installation, costs of transportation of biomass, and transmission and distribution. A smaller region, consisting of 293 villages was selected to study the sensitivity of the results to varying demand and supply parameters. The results of clustering are represented on a GIS map for the region.
Resumo:
The presence of a large number of spectral bands in the hyperspectral images increases the capability to distinguish between various physical structures. However, they suffer from the high dimensionality of the data. Hence, the processing of hyperspectral images is applied in two stages: dimensionality reduction and unsupervised classification techniques. The high dimensionality of the data has been reduced with the help of Principal Component Analysis (PCA). The selected dimensions are classified using Niche Hierarchical Artificial Immune System (NHAIS). The NHAIS combines the splitting method to search for the optimal cluster centers using niching procedure and the merging method is used to group the data points based on majority voting. Results are presented for two hyperspectral images namely EO-1 Hyperion image and Indian pines image. A performance comparison of this proposed hierarchical clustering algorithm with the earlier three unsupervised algorithms is presented. From the results obtained, we deduce that the NHAIS is efficient.
Resumo:
In this paper, we present a novel algorithm for piecewise linear regression which can learn continuous as well as discontinuous piecewise linear functions. The main idea is to repeatedly partition the data and learn a linear model in each partition. The proposed algorithm is similar in spirit to k-means clustering algorithm. We show that our algorithm can also be viewed as a special case of an EM algorithm for maximum likelihood estimation under a reasonable probability model. We empirically demonstrate the effectiveness of our approach by comparing its performance with that of the state of art algorithms on various datasets. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-pi and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to beta-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.
Resumo:
Cells in the lateral intraparietal cortex (LIP) of rhesus macaques respond vigorously and in spatially-tuned fashion to briefly memorized visual stimuli. Responses to stimulus presentation, memory maintenance, and task completion are seen, in varying combination from neuron to neuron. To help elucidate this functional segmentation a new system for simultaneous recording from multiple neighboring neurons was developed. The two parts of this dissertation discuss the technical achievements and scientific discoveries, respectively.
Technology. Simultanous recordings from multiple neighboring neurons were made with four-wire bundle electrodes, or tetrodes, which were adapted to the awake behaving primate preparation. Signals from these electrodes were partitionable into a background process with a 1/f-like spectrum and foreground spiking activity spanning 300-6000 Hz. Continuous voltage recordings were sorted into spike trains using a state-of-the-art clustering algorithm, producing a mean of 3 cells per site. The algorithm classified 96% of spikes correctly when tetrode recordings were confirmed with simultaneous intracellular signals. Recording locations were verified with a new technique that creates electrolytic lesions visible in magnetic resonance imaging, eliminating the need for histological processing. In anticipation of future multi-tetrode work, the chronic chamber microdrive, a device for long-term tetrode delivery, was developed.
Science. Simultaneously recorded neighboring LIP neurons were found to have similar preferred targets in the memory saccade paradigm, but dissimilar peristimulus time histograms, PSTH). A majority of neighboring cell pairs had a difference in preferred directions of under 45° while the trial time of maximal response showed a broader distribution, suggesting homogeneity of tuning with het erogeneity of function. A continuum of response characteristics was present, rather than a set of specific response types; however, a mapping experiment suggests this may be because a given cell's PSTH changes shape as well as amplitude through the response field. Spike train autocovariance was tuned over target and changed through trial epoch, suggesting different mechanisms during memory versus background periods. Mean frequency-domain spike-to-spike coherence was concentrated below 50 Hz with a significant maximum of 0.08; mean time-domain coherence had a narrow peak in the range ±10 ms with a significant maximum of 0.03. Time-domain coherence was found to be untuned for short lags (10 ms), but significantly tuned at larger lags (50 ms).
Resumo:
Wide field-of-view (FOV) microscopy is of high importance to biological research and clinical diagnosis where a high-throughput screening of samples is needed. This thesis presents the development of several novel wide FOV imaging technologies and demonstrates their capabilities in longitudinal imaging of living organisms, on the scale of viral plaques to live cells and tissues.
The ePetri Dish is a wide FOV on-chip bright-field microscope. Here we applied an ePetri platform for plaque analysis of murine norovirus 1 (MNV-1). The ePetri offers the ability to dynamically track plaques at the individual cell death event level over a wide FOV of 6 mm × 4 mm at 30 min intervals. A density-based clustering algorithm is used to analyze the spatial-temporal distribution of cell death events to identify plaques at their earliest stages. We also demonstrate the capabilities of the ePetri in viral titer count and dynamically monitoring plaque formation, growth, and the influence of antiviral drugs.
We developed another wide FOV imaging technique, the Talbot microscope, for the fluorescence imaging of live cells. The Talbot microscope takes advantage of the Talbot effect and can generate a focal spot array to scan the fluorescence samples directly on-chip. It has a resolution of 1.2 μm and a FOV of ~13 mm2. We further upgraded the Talbot microscope for the long-term time-lapse fluorescence imaging of live cell cultures, and analyzed the cells’ dynamic response to an anticancer drug.
We present two wide FOV endoscopes for tissue imaging, named the AnCam and the PanCam. The AnCam is based on the contact image sensor (CIS) technology, and can scan the whole anal canal within 10 seconds with a resolution of 89 μm, a maximum FOV of 100 mm × 120 mm, and a depth-of-field (DOF) of 0.65 mm. We also demonstrate the performance of the AnCam in whole anal canal imaging in both animal models and real patients. In addition to this, the PanCam is based on a smartphone platform integrated with a panoramic annular lens (PAL), and can capture a FOV of 18 mm × 120 mm in a single shot with a resolution of 100─140 μm. In this work we demonstrate the PanCam’s performance in imaging a stained tissue sample.
Resumo:
Neste trabalho, é proposta uma nova família de métodos a ser aplicada à otimização de problemas multimodais. Nestas técnicas, primeiramente são geradas soluções iniciais com o intuito de explorar o espaço de busca. Em seguida, com a finalidade de encontrar mais de um ótimo, estas soluções são agrupadas em subespaços utilizando um algoritmo de clusterização nebulosa. Finalmente, são feitas buscas locais através de métodos determinísticos de otimização dentro de cada subespaço gerado na fase anterior com a finalidade de encontrar-se o ótimo local. A família de métodos é formada por seis variantes, combinando três esquemas de inicialização das soluções na primeira fase e dois algoritmos de busca local na terceira. A fim de que esta nova família de métodos possa ser avaliada, seus constituintes são comparados com outras metodologias utilizando problemas da literatura e os resultados alcançados são promissores.
Resumo:
A new method of finding the optimal group membership and number of groupings to partition population genetic distance data is presented. The software program Partitioning Optimization with Restricted Growth Strings (PORGS), visits all possible set partitions and deems acceptable partitions to be those that reduce mean intracluster distance. The optimal number of groups is determined with the gap statistic which compares PORGS results with a reference distribution. The PORGS method was validated by a simulated data set with a known distribution. For efficiency, where values of n were larger, restricted growth strings (RGS) were used to bipartition populations during a nested search (bi-PORGS). Bi-PORGS was applied to a set of genetic data from 18 Chinook salmon (Oncorhynchus tshawytscha) populations from the west coast of Vancouver Island. The optimal grouping of these populations corresponded to four geographic locations: 1) Quatsino Sound, 2) Nootka Sound, 3) Clayoquot +Barkley sounds, and 4) southwest Vancouver Island. However, assignment of populations to groups did not strictly reflect the geographical divisions; fish of Barkley Sound origin that had strayed into the Gold River and close genetic similarity between transferred and donor populations meant groupings crossed geographic boundaries. Overall, stock structure determined by this partitioning method was similar to that determined by the unweighted pair-group method with arithmetic averages (UPGMA), an agglomerative clustering algorithm.
Resumo:
Métodos de otimização que utilizam condições de otimalidade de primeira e/ou segunda ordem são conhecidos por serem eficientes. Comumente, esses métodos iterativos são desenvolvidos e analisados à luz da análise matemática do espaço euclidiano n-dimensional, cuja natureza é de caráter local. Consequentemente, esses métodos levam a algoritmos iterativos que executam apenas as buscas locais. Assim, a aplicação de tais algoritmos para o cálculo de minimizadores globais de uma função não linear,especialmente não-convexas e multimodais, depende fortemente da localização dos pontos de partida. O método de Otimização Global Topográfico é um algoritmo de agrupamento, que utiliza uma abordagem baseada em conceitos elementares da teoria dos grafos, a fim de gerar bons pontos de partida para os métodos de busca local, a partir de pontos distribuídos de modo uniforme no interior da região viável. Este trabalho tem dois objetivos. O primeiro é realizar uma nova abordagem sobre método de Otimização Global Topográfica, onde, pela primeira vez, seus fundamentos são formalmente descritos e suas propriedades básicas são matematicamente comprovadas. Neste contexto, propõe-se uma fórmula semi-empírica para calcular o parâmetro chave deste algoritmo de agrupamento, e, usando um método robusto e eficiente de direções viáveis por pontos-interiores, estendemos o uso do método de Otimização Global Topográfica a problemas com restrições de desigualdade. O segundo objetivo é a aplicação deste método para a análise de estabilidade de fase em misturas termodinâmicas,o qual consiste em determinar se uma dada mistura se apresenta em uma ou mais fases. A solução deste problema de otimização global é necessária para o cálculo do equilíbrio de fases, que é um problema de grande importância em processos da engenharia, como, por exemplo, na separação por destilação, em processos de extração e simulação da recuperação terciária de petróleo, entre outros. Além disso, afim de ter uma avaliação inicial do potencial dessa técnica, primeiro vamos resolver 70 problemas testes, e então comparar o desempenho do método proposto aqui com o solver MIDACO, um poderoso software recentemente introduzido no campo da otimização global.
Resumo:
Esta dissertação apresenta resultados da aplicação de filtros adaptativos, utilizando os algoritmos NLMS (Normalized Least Mean Square) e RLS (Recursive Least Square), para a redução de desvios em previsões climáticas. As discrepâncias existentes entre o estado real da atmosfera e o previsto por um modelo numérico tendem a aumentar ao longo do período de integração. O modelo atmosférico Eta é utilizado operacionalmente para previsão numérica no CPTEC/INPE e como outros modelos atmosféricos, apresenta imprecisão nas previsões climáticas. Existem pesquisas que visam introduzir melhorias no modelo atmosférico Eta e outras que avaliam as previsões e identificam os erros do modelo para que seus produtos sejam utilizados de forma adequada. Dessa forma, neste trabalho pretende-se filtrar os dados provenientes do modelo Eta e ajustá-los, de modo a minimizar os erros entre os resultados fornecidos pelo modelo Eta e as reanálises do NCEP. Assim, empregamos técnicas de processamento digital de sinais e imagens com o intuito de reduzir os erros das previsões climáticas do modelo Eta. Os filtros adaptativos nesta dissertação ajustarão as séries ao longo do tempo de previsão. Para treinar os filtros foram utilizadas técnicas de agrupamento de regiões, como por exemplo o algoritmo de clusterização k-means, de modo a selecionar séries climáticas que apresentem comportamentos semelhantes entre si. As variáveis climáticas estudadas são o vento meridional e a altura geopotencial na região coberta pelo modelo de previsão atmosférica Eta com resolução de 40 km, a um nível de pressão de 250 hPa. Por fim, os resultados obtidos mostram que o filtro com 4 coeficientes, adaptado pelo algoritmo RLS em conjunto com o critério de seleção de regiões por meio do algoritmo k-means apresenta o melhor desempenho ao reduzir o erro médio e a dispersão do erro, tanto para a variável vento meridional quanto para a variável altura geopotencial.
Resumo:
Os métodos de otimização que adotam condições de otimalidade de primeira e/ou segunda ordem são eficientes e normalmente esses métodos iterativos são desenvolvidos e analisados através da análise matemática do espaço euclidiano n-dimensional, o qual tem caráter local. Esses métodos levam a algoritmos iterativos que são usados para o cálculo de minimizadores globais de uma função não linear, principalmente não-convexas e multimodais, dependendo da posição dos pontos de partida. Método de Otimização Global Topográfico é um algoritmo de agrupamento, o qual é fundamentado nos conceitos elementares da teoria dos grafos, com a finalidade de gerar bons pontos de partida para os métodos de busca local, com base nos pontos distribuídos de modo uniforme no interior da região viável. Este trabalho tem como objetivo a aplicação do método de Otimização Global Topográfica junto com um método robusto e eficaz de direções viáveis por pontos-interiores a problemas de otimização que tem restrições de igualdade e/ou desigualdade lineares e/ou não lineares, que constituem conjuntos viáveis com interiores não vazios. Para cada um destes problemas, é representado também um hiper-retângulo compreendendo cada conjunto viável, onde os pontos amostrais são gerados.
Resumo:
There is an increasing demand for optimising complete systems and the devices within that system, including capturing the interactions between the various multi-disciplinary (MD) components involved. Furthermore confidence in robust solutions is esential. As a consequence the computational cost rapidly increases and in many cases becomes infeasible to perform such conceptual designs. A coherent design methodology is proposed, where the aim is to improve the design process by effectively exploiting the potential of computational synthesis, search and optimisation and conventional simulation, with a reduction of the computational cost. This optimization framework consists of a hybrid optimization algorithm to handles multi-fidelity simulations. Simultaneously and in order to handles uncertainty without recasting the model and at affordable computational cost, a stochastic modelling method known as non-intrusive polynomial chaos is introduced. The effectiveness of the design methodology is demonstrated with the optimisation of a submarine propulsion system.
Resumo:
根据基因表达数据的特点,提出一种高精度的基于密度的聚类算法DENGENE.DENGENE通过定义一致性检测和引进峰点改进搜索方向,使得算法能够更好地处理基因表达数据.为了评价算法的性能,选取了两组广为使用的测试数据,即啤酒酵母基因表达数据集对算法来进行测试.实验结果表明,与基于模型的五种算法、CAST算法、K-均值聚类等相比,DENGENE在滤除噪声和聚类精度方面取得了显著的改善.