938 resultados para Elaborazione d’immagini, Microscopia, Istopatologia, Classificazione, K-means


Relevância:

100.00% 100.00%

Publicador:

Resumo:

When document corpus is very large, we often need to reduce the number of features. But it is not possible to apply conventional Non-negative Matrix Factorization(NMF) on billion by million matrix as the matrix may not fit in memory. Here we present novel Online NMF algorithm. Using Online NMF, we reduced original high-dimensional space to low-dimensional space. Then we cluster all the documents in reduced dimension using k-means algorithm. We experimentally show that by processing small subsets of documents we will be able to achieve good performance. The method proposed outperforms existing algorithms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning and data mining. Clustering is grouping of a data set or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait according to some defined distance measure. In this paper we present the genetically improved version of particle swarm optimization algorithm which is a population based heuristic search technique derived from the analysis of the particle swarm intelligence and the concepts of genetic algorithms (GA). The algorithm combines the concepts of PSO such as velocity and position update rules together with the concepts of GA such as selection, crossover and mutation. The performance of the above proposed algorithm is evaluated using some benchmark datasets from Machine Learning Repository. The performance of our method is better than k-means and PSO algorithm.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we have proposed a simple and effective approach to classify H.264 compressed videos, by capturing orientation information from the motion vectors. Our major contribution involves computing Histogram of Oriented Motion Vectors (HOMV) for overlapping hierarchical Space-Time cubes. The Space-Time cubes selected are partially overlapped. HOMV is found to be very effective to define the motion characteristics of these cubes. We then use Bag of Features (B OF) approach to define the video as histogram of HOMV keywords, obtained using k-means clustering. The video feature, thus computed, is found to be very effective in classifying videos. We demonstrate our results with experiments on two large publicly available video database.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Real world biological systems such as the human brain are inherently nonlinear and difficult to model. However, most of the previous studies have either employed linear models or parametric nonlinear models for investigating brain function. In this paper, a novel application of a nonlinear measure of phase synchronization based on recurrences, correlation between probabilities of recurrence (CPR), to study connectivity in the brain has been proposed. Being non-parametric, this method makes very few assumptions, making it suitable for investigating brain function in a data-driven way. CPR's utility with application to multichannel electroencephalographic (EEG) signals has been demonstrated. Brain connectivity obtained using thresholded CPR matrix of multichannel EEG signals showed clear differences in the number and pattern of connections in brain connectivity between (a) epileptic seizure and pre-seizure and (b) eyes open and eyes closed states. Corresponding brain headmaps provide meaningful insights about synchronization in the brain in those states. K-means clustering of connectivity parameters of CPR and linear correlation obtained from global epileptic seizure and pre-seizure showed significantly larger cluster centroid distances for CPR as opposed to linear correlation, thereby demonstrating the superior ability of CPR for discriminating seizure from pre-seizure. The headmap in the case of focal epilepsy clearly enables us to identify the focus of the epilepsy which provides certain diagnostic value. (C) 2013 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Structural information over the entire course of binding interactions based on the analyses of energy landscapes is described, which provides a framework to understand the events involved during biomolecular recognition. Conformational dynamics of malectin's exquisite selectivity for diglucosylated N-glycan (Dig-N-glycan), a highly flexible oligosaccharide comprising of numerous dihedral torsion angles, are described as an example. For this purpose, a novel approach based on hierarchical sampling for acquiring metastable molecular conformations constituting low-energy minima for understanding the structural features involved in a biologic recognition is proposed. For this purpose, four variants of principal component analysis were employed recursively in both Cartesian space and dihedral angles space that are characterized by free energy landscapes to select the most stable conformational substates. Subsequently, k-means clustering algorithm was implemented for geometric separation of the major native state to acquire a final ensemble of metastable conformers. A comparison of malectin complexes was then performed to characterize their conformational properties. Analyses of stereochemical metrics and other concerted binding events revealed surface complementarity, cooperative and bidentate hydrogen bonds, water-mediated hydrogen bonds, carbohydrate-aromatic interactions including CH-pi and stacking interactions involved in this recognition. Additionally, a striking structural transition from loop to beta-strands in malectin CRD upon specific binding to Dig-N-glycan is observed. The interplay of the above-mentioned binding events in malectin and Dig-N-glycan supports an extended conformational selection model as the underlying binding mechanism.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies a pilot-assisted physical layer data fusion technique known as Distributed Co-Phasing (DCP). In this two-phase scheme, the sensors first estimate the channel to the fusion center (FC) using pilots sent by the latter; and then they simultaneously transmit their common data by pre-rotating them by the estimated channel phase, thereby achieving physical layer data fusion. First, by analyzing the symmetric mutual information of the system, it is shown that the use of higher order constellations (HOC) can improve the throughput of DCP compared to the binary signaling considered heretofore. Using an HOC in the DCP setting requires the estimation of the composite DCP channel at the FC for data decoding. To this end, two blind algorithms are proposed: 1) power method, and 2) modified K-means algorithm. The latter algorithm is shown to be computationally efficient and converges significantly faster than the conventional K-means algorithm. Analytical expressions for the probability of error are derived, and it is found that even at moderate to low SNRs, the modified K-means algorithm achieves a probability of error comparable to that achievable with a perfect channel estimate at the FC, while requiring no pilot symbols to be transmitted from the sensor nodes. Also, the problem of signal corruption due to imperfect DCP is investigated, and constellation shaping to minimize the probability of signal corruption is proposed and analyzed. The analysis is validated, and the promising performance of DCP for energy-efficient physical layer data fusion is illustrated, using Monte Carlo simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Among the multiple advantages and applications of remote sensing, one of the most important uses is to solve the problem of crop classification, i.e., differentiating between various crop types. Satellite images are a reliable source for investigating the temporal changes in crop cultivated areas. In this letter, we propose a novel bat algorithm (BA)-based clustering approach for solving crop type classification problems using a multispectral satellite image. The proposed partitional clustering algorithm is used to extract information in the form of optimal cluster centers from training samples. The extracted cluster centers are then validated on test samples. A real-time multispectral satellite image and one benchmark data set from the University of California, Irvine (UCI) repository are used to demonstrate the robustness of the proposed algorithm. The performance of the BA is compared with two other nature-inspired metaheuristic techniques, namely, genetic algorithm and particle swarm optimization. The performance is also compared with the existing hybrid approach such as the BA with K-means. From the results obtained, it can be concluded that the BA can be successfully applied to solve crop type classification problems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resumen: El objetivo es determinar utilizando las mediciones acústicas, qué información es más relevante para el oyente al momento de categorizar el grado general de disfonía. Se eligieron 8 (4 voces femeninas y 4 voces masculinas. Cada emisión fue evaluada auditivo perceptualmente a través del item G de la escala GRBAS por 10 oyentes experimentados y acústicamente mediante medidas de aperiodicidad, ruido y caos. El estudio estadístico de análisis discriminante señala la importancia de GNE, Jit y Jitter_cc y Lyapunov como parámetros predictores del grado general de disfonía. La aplicación del método k-means evidencia que existen rasgos en los parámetros acústicos empleados que permiten agrupar objetivamente las voces estudiadas con 100% de precisión para la clase 0, 96% a la clase 2 y 79% a la clase 3. Un mayor número y variabilidad de casos se necesita a fin de verificar los resultados preliminares.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent player tracking technology provides new information about basketball game performance. The aim of this study was to (i) compare the game performances of all-star and non all-star basketball players from the National Basketball Association (NBA), and (ii) describe the different basketball game performance profiles based on the different game roles. Archival data were obtained from all 2013-2014 regular season games (n = 1230). The variables analyzed included the points per game, minutes played and the game actions recorded by the player tracking system. To accomplish the first aim, the performance per minute of play was analyzed using a descriptive discriminant analysis to identify which variables best predict the all-star and non all-star playing categories. The all-star players showed slower velocities in defense and performed better in elbow touches, defensive rebounds, close touches, close points and pull-up points, possibly due to optimized attention processes that are key for perceiving the required appropriate environmental information. The second aim was addressed using a k-means cluster analysis, with the aim of creating maximal different performance profile groupings. Afterwards, a descriptive discriminant analysis identified which variables best predict the different playing clusters. The results identified different playing profile of performers, particularly related to the game roles of scoring, passing, defensive and all-round game behavior. Coaching staffs may apply this information to different players, while accounting for individual differences and functional variability, to optimize practice planning and, consequently, the game performances of individuals and teams.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta dissertação apresenta resultados da aplicação de filtros adaptativos, utilizando os algoritmos NLMS (Normalized Least Mean Square) e RLS (Recursive Least Square), para a redução de desvios em previsões climáticas. As discrepâncias existentes entre o estado real da atmosfera e o previsto por um modelo numérico tendem a aumentar ao longo do período de integração. O modelo atmosférico Eta é utilizado operacionalmente para previsão numérica no CPTEC/INPE e como outros modelos atmosféricos, apresenta imprecisão nas previsões climáticas. Existem pesquisas que visam introduzir melhorias no modelo atmosférico Eta e outras que avaliam as previsões e identificam os erros do modelo para que seus produtos sejam utilizados de forma adequada. Dessa forma, neste trabalho pretende-se filtrar os dados provenientes do modelo Eta e ajustá-los, de modo a minimizar os erros entre os resultados fornecidos pelo modelo Eta e as reanálises do NCEP. Assim, empregamos técnicas de processamento digital de sinais e imagens com o intuito de reduzir os erros das previsões climáticas do modelo Eta. Os filtros adaptativos nesta dissertação ajustarão as séries ao longo do tempo de previsão. Para treinar os filtros foram utilizadas técnicas de agrupamento de regiões, como por exemplo o algoritmo de clusterização k-means, de modo a selecionar séries climáticas que apresentem comportamentos semelhantes entre si. As variáveis climáticas estudadas são o vento meridional e a altura geopotencial na região coberta pelo modelo de previsão atmosférica Eta com resolução de 40 km, a um nível de pressão de 250 hPa. Por fim, os resultados obtidos mostram que o filtro com 4 coeficientes, adaptado pelo algoritmo RLS em conjunto com o critério de seleção de regiões por meio do algoritmo k-means apresenta o melhor desempenho ao reduzir o erro médio e a dispersão do erro, tanto para a variável vento meridional quanto para a variável altura geopotencial.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We have applied a number of objective statistical techniques to define homogeneous climatic regions for the Pacific Ocean, using COADS (Woodruff et al 1987) monthly sea surface temperature (SST) for 1950-1989 as the key variable. The basic data comprised all global 4°x4° latitude/longitude boxes with enough data available to yield reliable long-term means of monthly mean SST. An R-mode principal components analysis of these data, following a technique first used by Stidd (1967), yields information about harmonics of the annual cycles of SST. We used the spatial coefficients (one for each 4-degree box and eigenvector) as input to a K-means cluster analysis to classify the gridbox SST data into 34 global regions, in which 20 comprise the Pacific and Indian oceans. Seasonal time series were then produced for each of these regions. For comparison purposes, the variance spectrum of each regional anomaly time series was calculated. Most of the significant spectral peaks occur near the biennial (2.1-2.2 years) and ENSO (~3-6 years) time scales in the tropical regions. Decadal scale fluctuations are important in the mid-latitude ocean regions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most research on technology roadmapping has focused on its practical applications and the development of methods to enhance its operational process. Thus, despite a demand for well-supported, systematic information, little attention has been paid to how/which information can be utilised in technology roadmapping. Therefore, this paper aims at proposing a methodology to structure technological information in order to facilitate the process. To this end, eight methods are suggested to provide useful information for technology roadmapping: summary, information extraction, clustering, mapping, navigation, linking, indicators and comparison. This research identifies the characteristics of significant data that can potentially be used in roadmapping, and presents an approach to extracting important information from such raw data through various data mining techniques including text mining, multi-dimensional scaling and K-means clustering. In addition, this paper explains how this approach can be applied in each step of roadmapping. The proposed approach is applied to develop a roadmap of radio-frequency identification (RFID) technology to illustrate the process practically. © 2013 © 2013 Taylor & Francis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

  语言文字的信息处理技术是人类社会现代化的重要成果。信息处理技术的水平已成为一个国家、一个民族发展进步的重要标志。改革开放以来,在实现汉语言文字信息处理技术的同时,我国少数民族语言文字信息处理技术也取得了巨大成就。从上世纪80、90年代起,藏语文信息处理工作取得突破性进展,然而在藏文的联机手写识别领域却仍处于起步阶段。   本文在总结了汉字识别和已有的藏文联机识别成果的基础上,在预处理过程、特征提取部分以及聚类部分进行了改进。在预处理过程中,使用了应用于图像处理中的数学形态学操作作为预处理中的去噪声方法,主要操作包括膨胀,腐蚀和细化;在特征提取阶段,在分析了原始特征值在后续计算中存在缺陷的基础上,本文对原始特征值进行了非线性变换,在网格方向特征提取时加入了网格的权值,从而增强了特征向量的区分能力;聚类使用的是k-means聚类方法,在聚类距离方面,我们在对比了各种距离测度方法后,提出了一种变型的欧氏距离计算方法,并将硬聚类方法改为模糊聚类方法,提高了算法的鲁棒性;为了提高系统区分相似字的能力,本文在借鉴了签名识别的方法后,实现了二级分类器,增强了其对细微差别的辨别能力。   实验结果表明,本文提出的改进方法对识别率有一定的提高,说明方法可行有效,适用于藏文的联机手写识别。

Relevância:

100.00% 100.00%

Publicador:

Resumo:

变形手势跟踪是基于视觉的人机交互研究中的一项重要内容.单摄像头条件下,提出一种新颖的变形手势实时跟踪方法.利用一组2D手势模型替代高维度的3D手模型.首先利用贝叶斯分类器对静态手势进行识别,然后对图像进行手指和指尖定位,通过将图像特征与识别结果进行匹配,实现了跟踪过程的自动初始化.提出将K-means聚类算法与粒子滤波相结合,用于解决多手指跟踪问题中手指互相干扰的问题.跟踪过程中进行跟踪状态检测,实现了自动恢复跟踪及手势模型更新.实验结果表明,该方法可以实现对变形手势快速、准确的连续跟踪,能够满足基于视觉的实时人机交互的要求.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

银行属于数据密集型企业,每天产生海量的业务数据,蕴含有大量有用的信息。如何利用这些数据,让其发挥巨大作用,成为提高银行竞争力的重要内容。当前,随着银行转变经营理念、信息技术的飞速发展,银行信息化已经进入一个全新的发展时期,数据仓库技术等新兴技术已成为银行数据分析和决策必不可少的重要工具。但目前,很多数据仓库应用显得功能不够实用,效果也没有预想的明显。为此,结合业务来深入研究数据仓库的使用机理,挖掘分析功能成为银行当前迫在眉睫的任务。 本文首先通过分析银行现有的实际业务需求,着重对数据仓库技术在银行业中的应用现状进行了详细表述,并根据数据仓库和数据挖掘技术的深入研究,结合银行的具体业务特点,设计与实现了商业银行分布式数据仓库系统;同时对分布式数据仓库系统的构建机理和实施步骤进行了详细描述。利用构件方式重新设计和实现了数据仓库中的重要部分——ETL系统,并加入ETL服务器的线程池缓冲区机制,实现系统性能的优化。 本文在建立数据仓库系统的基础上,利用联机分析处理OLAP技术和改进数据挖掘K-means聚类算法——X-means算法,实现了银行数据仓库系统的客户管理分析功能,取得了很好的应用效果,从而为银行数据仓库系统开发应用提供了可借鉴的操作思路。