22 resultados para Data Mining, Clustering, PSA, Pavement Deflection
Resumo:
Expressed sequence tags (ESTs) are a source for microsatellite development. In the present study, EST-derived microsatelltes (EST-SSRs) were generated and characterized in the common carp (Cyprinus carpio) by data mining from updated public EST databases and by subsequent testing for polymorphism. About 5.5% (555) of 10,088 ESTs contain repeat motifs of various types and lengths with CA being the most abundant dinucleotide one. Out of the 60 EST-SSRs for which PCR primers were designed, 25 loci showed polymorphism in a common carp population with the alleles per locus ranging from 3 to 17 (mean 7). The observed (H-O) and expected (HE) heterozygosities of these EST-SSRs were 0.13-1.00 and 0.12-0.91, respectively. Six EST-SSR loci significantly deviated from the Hardy-Weinberg equilibrium (HWE) expectation, and the remaining 19 loci were in HWE. Of the 60 primer sets, the rates of polymorphic EST-SSRs were 42% in common carp, 17% in crucian carp (Carassius auratus), and 5% in silver carp (Hypophthalmichthys molitrix), respectively. These new EST-SSR markers would provide sufficient polymorphism for population genetic studies and genome mapping of the common carp and its closely related fishes. (c) 2007 Published by Elsevier B.V.
Resumo:
Self-organizing maps (SOM) have been recognized as a powerful tool in data exploratoration, especially for the tasks of clustering on high dimensional data. However, clustering on categorical data is still a challenge for SOM. This paper aims to extend standard SOM to handle feature values of categorical type. A batch SOM algorithm (NCSOM) is presented concerning the dissimilarity measure and update method of map evolution for both numeric and categorical features simultaneously.
Resumo:
IEEE
Resumo:
National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences
Resumo:
We investigate the use of independent component analysis (ICA) for speech feature extraction in digits speech recognition systems.We observe that this may be true for a recognition tasks based on geometrical learning with little training data. In contrast to image processing, phase information is not essential for digits speech recognition. We therefore propose a new scheme that shows how the phase sensitivity can be removed by using an analytical description of the ICA-adapted basis functions via the Hilbert transform. Furthermore, since the basis functions are not shift invariant, we extend the method to include a frequency-based ICA stage that removes redundant time shift information. The digits speech recognition results show promising accuracy, Experiments show method based on ICA and geometrical learning outperforms HMM in different number of train samples.
Resumo:
In this paper, we constructed a Iris recognition algorithm based on point covering of high-dimensional space and Multi-weighted neuron of point covering of high-dimensional space, and proposed a new method for iris recognition based on point covering theory of high-dimensional space. In this method, irises are trained as "cognition" one class by one class, and it doesn't influence the original recognition knowledge for samples of the new added class. The results of experiments show the rejection rate is 98.9%, the correct cognition rate and the error rate are 95.71% and 3.5% respectively. The experimental results demonstrate that the rejection rate of test samples excluded in the training samples class is very high. It proves the proposed method for iris recognition is effective.
Resumo:
A new model of pattern recognition principles-Biomimetic Pattern Recognition, which is based on "matter cognition" instead of "matter classification", has been proposed. As a important means realizing Biomimetic Pattern Recognition, the mathematical model and analyzing method of ANN get breakthrough: a novel all-purpose mathematical model has been advanced, which can simulate all kinds of neuron architecture, including RBF and BP models. As the same time this model has been realized using hardware; the high-dimension space geometry method, a new means to analyzing ANN, has been researched.
Resumo:
离散属性数据是数据挖掘中的一类重要数据,而非监督学习,是数据挖掘采用的一类关键方法。 本文针对多种类型的离散数据,包括文本数据,时序离散序列数据和多维离散数据, 研究若干新的非监督数据挖掘算法。主要内容包括: 多文集文本的主题建模 将 LDA 应用于多文集数据,提出了多文集文本上的主题建模方法 C-LDA。 在 C-LDA 中,主题信息可以在各个文集间传递,因此也可以将其看作一种基于传递学习的方法。 正是基于文集间的信息传递,C-LDA 更进一步克服了 LDA 模型在单文集文本数据上存在的过拟合现象。 此外,所提出的模型,还可以作为有监督的主题模型。通过大型多文集数据集,我们验证了该模型的有效性。 时序离散序列的主题建模 将 LDA 应用于时序离散序列数据,提出了 T-BiLDA 模型。在 T-BiLDA 模型中,我们提出了全局转移概率这一全新的概念。 基于该概念,T-BiLDA 模型将现有工作中的全局信息、局部信息和时序信息集成于同一个模型。在实际的入侵检测数据上取得了更好的效果。 多维离散数据的聚类分析 我们提出了将多维离散数据映射到空间数据的框架 R-map。使得现有的空间数据聚类算法可直接应用于映射后的数据。 我们从理论上证明了映射中能保持数据的聚类性质,并从实验上验证了 R-map的有效性。
Resumo:
基于序贯频繁模式挖掘,提出并实现了一种宏观网络流量异常检测的方法。定义了一个新的频繁模式和相对应的异常度概念。对863—917网络安全监测平台提供的全国流量数据进行了实验,得出对应于“橙色八月”的2006年8月上旬流量严重异常的结论。通过与相关的其他传统算法进行对比,如使用绝对流量的算法和简单使用不同小时流量排名的算法,进一步说明序贯频繁模式对网络流量分析的实用性。
Resumo:
研究宏观网络安全数据挖掘系统的目的是保护大型网络中关键网络基础设施的可用性、机密性和完整性。为此,首先提出了一种宏观网络数据挖掘的系统框架;然后分析了宏观网络挖掘子系统和态势分析子系统;最后利用网格计算技术实现了该平台,并给出了其运行环境。该系统具有可扩展性,能有效进行宏观网络的数据挖掘和实时势态感知.