15 resultados para Vector Space IR, Search Engines, Document Clustering, Document

em Chinese Academy of Sciences Institutional Repositories Grid Portal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

词义消歧一直是自然语言理解中的一个关键问题,该问题解决的好坏直接关系到自然语言处理中诸多应用问题的效果优劣.由于自然语言知识表示的困难,在手工规则的词义消歧难以达到理想效果的情况下,各种有导机器学习方法被应用于词义消歧任务中.借鉴前人的成果引入信息检索领域中向量空间模型文档词语权重计算技术来解决多义词义项的知识表示问题,并提出了上下文位置权重的计算方法,给出了一种基于向量空间模型的词义消歧有导机器学习方法.该方法将多义词的义项和上下文分别映射到向量空间中,通过计算多义词上下文向量与义项向量的距离,采用k-NN(k=1)方法来确定上下文向量的义项分类.在9个汉语高频多义词的开放和封闭测试中均取得了突出的成绩(封闭测试平均正确率为96.31% ,开放测试平均正确率为92.98%),验证了该方法的有效性.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The thymidylate synthase (TS), an important target for many anticancer drugs, has been cloned from different species. But the cDNA property and function of TS in zebrafish are not well documented. In order to use zebrafish as an animal model for screening novel anticancer agents, we isolated TS cDNA from zebrafish and compared its sequence with those from other species. The open reading frame (ORF) of zebrafish TS cDNA sequence was 954 nucleotides, encoding a 318-amino acid protein with a calculated molecular mass of 36.15 kDa. The deduced amino acid sequence of zebrafish TS was similar to those from other organisms, including rat, mouse and humans. The zebrafish TS protein was expressed in Escherichia coli and purified to homogeneity. The purified zebrafish TS showed maximal activity at 28 degrees C with similar K-m value to human TS. Western immunoblot assay confirmed that TS was expressed in all the developmental stages of zebrafish with a high level of expression at the 1-4 cell stages. To study the function of TS in zebrafish embryo development, a short hairpin RNA (shRNA) expression vector, pSilencer 4.1-CMV/TS, was constructed which targeted the protein-coding region of zebrafish TS mRNA. Significant change in the development of tail and epiboly was found in zebrafish embryos microinjected pSilencer4.1-CMV/TS siRNA expression vector.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

本文结合自适应小波变换滤波去噪方法与小波阈值去噪方法,提出了一种可用于变速器故障振动信号去噪的双层滤波去噪算法。该算法的滤波过程分为两层,第一层滤波采用自适应小波变换滤波算法;第二层滤波采用经典的小波阈值去噪算法对信号进行二次去噪。最后,将去噪后的故障信号采用小波包进行了分解,并提取了小波包频带能量作为故障特征向量。

Relevância:

60.00% 60.00%

Publicador:

Resumo:

It is common that documents are represented by document icon in graphical user interfaces. The document icon facilitates user to retrieve documents, but it is difficult to distinguish the document from a collection of documents that user have accessed to. Our paper presents a document icon on which the users can add some subjective values and mark. Then we describe a system ex-explorer that users can browser and search the extent document icon. We found that it is easy to re-find the document on which users added some annotation or mark by themselves.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We propose a highly efficient content-lossless compression scheme for Chinese document images. The scheme combines morphologic analysis with pattern matching to cluster patterns. In order to achieve the error maps with minimal error numbers, the morphologic analysis is applied to decomposing and recomposing the Chinese character patterns. In the pattern matching, the criteria are adapted to the characteristics of Chinese characters. Since small-size components sometimes can be inserted into the blank spaces of large-size components, we can achieve small-size pattern library images. Arithmetic coding is applied to the final compression. Our method achieves much better compression performance than most alternative methods, and assures content-lossless reconstruction. (c) 2006 Society of Photo-Optical Instrumentation Engineers.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

We propose a highly efficient content-lossless compression scheme for Chinese document images. The scheme combines morphologic analysis with pattern matching to cluster patterns. In order to achieve the error maps with minimal error numbers, the morphologic analysis is applied to decomposing and recomposing the Chinese character patterns. In the pattern matching, the criteria are adapted to the characteristics of Chinese characters. Since small-size components sometimes can be inserted into the blank spaces of large-size components, we can achieve small-size pattern library images. Arithmetic coding is applied to the final compression. Our method achieves much better compression performance than most alternative methods, and assures content-lossless reconstruction. (c) 2006 Society of Photo-Optical Instrumentation Engineers.

Relevância:

50.00% 50.00%

Publicador:

Relevância:

50.00% 50.00%

Publicador:

Resumo:

ACM SIGIR; ACM SIGWEB

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new model of pattern recognition principles-Biomimetic Pattern Recognition, which is based on "matter cognition" instead of "matter classification", has been proposed. As a important means realizing Biomimetic Pattern Recognition, the mathematical model and analyzing method of ANN get breakthrough: a novel all-purpose mathematical model has been advanced, which can simulate all kinds of neuron architecture, including RBF and BP models. As the same time this model has been realized using hardware; the high-dimension space geometry method, a new means to analyzing ANN, has been researched.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

文本聚类在信息过滤,网页分类中有着很好的应用。但它面临数据量大,特征维度高的难点。由于K平均算法易于实现,对数据依赖度底,在文本聚类中得到应用。然而,传统K平均以及它的变种会产生有较大波动的聚类结果。因此对K平均算法进行了改进,通过优化聚类初始中心的选择,得到一种适合对文本数据聚类分析的改进算法。大量实验显示,该算法可以生成质量较高而且聚类质量波动性较小的结果。

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new model of pattern recognition principles-Biomimetic Pattern Recognition, which is based on "matter cognition" instead of "matter classification", has been proposed. As a important means realizing Biomimetic Pattern Recognition, the mathematical model and analyzing method of ANN get breakthrough: a novel all-purpose mathematical model has been advanced, which can simulate all kinds of neuron architecture, including RBF and BP models. As the same time this model has been realized using hardware; the high-dimension space geometry method, a new means to analyzing ANN, has been researched.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

一般说来,离群点是远离其他数据点的数据,但很可能包含着极其重要的信息.提出了一种新的离群模糊核聚类算法来发现样本集中的离群点.通过Mercer核把原来的数据空间映射到特征空间,并为特征空间的每个向量分配一个动态权值,在经典的FCM模糊聚类算法的基础上得到了一个特征空间内的全新的聚类目标函数,通过对目标函数的优化,最终得到了各个数据的权值,根据权值的大小标识出样本集中的离群点.仿真实验的结果表明了该离群模糊核聚类算法的可行性和有效性.