998 resultados para stream mining


Relevância:

20.00% 20.00%

Publicador:

Resumo:

数据流是近年出现的一个新的应用类型,具有连续、无限、高速等特点。典型的数据流包括:无线传感器网络应用环境中由传感器传回的各种监测数据、股票交易所的股票价格信息、网络监测系统与道路交通监测系统的监测数据、电信部门的通话记录数据,以及网站的日志信息等。数据流的出现对传统的数据管理和挖掘技术提出了巨大的挑战。传统的数据挖掘技术往往对静态数据集合做多遍扫描,其时间和空间复杂度均较高,难以直接应用到数据流环境中。本文对数据流上的频繁项集挖掘问题做了深入研究,主要研究内容和创新性成果概述如下: 本文首先对频繁项集挖掘问题做了一个全面的综述。综述部分先对静态数据集上的频繁项集挖掘的概念、分类、经典算法等相关研究做全面的介绍,然后分析了在数据流上进行频繁项集挖掘面临的问题和挑战、以及研究现状。 针对数据流上的频繁元素挖掘问题,本文提出了一个简单而高效的算法,挖掘数据流滑动窗口上的频繁元素。算法既可以定期返回满足ε-近似要求的频繁元素,也可以响应用户在任意时间提交的请求,返回满足误差要求的结果。 针对数据流上的频繁项集挖掘问题,本文提出了BFI-Stream算法,挖掘数据流滑动窗口上的所有频繁项集,实时返回精确结果。该算法使用前缀树数据结构,并且在创建和更新过程中裁剪了一部分非频繁节点,因此算法的空间和时间复杂度都较低。 接着,本文针对现有的在数据流上挖掘频繁项集的算法存在维护过多非频繁项集而导致使用空间过大的问题,提出了一种乐观裁剪方法,大大降低了算法的空间复杂度。文中先对实际数据集分析了项集的频率分布情况,提出了乐观裁剪方法,裁剪大部分非频繁项集;实验结果表明乐观裁剪方法不仅大大降低了内存使用量,还提高了算法的更新效率。 再次,本文针对用户指定最小支持度和允许误差的近似查询,提出了在数据流滑动窗口上挖掘频繁项集的近似算法AFI-Stream,返回满足误差要求的结果。AFI-Stream仅仅维护频繁项集,不维护非频繁项集,因此能大大降低算法使用的内存。 为了满足在数据流上挖掘频繁项集研究的需要,本文设计并开发了一个数据流频繁项集挖掘原型系统StreamMiner,进行相关算法的评测和研究。

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A fine-grid model (1/6degrees) covering the South China Sea (SCS), East China Sea and Japan/East Sea, which is embedded into a coarse-grid (3degrees) global model, was established to study the SCS circulation. In the present paper, we report the model-produced monthly and annual mean transport stream functions and sea surface heights(SSH) and their anomalies of the SCS. Comparison to the TOPEX/Poseidon data shows that the model-produced monthly sea surface height anomalies (SSHA) are in good agreement with altimeter measurements. Based on the results, the circulation of the SCS, especially the upper layer circulation, is discussed. In the surface layer, the western Philippine Sea water intrudes into the SCS through the Luzon Strait in autumn, winter and spring, but not in summer. However, as far as the whole water column is concerned, the water intrudes into the SCS through the Luzon Strait all the year round. This indicates that in summer the water still intrudes into the SCS in the subsurface and intermediate layers. The area near the northern continental slope of the SCS is dominated by a cyclonic circulation all the year round. The SCS Southern Anticyclonic Gyre, SE Vietnam Off-Shore Current in summertime and SCS Southern Cyclonic Gyre in wintertime are reproduced reasonably. The difference between the monthly averaged SSH and SSHA is significant, indicating the importance of the mean SSH in the SCS circulation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although single nucleotide polymorphisms (SNPs) are important resources for population genetics, pedigree analysis and genomic mapping, such loci have not been reported in Pacific abalone so far. In this study, a bioinformatics strategy was adopted to discover SNPs within the expressed sequences (ESTs) of Pacific abalone, Haliotis discus hannai, and furthermore, polymerase chain reaction direct sequencing (PCR-DS) and allele-specific PCR (AS-PCR) were used for SNPs detection and genotype scoring respectively. A total of 5893 ESTs were assembled and 302 putative SNPs were identified. The average density of SNPs in ESTs was 1%. Fifty-two sets of sequencing primers were designed from SNPs flanking ESTs to amplify the genomic DNA, and 13 could generate products of expected size. Polymerase chain reaction direct sequencing of the amplification products from pooled DNA samples revealed 40 polymorphic SNP loci. Using a modified tetra-primer AS-PCR, seven mitochondrial and six nuclear SNPs were typed and characterized among 37 wild abalones. In conclusion, it is feasible to discover SNPs from number limited ESTs and the AS-PCR as a simple, robust and reliable assay could be a primary method for small- and medium-scale SNPs detection in abalones as well as other non-model organisms.