10 resultados para Data stream mining
em Chinese Academy of Sciences Institutional Repositories Grid Portal
Resumo:
skyline查询是数据挖掘一个重要的研究方向,在基于数据的决策支持等应用中有着重要的作用.由于现实应用中存在着大量的不完整数据流,但大多数现有的skyline查询算法都依赖于如下的假设:1)任意数据点的所有维度值都是已知的;2)数据集是稳定、有界的并且可以随意访问.此外,随着数据维度的增加,skyline数据点的个数会变得过多,因此引入了k-支配skyline的概念,但是不完整数据的k-支配关系并不具有传递性,现有的skyline查询算法都无法适用.基于这些问题,考虑到数据流高维、无界、顺序性的特点,并且在某些维度上可能具有缺失值的特性,提出了一种新的基于滑动窗口的不完整数据流的k-支配skyline查询算法,实验结果表明,算法不仅可以支持不完整数据流上的k-支配skyline计算,并能够保证效率和性能.
Resumo:
随着信息技术的发展,出现了大量的数据流应用,如传感器数据处理、网络监控、金融数据分析等。在这些应用中,数据是一种连续、时变、有序、无限的序列,查询大多数都是采用连续查询方式。这种数据和查询的连续性对管理系统的资源需求很大。当系统资源不能满足查询处理要求,即查询处理任务超过系统的最大处理能力,用户的查询将得不到及时、正确地处理。同时,如果查询处理时间超过了查询截止期需求,查询结果就没有意义,甚至会造成灾难性后果。目前,很多研究都集中于数据流系统的降载,对支持实时查询处理的实时数据流系统降载的研究比较少。 本论文主要研究支持实时查询处理的实时数据流管理系统中的降载方法,主要包括如下三个方面:随机降载方法、语义降载方法与共享滑动窗口连接操作的降载方法。最后,通过实时数据流管理系统测试平台验证了所提出算法在提高系统吞吐量与降低截止期错失率方面表现出良好的性能。 针对实时数据流应用需求,提出了一种适合实时查询的数据流处理框架结构RT-DSPA和相应的多层过载处理策略MLOHS,为降载方法的研究提供一个框架基础。RT-DSPA分为用户层、DSMS层以及数据源层多个功能模块,具有多层性、可扩展性、健壮性以及可配置性的特点。 在随机降载方面,提出了一种基于数据流流速的负载估计算法;在实时数据流处理框架与负载估计算法的基础上,提出了一种截止期敏感的随机降载算法RLS-EDA。由于系统负载经常波动较大,该算法利用截止期的特点,使用暂存所丢弃元组技术充分地利用CPU空闲资源,使降载执行后系统的吞吐量得到提高,进而尽可能地降低查询截止期错失率;最后,讨论了降载过程中的队列维护策略、含共享操作符查询网络中的降载位置以及降载操作符插入查询网络的算法。实验结果表明,在系统负载波动较大的情况下,RLS-EDA算法表现出良好的性能。 在充分了解数据流及查询特征的情况下,语义降载表现出更好的降载效果。为明确语义降载时使用到的语义,提出了元组价值、价值等级的概念,给出价值等级划分时发生冲突的解决方法。设计了适合实时数据流管理系统的价值等级–执行开销优先级表和截止期–价值密度优先级表,其在确定优先级时可考虑多维因素。基于这两种优先级表设计,提出了相对应的语义降载算法SLS-PT-VD&EC和SLS-PT-D&TVD。基于优先级表的语义降载算法能够灵活地满足用户的不同需求,同时提高系统降载时的性能。 最后,针对共享滑动窗口连接操作符的过载情况,利用查询截止期的特点,提出了一种基于暂存丢弃元组技术的共享滑动窗口连接的降载算法LS-SJRT;为减小LS-SJRT算法的降载开销,提出了一种改进的基于调节滑动窗口宽度的共享滑动窗口连接降载算法LS-SJRT-CW。实验结果显示这两种算法在共享连接操作符过载时都能够表现出较好的性能。
Resumo:
Expressed sequence tags (ESTs) are a source for microsatellite development. In the present study, EST-derived microsatelltes (EST-SSRs) were generated and characterized in the common carp (Cyprinus carpio) by data mining from updated public EST databases and by subsequent testing for polymorphism. About 5.5% (555) of 10,088 ESTs contain repeat motifs of various types and lengths with CA being the most abundant dinucleotide one. Out of the 60 EST-SSRs for which PCR primers were designed, 25 loci showed polymorphism in a common carp population with the alleles per locus ranging from 3 to 17 (mean 7). The observed (H-O) and expected (HE) heterozygosities of these EST-SSRs were 0.13-1.00 and 0.12-0.91, respectively. Six EST-SSR loci significantly deviated from the Hardy-Weinberg equilibrium (HWE) expectation, and the remaining 19 loci were in HWE. Of the 60 primer sets, the rates of polymorphic EST-SSRs were 42% in common carp, 17% in crucian carp (Carassius auratus), and 5% in silver carp (Hypophthalmichthys molitrix), respectively. These new EST-SSR markers would provide sufficient polymorphism for population genetic studies and genome mapping of the common carp and its closely related fishes. (c) 2007 Published by Elsevier B.V.
Resumo:
Plecoptera constitute a numerically and ecologically significant component in mountain streams all over the world, but little is known of their life cycles in Asia. The life cycle of Nemoura sichuanensis and its relationship to water temperature was investigated during a 4-year study in a headwater stream (known as the Jiuchong torrent) of the Xiangxi River in Central China. Size structure histograms suggest that the life cycle was univoltine, and the relationships between the growth of Nemoura sichuanensis, physiological time, and effective accumulated water temperature were described using logistic regressions. The growth pattern was generally similar within year classes but growth rates did vary between year-classes. Our field data suggest a critical thermal threshold for emergence in Nemoura sichuanensis, that was close to 9 degrees C. The total number of physiological days required for completing larval development was 250 days. The effective accumulated water temperature was 2500 degree-days in the field. Development during the life cycle increased somewhat linearly with the physiological time and the effective accumulated water temperature, but some non-linear relationships were best developed by logistic equations.
Resumo:
数据流是近年出现的一个新的应用类型,具有连续、无限、高速等特点。典型的数据流包括:无线传感器网络应用环境中由传感器传回的各种监测数据、股票交易所的股票价格信息、网络监测系统与道路交通监测系统的监测数据、电信部门的通话记录数据,以及网站的日志信息等。数据流的出现对传统的数据管理和挖掘技术提出了巨大的挑战。传统的数据挖掘技术往往对静态数据集合做多遍扫描,其时间和空间复杂度均较高,难以直接应用到数据流环境中。本文对数据流上的频繁项集挖掘问题做了深入研究,主要研究内容和创新性成果概述如下: 本文首先对频繁项集挖掘问题做了一个全面的综述。综述部分先对静态数据集上的频繁项集挖掘的概念、分类、经典算法等相关研究做全面的介绍,然后分析了在数据流上进行频繁项集挖掘面临的问题和挑战、以及研究现状。 针对数据流上的频繁元素挖掘问题,本文提出了一个简单而高效的算法,挖掘数据流滑动窗口上的频繁元素。算法既可以定期返回满足ε-近似要求的频繁元素,也可以响应用户在任意时间提交的请求,返回满足误差要求的结果。 针对数据流上的频繁项集挖掘问题,本文提出了BFI-Stream算法,挖掘数据流滑动窗口上的所有频繁项集,实时返回精确结果。该算法使用前缀树数据结构,并且在创建和更新过程中裁剪了一部分非频繁节点,因此算法的空间和时间复杂度都较低。 接着,本文针对现有的在数据流上挖掘频繁项集的算法存在维护过多非频繁项集而导致使用空间过大的问题,提出了一种乐观裁剪方法,大大降低了算法的空间复杂度。文中先对实际数据集分析了项集的频率分布情况,提出了乐观裁剪方法,裁剪大部分非频繁项集;实验结果表明乐观裁剪方法不仅大大降低了内存使用量,还提高了算法的更新效率。 再次,本文针对用户指定最小支持度和允许误差的近似查询,提出了在数据流滑动窗口上挖掘频繁项集的近似算法AFI-Stream,返回满足误差要求的结果。AFI-Stream仅仅维护频繁项集,不维护非频繁项集,因此能大大降低算法使用的内存。 为了满足在数据流上挖掘频繁项集研究的需要,本文设计并开发了一个数据流频繁项集挖掘原型系统StreamMiner,进行相关算法的评测和研究。
Resumo:
IEEE
Resumo:
National Key Basic Research and Development Program of China [2006CB701305]; State Key Laboratory of Resource and Environment Information System [088RA400SA]; Chinese Academy of Sciences
Resumo:
A fine-grid model (1/6degrees) covering the South China Sea (SCS), East China Sea and Japan/East Sea, which is embedded into a coarse-grid (3degrees) global model, was established to study the SCS circulation. In the present paper, we report the model-produced monthly and annual mean transport stream functions and sea surface heights(SSH) and their anomalies of the SCS. Comparison to the TOPEX/Poseidon data shows that the model-produced monthly sea surface height anomalies (SSHA) are in good agreement with altimeter measurements. Based on the results, the circulation of the SCS, especially the upper layer circulation, is discussed. In the surface layer, the western Philippine Sea water intrudes into the SCS through the Luzon Strait in autumn, winter and spring, but not in summer. However, as far as the whole water column is concerned, the water intrudes into the SCS through the Luzon Strait all the year round. This indicates that in summer the water still intrudes into the SCS in the subsurface and intermediate layers. The area near the northern continental slope of the SCS is dominated by a cyclonic circulation all the year round. The SCS Southern Anticyclonic Gyre, SE Vietnam Off-Shore Current in summertime and SCS Southern Cyclonic Gyre in wintertime are reproduced reasonably. The difference between the monthly averaged SSH and SSHA is significant, indicating the importance of the mean SSH in the SCS circulation.