949 resultados para Data stream mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

skyline查询是数据挖掘一个重要的研究方向,在基于数据的决策支持等应用中有着重要的作用.由于现实应用中存在着大量的不完整数据流,但大多数现有的skyline查询算法都依赖于如下的假设:1)任意数据点的所有维度值都是已知的;2)数据集是稳定、有界的并且可以随意访问.此外,随着数据维度的增加,skyline数据点的个数会变得过多,因此引入了k-支配skyline的概念,但是不完整数据的k-支配关系并不具有传递性,现有的skyline查询算法都无法适用.基于这些问题,考虑到数据流高维、无界、顺序性的特点,并且在某些维度上可能具有缺失值的特性,提出了一种新的基于滑动窗口的不完整数据流的k-支配skyline查询算法,实验结果表明,算法不仅可以支持不完整数据流上的k-支配skyline计算,并能够保证效率和性能.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

随着信息技术的发展,出现了大量的数据流应用,如传感器数据处理、网络监控、金融数据分析等。在这些应用中,数据是一种连续、时变、有序、无限的序列,查询大多数都是采用连续查询方式。这种数据和查询的连续性对管理系统的资源需求很大。当系统资源不能满足查询处理要求,即查询处理任务超过系统的最大处理能力,用户的查询将得不到及时、正确地处理。同时,如果查询处理时间超过了查询截止期需求,查询结果就没有意义,甚至会造成灾难性后果。目前,很多研究都集中于数据流系统的降载,对支持实时查询处理的实时数据流系统降载的研究比较少。 本论文主要研究支持实时查询处理的实时数据流管理系统中的降载方法,主要包括如下三个方面:随机降载方法、语义降载方法与共享滑动窗口连接操作的降载方法。最后,通过实时数据流管理系统测试平台验证了所提出算法在提高系统吞吐量与降低截止期错失率方面表现出良好的性能。 针对实时数据流应用需求,提出了一种适合实时查询的数据流处理框架结构RT-DSPA和相应的多层过载处理策略MLOHS,为降载方法的研究提供一个框架基础。RT-DSPA分为用户层、DSMS层以及数据源层多个功能模块,具有多层性、可扩展性、健壮性以及可配置性的特点。 在随机降载方面,提出了一种基于数据流流速的负载估计算法;在实时数据流处理框架与负载估计算法的基础上,提出了一种截止期敏感的随机降载算法RLS-EDA。由于系统负载经常波动较大,该算法利用截止期的特点,使用暂存所丢弃元组技术充分地利用CPU空闲资源,使降载执行后系统的吞吐量得到提高,进而尽可能地降低查询截止期错失率;最后,讨论了降载过程中的队列维护策略、含共享操作符查询网络中的降载位置以及降载操作符插入查询网络的算法。实验结果表明,在系统负载波动较大的情况下,RLS-EDA算法表现出良好的性能。 在充分了解数据流及查询特征的情况下,语义降载表现出更好的降载效果。为明确语义降载时使用到的语义,提出了元组价值、价值等级的概念,给出价值等级划分时发生冲突的解决方法。设计了适合实时数据流管理系统的价值等级–执行开销优先级表和截止期–价值密度优先级表,其在确定优先级时可考虑多维因素。基于这两种优先级表设计,提出了相对应的语义降载算法SLS-PT-VD&EC和SLS-PT-D&TVD。基于优先级表的语义降载算法能够灵活地满足用户的不同需求,同时提高系统降载时的性能。 最后,针对共享滑动窗口连接操作符的过载情况,利用查询截止期的特点,提出了一种基于暂存丢弃元组技术的共享滑动窗口连接的降载算法LS-SJRT;为减小LS-SJRT算法的降载开销,提出了一种改进的基于调节滑动窗口宽度的共享滑动窗口连接降载算法LS-SJRT-CW。实验结果显示这两种算法在共享连接操作符过载时都能够表现出较好的性能。

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We give a one-pass, O~(m^{1-2/k})-space algorithm for estimating the k-th frequency moment of a data stream for any real k>2. Together with known lower bounds, this resolves the main problem left open by Alon, Matias, Szegedy, STOC'96. Our algorithm enables deletions as well as insertions of stream elements.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We use an information-theoretic method developed by Neifeld and Lee [J. Opt. Soc. Am. A 25, C31 (2008)] to analyze the performance of a slow-light system. Slow-light is realized in this system via stimulated Brillouin scattering in a 2 km-long, room-temperature, highly nonlinear fiber pumped by a laser whose spectrum is tailored and broadened to 5 GHz. We compute the information throughput (IT), which quantifies the fraction of information transferred from the source to the receiver and the information delay (ID), which quantifies the delay of a data stream at which the information transfer is largest, for a range of experimental parameters. We also measure the eye-opening (EO) and signal-to-noise ratio (SNR) of the transmitted data stream and find that they scale in a similar fashion to the information-theoretic method. Our experimental findings are compared to a model of the slow-light system that accounts for all pertinent noise sources in the system as well as data-pulse distortion due to the filtering effect of the SBS process. The agreement between our observations and the predictions of our model is very good. Furthermore, we compare measurements of the IT for an optimal flattop gain profile and for a Gaussian-shaped gain profile. For a given pump-beam power, we find that the optimal profile gives a 36% larger ID and somewhat higher IT compared to the Gaussian profile. Specifically, the optimal (Gaussian) profile produces a fractional slow-light ID of 0.94 (0.69) and an IT of 0.86 (0.86) at a pump-beam power of 450 mW and a data rate of 2.5 Gbps. Thus, the optimal profile better utilizes the available pump-beam power, which is often a valuable resource in a system design.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Online artwork which streams web-cam images live from the Internet and re-mixes them into disjointed narrative sequences, thereby producing cinema as a 'found object' made entirely of live material streamed from the internet. ‘Short Films about Flying’ is an online film which explores how a cinematic work can be generated using live material from the internet. The work is driven by software that takes surveillance video from a live camera feed at Logan Airport, Boston, and combines this with randomly grabbed audio from the web and texts taken from websites, chat rooms, message boards etc. This results in an endless open edition of unique cinematic works in real-time. By combining the language of cinema with global real-time data technologies, this work is one of the first new media artworks to re-imagine the internet in a different sensory form as a cinematic space. ‘Short Films about Flying’ was developed over the course of a year in collaboration with Jon Thomson (Slade) to explore how the concept of the found object can be re-conceptualised as the found data stream. It has informed other research by Craighead and Thomson, such as the web project http://www.templatecinema.com, and began an examination into relationships between montage and live virtual data –an early example of which would be ‘Flat Earth’, an animated work developed for Channel 4 in 2007, with the production company Animate. This piece has been cited in discussions on new media art, as a significant example of artworks using a database as their determining structure. It was acquired for the Arts Council Collection and has continuously toured significant international venues over the last 4 years. Citations include:’ Time and Technology’ by Charlie Gere (2006); 'The Wrong Categories' by Kris Cohen (2006); 'Networked Art - Practices and Positions' edited by Tom Corby (Routledge 2005) and Grayson Perry in The Times (9.8.06).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Internet of Things är ett samlingsbegrepp för den utveckling som innebär att olika typer av enheter kan förses med sensorer och datachip som är uppkopplade mot internet. En ökad mängd data innebär en ökad förfrågan på lösningar som kan lagra, spåra, analysera och bearbeta data. Ett sätt att möta denna förfrågan är att använda sig av molnbaserade realtidsanalystjänster. Multi-tenant och single-tenant är två typer av arkitekturer för molnbaserade realtidsanalystjänster som kan användas för att lösa problemen med hanteringen av de ökade datamängderna. Dessa arkitekturer skiljer sig åt när det gäller komplexitet i utvecklingen. I detta arbete representerar Azure Stream Analytics en multi-tenant arkitektur och HDInsight/Storm representerar en single-tenant arkitektur. För att kunna göra en jämförelse av molnbaserade realtidsanalystjänster med olika arkitekturer, har vi valt att använda oss av användbarhetskriterierna: effektivitet, ändamålsenlighet och användarnöjdhet. Vi kom fram till att vi ville ha svar på följande frågor relaterade till ovannämnda tre användbarhetskriterier: • Vilka likheter och skillnader kan vi se i utvecklingstider? • Kan vi identifiera skillnader i funktionalitet? • Hur upplever utvecklare de olika analystjänsterna? Vi har använt en design and creation strategi för att utveckla två Proof of Concept prototyper och samlat in data genom att använda flera datainsamlingsmetoder. Proof of Concept prototyperna inkluderade två artefakter, en för Azure Stream Analytics och en för HDInsight/Storm. Vi utvärderade dessa genom att utföra fem olika scenarier som var för sig hade 2-5 delmål. Vi simulerade strömmande data genom att låta en applikation kontinuerligt slumpa fram data som vi analyserade med hjälp av de två realtidsanalystjänsterna. Vi har använt oss av observationer för att dokumentera hur vi arbetade med utvecklingen av analystjänsterna samt för att mäta utvecklingstider och identifiera skillnader i funktionalitet. Vi har även använt oss av frågeformulär för att ta reda på vad användare tyckte om analystjänsterna. Vi kom fram till att Azure Stream Analytics initialt var mer användbart än HDInsight/Storm men att skillnaderna minskade efter hand. Azure Stream Analytics var lättare att arbeta med vid simplare analyser medan HDInsight/Storm hade ett bredare val av funktionalitet.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering is a difficult problem especially when we consider the task in the context of a data stream of categorical attributes. In this paper, we propose σ-SCLOPE, a novel algorithm based on SCLOPE’s intuitive observation about cluster histograms. Unlike SCLOPE however, our algorithm consumes less memory per window and has a better clustering runtime for the same data stream in a given window. This positions σ-SCLOPE as a more attractive option over SCLOPE if a minor lost of clustering accuracy is insignificant in the application.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Based on the knowledge sharing model by Nonaka (1994), this study examines the relative efficacy of various Information Communication Technologies (ICTs) applications in facilitating sharing of explicit and tacit knowledge among professional accountants in Malaysia. The results of this study indicate that ICTs, generally, facilitate all modes of knowledge sharing. Best-Practice Repositories are effective for sharing of both explicit and tacit knowledge, while internet/e-mail facilities are effective for tacit knowledge sharing. Data warehousing /mining, on the other hand, is effective in facilitating self learning through tacit-to-tacit mode and explicit-to-explicit mode. ICT facilities used mainly for office administration are ineffective for knowledge sharing purpose. The implications of the findings are
discussed.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This brief deals with the problem of minor component analysis (MCA). Artificial neural networks can be exploited to achieve the task of MCA. Recent research works show that convergence of neural networks based MCA algorithms can be guaranteed if the learning rates are less than certain thresholds. However, the computation of these thresholds needs information about the eigenvalues of the autocorrelation matrix of data set, which is unavailable in online extraction of minor component from input data stream. In this correspondence, we introduce an adaptive learning rate into the OJAn MCA algorithm, such that its convergence condition does not depend on any unobtainable information, and can be easily satisfied in practical applications.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this research, we propose a facial expression recognition system with a layered encoding cascade optimization model. Since generating an effective facial representation is a vital step to the success of facial emotion recognition, a modified Local Gabor Binary Pattern operator is first employed to derive a refined initial face representation and we then propose two evolutionary algorithms for feature optimization including (i) direct similarity and (ii) Pareto-based feature selection, under the layered cascade model. The direct similarity feature selection considers characteristics within the same emotion category that give the minimum within-class variation while the Pareto-based feature optimization focuses on features that best represent each expression category and at the same time provide the most distinctions to other expressions. Both a neural network and an ensemble classifier with weighted majority vote are implemented for the recognition of seven expressions based on the selected optimized features. The ensemble model also automatically updates itself with the most recent concepts in the data. Evaluated with the Cohn-Kanade database, our system achieves the best accuracies when the ensemble classifier is applied, and outperforms other research reported in the literature with 96.8% for direct similarity based optimization and 97.4% for the Pareto-based feature selection. Cross-database evaluation with frontal images from the MMI database has also been conducted to further prove system efficiency where it achieves 97.5% for Pareto-based approach and 90.7% for direct similarity-based feature selection and outperforms related research for MMI. When evaluated with 90° side-view images extracted from the videos of the MMI database, the system achieves superior performances with >80% accuracies for both optimization algorithms. Experiments with other weighting and meta-learning combination methods for the construction of ensembles are also explored with our proposed ensemble showing great adpativity to new test data stream for cross-database evaluation. In future work, we aim to incorporate other filtering techniques and evolutionary algorithms into the optimization models to further enhance the recognition performance.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Classification of electrocardiogram (ECG) data stream is essential to diagnosis of critical heart conditions. It is vital to accurately detect abnormality in the ECG in order to prevent possible beginning of life-threatening cardiac symptoms. In this paper, we focus on identifying premature ventricular contraction (PVC) which is one of the most common heart rhythm abnormalities. We use "Replacing" strategy to check the effects of each individual heartbeat on the variation of principal directions. Based on this idea, an online PVC detection method is proposed to classify the new arriving PVC beats in the real-time and online manner. The proposed approach is tested on the MIT-BIH arrhythmia database (MIT-BIH-AR). The PVC detection accuracy was 98.77%, with the sensitivity and positive predictivity of 96.12% and 86.48%, respectively. These results are an improvement on previous reported results for PVC detection. In addition, our proposed method is effective in terms of computation time. The average execution time of our proposed method was 3.83 s for a 30 min ECG recording. It shows the capability of the classifier to detect abnormal PVCs in online manner.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The use of middleware technology in various types of systems, in order to abstract low-level details related to the distribution of application logic, is increasingly common. Among several systems that can be benefited from using these components, we highlight the distributed systems, where it is necessary to allow communications between software components located on different physical machines. An important issue related to the communication between distributed components is the provision of mechanisms for managing the quality of service. This work presents a metamodel for modeling middlewares based on components in order to provide to an application the abstraction of a communication between components involved in a data stream, regardless their location. Another feature of the metamodel is the possibility of self-adaptation related to the communication mechanism, either by updating the values of its configuration parameters, or by its replacement by another mechanism, in case of the restrictions of quality of service specified are not being guaranteed. In this respect, it is planned the monitoring of the communication state (application of techniques like feedback control loop), analyzing performance metrics related. The paradigm of Model Driven Development was used to generate the implementation of a middleware that will serve as proof of concept of the metamodel, and the configuration and reconfiguration policies related to the dynamic adaptation processes. In this sense was defined the metamodel associated to the process of a communication configuration. The MDD application also corresponds to the definition of the following transformations: the architectural model of the middleware in Java code, and the configuration model to XML

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Pós-graduação em Televisão Digital: Informação e Conhecimento - FAAC

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Synchronous telecommunication networks, distributed control systems and integrated circuits have its accuracy of operation dependent on the existence of a reliable time basis signal extracted from the line data stream and acquirable to each node. In this sense, the existence of a sub-network (inside the main network) dedicated to the distribution of the clock signals is crucially important. There are different solutions for the architecture of the time distribution sub-network and choosing one of them depends on cost, precision, reliability and operational security. In this work we expose: (i) the possible time distribution networks and their usual topologies and arrangements. (ii) How parameters of the network nodes can affect the reachability and stability of the synchronous state of a network. (iii) Optimizations methods for synchronous networks which can provide low cost architectures with operational precision, reliability and security. (C) 2011 Elsevier B. V. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.