3 resultados para Real-time data processing.
em Aston University Research Archive
Resumo:
Recent advances in technology have produced a significant increase in the availability of free sensor data over the Internet. With affordable weather monitoring stations now available to individual meteorology enthusiasts a reservoir of real time data such as temperature, rainfall and wind speed can now be obtained for most of the United States and Europe. Despite the abundance of available data, obtaining useable information about the weather in your local neighbourhood requires complex processing that poses several challenges. This paper discusses a collection of technologies and applications that harvest, refine and process this data, culminating in information that has been tailored toward the user. In this case we are particularly interested in allowing a user to make direct queries about the weather at any location, even when this is not directly instrumented, using interpolation methods. We also consider how the uncertainty that the interpolation introduces can then be communicated to the user of the system, using UncertML, a developing standard for uncertainty representation.
Resumo:
Recent advances in technology have produced a significant increase in the availability of free sensor data over the Internet. With affordable weather monitoring stations now available to individual meteorology enthusiasts a reservoir of real time data such as temperature, rainfall and wind speed can now be obtained for most of the United States and Europe. Despite the abundance of available data, obtaining useable information about the weather in your local neighbourhood requires complex processing that poses several challenges. This paper discusses a collection of technologies and applications that harvest, refine and process this data, culminating in information that has been tailored toward the user. In this case we are particularly interested in allowing a user to make direct queries about the weather at any location, even when this is not directly instrumented, using interpolation methods. We also consider how the uncertainty that the interpolation introduces can then be communicated to the user of the system, using UncertML, a developing standard for uncertainty representation.
Resumo:
This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.