937 resultados para Distributed data


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This research presents several components encompassing the scope of the objective of Data Partitioning and Replication Management in Distributed GIS Database. Modern Geographic Information Systems (GIS) databases are often large and complicated. Therefore data partitioning and replication management problems need to be addresses in development of an efficient and scalable solution. ^ Part of the research is to study the patterns of geographical raster data processing and to propose the algorithms to improve availability of such data. These algorithms and approaches are targeting granularity of geographic data objects as well as data partitioning in geographic databases to achieve high data availability and Quality of Service(QoS) considering distributed data delivery and processing. To achieve this goal a dynamic, real-time approach for mosaicking digital images of different temporal and spatial characteristics into tiles is proposed. This dynamic approach reuses digital images upon demand and generates mosaicked tiles only for the required region according to user's requirements such as resolution, temporal range, and target bands to reduce redundancy in storage and to utilize available computing and storage resources more efficiently. ^ Another part of the research pursued methods for efficient acquiring of GIS data from external heterogeneous databases and Web services as well as end-user GIS data delivery enhancements, automation and 3D virtual reality presentation. ^ There are vast numbers of computing, network, and storage resources idling or not fully utilized available on the Internet. Proposed "Crawling Distributed Operating System "(CDOS) approach employs such resources and creates benefits for the hosts that lend their CPU, network, and storage resources to be used in GIS database context. ^ The results of this dissertation demonstrate effective ways to develop a highly scalable GIS database. The approach developed in this dissertation has resulted in creation of TerraFly GIS database that is used by US government, researchers, and general public to facilitate Web access to remotely-sensed imagery and GIS vector information. ^

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Large scale distributed data stores rely on optimistic replication to scale and remain highly available in the face of net work partitions. Managing data without coordination results in eventually consistent data stores that allow for concurrent data updates. These systems often use anti-entropy mechanisms (like Merkle Trees) to detect and repair divergent data versions across nodes. However, in practice hash-based data structures are too expensive for large amounts of data and create too many false conflicts. Another aspect of eventual consistency is detecting write conflicts. Logical clocks are often used to track data causality, necessary to detect causally concurrent writes on the same key. However, there is a nonnegligible metadata overhead per key, which also keeps growing with time, proportional with the node churn rate. Another challenge is deleting keys while respecting causality: while the values can be deleted, perkey metadata cannot be permanently removed without coordination. Weintroduceanewcausalitymanagementframeworkforeventuallyconsistentdatastores,thatleveragesnodelogicalclocks(BitmappedVersion Vectors) and a new key logical clock (Dotted Causal Container) to provides advantages on multiple fronts: 1) a new efficient and lightweight anti-entropy mechanism; 2) greatly reduced per-key causality metadata size; 3) accurate key deletes without permanent metadata.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Existing research on synchronous remote working in CSCW has highlighted the troubles that can arise because actions at one site are (partially) unavailable to remote colleagues. Such ‘local action’ is routinely characterised as a nuisance, a distraction, subordinate and the like. This paper explores interconnections between ‘local action’ and ‘distributed work’ in the case of a research team virtually collocated through ‘MiMeG’. MiMeG is an e-Social Science tool that facilitates ‘distributed data sessions’ in which social scientists are able to remotely collaborate on the real-time analysis of video data. The data are visible and controllable in a shared workspace and participants are additionally connected via audio conferencing. The findings reveal that whilst the (partial) unavailability of local action is at times problematic, it is also used as a resource for coordinating work. The paper considers how local action is interactionally managed in distributed data sessions and concludes by outlining implications of the analysis for the design and study of technologies to support group-to-group collaboration.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular we look at two aspects, first how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories --- this is an important and challenging aspect of P-found because the data volumes involved are too large to be centralised. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling new scientific discoveries.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this article, we review the state-of-the-art techniques in mining data streams for mobile and ubiquitous environments. We start the review with a concise background of data stream processing, presenting the building blocks for mining data streams. In a wide range of applications, data streams are required to be processed on small ubiquitous devices like smartphones and sensor devices. Mobile and ubiquitous data mining target these applications with tailored techniques and approaches addressing scarcity of resources and mobility issues. Two categories can be identified for mobile and ubiquitous mining of streaming data: single-node and distributed. This survey will cover both categories. Mining mobile and ubiquitous data require algorithms with the ability to monitor and adapt the working conditions to the available computational resources. We identify the key characteristics of these algorithms and present illustrative applications. Distributed data stream mining in the mobile environment is then discussed, presenting the Pocket Data Mining framework. Mobility of users stimulates the adoption of context-awareness in this area of research. Context-awareness and collaboration are discussed in the Collaborative Data Stream Mining, where agents share knowledge to learn adaptive accurate models.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Despite the abundant availability,of protocols and application for peer-to-peer file sharing, several drawbacks are still present in the field. Among most notable drawbacks is the lack of a simple and interoperable way to share information among independent peer-to-peer networks. Another drawback is the requirement that the shared content can be accessed only by a limited number of compatible applications, making impossible their access to others applications and system. In this work we present a new approach for peer-to-peer data indexing, focused on organization and retrieval of metadata which describes the shared content. This approach results in a common and interoperable infrastructure, which provides a transparent access to data shared on multiple data sharing networks via a simple API. The proposed approach is evaluated using a case study, implemented as a cross-platform extension to Mozilla Fir fox browser; and demonstrates the advantages of such interoperability over conventional distributed data access strategies.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Despite the abundant availability of protocols and application for peer-to-peer file sharing, several drawbacks are still present in the field. Among most notable drawbacks is the lack of a simple and interoperable way to share information among independent peer-to-peer networks. Another drawback is the requirement that the shared content can be accessed only by a limited number of compatible applications, making impossible their access to others applications and system. In this work we present a new approach for peer-to-peer data indexing, focused on organization and retrieval of metadata which describes the shared content. This approach results in a common and interoperable infrastructure, which provides a transparent access to data shared on multiple data sharing networks via a simple API. The proposed approach is evaluated using a case study, implemented as a cross-platform extension to Mozilla Firefox browser, and demonstrates the advantages of such interoperability over conventional distributed data access strategies. © 2009 IEEE.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this position paper, we claim that the need for time consuming data preparation and result interpretation tasks in knowledge discovery, as well as for costly expert consultation and consensus building activities required for ontology building can be reduced through exploiting the interplay of data mining and ontology engineering. The aim is to obtain in a semi-automatic way new knowledge from distributed data sources that can be used for inference and reasoning, as well as to guide the extraction of further knowledge from these data sources. The proposed approach is based on the creation of a novel knowledge discovery method relying on the combination, through an iterative ?feedbackloop?, of (a) data mining techniques to make emerge implicit models from data and (b) pattern-based ontology engineering to capture these models in reusable, conceptual and inferable artefacts.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Many data streaming applications produces massive amounts of data that must be processed in a distributed fashion due to the resource limitation of a single machine. We propose a distributed data stream clustering protocol. Theoretical analysis shows preliminary results about the quality of discovered clustering. In addition, we present results about the ability to reduce the time complexity respect to the centralized approach.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

EPICS (Experimental Physics and Industrial Control System) lies in a set of software tools and applications which provide a software infrastructure for building distributed data acquisition and control systems. Currently there is an increase in use of such systems in large Physics experiments like ITER, ESS, and FREIA. In these experiments, advanced data acquisition systems using FPGA-based technology like FlexRIO are more frequently been used. The particular case of ITER (International Thermonuclear Experimental Reactor), the instrumentation and control system is supported by CCS (CODAC Core System), based on RHEL (Red Hat Enterprise Linux) operating system, and by the plant design specifications in which every CCS element is defined either hardware, firmware or software. In this degree final project the methodology proposed in Implementation of Intelligent Data Acquisition Systems for Fusion Experiments using EPICS and FlexRIO Technology Sanz et al. [1] is used. The final objective is to provide a document describing the fulfilled process and the source code of the data acquisition system accomplished. The use of the proposed methodology leads to have two diferent stages. The first one consists of the hardware modelling with graphic design tools like LabVIEWFPGA which later will be implemented in the FlexRIO device. In the next stage the design cycle is completed creating an EPICS controller that manages the device using a generic device support layer named NDS (Nominal Device Support). This layer integrates the data acquisition system developed into CCS (Control, data access and communication Core System) as an EPICS interface to the system. The use of FlexRIO technology drives the use of LabVIEW and LabVIEW FPGA respectively. RESUMEN. EPICS (Experimental Physics and Industrial Control System) es un conjunto de herramientas software utilizadas para el desarrollo e implementación de sistemas de adquisición de datos y control distribuidos. Cada vez es más utilizado para entornos de experimentación física a gran escala como ITER, ESS y FREIA entre otros. En estos experimentos se están empezando a utilizar sistemas de adquisición de datos avanzados que usan tecnología basada en FPGA como FlexRIO. En el caso particular de ITER, el sistema de instrumentación y control adoptado se basa en el uso de la herramienta CCS (CODAC Core System) basado en el sistema operativo RHEL (Red Hat) y en las especificaciones del diseño del sistema de planta, en la cual define todos los elementos integrantes del CCS, tanto software como firmware y hardware. En este proyecto utiliza la metodología propuesta para la implementación de sistemas de adquisición de datos inteligente basada en EPICS y FlexRIO. Se desea generar una serie de ejemplos que cubran dicho ciclo de diseño completo y que serían propuestos como casos de uso de dichas tecnologías. Se proporcionará un documento en el que se describa el trabajo realizado así como el código fuente del sistema de adquisición. La metodología adoptada consta de dos etapas diferenciadas. En la primera de ellas se modela el hardware y se sintetiza en el dispositivo FlexRIO utilizando LabVIEW FPGA. Posteriormente se completa el ciclo de diseño creando un controlador EPICS que maneja cada dispositivo creado utilizando una capa software genérica de manejo de dispositivos que se denomina NDS (Nominal Device Support). Esta capa integra la solución en CCS realizando la interfaz con la capa EPICS del sistema. El uso de la tecnología FlexRIO conlleva el uso del lenguaje de programación y descripción hardware LabVIEW y LabVIEW FPGA respectivamente.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

The Fibre Distributed Data Interface (FDDI) represents the new generation of local area networks (LANs). These high speed LANs are capable of supporting up to 500 users over a 100 km distance. User traffic is expected to be as diverse as file transfers, packet voice and video. As the proliferation of FDDI LANs continues, the need to interconnect these LANs arises. FDDI LAN interconnection can be achieved in a variety of different ways. Some of the most commonly used today are public data networks, dial up lines and private circuits. For applications that can potentially generate large quantities of traffic, such as an FDDI LAN, it is cost effective to use a private circuit leased from the public carrier. In order to send traffic from one LAN to another across the leased line, a routing algorithm is required. Much research has been done on the Bellman-Ford algorithm and many implementations of it exist in computer networks. However, due to its instability and problems with routing table loops it is an unsatisfactory algorithm for interconnected FDDI LANs. A new algorithm, termed ISIS which is being standardized by the ISO provides a far better solution. ISIS will be implemented in many manufacturers routing devices. In order to make the work as practical as possible, this algorithm will be used as the basis for all the new algorithms presented. The ISIS algorithm can be improved by exploiting information that is dropped by that algorithm during the calculation process. A new algorithm, called Down Stream Path Splits (DSPS), uses this information and requires only minor modification to some of the ISIS routing procedures. DSPS provides a higher network performance, with very little additional processing and storage requirements. A second algorithm, also based on the ISIS algorithm, generates a massive increase in network performance. This is achieved by selecting alternative paths through the network in times of heavy congestion. This algorithm may select the alternative path at either the originating node, or any node along the path. It requires more processing and memory storage than DSPS, but generates a higher network power. The final algorithm combines the DSPS algorithm with the alternative path algorithm. This is the most flexible and powerful of the algorithms developed. However, it is somewhat complex and requires a fairly large storage area at each node. The performance of the new routing algorithms is tested in a comprehensive model of interconnected LANs. This model incorporates the transport through physical layers and generates random topologies for routing algorithm performance comparisons. Using this model it is possible to determine which algorithm provides the best performance without introducing significant complexity and storage requirements.