987 resultados para data distribution


Relevância:

100.00% 100.00%

Publicador:

Resumo:

As network capacity has increased over the past decade, individuals and organisations have found it increasingly appealing to make use of remote services in the form of service-oriented architectures and cloud computing services. Data processed by remote services, however, is no longer under the direct control of the individual or organisation that provided the data, leaving data owners at risk of data theft or misuse. This paper describes a model by which data owners can control the distribution and use of their data throughout a dynamic coalition of service providers using digital rights management technology. Our model allows a data owner to establish the trustworthiness of every member of a coalition employed to process data, and to communicate a machine-enforceable usage policy to every such member.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.

Relevância:

100.00% 100.00%

Publicador:

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data Distribution Management (DDM) is a core part of High Level Architecture standard, as its goal is to optimize the resources used by simulation environments to exchange data. It has to filter and match the set of information generated during a simulation, so that each federate, that is a simulation entity, only receives the information it needs. It is important that this is done quickly and to the best in order to get better performances and avoiding the transmission of irrelevant data, otherwise network resources may saturate quickly. The main topic of this thesis is the implementation of a super partes DDM testbed. It evaluates the goodness of DDM approaches, of all kinds. In fact it supports both region and grid based approaches, and it may support other different methods still unknown too. It uses three factors to rank them: execution time, memory and distance from the optimal solution. A prearranged set of instances is already available, but we also allow the creation of instances with user-provided parameters. This is how this thesis is structured. We start introducing what DDM and HLA are and what do they do in details. Then in the first chapter we describe the state of the art, providing an overview of the most well known resolution approaches and the pseudocode of the most interesting ones. The third chapter describes how the testbed we implemented is structured. In the fourth chapter we expose and compare the results we got from the execution of four approaches we have implemented. The result of the work described in this thesis can be downloaded on sourceforge using the following link: https://sourceforge.net/projects/ddmtestbed/. It is licensed under the GNU General Public License version 3.0 (GPLv3).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Il Data Distribution Management (DDM) è un componente dello standard High Level Architecture. Il suo compito è quello di rilevare le sovrapposizioni tra update e subscription extent in modo efficiente. All'interno di questa tesi si discute la necessità di avere un framework e per quali motivi è stato implementato. Il testing di algoritmi per un confronto equo, librerie per facilitare la realizzazione di algoritmi, automatizzazione della fase di compilazione, sono motivi che sono stati fondamentali per iniziare la realizzazione framework. Il motivo portante è stato che esplorando articoli scientifici sul DDM e sui vari algoritmi si è notato che in ogni articolo si creavano dei dati appositi per fare dei test. L'obiettivo di questo framework è anche quello di riuscire a confrontare gli algoritmi con un insieme di dati coerente. Si è deciso di testare il framework sul Cloud per avere un confronto più affidabile tra esecuzioni di utenti diversi. Si sono presi in considerazione due dei servizi più utilizzati: Amazon AWS EC2 e Google App Engine. Sono stati mostrati i vantaggi e gli svantaggi dell'uno e dell'altro e il motivo per cui si è scelto di utilizzare Google App Engine. Si sono sviluppati quattro algoritmi: Brute Force, Binary Partition, Improved Sort, Interval Tree Matching. Sono stati svolti dei test sul tempo di esecuzione e sulla memoria di picco utilizzata. Dai risultati si evince che l'Interval Tree Matching e l'Improved Sort sono i più efficienti. Tutti i test sono stati svolti sulle versioni sequenziali degli algoritmi e che quindi ci può essere un riduzione nel tempo di esecuzione per l'algoritmo Interval Tree Matching.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

En este trabajo se ha investigado la posibilidad de utilizar el estándar DDS (Data Distribution Service) desarrollado por el OMG (Object Management Group) para la monitorización en tiempo real del nivel de glucosa en pacientes diabéticos. Dicho estándar sigue el patrón publicador/suscriptor de modo que, en la prueba de concepto desarrollada, los sensores del punto de cuidado son publicadores de los valores de glucosa de los pacientes y diferentes supervisores se suscriben a esa información. Estos supervisores reaccionan de la forma más adecuada a los valores y la evolución del nivel de glucosa en el paciente, por ejemplo, registrando el valor de la muestra o generando una alarma. El software de intermediación que soporta la comunicación de datos sigue el estándar DDS. Esto facilita por un lado la escalabilidad e interoperatividad de la solución desarrollada y por otro la monitorización de niveles de glucosa y la activación de protocolos predefinidos en tiempo real. La investigación se enmarca dentro del proyecto intramural PERSONA del CIBER-BBN, cuyo objetivo es el diseño de herramientas de soporte a la decisión para la monitorización continua de pacientes personalizadas e integradas en una plataforma tecnológica para diabetes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

To generate realistic predictions, species distribution models require the accurate coregistration of occurrence data with environmental variables. There is a common assumption that species occurrence data are accurately georeferenced; however, this is often not the case. This study investigates whether locational uncertainty and sample size affect the performance and interpretation of fine-scale species distribution models. This study evaluated the effects of locational uncertainty across multiple sample sizes by subsampling and spatially degrading occurrence data. Distribution models were constructed for kelp (Ecklonia radiata), across a large study site (680 km2) off the coast of southeastern Australia. Generalized additive models were used to predict distributions based on fine-resolution (2·5 m cell size) seafloor variables, generated from multibeam echosounder data sets, and occurrence data from underwater towed video. The effects of different levels of locational uncertainty in combination with sample size were evaluated by comparing model performance and predicted distributions. While locational uncertainty was observed to influence some measures of model performance, in general this was small and varied based on the accuracy metric used. However, simulated locational uncertainty caused changes in variable importance and predicted distributions at fine scales, potentially influencing model interpretation. This was most evident with small sample sizes. Results suggested that seemingly high-performing, fine-scale models can be generated from data containing locational uncertainty, although interpreting their predictions can be misleading if the predictions are interpreted at scales similar to the spatial errors. This study demonstrated the need to consider predictions across geographic space rather than performance alone. The findings are important for conservation managers as they highlight the inherent variation in predictions between equally performing distribution models, and the subsequent restrictions on ecological interpretations.