946 resultados para Vertical Data Distribution
Resumo:
Studies by Enfield and Allen (1980), McLain et al (1985), and others have shown that anomalously warm years in the northern coastal California Current correspond to El Niño conditions in the equatorial Pacific Ocean. Ocean model studies suggest a mechanical link between the northern coastal California Current and the equatorial ocean through long waves that propagate cyclonically along the ocean boundary (McCreary 1976; Clarke 1983; Shriver et al 1991). However, distinct observational evidence of such an oceanic connection is not extensive. Much of the supposed El Niño variation in temperature and sea level data from the coastal California Current region can be associated with the effects of anomalously intense north Pacific atmospheric cyclogenesis, which is frequently augmented during El Niño years (Wallace and Gutzler 1981; Simpson 1983; Emery and Hamilton 1984). This study uses time series of ocean temperature data to distinguish between locally forced effects, initiated by north Pacific atmospheric changes, and remotely forced effects, initiated by equatorial Pacific atmospheric changes related to El Niño events.
Resumo:
As network capacity has increased over the past decade, individuals and organisations have found it increasingly appealing to make use of remote services in the form of service-oriented architectures and cloud computing services. Data processed by remote services, however, is no longer under the direct control of the individual or organisation that provided the data, leaving data owners at risk of data theft or misuse. This paper describes a model by which data owners can control the distribution and use of their data throughout a dynamic coalition of service providers using digital rights management technology. Our model allows a data owner to establish the trustworthiness of every member of a coalition employed to process data, and to communicate a machine-enforceable usage policy to every such member.
Resumo:
Distributed systems are widely used for solving large-scale and data-intensive computing problems, including all-to-all comparison (ATAC) problems. However, when used for ATAC problems, existing computational frameworks such as Hadoop focus on load balancing for allocating comparison tasks, without careful consideration of data distribution and storage usage. While Hadoop-based solutions provide users with simplicity of implementation, their inherent MapReduce computing pattern does not match the ATAC pattern. This leads to load imbalances and poor data locality when Hadoop's data distribution strategy is used for ATAC problems. Here we present a data distribution strategy which considers data locality, load balancing and storage savings for ATAC computing problems in homogeneous distributed systems. A simulated annealing algorithm is developed for data distribution and task scheduling. Experimental results show a significant performance improvement for our approach over Hadoop-based solutions.
Resumo:
This research studied distributed computing of all-to-all comparison problems with big data sets. The thesis formalised the problem, and developed a high-performance and scalable computing framework with a programming model, data distribution strategies and task scheduling policies to solve the problem. The study considered storage usage, data locality and load balancing for performance improvement in solving the problem. The research outcomes can be applied in bioinformatics, biometrics and data mining and other domains in which all-to-all comparisons are a typical computing pattern.
Resumo:
So far, various calculation models for the vertical distribution of suspended sediment concentration have been produced by several investigators from different theories. The limitations of all these models imply that it is possible to find a more reasonable model, for which each previous model can be included as special case. The formulation of a reasonable general model is the purpose of this paper.
The Trade-Off Between Implicit and Explicit Data Distribution in Shared-Memory Programming Paradigms
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
In systems that combine the outputs of classification methods (combination systems), such as ensembles and multi-agent systems, one of the main constraints is that the base components (classifiers or agents) should be diverse among themselves. In other words, there is clearly no accuracy gain in a system that is composed of a set of identical base components. One way of increasing diversity is through the use of feature selection or data distribution methods in combination systems. In this work, an investigation of the impact of using data distribution methods among the components of combination systems will be performed. In this investigation, different methods of data distribution will be used and an analysis of the combination systems, using several different configurations, will be performed. As a result of this analysis, it is aimed to detect which combination systems are more suitable to use feature distribution among the components
Resumo:
[EN] Vertical distributions of turbulent energy dissipation rates and fluorescence were measured simultaneously with a high-resolution micro-profiler in four different oceanographic regions, from temperate to polar and from coastal to open waters settings. High fluorescence values, forming a deep chlorophyll maximum (DCM), were often located in weakly stratified portions of the upper water column, just below layers with maximum levels of turbulent energy dissipation rate. In the vicinity of the DCM, a significant negative relationship between fluorescence and turbulent energy dissipation rate was found. We discuss the mechanisms that may explain the observed patterns of planktonic biomass distribution within the ocean mixed layer, including a vertically variable diffusion coefficient and the alteration of the cells sinking velocity by turbulent motion. These findings provide further insight into the processes controlling the vertical distribution of the pelagic community and position of the DCM.