851 resultados para large-scale systems


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. Malleable applications, where the number of processors on which the applications execute can be changed during executions, can make use of their malleability to better tolerate high failure rates. We present AdFT, an adaptive fault tolerance framework for long running malleable applications to maximize application performance in the presence of failures. AdFT framework includes cost models for evaluating the benefits of various fault tolerance actions including checkpointing, live-migration and rescheduling, and runtime decisions for dynamically selecting the fault tolerance actions at different points of application execution to maximize performance. Simulations with real and synthetic failure traces show that our approach outperforms existing fault tolerance mechanisms for malleable applications yielding up to 23% improvement in application performance, and is effective even for petascale systems and beyond.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exascale systems of the future are predicted to have mean time between failures (MTBF) of less than one hour. At such low MTBFs, employing periodic checkpointing alone will result in low efficiency because of the high number of application failures resulting in large amount of lost work due to rollbacks. In such scenarios, it is highly necessary to have proactive fault tolerance mechanisms that can help avoid significant number of failures. In this work, we have developed a mechanism for proactive fault tolerance using partial replication of a set of application processes. Our fault tolerance framework adaptively changes the set of replicated processes periodically based on failure predictions to avoid failures. We have developed an MPI prototype implementation, PAREP-MPI that allows changing the replica set. We have shown that our strategy involving adaptive process replication significantly outperforms existing mechanisms providing up to 20 percent improvement in application efficiency even for exascale systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collective behaviours can be observed in both natural and man-made systems composed of a large number of elemental subsystems. Typically, each elemental subsystem has its own dynamics but, whenever interaction between individuals occurs, the individual behaviours tend to be relaxed, and collective behaviours emerge. In this paper, the collective behaviour of a large-scale system composed of several coupled elemental particles is analysed. The dynamics of the particles are governed by the same type of equations but having different parameter values and initial conditions. Coupling between particles is based on statistical feedback, which means that each particle is affected by the average behaviour of its neighbours. It is shown that the global system may unveil several types of collective behaviours, corresponding to partial synchronisation, characterised by the existence of several clusters of synchronised subsystems, and global synchronisation between particles, where all the elemental particles synchronise completely.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mode of access: Internet.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper various techniques in relation to large-scale systems are presented. At first, explanation of large-scale systems and differences from traditional systems are given. Next, possible specifications and requirements on hardware and software are listed. Finally, examples of large-scale systems are presented.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This study proceeds from a central interest in the importance of systematically evaluating operational large-scale integrated information systems (IS) in organisations. The study is conducted within the IS-Impact Research Track at Queensland University of Technology (QUT). The goal of the IS-Impact Track is, "to develop the most widely employed model for benchmarking information systems in organizations for the joint benefit of both research and practice" (Gable et al, 2009). The track espouses programmatic research having the principles of incrementalism, tenacity, holism and generalisability through replication and extension research strategies. Track efforts have yielded the bicameral IS-Impact measurement model; the ‘impact’ half includes Organisational-Impact and Individual-Impact dimensions; the ‘quality’ half includes System-Quality and Information-Quality dimensions. Akin to Gregor’s (2006) analytic theory, the ISImpact model is conceptualised as a formative, multidimensional index and is defined as "a measure at a point in time, of the stream of net benefits from the IS, to date and anticipated, as perceived by all key-user-groups" (Gable et al., 2008, p: 381). The study adopts the IS-Impact model (Gable, et al., 2008) as its core theory base. Prior work within the IS-Impact track has been consciously constrained to Financial IS for their homogeneity. This study adopts a context-extension strategy (Berthon et al., 2002) with the aim "to further validate and extend the IS-Impact measurement model in a new context - i.e. a different IS - Human Resources (HR)". The overarching research question is: "How can the impacts of large-scale integrated HR applications be effectively and efficiently benchmarked?" This managerial question (Cooper & Emory, 1995) decomposes into two more specific research questions – In the new HR context: (RQ1): "Is the IS-Impact model complete?" (RQ2): "Is the ISImpact model valid as a 1st-order formative, 2nd-order formative multidimensional construct?" The study adhered to the two-phase approach of Gable et al. (2008) to hypothesise and validate a measurement model. The initial ‘exploratory phase’ employed a zero base qualitative approach to re-instantiating the IS-Impact model in the HR context. The subsequent ‘confirmatory phase’ sought to validate the resultant hypothesised measurement model against newly gathered quantitative data. The unit of analysis for the study is the application, ‘ALESCO’, an integrated large-scale HR application implemented at Queensland University of Technology (QUT), a large Australian university (with approximately 40,000 students and 5000 staff). Target respondents of both study phases were ALESCO key-user-groups: strategic users, management users, operational users and technical users, who directly use ALESCO or its outputs. An open-ended, qualitative survey was employed in the exploratory phase, with the objective of exploring the completeness and applicability of the IS-Impact model’s dimensions and measures in the new context, and to conceptualise any resultant model changes to be operationalised in the confirmatory phase. Responses from 134 ALESCO users to the main survey question, "What do you consider have been the impacts of the ALESCO (HR) system in your division/department since its implementation?" were decomposed into 425 ‘impact citations.’ Citation mapping using a deductive (top-down) content analysis approach instantiated all dimensions and measures of the IS-Impact model, evidencing its content validity in the new context. Seeking to probe additional (perhaps negative) impacts; the survey included the additional open question "In your opinion, what can be done better to improve the ALESCO (HR) system?" Responses to this question decomposed into a further 107 citations which in the main did not map to IS-Impact, but rather coalesced around the concept of IS-Support. Deductively drawing from relevant literature, and working inductively from the unmapped citations, the new ‘IS-Support’ construct, including the four formative dimensions (i) training, (ii) documentation, (iii) assistance, and (iv) authorisation (each having reflective measures), was defined as: "a measure at a point in time, of the support, the [HR] information system key-user groups receive to increase their capabilities in utilising the system." Thus, a further goal of the study became validation of the IS-Support construct, suggesting the research question (RQ3): "Is IS-Support valid as a 1st-order reflective, 2nd-order formative multidimensional construct?" With the aim of validating IS-Impact within its nomological net (identification through structural relations), as in prior work, Satisfaction was hypothesised as its immediate consequence. The IS-Support construct having derived from a question intended to probe IS-Impacts, too was hypothesised as antecedent to Satisfaction, thereby suggesting the research question (RQ4): "What is the relative contribution of IS-Impact and IS-Support to Satisfaction?" With the goal of testing the above research questions, IS-Impact, IS-Support and Satisfaction were operationalised in a quantitative survey instrument. Partial least squares (PLS) structural equation modelling employing 221 valid responses largely evidenced the validity of the commencing IS-Impact model in the HR context. ISSupport too was validated as operationalised (including 11 reflective measures of its 4 formative dimensions). IS-Support alone explained 36% of Satisfaction; IS-Impact alone 70%; in combination both explaining 71% with virtually all influence of ISSupport subsumed by IS-Impact. Key study contributions to research include: (1) validation of IS-Impact in the HR context, (2) validation of a newly conceptualised IS-Support construct as important antecedent of Satisfaction, and (3) validation of the redundancy of IS-Support when gauging IS-Impact. The study also makes valuable contributions to practice, the research track and the sponsoring organisation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose low-complexity algorithms based on Monte Carlo sampling for signal detection and channel estimation on the uplink in large-scale multiuser multiple-input-multiple-output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and a similar number of uplink users. A BS receiver that employs a novel mixed sampling technique (which makes a probabilistic choice between Gibbs sampling and random uniform sampling in each coordinate update) for detection and a Gibbs-sampling-based method for channel estimation is proposed. The algorithm proposed for detection alleviates the stalling problem encountered at high signal-to-noise ratios (SNRs) in conventional Gibbs-sampling-based detection and achieves near-optimal performance in large systems with M-ary quadrature amplitude modulation (M-QAM). A novel ingredient in the detection algorithm that is responsible for achieving near-optimal performance at low complexity is the joint use of a mixed Gibbs sampling (MGS) strategy coupled with a multiple restart (MR) strategy with an efficient restart criterion. Near-optimal detection performance is demonstrated for a large number of BS antennas and users (e. g., 64 and 128 BS antennas and users). The proposed Gibbs-sampling-based channel estimation algorithm refines an initial estimate of the channel obtained during the pilot phase through iterations with the proposed MGS-based detection during the data phase. In time-division duplex systems where channel reciprocity holds, these channel estimates can be used for multiuser MIMO precoding on the downlink. The proposed receiver is shown to achieve good performance and scale well for large dimensions.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we propose a low-complexity algorithm based on Markov chain Monte Carlo (MCMC) technique for signal detection on the uplink in large scale multiuser multiple input multiple output (MIMO) systems with tens to hundreds of antennas at the base station (BS) and similar number of uplink users. The algorithm employs a randomized sampling method (which makes a probabilistic choice between Gibbs sampling and random sampling in each iteration) for detection. The proposed algorithm alleviates the stalling problem encountered at high SNRs in conventional MCMC algorithm and achieves near-optimal performance in large systems with M-QAM. A novel ingredient in the algorithm that is responsible for achieving near-optimal performance at low complexities is the joint use of a randomized MCMC (R-MCMC) strategy coupled with a multiple restart strategy with an efficient restart criterion. Near-optimal detection performance is demonstrated for large number of BS antennas and users (e.g., 64, 128, 256 BS antennas/users).