80 resultados para large-scale systems
em CentAUR: Central Archive University of Reading - UK
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
Reanalysis data obtained from data assimilation are increasingly used for diagnostic studies of the general circulation of the atmosphere, for the validation of modelling experiments and for estimating energy and water fluxes between the Earth surface and the atmosphere. Because fluxes are not specifically observed, but determined by the data assimilation system, they are not only influenced by the utilized observations but also by model physics and dynamics and by the assimilation method. In order to better understand the relative importance of humidity observations for the determination of the hydrological cycle, in this paper we describe an assimilation experiment using the ERA40 reanalysis system where all humidity data have been excluded from the observational data base. The surprising result is that the model, driven by the time evolution of wind, temperature and surface pressure, is able to almost completely reconstitute the large-scale hydrological cycle of the control assimilation without the use of any humidity data. In addition, analysis of the individual weather systems in the extratropics and tropics using an objective feature tracking analysis indicates that the humidity data have very little impact on these systems. We include a discussion of these results and possible consequences for the way moisture information is assimilated, as well as the potential consequences for the design of observing systems for climate monitoring. It is further suggested, with support from a simple assimilation study with another model, that model physics and dynamics play a decisive role for the hydrological cycle, stressing the need to better understand these aspects of model parametrization. .
Resumo:
This paper shows how the rainfall distribution over the UK, in the three major events on 13-15 June, 25 June and 20 July 2007, was related to troughs in the upper-level flow, and investigates the relationship of these features to a persistent large-scale flow pattern which extended around the northern hemisphere and its possible origins. Remote influences can be mediated by the propagation of large-scale atmospheric waves across the northern hemisphere and also by the origins of the air-masses that are wrapped into the developing weather systems delivering the rain to the UK. These dynamical influences are examined using analyses and forecasts produced by a range of atmospheric models.
Resumo:
Recent numerical experiments have demonstrated that the state of the stratosphere has a dynamical impact on the state of the troposphere. To account for such an effect, a number of mechanisms have been proposed in the literature, all of which amount to a large-scale adjustment of the troposphere to potential vorticity (PV) anomalies in the stratosphere. This paper analyses whether a simple PV adjustment suffices to explain the actual dynamical response of the troposphere to the state of the stratosphere, the actual response being determined by ensembles of numerical experiments run with an atmospheric general-circulation model. For this purpose, a new PV inverter is developed. It is shown that the simple PV adjustment hypothesis is inadequate. PV anomalies in the stratosphere induce, by inversion, flow anomalies in the troposphere that do not coincide spatially with the tropospheric changes determined by the numerical experiments. Moreover, the tropospheric anomalies induced by PV inversion are on a larger scale than the changes found in the numerical experiments, which are linked to the Atlantic and Pacific storm-tracks. These findings imply that the impact of the stratospheric state on the troposphere is manifested through the impact on individual synoptic-scale systems and their self-organization in the storm-tracks. Changes in these weather systems in the troposphere are not merely synoptic-scale noise on a larger scale tropospheric response, but an integral part of the mechanism by which the state of the stratosphere impacts that of the troposphere.
Resumo:
Midlatitude cyclones are important contributors to boundary layer ventilation. However, it is uncertain how efficient such systems are at transporting pollutants out of the boundary layer, and variations between cyclones are unexplained. In this study 15 idealized baroclinic life cycles, with a passive tracer included, are simulated to identify the relative importance of two transport processes: horizontal divergence and convergence within the boundary layer and large-scale advection by the warm conveyor belt. Results show that the amount of ventilation is insensitive to surface drag over a realistic range of values. This indicates that although boundary layer processes are necessary for ventilation they do not control the magnitude of ventilation. A diagnostic for the mass flux out of the boundary layer has been developed to identify the synoptic-scale variables controlling the strength of ascent in the warm conveyor belt. A very high level of correlation (R-2 values exceeding 0.98) is found between the diagnostic and the actual mass flux computed from the simulations. This demonstrates that the large-scale dynamics control the amount of ventilation, and the efficiency of midlatitude cyclones to ventilate the boundary layer can be estimated using the new mass flux diagnostic. We conclude that meteorological analyses, such as ERA-40, are sufficient to quantify boundary layer ventilation by the large-scale dynamics.
Resumo:
Background: The large-scale production of G-protein coupled receptors (GPCRs) for functional and structural studies remains a challenge. Recent successes have been made in the expression of a range of GPCRs using Pichia pastoris as an expression host. P. pastoris has a number of advantages over other expression systems including ability to post-translationally modify expressed proteins, relative low cost for production and ability to grow to very high cell densities. Several previous studies have described the expression of GPCRs in P. pastoris using shaker flasks, which allow culturing of small volumes (500 ml) with moderate cell densities (OD600 similar to 15). The use of bioreactors, which allow straightforward culturing of large volumes, together with optimal control of growth parameters including pH and dissolved oxygen to maximise cell densities and expression of the target receptors, are an attractive alternative. The aim of this study was to compare the levels of expression of the human Adenosine 2A receptor (A(2A)R) in P. pastoris under control of a methanol-inducible promoter in both flask and bioreactor cultures. Results: Bioreactor cultures yielded an approximately five times increase in cell density (OD600 similar to 75) compared to flask cultures prior to induction and a doubling in functional expression level per mg of membrane protein, representing a significant optimisation. Furthermore, analysis of a C-terminally truncated A2AR, terminating at residue V334 yielded the highest levels (200 pmol/mg) so far reported for expression of this receptor in P. pastoris. This truncated form of the receptor was also revealed to be resistant to C-terminal degradation in contrast to the WT A(2A)R, and therefore more suitable for further functional and structural studies. Conclusion: Large-scale expression of the A(2A)R in P. pastoris bioreactor cultures results in significant increases in functional expression compared to traditional flask cultures.
Resumo:
Where users are interacting in a distributed virtual environment, the actions of each user must be observed by peers with sufficient consistency and within a limited delay so as not to be detrimental to the interaction. The consistency control issue may be split into three parts: update control; consistent enactment and evolution of events; and causal consistency. The delay in the presentation of events, termed latency, is primarily dependent on the network propagation delay and the consistency control algorithms. The latency induced by the consistency control algorithm, in particular causal ordering, is proportional to the number of participants. This paper describes how the effect of network delays may be reduced and introduces a scalable solution that provides sufficient consistency control while minimising its effect on latency. The principles described have been developed at Reading over the past five years. Similar principles are now emerging in the simulation community through the HLA standard. This paper attempts to validate the suggested principles within the schema of distributed simulation and virtual environments and to compare and contrast with those described by the HLA definition documents.
Resumo:
We present a descriptive overview of the meteorology in the south eastern subtropical Pacific (SEP) during the VOCALS-REx intensive observations campaign which was carried out between October and November 2008. Mainly based on data from operational analyses, forecasts, reanalysis, and satellite observations, we focus on spatio-temporal scales from synoptic to planetary. A climatological context is given within which the specific conditions observed during the campaign are placed, with particular reference to the relationships between the large-scale and the regional circulations. The mean circulations associated with the diurnal breeze systems are also discussed. We then provide a summary of the day-to-day synoptic-scale circulation, air-parcel trajectories, and cloud cover in the SEP during VOCALS-REx. Three meteorologically distinct periods of time are identified and the large-scale causes for their different character are discussed. The first period was characterised by significant variability associated with synoptic-scale systems interesting the SEP; while the two subsequent phases were affected by planetary-scale disturbances with a slower evolution. The changes between initial and later periods can be partly explained from the regular march of the annual cycle, but contributions from subseasonal variability and its teleconnections were important. Across the whole of the two months under consideration we find a significant correlation between the depth of the inversion-capped marine boundary layer (MBL) and the amount of low cloud in the area of study. We discuss this correlation and argue that at least as a crude approximation a typical scaling may be applied relating MBL and cloud properties with the large-scale parameters of SSTs and tropospheric temperatures. These results are consistent with previously found empirical relationships involving lower-tropospheric stability.
Large-scale atmospheric dynamics of the wet winter 2009–2010 and its impact on hydrology in Portugal
Resumo:
The anomalously wet winter of 2010 had a very important impact on the Portuguese hydrological system. Owing to the detrimental effects of reduced precipitation in Portugal on the environmental and socio-economic systems, the 2010 winter was predominantly beneficial by reversing the accumulated precipitation deficits during the previous hydrological years. The recorded anomalously high precipitation amounts have contributed to an overall increase in river runoffs and dam recharges in the 4 major river basins. In synoptic terms, the winter 2010 was characterised by an anomalously strong westerly flow component over the North Atlantic that triggered high precipitation amounts. A dynamically coherent enhancement in the frequencies of mid-latitude cyclones close to Portugal, also accompanied by significant increases in the occurrence of cyclonic, south and south-westerly circulation weather types, are noteworthy. Furthermore, the prevalence of the strong negative phase of the North Atlantic Oscillation (NAO) also emphasises the main dynamical features of the 2010 winter. A comparison of the hydrological and atmospheric conditions between the 2010 winter and the previous 2 anomalously wet winters (1996 and 2001) was also carried out to isolate not only their similarities, but also their contrasting conditions, highlighting the limitations of estimating winter precipitation amounts in Portugal using solely the NAO phase as a predictor.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
As part of an international intercomparison project, a set of single column models (SCMs) and cloud-resolving models (CRMs) are run under the weak temperature gradient (WTG) method and the damped gravity wave (DGW) method. For each model, the implementation of the WTG or DGW method involves a simulated column which is coupled to a reference state defined with profiles obtained from the same model in radiative-convective equilibrium. The simulated column has the same surface conditions as the reference state and is initialized with profiles from the reference state. We performed systematic comparison of the behavior of different models under a consistent implementation of the WTG method and the DGW method and systematic comparison of the WTG and DGW methods in models with different physics and numerics. CRMs and SCMs produce a variety of behaviors under both WTG and DGW methods. Some of the models reproduce the reference state while others sustain a large-scale circulation which results in either substantially lower or higher precipitation compared to the value of the reference state. CRMs show a fairly linear relationship between precipitation and circulation strength. SCMs display a wider range of behaviors than CRMs. Some SCMs under the WTG method produce zero precipitation. Within an individual SCM, a DGW simulation and a corresponding WTG simulation can produce different signed circulation. When initialized with a dry troposphere, DGW simulations always result in a precipitating equilibrium state. The greatest sensitivities to the initial moisture conditions occur for multiple stable equilibria in some WTG simulations, corresponding to either a dry equilibrium state when initialized as dry or a precipitating equilibrium state when initialized as moist. Multiple equilibria are seen in more WTG simulations for higher SST. In some models, the existence of multiple equilibria is sensitive to some parameters in the WTG calculations.
Resumo:
As part of an international intercomparison project, the weak temperature gradient (WTG) and damped gravity wave (DGW) methods are used to parameterize large-scale dynamics in a set of cloud-resolving models (CRMs) and single column models (SCMs). The WTG or DGW method is implemented using a configuration that couples a model to a reference state defined with profiles obtained from the same model in radiative-convective equilibrium. We investigated the sensitivity of each model to changes in SST, given a fixed reference state. We performed a systematic comparison of the WTG and DGW methods in different models, and a systematic comparison of the behavior of those models using the WTG method and the DGW method. The sensitivity to the SST depends on both the large-scale parameterization method and the choice of the cloud model. In general, SCMs display a wider range of behaviors than CRMs. All CRMs using either the WTG or DGW method show an increase of precipitation with SST, while SCMs show sensitivities which are not always monotonic. CRMs using either the WTG or DGW method show a similar relationship between mean precipitation rate and column-relative humidity, while SCMs exhibit a much wider range of behaviors. DGW simulations produce large-scale velocity profiles which are smoother and less top-heavy compared to those produced by the WTG simulations. These large-scale parameterization methods provide a useful tool to identify the impact of parameterization differences on model behavior in the presence of two-way feedback between convection and the large-scale circulation.
Resumo:
Preferred structures in the surface pressure variability are investigated in and compared between two 100-year simulations of the Hadley Centre climate model HadCM3. In the first (control) simulation, the model is forced with pre-industrial carbon dioxide concentration (1×CO2) and in the second simulation the model is forced with doubled CO2 concentration (2×CO2). Daily winter (December-January-February) surface pressures over the Northern Hemisphere are analysed. The identification of preferred patterns is addressed using multivariate mixture models. For the control simulation, two significant flow regimes are obtained at 5% and 2.5% significance levels within the state space spanned by the leading two principal components. They show a high pressure centre over the North Pacific/Aleutian Islands associated with a low pressure centre over the North Atlantic, and its reverse. For the 2×CO2 simulation, no such behaviour is obtained. At higher-dimensional state space, flow patterns are obtained from both simulations. They are found to be significant at the 1% level for the control simulation and at the 2.5% level for the 2×CO2 simulation. Hence under CO2 doubling, regime behaviour in the large-scale wave dynamics weakens. Doubling greenhouse gas concentration affects both the frequency of occurrence of regimes and also the pattern structures. The less frequent regime becomes amplified and the more frequent regime weakens. The largest change is observed over the Pacific where a significant deepening of the Aleutian low is obtained under CO2 doubling.