33 resultados para Non-uniform distribution
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
Satellite-based rainfall monitoring is widely used for climatological studies because of its full global coverage but it is also of great importance for operational purposes especially in areas such as Africa where there is a lack of ground-based rainfall data. Satellite rainfall estimates have enormous potential benefits as input to hydrological and agricultural models because of their real time availability, low cost and full spatial coverage. One issue that needs to be addressed is the uncertainty on these estimates. This is particularly important in assessing the likely errors on the output from non-linear models (rainfall-runoff or crop yield) which make use of the rainfall estimates, aggregated over an area, as input. Correct assessment of the uncertainty on the rainfall is non-trivial as it must take account of • the difference in spatial support of the satellite information and independent data used for calibration • uncertainties on the independent calibration data • the non-Gaussian distribution of rainfall amount • the spatial intermittency of rainfall • the spatial correlation of the rainfall field This paper describes a method for estimating the uncertainty on satellite-based rainfall values taking account of these factors. The method involves firstly a stochastic calibration which completely describes the probability of rainfall occurrence and the pdf of rainfall amount for a given satellite value, and secondly the generation of ensemble of rainfall fields based on the stochastic calibration but with the correct spatial correlation structure within each ensemble member. This is achieved by the use of geostatistical sequential simulation. The ensemble generated in this way may be used to estimate uncertainty at larger spatial scales. A case study of daily rainfall monitoring in the Gambia, west Africa for the purpose of crop yield forecasting is presented to illustrate the method.
Resumo:
A wind-tunnel study was conducted to investigate ventilation of scalars from urban-like geometries at neighbourhood scale by exploring two different geometries a uniform height roughness and a non-uniform height roughness, both with an equal plan and frontal density of λ p = λ f = 25%. In both configurations a sub-unit of the idealized urban surface was coated with a thin layer of naphthalene to represent area sources. The naphthalene sublimation method was used to measure directly total area-averaged transport of scalars out of the complex geometries. At the same time, naphthalene vapour concentrations controlled by the turbulent fluxes were detected using a fast Flame Ionisation Detection (FID) technique. This paper describes the novel use of a naphthalene coated surface as an area source in dispersion studies. Particular emphasis was also given to testing whether the concentration measurements were independent of Reynolds number. For low wind speeds, transfer from the naphthalene surface is determined by a combination of forced and natural convection. Compared with a propane point source release, a 25% higher free stream velocity was needed for the naphthalene area source to yield Reynolds-number-independent concentration fields. Ventilation transfer coefficients w T /U derived from the naphthalene sublimation method showed that, whilst there was enhanced vertical momentum exchange due to obstacle height variability, advection was reduced and dispersion from the source area was not enhanced. Thus, the height variability of a canopy is an important parameter when generalising urban dispersion. Fine resolution concentration measurements in the canopy showed the effect of height variability on dispersion at street scale. Rapid vertical transport in the wake of individual high-rise obstacles was found to generate elevated point-like sources. A Gaussian plume model was used to analyse differences in the downstream plumes. Intensified lateral and vertical plume spread and plume dilution with height was found for the non-uniform height roughness
Resumo:
Ionotropic gamma-amino butyric acid (GABA) receptors composed of heterogeneous molecular subunits are major mediators of inhibitory responses in the adult CNS. Here, we describe a novel ionotropic GABA receptor in mouse cerebellar Purkinje cells (PCs) using agents reported to have increased affinity for rho subunit-containing GABA(C) over other GABA receptors. Exogenous application of the GABA(C)-preferring agonist cis-4-aminocrotonic acid (CACA) evoked whole-cell currents in PCs, whilst equimolar concentrations of GABA evoked larger currents. CACA-evoked currents had a greater sensitivity to the selective GABA(C) antagonist (1,2,5,6-tetrahydropyridin-4-yl)methylphosphinic acid (TPMPA) than GABA-evoked currents. Focal application of agonists produced a differential response profile; CACA-evoked currents displayed a much more pronounced attenuation with increasing distance from the PC soma, displayed a slower time-to-peak and exhibited less desensitization than GABA-evoked currents. However, CACA-evoked currents were also completely blocked by bicuculline, a selective agent for GABA(A) receptors. Thus, we describe a population of ionotropic GABA receptors with a mixed GABA(A)/GABA(C) pharmacology. TPMPA reduced inhibitory synaptic transmission at interneurone-Purkinje cell (IN-PC) synapses, causing clear reductions in miniature inhibitory postsynaptic current (mIPSC) amplitude and frequency. Combined application of NO-711 (a selective GABA transporter subtype 1 (GAT-1) antagonist) and SNAP-5114 (a GAT-(2)/3/4 antagonist) induced a tonic GABA conductance in PCs; however, TPMPA had no effect on this current. Immunohistochemical studies suggest that rho subunits are expressed predominantly in PC soma and proximal dendritic compartments with a lower level of expression in more distal dendrites; this selective immunoreactivity contrasted with a more uniform distribution of GABA(A) alpha 1 subunits in PCs. Finally, co-immunoprecipitation studies suggest that rho subunits can form complexes with GABA(A) receptor alpha 1 subunits in the cerebellar cortex. Overall, these data suggest that rho subunits contribute to functional ionotropic receptors that mediate a component of phasic inhibitory GABAergic transmission at IN-PC synapses in the cerebellum.
Resumo:
There are several advantages of using metabolic labeling in quantitative proteomics. The early pooling of samples compared to post-labeling methods eliminates errors from different sample processing, protein extraction and enzymatic digestion. Metabolic labeling is also highly efficient and relatively inexpensive compared to commercial labeling reagents. However, methods for multiplexed quantitation in the MS-domain (or ‘non-isobaric’ methods), suffer from signal dilution at higher degrees of multiplexing, as the MS/MS signal for peptide identification is lower given the same amount of peptide loaded onto the column or injected into the mass spectrometer. This may partly be overcome by mixing the samples at non-uniform ratios, for instance by increasing the fraction of unlabeled proteins. We have developed an algorithm for arbitrary degrees of nonisobaric multiplexing for relative protein abundance measurements. We have used metabolic labeling with different levels of 15N, but the algorithm is in principle applicable to any isotope or combination of isotopes. Ion trap mass spectrometers are fast and suitable for LC-MS/MS and peptide identification. However, they cannot resolve overlapping isotopic envelopes from different peptides, which makes them less suitable for MS-based quantitation. Fourier-transform ion cyclotron resonance (FTICR) mass spectrometry is less suitable for LC-MS/MS, but provides the resolving power required to resolve overlapping isotopic envelopes. We therefore combined ion trap LC-MS/MS for peptide identification with FTICR LC-MS for quantitation using chromatographic alignment. We applied the method in a heat shock study in a plant model system (A. thaliana) and compared the results with gene expression data from similar experiments in literature.
Resumo:
Sea-level rise is an important aspect of climate change because of its impact on society and ecosystems. Here we present an intercomparison of results from ten coupled atmosphere-ocean general circulation models (AOGCMs) for sea-level changes simulated for the twentieth century and projected to occur during the twenty first century in experiments following scenario IS92a for greenhouse gases and sulphate aerosols. The model results suggest that the rate of sea-level rise due to thermal expansion of sea water has increased during the twentieth century, but the small set of tide gauges with long records might not be adequate to detect this acceleration. The rate of sea-level rise due to thermal expansion continues to increase throughout the twenty first century, and the projected total is consequently larger than in the twentieth century; for 1990-2090 it amounts to 0.20-0.37 in. This wide range results from systematic uncertainty in modelling of climate change and of heat uptake by the ocean. The AOGCMs agree that sea-level rise is expected to be geographically non-uniform, with some regions experiencing as much as twice the global average, and others practically zero, but they do not agree about the geographical pattern. The lack of agreement indicates that we cannot currently have confidence in projections of local sea- level changes, and reveals a need for detailed analysis and intercomparison in order to understand and reduce the disagreements.
Resumo:
The problem of calculating the probability of error in a DS/SSMA system has been extensively studied for more than two decades. When random sequences are employed some conditioning must be done before the application of the central limit theorem is attempted, leading to a Gaussian distribution. The authors seek to characterise the multiple access interference as a random-walk with a random number of steps, for random and deterministic sequences. Using results from random-walk theory, they model the interference as a K-distributed random variable and use it to calculate the probability of error in the form of a series, for a DS/SSMA system with a coherent correlation receiver and BPSK modulation under Gaussian noise. The asymptotic properties of the proposed distribution agree with other analyses. This is, to the best of the authors' knowledge, the first attempt to propose a non-Gaussian distribution for the interference. The modelling can be extended to consider multipath fading and general modulation
Resumo:
A system identification algorithm is introduced for Hammerstein systems that are modelled using a non-uniform rational B-spline (NURB) neural network. The proposed algorithm consists of two successive stages. First the shaping parameters in NURB network are estimated using a particle swarm optimization (PSO) procedure. Then the remaining parameters are estimated by the method of the singular value decomposition (SVD). Numerical examples are utilized to demonstrate the efficacy of the proposed approach.
Resumo:
We propose a new algorithm for summarizing properties of large-scale time-evolving networks. This type of data, recording connections that come and go over time, is being generated in many modern applications, including telecommunications and on-line human social behavior. The algorithm computes a dynamic measure of how well pairs of nodes can communicate by taking account of routes through the network that respect the arrow of time. We take the conventional approach of downweighting for length (messages become corrupted as they are passed along) and add the novel feature of downweighting for age (messages go out of date). This allows us to generalize widely used Katz-style centrality measures that have proved popular in network science to the case of dynamic networks sampled at non-uniform points in time. We illustrate the new approach on synthetic and real data.
Resumo:
We address the problem of automatically identifying and restoring damaged and contaminated images. We suggest a novel approach based on a semi-parametric model. This has two components, a parametric component describing known physical characteristics and a more flexible non-parametric component. The latter avoids the need for a detailed model for the sensor, which is often costly to produce and lacking in robustness. We assess our approach using an analysis of electroencephalographic images contaminated by eye-blink artefacts and highly damaged photographs contaminated by non-uniform lighting. These experiments show that our approach provides an effective solution to problems of this type.
Resumo:
Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.
Resumo:
We consider a three dimensional system consisting of a large number of small spherical particles, distributed in a range of sizes and heights (with uniform distribution in the horizontal direction). Particles move vertically at a size-dependent terminal velocity. They are either allowed to merge whenever they cross or there is a size ratio criterion enforced to account for collision efficiency. Such a system may be described, in mean field approximation, by the Smoluchowski kinetic equation with a differential sedimentation kernel. We obtain self-similar steady-state and time-dependent solutions to the kinetic equation, using methods borrowed from weak turbulence theory. Analytical results are compared with direct numerical simulations (DNS) of moving and merging particles, and a good agreement is found.
Resumo:
Observational evidence is scarce concerning the distribution of plant pathogen population sizes or densities as a function of time-scale or spatial scale. For wild pathosystems we can only get indirect evidence from evolutionary patterns and the consequences of biological invasions.We have little or no evidence bearing on extermination of hosts by pathogens, or successful escape of a host from a pathogen. Evidence over the last couple of centuries from crops suggest that the abundance of particular pathogens in the spectrum affecting a given host can vary hugely on decadal timescales. However, this may be an artefact of domestication and intensive cultivation. Host-pathogen dynamics can be formulated mathematically fairly easily–for example as SIR-type differential equation or difference equation models, and this has been the (successful) focus of recent work in crops. “Long-term” is then discussed in terms of the time taken to relax from a perturbation to the asymptotic state. However, both host and pathogen dynamics are driven by environmental factors as well as their mutual interactions, and both host and pathogen co-evolve, and evolve in response to external factors. We have virtually no information about the importance and natural role of higher trophic levels (hyperpathogens) and competitors, but they could also induce long-scale fluctuations in the abundance of pathogens on particular hosts. In wild pathosystems the host distribution cannot be modelled as either a uniform density or even a uniform distribution of fields (which could then be treated as individuals). Patterns of short term density-dependence and the detail of host distribution are therefore critical to long-term dynamics. Host density distributions are not usually scale-free, but are rarely uniform or clearly structured on a single scale. In a (multiply structured) metapopulation with coevolution and external disturbances it could well be the case that the time required to attain equilibrium (if it exists) based on conditions stable over a specified time-scale is longer than that time-scale. Alternatively, local equilibria may be reached fairly rapidly following perturbations but the meta-population equilibrium be attained very slowly. In either case, meta-stability on various time-scales is a more relevant than equilibrium concepts in explaining observed patterns.
Resumo:
Numerical simulations are performed to assess the influence of the large-scale circulation on the transition from suppressed to active convection. As a model tool, we used a coupled-column model. It consists of two cloud-resolving models which are fully coupled via a large-scale circulation which is derived from the requirement that the instantaneous domain-mean potential temperature profiles of the two columns remain close to each other. This is known as the weak-temperature gradient approach. The simulations of the transition are initialized from coupled-column simulations over non-uniform surface forcing and the transition is forced within the dry column by changing the local and/or remote surface forcings to uniform surface forcing across the columns. As the strength of the circulation is reduced to zero, moisture is recharged into the dry column and a transition to active convection occurs once the column is sufficiently moistened to sustain deep convection. Direct effects of changing surface forcing occur over the first few days only. Afterward, it is the evolution of the large-scale circulation which systematically modulates the transition. Its contributions are approximately equally divided between the heating and moistening effects. A transition time is defined to summarize the evolution from suppressed to active convection. It is the time when the rain rate within the dry column is halfway to the mean value obtained at equilibrium over uniform surface forcing. The transition time is around twice as long for a transition that is forced remotely compared to a transition that is forced locally. Simulations in which both local and remote surface forcings are changed produce intermediate transition times.