73 resultados para height partition clustering


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensemble clustering (EC) can arise in data assimilation with ensemble square root filters (EnSRFs) using non-linear models: an M-member ensemble splits into a single outlier and a cluster of M−1 members. The stochastic Ensemble Kalman Filter does not present this problem. Modifications to the EnSRFs by a periodic resampling of the ensemble through random rotations have been proposed to address it. We introduce a metric to quantify the presence of EC and present evidence to dispel the notion that EC leads to filter failure. Starting from a univariate model, we show that EC is not a permanent but transient phenomenon; it occurs intermittently in non-linear models. We perform a series of data assimilation experiments using a standard EnSRF and a modified EnSRF by a resampling though random rotations. The modified EnSRF thus alleviates issues associated with EC at the cost of traceability of individual ensemble trajectories and cannot use some of algorithms that enhance performance of standard EnSRF. In the non-linear regimes of low-dimensional models, the analysis root mean square error of the standard EnSRF slowly grows with ensemble size if the size is larger than the dimension of the model state. However, we do not observe this problem in a more complex model that uses an ensemble size much smaller than the dimension of the model state, along with inflation and localisation. Overall, we find that transient EC does not handicap the performance of the standard EnSRF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Nest site selection in arboreal, domatia-dwelling ants, particularly those coexisting on a single host plant, is little understood. To examine this phenomenon we studied the African savannah tree Vachellia erioloba, which hosts ants in swollen-thorn domatia. We found four ant species from different genera (Cataulacus intrudens, Tapinoma subtile, Tetraponera ambigua and an unidentified Crematogaster species). In contrast to other African ant plants, many V. erioloba trees (41 % in our survey) were simultaneously co-occupied by more than one ant species. Our study provides quantitative field data describing: (1) aspects of tree and domatia morphology relevant to supporting a community of mutualist ants, (2) how ant species occupancy varies with domatia morphology and (3) how ant colony size varies with domatia size and species. We found that Crematogaster sp. occupy the largest thorns, followed by C. intrudens, with T. subtile in the smallest thorns. Thorn age, as well as nest entrance hole size correlated closely with ant species occupant. These differing occupancy patterns may help to explain the unusual coexistence of three ant species on individual myrmecophytic trees. In all three common ant species, colony size, as measured by total number of ants, increased with domatia size. Additionally, domatia volume and species identity interact to predict ant numbers, suggesting differing responses between species to increased availability of nesting space. The proportion of total ants in nests that were immatures varied with thorn volume and species, highlighting the importance of domatia morphology in influencing colony structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A perceived limitation of z-coordinate models associated with spurious diapycnal mixing in eddying, frontal flow, can be readily addressed through appropriate attention to the tracer advection schemes employed. It is demonstrated that tracer advection schemes developed by Prather and collaborators for application in the stratosphere, greatly improve the fidelity of eddying flows, reducing levels of spurious diapycnal mixing to below those directly measured in field experiments, ∼1 × 10−5 m2 s−1. This approach yields a model in which geostrophic eddies are quasi-adiabatic in the ocean interior, so that the residual-mean overturning circulation aligns almost perfectly with density contours. A reentrant channel configuration of the MIT General Circulation Model, that approximates the Antarctic Circumpolar Current, is used to examine these issues. Virtual analogs of ocean deliberate tracer release field experiments reinforce our conclusion, producing passive tracer solutions that parallel field experiments remarkably well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, the concept of available potential energy (APE) density is extended to a multicomponent Boussinesq fluid with a nonlinear equation of state. As shown by previous studies, the APE density is naturally interpreted as the work against buoyancy forces that a parcel needs to perform to move from a notional reference position at which its buoyancy vanishes to its actual position; because buoyancy can be defined relative to an arbitrary reference state, so can APE density. The concept of APE density is therefore best viewed as defining a class of locally defined energy quantities, each tied to a different reference state, rather than as a single energy variable. An important result, for which a new proof is given, is that the volume integrated APE density always exceeds Lorenz’s globally defined APE, except when the reference state coincides with Lorenz’s adiabatically re-arranged reference state of minimum potential energy. A parcel reference position is systematically defined as a level of neutral buoyancy (LNB): depending on the nature of the fluid and on how the reference state is defined, a parcel may have one, none, or multiple LNB within the fluid. Multiple LNB are only possible for a multicomponent fluid whose density depends on pressure. When no LNB exists within the fluid, a parcel reference position is assigned at the minimum or maximum geopotential height. The class of APE densities thus defined admits local and global balance equations, which all exhibit a conversion with kinetic energy, a production term by boundary buoyancy fluxes, and a dissipation term by internal diffusive effects. Different reference states alter the partition between APE production and dissipation, but neither affect the net conversion between kinetic energy and APE, nor the difference between APE production and dissipation. We argue that the possibility of constructing APE-like budgets based on reference states other than Lorenz’s reference state is more important than has been previously assumed, and we illustrate the feasibility of doing so in the context of an idealised and realistic oceanic example, using as reference states one with constant density and another one defined as the horizontal mean density field; in the latter case, the resulting APE density is found to be a reasonable approximation of the APE density constructed from Lorenz’s reference state, while being computationally cheaper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Under particular large-scale atmospheric conditions, several windstorms may affect Europe within a short time period. The occurrence of such cyclone families leads to large socioeconomic impacts and cumulative losses. The serial clustering of windstorms is analyzed for the North Atlantic/western Europe. Clustering is quantified as the dispersion (ratio variance/mean) of cyclone passages over a certain area. Dispersion statistics are derived for three reanalysis data sets and a 20-run European Centre Hamburg Version 5 /Max Planck Institute Version–Ocean Model Version 1 global climate model (ECHAM5/MPI-OM1 GCM) ensemble. The dependence of the seriality on cyclone intensity is analyzed. Confirming previous studies, serial clustering is identified in reanalysis data sets primarily on both flanks and downstream regions of the North Atlantic storm track. This pattern is a robust feature in the reanalysis data sets. For the whole area, extreme cyclones cluster more than nonextreme cyclones. The ECHAM5/MPI-OM1 GCM is generally able to reproduce the spatial patterns of clustering under recent climate conditions, but some biases are identified. Under future climate conditions (A1B scenario), the GCM ensemble indicates that serial clustering may decrease over the North Atlantic storm track area and parts of western Europe. This decrease is associated with an extension of the polar jet toward Europe, which implies a tendency to a more regular occurrence of cyclones over parts of the North Atlantic Basin poleward of 50°N and western Europe. An increase of clustering of cyclones is projected south of Newfoundland. The detected shifts imply a change in the risk of occurrence of cumulative events over Europe under future climate conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper presents a hierarchical clustering method for semantic Web service discovery. This method aims to improve the accuracy and efficiency of the traditional service discovery using vector space model. The Web service is converted into a standard vector format through the Web service description document. With the help of WordNet, a semantic analysis is conducted to reduce the dimension of the term vector and to make semantic expansion to meet the user’s service request. The process and algorithm of hierarchical clustering based semantic Web service discovery is discussed. Validation is carried out on the dataset.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cognitive experiments involving motor execution (ME) and motor imagery (MI) have been intensively studied using functional magnetic resonance imaging (fMRI). However, the functional networks of a multitask paradigm which include ME and MI were not widely explored. In this article, we aimed to investigate the functional networks involved in MI and ME using a method combining the hierarchical clustering analysis (HCA) and the independent component analysis (ICA). Ten right-handed subjects were recruited to participate a multitask experiment with conditions such as visual cue, MI, ME and rest. The results showed that four activation clusters were found including parts of the visual network, ME network, the MI network and parts of the resting state network. Furthermore, the integration among these functional networks was also revealed. The findings further demonstrated that the combined HCA with ICA approach was an effective method to analyze the fMRI data of multitasks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

During the last decades, several windstorm series hit Europe leading to large aggregated losses. Such storm series are examples of serial clustering of extreme cyclones, presenting a considerable risk for the insurance industry. Clustering of events and return periods of storm series for Germany are quantified based on potential losses using empirical models. Two reanalysis data sets and observations from German weather stations are considered for 30 winters. Histograms of events exceeding selected return levels (1-, 2- and 5-year) are derived. Return periods of historical storm series are estimated based on the Poisson and the negative binomial distributions. Over 4000 years of general circulation model (GCM) simulations forced with current climate conditions are analysed to provide a better assessment of historical return periods. Estimations differ between distributions, for example 40 to 65 years for the 1990 series. For such less frequent series, estimates obtained with the Poisson distribution clearly deviate from empirical data. The negative binomial distribution provides better estimates, even though a sensitivity to return level and data set is identified. The consideration of GCM data permits a strong reduction of uncertainties. The present results support the importance of considering explicitly clustering of losses for an adequate risk assessment for economical applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

From Milsom's equations, which describe the geometry of ray-path hops reflected from the ionospheric F-layer, algorithms for the simplified estimation of mirror-reflection height are developed. These allow for hop length and the effects of variations in underlying ionisation (via the ratio of the F2- and E-layer critical frequencies) and F2-layer peak height (via the M(3000)F2-factor). Separate algorithms are presented which are applicable to a range of signal frequencies about the FOT and to propagation at the MUF. The accuracies and complexities of the algorithms are compared with those inherent in the use of a procedure based on an equation developed by Shimazaki.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Some recent winters in Western Europe have been characterized by the occurrence of multiple extratropical cyclones following a similar path. The occurrence of such cyclone clusters leads to large socio-economic impacts due to damaging winds, storm surges, and floods. Recent studies have statistically characterized the clustering of extratropical cyclones over the North Atlantic and Europe and hypothesized potential physical mechanisms responsible for their formation. Here we analyze 4 months characterized by multiple cyclones over Western Europe (February 1990, January 1993, December 1999, and January 2007). The evolution of the eddy driven jet stream, Rossby wave-breaking, and upstream/downstream cyclone development are investigated to infer the role of the large-scale flow and to determine if clustered cyclones are related to each other. Results suggest that optimal conditions for the occurrence of cyclone clusters are provided by a recurrent extension of an intensified eddy driven jet toward Western Europe lasting at least 1 week. Multiple Rossby wave-breaking occurrences on both the poleward and equatorward flanks of the jet contribute to the development of these anomalous large-scale conditions. The analysis of the daily weather charts reveals that upstream cyclone development (secondary cyclogenesis, where new cyclones are generated on the trailing fronts of mature cyclones) is strongly related to cyclone clustering, with multiple cyclones developing on a single jet streak. The present analysis permits a deeper understanding of the physical reasons leading to the occurrence of cyclone families over the North Atlantic, enabling a better estimation of the associated cumulative risk over Europe.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Canopy Height Profile (CHP) procedure presented in Harding et al. (2001) for large footprint LiDAR data was tested in a closed canopy environment as a way of extracting vertical foliage profiles from LiDAR raw-waveform. In this study, an adaptation of this method to small-footprint data has been shown, tested and validated in an Australian sparse canopy forest at plot- and site-level. Further, the methodology itself has been enhanced by implementing a dataset-adjusted reflectance ratio calculation according to Armston et al. (2013) in the processing chain, and tested against a fixed ratio of 0.5 estimated for the laser wavelength of 1550nm. As a by-product of the methodology, effective leaf area index (LAIe) estimates were derived and compared to hemispherical photography-derived values. To assess the influence of LiDAR aggregation area size on the estimates in a sparse canopy environment, LiDAR CHPs and LAIes were generated by aggregating waveforms to plot- and site-level footprints (plot/site-aggregated) as well as in 5m grids (grid-processed). LiDAR profiles were then compared to leaf biomass field profiles generated based on field tree measurements. The correlation between field and LiDAR profiles was very high, with a mean R2 of 0.75 at plot-level and 0.86 at site-level for 55 plots and the corresponding 11 sites. Gridding had almost no impact on the correlation between LiDAR and field profiles (only marginally improvement), nor did the dataset-adjusted reflectance ratio. However, gridding and the dataset-adjusted reflectance ratio were found to improve the correlation between raw-waveform LiDAR and hemispherical photography LAIe estimates, yielding the highest correlations of 0.61 at plot-level and of 0.83 at site-level. This proved the validity of the approach and superiority of dataset-adjusted reflectance ratio of Armston et al. (2013) over a fixed ratio of 0.5 for LAIe estimation, as well as showed the adequacy of small-footprint LiDAR data for LAIe estimation in discontinuous canopy forests.