155 resultados para cluster algorithms


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Data assimilation algorithms are a crucial part of operational systems in numerical weather prediction, hydrology and climate science, but are also important for dynamical reconstruction in medical applications and quality control for manufacturing processes. Usually, a variety of diverse measurement data are employed to determine the state of the atmosphere or to a wider system including land and oceans. Modern data assimilation systems use more and more remote sensing data, in particular radiances measured by satellites, radar data and integrated water vapor measurements via GPS/GNSS signals. The inversion of some of these measurements are ill-posed in the classical sense, i.e. the inverse of the operator H which maps the state onto the data is unbounded. In this case, the use of such data can lead to significant instabilities of data assimilation algorithms. The goal of this work is to provide a rigorous mathematical analysis of the instability of well-known data assimilation methods. Here, we will restrict our attention to particular linear systems, in which the instability can be explicitly analyzed. We investigate the three-dimensional variational assimilation and four-dimensional variational assimilation. A theory for the instability is developed using the classical theory of ill-posed problems in a Banach space framework. Further, we demonstrate by numerical examples that instabilities can and will occur, including an example from dynamic magnetic tomography.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High spatial resolution environmental data gives us a better understanding of the environmental factors affecting plant distributions at fine spatial scales. However, large environmental datasets dramatically increase compute times and output species model size stimulating the need for an alternative computing solution. Cluster computing offers such a solution, by allowing both multiple plant species Environmental Niche Models (ENMs) and individual tiles of high spatial resolution models to be computed concurrently on the same compute cluster. We apply our methodology to a case study of 4,209 species of Mediterranean flora (around 17% of species believed present in the biome). We demonstrate a 16 times speed-up of ENM computation time when 16 CPUs were used on the compute cluster. Our custom Java ‘Merge’ and ‘Downsize’ programs reduce ENM output files sizes by 94%. The median 0.98 test AUC score of species ENMs is aided by various species occurrence data filtering techniques. Finally, by calculating the percentage change of individual grid cell values, we map the projected percentages of plant species vulnerable to climate change in the Mediterranean region between 1950–2000 and 2020.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to assist in comparing the computational techniques used in different models, the authors propose a standardized set of one-dimensional numerical experiments that could be completed for each model. The results of these experiments, with a simplified form of the computational representation for advection, diffusion, pressure gradient term, Coriolis term, and filter used in the models, should be reported in the peer-reviewed literature. Specific recommendations are described in this paper.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We discuss the modeling of dielectric responses for an electromagnetically excited network of capacitors and resistors using a systems identification framework. Standard models that assume integral order dynamics are augmented to incorporate fractional order dynamics. This enables us to relate more faithfully the modeled responses to those reported in the Dielectrics literature.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the fast development of the Internet, wireless communications and semiconductor devices, home networking has received significant attention. Consumer products can collect and transmit various types of data in the home environment. Typical consumer sensors are often equipped with tiny, irreplaceable batteries and it therefore of the utmost importance to design energy efficient algorithms to prolong the home network lifetime and reduce devices going to landfill. Sink mobility is an important technique to improve home network performance including energy consumption, lifetime and end-to-end delay. Also, it can largely mitigate the hot spots near the sink node. The selection of optimal moving trajectory for sink node(s) is an NP-hard problem jointly optimizing routing algorithms with the mobile sink moving strategy is a significant and challenging research issue. The influence of multiple static sink nodes on energy consumption under different scale networks is first studied and an Energy-efficient Multi-sink Clustering Algorithm (EMCA) is proposed and tested. Then, the influence of mobile sink velocity, position and number on network performance is studied and a Mobile-sink based Energy-efficient Clustering Algorithm (MECA) is proposed. Simulation results validate the performance of the proposed two algorithms which can be deployed in a consumer home network environment.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Objective To undertake a process evaluation of pharmacists' recommendations arising in the context of a complex IT-enabled pharmacist-delivered randomised controlled trial (PINCER trial) to reduce the risk of hazardous medicines management in general practices. Methods PINCER pharmacists manually recorded patients’ demographics, details of interventions recommended, actions undertaken by practice staff and time taken to manage individual cases of hazardous medicines management. Data were coded and double entered into SPSS v15, and then summarised using percentages for categorical data (with 95% CI) and, as appropriate, means (SD) or medians (IQR) for continuous data. Key findings Pharmacists spent a median of 20 minutes (IQR 10, 30) reviewing medical records, recommending interventions and completing actions in each case of hazardous medicines management. Pharmacists judged 72% (95%CI 70, 74) (1463/2026) of cases of hazardous medicines management to be clinically relevant. Pharmacists recommended 2105 interventions in 74% (95%CI 73, 76) (1516/2038) of cases and 1685 actions were taken in 61% (95%CI 59, 63) (1246/2038) of cases; 66% (95%CI 64, 68) (1383/2105) of interventions recommended by pharmacists were completed and 5% (95%CI 4, 6) (104/2105) of recommendations were accepted by general practitioners (GPs), but not completed at the end of the pharmacists’ placement; the remaining recommendations were rejected or considered not relevant by GPs. Conclusions The outcome measures were used to target pharmacist activity in general practice towards patients at risk from hazardous medicines management. Recommendations from trained PINCER pharmacists were found to be broadly acceptable to GPs and led to ameliorative action in the majority of cases. It seems likely that the approach used by the PINCER pharmacists could be employed by other practice pharmacists following appropriate training.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The variability of results from different automated methods of detection and tracking of extratropical cyclones is assessed in order to identify uncertainties related to the choice of method. Fifteen international teams applied their own algorithms to the same dataset—the period 1989–2009 of interim European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERAInterim) data. This experiment is part of the community project Intercomparison of Mid Latitude Storm Diagnostics (IMILAST; see www.proclim.ch/imilast/index.html). The spread of results for cyclone frequency, intensity, life cycle, and track location is presented to illustrate the impact of using different methods. Globally, methods agree well for geographical distribution in large oceanic regions, interannual variability of cyclone numbers, geographical patterns of strong trends, and distribution shape for many life cycle characteristics. In contrast, the largest disparities exist for the total numbers of cyclones, the detection of weak cyclones, and distribution in some densely populated regions. Consistency between methods is better for strong cyclones than for shallow ones. Two case studies of relatively large, intense cyclones reveal that the identification of the most intense part of the life cycle of these events is robust between methods, but considerable differences exist during the development and the dissolution phases.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Boreal winter wind storm situations over Central Europe are investigated by means of an objective cluster analysis. Surface data from the NCEP-Reanalysis and ECHAM4/OPYC3-climate change GHG simulation (IS92a) are considered. To achieve an optimum separation of clusters of extreme storm conditions, 55 clusters of weather patterns are differentiated. To reduce the computational effort, a PCA is initially performed, leading to a data reduction of about 98 %. The clustering itself was computed on 3-day periods constructed with the first six PCs using "k-means" clustering algorithm. The applied method enables an evaluation of the time evolution of the synoptic developments. The climate change signal is constructed by a projection of the GCM simulation on the EOFs attained from the NCEP-Reanalysis. Consequently, the same clusters are obtained and frequency distributions can be compared. For Central Europe, four primary storm clusters are identified. These clusters feature almost 72 % of the historical extreme storms events and add only to 5 % of the total relative frequency. Moreover, they show a statistically significant signature in the associated wind fields over Europe. An increased frequency of Central European storm clusters is detected with enhanced GHG conditions, associated with an enhancement of the pressure gradient over Central Europe. Consequently, more intense wind events over Central Europe are expected. The presented algorithm will be highly valuable for the analysis of huge data amounts as is required for e.g. multi-model ensemble analysis, particularly because of the enormous data reduction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present an efficient graph-based algorithm for quantifying the similarity of household-level energy use profiles, using a notion of similarity that allows for small time–shifts when comparing profiles. Experimental results on a real smart meter data set demonstrate that in cases of practical interest our technique is far faster than the existing method for computing the same similarity measure. Having a fast algorithm for measuring profile similarity improves the efficiency of tasks such as clustering of customers and cross-validation of forecasting methods using historical data. Furthermore, we apply a generalisation of our algorithm to produce substantially better household-level energy use forecasts from historical smart meter data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Exascale systems are the next frontier in high-performance computing and are expected to deliver a performance of the order of 10^18 operations per second using massive multicore processors. Very large- and extreme-scale parallel systems pose critical algorithmic challenges, especially related to concurrency, locality and the need to avoid global communication patterns. This work investigates a novel protocol for dynamic group communication that can be used to remove the global communication requirement and to reduce the communication cost in parallel formulations of iterative data mining algorithms. The protocol is used to provide a communication-efficient parallel formulation of the k-means algorithm for cluster analysis. The approach is based on a collective communication operation for dynamic groups of processes and exploits non-uniform data distributions. Non-uniform data distributions can be either found in real-world distributed applications or induced by means of multidimensional binary search trees. The analysis of the proposed dynamic group communication protocol has shown that it does not introduce significant communication overhead. The parallel clustering algorithm has also been extended to accommodate an approximation error, which allows a further reduction of the communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing elements.