791 resultados para cluster algorithms
Resumo:
This study examines the numerical accuracy, computational cost, and memory requirements of self-consistent field theory (SCFT) calculations when the diffusion equations are solved with various pseudo-spectral methods and the mean field equations are iterated with Anderson mixing. The different methods are tested on the triply-periodic gyroid and spherical phases of a diblock-copolymer melt over a range of intermediate segregations. Anderson mixing is found to be somewhat less effective than when combined with the full-spectral method, but it nevertheless functions admirably well provided that a large number of histories is used. Of the different pseudo-spectral algorithms, the 4th-order one of Ranjan, Qin and Morse performs best, although not quite as efficiently as the full-spectral method.
Resumo:
Background: Medication errors are common in primary care and are associated with considerable risk of patient harm. We tested whether a pharmacist-led, information technology-based intervention was more effective than simple feedback in reducing the number of patients at risk of measures related to hazardous prescribing and inadequate blood-test monitoring of medicines 6 months after the intervention. Methods: In this pragmatic, cluster randomised trial general practices in the UK were stratified by research site and list size, and randomly assigned by a web-based randomisation service in block sizes of two or four to one of two groups. The practices were allocated to either computer-generated simple feedback for at-risk patients (control) or a pharmacist-led information technology intervention (PINCER), composed of feedback, educational outreach, and dedicated support. The allocation was masked to general practices, patients, pharmacists, researchers, and statisticians. Primary outcomes were the proportions of patients at 6 months after the intervention who had had any of three clinically important errors: non-selective non-steroidal anti-inflammatory drugs (NSAIDs) prescribed to those with a history of peptic ulcer without co-prescription of a proton-pump inhibitor; β blockers prescribed to those with a history of asthma; long-term prescription of angiotensin converting enzyme (ACE) inhibitor or loop diuretics to those 75 years or older without assessment of urea and electrolytes in the preceding 15 months. The cost per error avoided was estimated by incremental cost-eff ectiveness analysis. This study is registered with Controlled-Trials.com, number ISRCTN21785299. Findings: 72 general practices with a combined list size of 480 942 patients were randomised. At 6 months’ follow-up, patients in the PINCER group were significantly less likely to have been prescribed a non-selective NSAID if they had a history of peptic ulcer without gastroprotection (OR 0∙58, 95% CI 0∙38–0∙89); a β blocker if they had asthma (0∙73, 0∙58–0∙91); or an ACE inhibitor or loop diuretic without appropriate monitoring (0∙51, 0∙34–0∙78). PINCER has a 95% probability of being cost eff ective if the decision-maker’s ceiling willingness to pay reaches £75 per error avoided at 6 months. Interpretation: The PINCER intervention is an effective method for reducing a range of medication errors in general practices with computerised clinical records. Funding: Patient Safety Research Portfolio, Department of Health, England.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. This work proposes a fully decentralised algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art distributed K-Means algorithms based on sampling methods. The experimental analysis confirms that the proposed algorithm is a practical and accurate distributed K-Means implementation for networked systems of very large and extreme scale.
Resumo:
Controllers for feedback substitution schemes demonstrate a trade-off between noise power gain and normalized response time. Using as an example the design of a controller for a radiometric transduction process subjected to arbitrary noise power gain and robustness constraints, a Pareto-front of optimal controller solutions fulfilling a range of time-domain design objectives can be derived. In this work, we consider designs using a loop shaping design procedure (LSDP). The approach uses linear matrix inequalities to specify a range of objectives and a genetic algorithm (GA) to perform a multi-objective optimization for the controller weights (MOGA). A clonal selection algorithm is used to further provide a directed search of the GA towards the Pareto front. We demonstrate that with the proposed methodology, it is possible to design higher order controllers with superior performance in terms of response time, noise power gain and robustness.
Resumo:
Some points of the paper by N.K. Nichols (see ibid., vol.AC-31, p.643-5, 1986), concerning the robust pole assignment of linear multiinput systems, are clarified. It is stressed that the minimization of the condition number of the closed-loop eigenvector matrix does not necessarily lead to robustness of the pole assignment. It is shown why the computational method, which Nichols claims is robust, is in fact numerically unstable with respect to the determination of the gain matrix. In replying, Nichols presents arguments to support the choice of the conditioning of the closed-loop poles as a measure of robustness and to show that the methods of J Kautsky, N. K. Nichols and P. VanDooren (1985) are stable in the sense that they produce accurate solutions to well-conditioned problems.
Resumo:
A number of computationally reliable direct methods for pole assignment by feedback have recently been developed. These direct procedures do not necessarily produce robust solutions to the problem, however, in the sense that the assigned poles are insensitive to perturbalions in the closed-loop system. This difficulty is illustrated here with results from a recent algorithm presented in this TRANSACTIONS and its causes are examined. A measure of robustness is described, and techniques for testing and improving robustness are indicated.
Resumo:
The solution of the pole assignment problem by feedback in singular systems is parameterized and conditions are given which guarantee the regularity and maximal degree of the closed loop pencil. A robustness measure is defined, and numerical procedures are described for selecting the free parameters in the feedback to give optimal robustness.
Resumo:
In this paper we explore classification techniques for ill-posed problems. Two classes are linearly separable in some Hilbert space X if they can be separated by a hyperplane. We investigate stable separability, i.e. the case where we have a positive distance between two separating hyperplanes. When the data in the space Y is generated by a compact operator A applied to the system states ∈ X, we will show that in general we do not obtain stable separability in Y even if the problem in X is stably separable. In particular, we show this for the case where a nonlinear classification is generated from a non-convergent family of linear classes in X. We apply our results to the problem of quality control of fuel cells where we classify fuel cells according to their efficiency. We can potentially classify a fuel cell using either some external measured magnetic field or some internal current. However we cannot measure the current directly since we cannot access the fuel cell in operation. The first possibility is to apply discrimination techniques directly to the measured magnetic fields. The second approach first reconstructs currents and then carries out the classification on the current distributions. We show that both approaches need regularization and that the regularized classifications are not equivalent in general. Finally, we investigate a widely used linear classification algorithm Fisher's linear discriminant with respect to its ill-posedness when applied to data generated via a compact integral operator. We show that the method cannot stay stable when the number of measurement points becomes large.
Resumo:
We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) of CO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration, were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving >80% success rate and mean NEE confidence intervals <110 gC m−2 year−1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence intervals on annual NEE increased by 30% when observed data were used instead of synthetic data, reflecting and quantifying the addition of model error. Finally, our analyses indicated that incorporating additional constraints, using data on C pools (wood, soil and fine roots) would help to reduce uncertainties for model parameters poorly served by eddy covariance data.
Resumo:
Ensemble clustering (EC) can arise in data assimilation with ensemble square root filters (EnSRFs) using non-linear models: an M-member ensemble splits into a single outlier and a cluster of M−1 members. The stochastic Ensemble Kalman Filter does not present this problem. Modifications to the EnSRFs by a periodic resampling of the ensemble through random rotations have been proposed to address it. We introduce a metric to quantify the presence of EC and present evidence to dispel the notion that EC leads to filter failure. Starting from a univariate model, we show that EC is not a permanent but transient phenomenon; it occurs intermittently in non-linear models. We perform a series of data assimilation experiments using a standard EnSRF and a modified EnSRF by a resampling though random rotations. The modified EnSRF thus alleviates issues associated with EC at the cost of traceability of individual ensemble trajectories and cannot use some of algorithms that enhance performance of standard EnSRF. In the non-linear regimes of low-dimensional models, the analysis root mean square error of the standard EnSRF slowly grows with ensemble size if the size is larger than the dimension of the model state. However, we do not observe this problem in a more complex model that uses an ensemble size much smaller than the dimension of the model state, along with inflation and localisation. Overall, we find that transient EC does not handicap the performance of the standard EnSRF.
Resumo:
Data assimilation algorithms are a crucial part of operational systems in numerical weather prediction, hydrology and climate science, but are also important for dynamical reconstruction in medical applications and quality control for manufacturing processes. Usually, a variety of diverse measurement data are employed to determine the state of the atmosphere or to a wider system including land and oceans. Modern data assimilation systems use more and more remote sensing data, in particular radiances measured by satellites, radar data and integrated water vapor measurements via GPS/GNSS signals. The inversion of some of these measurements are ill-posed in the classical sense, i.e. the inverse of the operator H which maps the state onto the data is unbounded. In this case, the use of such data can lead to significant instabilities of data assimilation algorithms. The goal of this work is to provide a rigorous mathematical analysis of the instability of well-known data assimilation methods. Here, we will restrict our attention to particular linear systems, in which the instability can be explicitly analyzed. We investigate the three-dimensional variational assimilation and four-dimensional variational assimilation. A theory for the instability is developed using the classical theory of ill-posed problems in a Banach space framework. Further, we demonstrate by numerical examples that instabilities can and will occur, including an example from dynamic magnetic tomography.
Resumo:
Generally classifiers tend to overfit if there is noise in the training data or there are missing values. Ensemble learning methods are often used to improve a classifier's classification accuracy. Most ensemble learning approaches aim to improve the classification accuracy of decision trees. However, alternative classifiers to decision trees exist. The recently developed Random Prism ensemble learner for classification aims to improve an alternative classification rule induction approach, the Prism family of algorithms, which addresses some of the limitations of decision trees. However, Random Prism suffers like any ensemble learner from a high computational overhead due to replication of the data and the induction of multiple base classifiers. Hence even modest sized datasets may impose a computational challenge to ensemble learners such as Random Prism. Parallelism is often used to scale up algorithms to deal with large datasets. This paper investigates parallelisation for Random Prism, implements a prototype and evaluates it empirically using a Hadoop computing cluster.
Resumo:
High spatial resolution environmental data gives us a better understanding of the environmental factors affecting plant distributions at fine spatial scales. However, large environmental datasets dramatically increase compute times and output species model size stimulating the need for an alternative computing solution. Cluster computing offers such a solution, by allowing both multiple plant species Environmental Niche Models (ENMs) and individual tiles of high spatial resolution models to be computed concurrently on the same compute cluster. We apply our methodology to a case study of 4,209 species of Mediterranean flora (around 17% of species believed present in the biome). We demonstrate a 16 times speed-up of ENM computation time when 16 CPUs were used on the compute cluster. Our custom Java ‘Merge’ and ‘Downsize’ programs reduce ENM output files sizes by 94%. The median 0.98 test AUC score of species ENMs is aided by various species occurrence data filtering techniques. Finally, by calculating the percentage change of individual grid cell values, we map the projected percentages of plant species vulnerable to climate change in the Mediterranean region between 1950–2000 and 2020.