827 resultados para clustering accuracy


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Ensemble clustering (EC) can arise in data assimilation with ensemble square root filters (EnSRFs) using non-linear models: an M-member ensemble splits into a single outlier and a cluster of M−1 members. The stochastic Ensemble Kalman Filter does not present this problem. Modifications to the EnSRFs by a periodic resampling of the ensemble through random rotations have been proposed to address it. We introduce a metric to quantify the presence of EC and present evidence to dispel the notion that EC leads to filter failure. Starting from a univariate model, we show that EC is not a permanent but transient phenomenon; it occurs intermittently in non-linear models. We perform a series of data assimilation experiments using a standard EnSRF and a modified EnSRF by a resampling though random rotations. The modified EnSRF thus alleviates issues associated with EC at the cost of traceability of individual ensemble trajectories and cannot use some of algorithms that enhance performance of standard EnSRF. In the non-linear regimes of low-dimensional models, the analysis root mean square error of the standard EnSRF slowly grows with ensemble size if the size is larger than the dimension of the model state. However, we do not observe this problem in a more complex model that uses an ensemble size much smaller than the dimension of the model state, along with inflation and localisation. Overall, we find that transient EC does not handicap the performance of the standard EnSRF.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Meteorological (met) station data is used as the basis for a number of influential studies into the impacts of the variability of renewable resources. Real turbine output data is not often easy to acquire, whereas meteorological wind data, supplied at a standardised height of 10 m, is widely available. This data can be extrapolated to a standard turbine height using the wind profile power law and used to simulate the hypothetical power output of a turbine. Utilising a number of met sites in such a manner can develop a model of future wind generation output. However, the accuracy of this extrapolation is strongly dependent on the choice of the wind shear exponent alpha. This paper investigates the accuracy of the simulated generation output compared to reality using a wind farm in North Rhins, Scotland and a nearby met station in West Freugh. The results show that while a single annual average value for alpha may be selected to accurately represent the long term energy generation from a simulated wind farm, there are significant differences between simulation and reality on an hourly power generation basis, with implications for understanding the impact of variability of renewables on short timescales, particularly system balancing and the way that conventional generation may be asked to respond to a high level of variable renewable generation on the grid in the future.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The paper reports a study that investigated the relationship between students’ self-predicted and actual General Certificate of Secondary Education results in order to establish the extent of over- and under-prediction and whether this varies by subject and across genders and socio-economic groupings. It also considered the relationship between actual and predicted attainment and attitudes towards going to university. The sample consisted of 109 young people in two schools being followed up from an earlier study. Just over 50% of predictions were accurate and students were much more likely to over-predict than to under-predict. Most errors of prediction were only one grade out and may reflect examination unreliability as well as student misperceptions. Girls were slightly less likely than boys to over-predict but there were no differences associated with social background. Higher levels of attainment, both actual and predicted, were strongly associated with positive attitudes to university. Differences between predictions and results are likely to reflect examination errors as well as pupil errors. There is no evidence that students from more advantaged social backgrounds over-estimate themselves compared with other students, although boys over-estimate themselves compared with girls.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Based on surveys undertaken with local authorities and valuers who provide the valuations on which purchase prices for local authority houses under the Right to Buy are based, this paper reports on research which aims to establish the reasons for the differences between the initial valuations provided by the local authority valuers and those provided by the District Valuer on appeal. The paper reports on the reasons why tenants appeal the initial valuation and discusses issues of valuation accuracy, uncertainty and the different and imperfect data available to valuers employed by the organisations involved, as well as the factors within the valuation process, including the absence of any requirement to agree a value, which contribute to the different outcomes.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Advanced Along-Track Scanning Radiometer (AATSR) was launched on Envisat in March 2002. The AATSR instrument is designed to retrieve precise and accurate global sea surface temperature (SST) that, combined with the large data set collected from its predecessors, ATSR and ATSR-2, will provide a long term record of SST data that is greater than 15 years. This record can be used for independent monitoring and detection of climate change. The AATSR validation programme has successfully completed its initial phase. The programme involves validation of the AATSR derived SST values using in situ radiometers, in situ buoys and global SST fields from other data sets. The results of the initial programme presented here will demonstrate that the AATSR instrument is currently close to meeting its scientific objectives of determining global SST to an accuracy of 0.3 K (one sigma). For night time data, the analysis gives a warm bias of between +0.04 K (0.28 K) for buoys to +0.06 K (0.20 K) for radiometers, with slightly higher errors observed for day time data, showing warm biases of between +0.02 (0.39 K) for buoys to +0.11 K (0.33 K) for radiometers. They show that the ATSR series of instruments continues to be the world leader in delivering accurate space-based observations of SST, which is a key climate parameter.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article addresses the question of how far working memory may affect second language (L2) learners' improvement in spoken language during a period of immersion. Research is presented testing the hypothesis that individual differences in working memory (WM) capacity are associated with individual variation in improvements in oral production of questions in English. Thirty-two Chinese adult speakers of English were tested, before and after a year's postgraduate study in the United Kingdom, to measure grammatical accuracy and fluency using a question elicitation task, and to measure WM using a battery of first language (L1) and L2 WM tests. Story recall in L1 (Mandarin) was significantly associated with individuals' improvement in oral grammatical measures (p < .05). However, there was no significant mean improvement across the cohort in grammatical accuracy, although there was for fluency. The findings suggest that WM may aid certain aspects of individuals' L2 oral proficiency during academic immersion through postgraduate study. They also indicate that academic immersion in itself can lead to improvements in oral proficiency, independent of WM capacity, but there is no general guarantee of significant grammatical change. Further research to clarify the opportunities for input and interaction available in academic immersion settings is called for.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Under particular large-scale atmospheric conditions, several windstorms may affect Europe within a short time period. The occurrence of such cyclone families leads to large socioeconomic impacts and cumulative losses. The serial clustering of windstorms is analyzed for the North Atlantic/western Europe. Clustering is quantified as the dispersion (ratio variance/mean) of cyclone passages over a certain area. Dispersion statistics are derived for three reanalysis data sets and a 20-run European Centre Hamburg Version 5 /Max Planck Institute Version–Ocean Model Version 1 global climate model (ECHAM5/MPI-OM1 GCM) ensemble. The dependence of the seriality on cyclone intensity is analyzed. Confirming previous studies, serial clustering is identified in reanalysis data sets primarily on both flanks and downstream regions of the North Atlantic storm track. This pattern is a robust feature in the reanalysis data sets. For the whole area, extreme cyclones cluster more than nonextreme cyclones. The ECHAM5/MPI-OM1 GCM is generally able to reproduce the spatial patterns of clustering under recent climate conditions, but some biases are identified. Under future climate conditions (A1B scenario), the GCM ensemble indicates that serial clustering may decrease over the North Atlantic storm track area and parts of western Europe. This decrease is associated with an extension of the polar jet toward Europe, which implies a tendency to a more regular occurrence of cyclones over parts of the North Atlantic Basin poleward of 50°N and western Europe. An increase of clustering of cyclones is projected south of Newfoundland. The detected shifts imply a change in the risk of occurrence of cumulative events over Europe under future climate conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper reviews nine software packages with particular reference to their GARCH model estimation accuracy when judged against a respected benchmark. We consider the numerical consistency of GARCH and EGARCH estimation and forecasting. Our results have a number of implications for published research and future software development. Finally, we argue that the establishment of benchmarks for other standard non-linear models is long overdue.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Radar refractivity retrievals have the potential to accurately capture near-surface humidity fields from the phase change of ground clutter returns. In practice, phase changes are very noisy and the required smoothing will diminish large radial phase change gradients, leading to severe underestimates of large refractivity changes (ΔN). To mitigate this, the mean refractivity change over the field (ΔNfield) must be subtracted prior to smoothing. However, both observations and simulations indicate that highly correlated returns (e.g., when single targets straddle neighboring gates) result in underestimates of ΔNfield when pulse-pair processing is used. This may contribute to reported differences of up to 30 N units between surface observations and retrievals. This effect can be avoided if ΔNfield is estimated using a linear least squares fit to azimuthally averaged phase changes. Nevertheless, subsequent smoothing of the phase changes will still tend to diminish the all-important spatial perturbations in retrieved refractivity relative to ΔNfield; an iterative estimation approach may be required. The uncertainty in the target location within the range gate leads to additional phase noise proportional to ΔN, pulse length, and radar frequency. The use of short pulse lengths is recommended, not only to reduce this noise but to increase both the maximum detectable refractivity change and the number of suitable targets. Retrievals of refractivity fields must allow for large ΔN relative to an earlier reference field. This should be achievable for short pulses at S band, but phase noise due to target motion may prevent this at C band, while at X band even the retrieval of ΔN over shorter periods may at times be impossible.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In order to examine metacognitive accuracy (i.e., the relationship between metacognitive judgment and memory performance), researchers often rely on by-participant analysis, where metacognitive accuracy (e.g., resolution, as measured by the gamma coefficient or signal detection measures) is computed for each participant and the computed values are entered into group-level statistical tests such as the t-test. In the current work, we argue that the by-participant analysis, regardless of the accuracy measurements used, would produce a substantial inflation of Type-1 error rates, when a random item effect is present. A mixed-effects model is proposed as a way to effectively address the issue, and our simulation studies examining Type-1 error rates indeed showed superior performance of mixed-effects model analysis as compared to the conventional by-participant analysis. We also present real data applications to illustrate further strengths of mixed-effects model analysis. Our findings imply that caution is needed when using the by-participant analysis, and recommend the mixed-effects model analysis.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.