66 resultados para Most Productive Scale Size


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The time evolution of the circulation change at the end of the Baiu season is investigated using ERA40 data. An end-day is defined for each of the 23 years based on the 850 hPa θe value at 40˚Nin the 130-140˚E sector exceeding 330 K. Daily time series of variables are composited with respect to this day. These composite time-series exhibit a clearer and more rapid change in the precipitation and the large-scale circulation over the whole East Asia region than those performed using calendar days. The precipitation change includes the abrupt end of the Baiu rain, the northward shift of tropical convection perhaps starting a few days before this, and the start of the heavier rain at higher latitudes. The northward migration of lower tropospheric warm, moist tropical air, a general feature of the seasonal march in the region, is fast over the continent and slow over the ocean. By mid to late July the cooler air over the Sea of Japan is surrounded on 3 sides by the tropical air. It is suggestive that the large-scale stage has been set for a jump to the post-Baiu state, i.e., for the end of the Baiu season. Two likely triggers for the actual change emerge from the analysis. The first is the northward movement of tropical convection into the Philippine region. The second is an equivalent barotropic Rossby wave-train, that over a 10-day period develops downstream across Eurasia. It appears likely that in most years one or both mechanisms can be important in triggering the actual end of the Baiu season.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We perform a multimodel detection and attribution study with climate model simulation output and satellite-based measurements of tropospheric and stratospheric temperature change. We use simulation output from 20 climate models participating in phase 5 of the Coupled Model Intercomparison Project. This multimodel archive provides estimates of the signal pattern in response to combined anthropogenic and natural external forcing (the finger-print) and the noise of internally generated variability. Using these estimates, we calculate signal-to-noise (S/N) ratios to quantify the strength of the fingerprint in the observations relative to fingerprint strength in natural climate noise. For changes in lower stratospheric temperature between 1979 and 2011, S/N ratios vary from 26 to 36, depending on the choice of observational dataset. In the lower troposphere, the fingerprint strength in observations is smaller, but S/N ratios are still significant at the 1% level or better, and range from three to eight. We find no evidence that these ratios are spuriously inflated by model variability errors. After removing all global mean signals, model fingerprints remain identifiable in 70% of the tests involving tropospheric temperature changes. Despite such agreement in the large-scale features of model and observed geographical patterns of atmospheric temperature change, most models do not replicate the size of the observed changes. On average, the models analyzed underestimate the observed cooling of the lower stratosphere and overestimate the warming of the troposphere. Although the precise causes of such differences are unclear, model biases in lower stratospheric temperature trends are likely to be reduced by more realistic treatment of stratospheric ozone depletion and volcanic aerosol forcing.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Particulate matter generated during the cooking process has been identified as one of the major problems of indoor air quality and indoor environmental health. Reliable assessment of exposure to cooking-generated particles requires accurate information of emission characteristics especially the size distribution. This study characterizes the volume/mass-based size distribution of the fume particles at the oil-heating stage for the typical Chinese-style cooking in a laboratory kitchen. A laser-diffraction size analyzer is applied to measure the volume frequency of fume particles ranged from 0.1 to 10 μm, which contribute to most mass proportion in PM2.5 and PM10. Measurements show that particle emissions have little dependence on the types of vegetable oil used but have a close relationship with the heating temperature. It is found that volume frequency of fume particles in the range of 1.0–4.0 μm accounts for nearly 100% of PM0.1–10 with the mode diameter 2.7 μm, median diameter 2.6 μm, Sauter mean diameter 3.0 μm, DeBroukere mean diameter 3.2 μm, and distribution span 0.48. Such information on emission characteristics obtained in this study can be possibly used to improve the assessment of indoor air quality due to PM0.1–10 in the kitchen and residential flat.