24 resultados para scaling
Resumo:
The fast increase in the size and number of databases demands data mining approaches that are scalable to large amounts of data. This has led to the exploration of parallel computing technologies in order to perform data mining tasks concurrently using several processors. Parallelization seems to be a natural and cost-effective way to scale up data mining technologies. One of the most important of these data mining technologies is the classification of newly recorded data. This paper surveys advances in parallelization in the field of classification rule induction.
Resumo:
Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.
Resumo:
We investigate the scaling between precipitation and temperature changes in warm and cold climates using six models that have simulated the response to both increased CO2 and Last Glacial Maximum (LGM) boundary conditions. Globally, precipitation increases in warm climates and decreases in cold climates by between 1.5%/°C and 3%/°C. Precipitation sensitivity to temperature changes is lower over the land than over the ocean and lower over the tropical land than over the extratropical land, reflecting the constraint of water availability. The wet tropics get wetter in warm climates and drier in cold climates, but the changes in dry areas differ among models. Seasonal changes of tropical precipitation in a warmer world also reflect this “rich get richer” syndrome. Precipitation seasonality is decreased in the cold-climate state. The simulated changes in precipitation per degree temperature change are comparable to the observed changes in both the historical period and the LGM.
Resumo:
The leaf carbon isotope ratio (δ13C) of C3 plants is inversely related to the drawdown of CO2 concentration during photosynthesis, which increases towards drier environments. We aimed to discriminate between the hypothesis of universal scaling, which predicts between-species responses of δ13C to aridity similar to within-species responses, and biotic homoeostasis, which predicts offsets in the δ13C of species occupying adjacent ranges. The Northeast China Transect spans 130–900 mm annual precipitation within a narrow latitude and temperature range. Leaves of 171 species were sampled at 33 sites along the transect (18 at ≥ 5 sites) for dry matter, carbon (C) and nitrogen (N) content, specific leaf area (SLA) and δ13C. The δ13C of species generally followed a common relationship with the climatic moisture index (MI). Offsets between adjacent species were not observed. Trees and forbs diverged slightly at high MI. In C3 plants, δ13C predicted N per unit leaf area (Narea) better than MI. The δ13C of C4 plants was invariant with MI. SLA declined and Narea increased towards low MI in both C3 and C4 plants. The data are consistent with optimal stomatal regulation with respect to atmospheric dryness. They provide evidence for universal scaling of CO2 drawdown with aridity in C3 plants.
Resumo:
Advances in hardware technologies allow to capture and process data in real-time and the resulting high throughput data streams require novel data mining approaches. The research area of Data Stream Mining (DSM) is developing data mining algorithms that allow us to analyse these continuous streams of data in real-time. The creation and real-time adaption of classification models from data streams is one of the most challenging DSM tasks. Current classifiers for streaming data address this problem by using incremental learning algorithms. However, even so these algorithms are fast, they are challenged by high velocity data streams, where data instances are incoming at a fast rate. This is problematic if the applications desire that there is no or only a very little delay between changes in the patterns of the stream and absorption of these patterns by the classifier. Problems of scalability to Big Data of traditional data mining algorithms for static (non streaming) datasets have been addressed through the development of parallel classifiers. However, there is very little work on the parallelisation of data stream classification techniques. In this paper we investigate K-Nearest Neighbours (KNN) as the basis for a real-time adaptive and parallel methodology for scalable data stream classification tasks.
Resumo:
The turbulent structure of a stratocumulus-topped marine boundary layer over a 2-day period is observed with a Doppler lidar at Mace Head in Ireland. Using profiles of vertical velocity statistics, the bulk of the mixing is identified as cloud driven. This is supported by the pertinent feature of negative vertical velocity skewness in the sub-cloud layer which extends, on occasion, almost to the surface. Both coupled and decoupled turbulence characteristics are observed. The length and timescales related to the cloud-driven mixing are investigated and shown to provide additional information about the structure and the source of the mixing inside the boundary layer. They are also shown to place constraints on the length of the sampling periods used to derive products, such as the turbulent dissipation rate, from lidar measurements. For this, the maximum wavelengths that belong to the inertial subrange are studied through spectral analysis of the vertical velocity. The maximum wavelength of the inertial subrange in the cloud-driven layer scales relatively well with the corresponding layer depth during pronounced decoupled structure identified from the vertical velocity skewness. However, on many occasions, combining the analysis of the inertial subrange and vertical velocity statistics suggests higher decoupling height than expected from the skewness profiles. Our results show that investigation of the length scales related to the inertial subrange significantly complements the analysis of the vertical velocity statistics and enables a more confident interpretation of complex boundary layer structures using measurements from a Doppler lidar.