800 resultados para Distributed database


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform data mining and other analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data that is used to populate the second component, and a data warehouse that contains important molecular properties. These properties may be used for data mining studies. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular, we look at two aspects: firstly, how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories — this is an important and challenging aspect of P-found, due to the large data volumes involved and the desire of scientists to maintain control of their own data. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling scientific discovery.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reduced flexibility of low carbon generation could pose new challenges for future energy systems. Both demand response and distributed storage may have a role to play in supporting future system balancing. This paper reviews how these technically different, but functionally similar approaches compare and compete with one another. Household survey data is used to test the effectiveness of price signals to deliver demand responses for appliances with a high degree of agency. The underlying unit of storage for different demand response options is discussed, with particular focus on the ability to enhance demand side flexibility in the residential sector. We conclude that a broad range of options, with different modes of storage, may need to be considered, if residential demand flexibility is to be maximised.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This article clarifies what was done with the sub-7-man positions in data-mining Harold van der Heijden's 'HHdbIV' database of chess studies prior to its publication. It emphasises that only positions in the main lines of studies were examined and that the information about uniqueness of move was not incorporated in HHdbIV. There is some reflection on the separate technical and artistic dimensions of study evaluation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in hardware and software technology enable us to collect, store and distribute large quantities of data on a very large scale. Automatically discovering and extracting hidden knowledge in the form of patterns from these large data volumes is known as data mining. Data mining technology is not only a part of business intelligence, but is also used in many other application areas such as research, marketing and financial analytics. For example medical scientists can use patterns extracted from historic patient data in order to determine if a new patient is likely to respond positively to a particular treatment or not; marketing analysts can use extracted patterns from customer data for future advertisement campaigns; finance experts have an interest in patterns that forecast the development of certain stock market shares for investment recommendations. However, extracting knowledge in the form of patterns from massive data volumes imposes a number of computational challenges in terms of processing time, memory, bandwidth and power consumption. These challenges have led to the development of parallel and distributed data analysis approaches and the utilisation of Grid and Cloud computing. This chapter gives an overview of parallel and distributed computing approaches and how they can be used to scale up data mining to large datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study a two-way relay network (TWRN), where distributed space-time codes are constructed across multiple relay terminals in an amplify-and-forward mode. Each relay transmits a scaled linear combination of its received symbols and their conjugates,with the scaling factor chosen based on automatic gain control. We consider equal power allocation (EPA) across the relays, as well as the optimal power allocation (OPA) strategy given access to instantaneous channel state information (CSI). For EPA, we derive an upper bound on the pairwise-error-probability (PEP), from which we prove that full diversity is achieved in TWRNs. This result is in contrast to one-way relay networks, in which case a maximum diversity order of only unity can be obtained. When instantaneous CSI is available at the relays, we show that the OPA which minimizes the conditional PEP of the worse link can be cast as a generalized linear fractional program, which can be solved efficiently using the Dinkelback-type procedure.We also prove that, if the sum-power of the relay terminals is constrained, then the OPA will activate at most two relays.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A continuous tropospheric and stratospheric vertically resolved ozone time series, from 1850 to 2099, has been generated to be used as forcing in global climate models that do not include interactive chemistry. A multiple linear regression analysis of SAGE I+II satellite observations and polar ozonesonde measurements is used for the stratospheric zonal mean dataset during the well-observed period from 1979 to 2009. In addition to terms describing the mean annual cycle, the regression includes terms representing equivalent effective stratospheric chlorine (EESC) and the 11-yr solar cycle variability. The EESC regression fit coefficients, together with pre-1979 EESC values, are used to extrapolate the stratospheric ozone time series backward to 1850. While a similar procedure could be used to extrapolate into the future, coupled chemistry climate model (CCM) simulations indicate that future stratospheric ozone abundances are likely to be significantly affected by climate change, and capturing such effects through a regression model approach is not feasible. Therefore, the stratospheric ozone dataset is extended into the future (merged in 2009) with multimodel mean projections from 13 CCMs that performed a simulation until 2099 under the SRES (Special Report on Emission Scenarios) A1B greenhouse gas scenario and the A1 adjusted halogen scenario in the second round of the Chemistry-Climate Model Validation (CCMVal-2) Activity. The stratospheric zonal mean ozone time series is merged with a three-dimensional tropospheric data set extracted from simulations of the past by two CCMs (CAM3.5 and GISSPUCCINI)and of the future by one CCM (CAM3.5). The future tropospheric ozone time series continues the historical CAM3.5 simulation until 2099 following the four different Representative Concentration Pathways (RCPs). Generally good agreement is found between the historical segment of the ozone database and satellite observations, although it should be noted that total column ozone is overestimated in the southern polar latitudes during spring and tropospheric column ozone is slightly underestimated. Vertical profiles of tropospheric ozone are broadly consistent with ozonesondes and in-situ measurements, with some deviations in regions of biomass burning. The tropospheric ozone radiative forcing (RF) from the 1850s to the 2000s is 0.23Wm−2, lower than previous results. The lower value is mainly due to (i) a smaller increase in biomass burning emissions; (ii) a larger influence of stratospheric ozone depletion on upper tropospheric ozone at high southern latitudes; and possibly (iii) a larger influence of clouds (which act to reduce the net forcing) compared to previous radiative forcing calculations. Over the same period, decreases in stratospheric ozone, mainly at high latitudes, produce a RF of −0.08Wm−2, which is more negative than the central Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4) value of −0.05Wm−2, but which is within the stated range of −0.15 to +0.05Wm−2. The more negative value is explained by the fact that the regression model simulates significant ozone depletion prior to 1979, in line with the increase in EESC and as confirmed by CCMs, while the AR4 assumed no change in stratospheric RF prior to 1979. A negative RF of similar magnitude persists into the future, although its location shifts from high latitudes to the tropics. This shift is due to increases in polar stratospheric ozone, but decreases in tropical lower stratospheric ozone, related to a strengthening of the Brewer-Dobson circulation, particularly through the latter half of the 21st century. Differences in trends in tropospheric ozone among the four RCPs are mainly driven by different methane concentrations, resulting in a range of tropospheric ozone RFs between 0.4 and 0.1Wm−2 by 2100. The ozone dataset described here has been released for the Coupled Model Intercomparison Project (CMIP5) model simulations in netCDF Climate and Forecast (CF) Metadata Convention at the PCMDI website (http://cmip-pcmdi.llnl.gov/).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bayesian analysis is given of an instrumental variable model that allows for heteroscedasticity in both the structural equation and the instrument equation. Specifically, the approach for dealing with heteroscedastic errors in Geweke (1993) is extended to the Bayesian instrumental variable estimator outlined in Rossi et al. (2005). Heteroscedasticity is treated by modelling the variance for each error using a hierarchical prior that is Gamma distributed. The computation is carried out by using a Markov chain Monte Carlo sampling algorithm with an augmented draw for the heteroscedastic case. An example using real data illustrates the approach and shows that ignoring heteroscedasticity in the instrument equation when it exists may lead to biased estimates.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new model has been developed for assessing multiple sources of nitrogen in catchments. The model (INCA) is process based and uses reaction kinetic equations to simulate the principal mechanisms operating. The model allows for plant uptake, surface and sub-surface pathways and can simulate up to six land uses simultaneously. The model can be applied to catchment as a semi-distributed simulation and has an inbuilt multi-reach structure for river systems. Sources of nitrogen can be from atmospheric deposition, from the terrestrial environment (e.g. agriculture, leakage from forest systems etc.), from urban areas or from direct discharges via sewage or intensive farm units. The model is a daily simulation model and can provide information in the form of time series at key sites, or as profiles down river systems or as statistical distributions. The process model is described and in a companion paper the model is applied to the River Tywi catchment in South Wales and the Great Ouse in Bedfordshire.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Version 1 of the Global Charcoal Database is now available for regional fire history reconstructions, data exploration, hypothesis testing, and evaluation of coupled climate–vegetation–fire model simulations. The charcoal database contains over 400 radiocarbon-dated records that document changes in charcoal abundance during the Late Quaternary. The aim of this public database is to stimulate cross-disciplinary research in fire sciences targeted at an increased understanding of the controls and impacts of natural and anthropogenic fire regimes on centennial-to-orbital timescales. We describe here the data standardization techniques for comparing multiple types of sedimentary charcoal records. Version 1 of the Global Charcoal Database has been used to characterize global and regional patterns in fire activity since the last glacial maximum. Recent studies using the charcoal database have explored the relation between climate and fire during periods of rapid climate change, including evidence of fire activity during the Younger Dryas Chronozone, and during the past two millennia.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Unorganized traffic is a generalized form of travel wherein vehicles do not adhere to any predefined lanes and can travel in-between lanes. Such travel is visible in a number of countries e.g. India, wherein it enables a higher traffic bandwidth, more overtaking and more efficient travel. These advantages are visible when the vehicles vary considerably in size and speed, in the absence of which the predefined lanes are near-optimal. Motion planning for multiple autonomous vehicles in unorganized traffic deals with deciding on the manner in which every vehicle travels, ensuring no collision either with each other or with static obstacles. In this paper the notion of predefined lanes is generalized to model unorganized travel for the purpose of planning vehicles travel. A uniform cost search is used for finding the optimal motion strategy of a vehicle, amidst the known travel plans of the other vehicles. The aim is to maximize the separation between the vehicles and static obstacles. The search is responsible for defining an optimal lane distribution among vehicles in the planning scenario. Clothoid curves are used for maintaining a lane or changing lanes. Experiments are performed by simulation over a set of challenging scenarios with a complex grid of obstacles. Additionally behaviours of overtaking, waiting for a vehicle to cross and following another vehicle are exhibited.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Flash floods pose a significant danger for life and property. Unfortunately, in arid and semiarid environment the runoff generation shows a complex non-linear behavior with a strong spatial and temporal non-uniformity. As a result, the predictions made by physically-based simulations in semiarid areas are subject to great uncertainty, and a failure in the predictive behavior of existing models is common. Thus better descriptions of physical processes at the watershed scale need to be incorporated into the hydrological model structures. For example, terrain relief has been systematically considered static in flood modelling at the watershed scale. Here, we show that the integrated effect of small distributed relief variations originated through concurrent hydrological processes within a storm event was significant on the watershed scale hydrograph. We model these observations by introducing dynamic formulations of two relief-related parameters at diverse scales: maximum depression storage, and roughness coefficient in channels. In the final (a posteriori) model structure these parameters are allowed to be both time-constant or time-varying. The case under study is a convective storm in a semiarid Mediterranean watershed with ephemeral channels and high agricultural pressures (the Rambla del Albujón watershed; 556 km 2 ), which showed a complex multi-peak response. First, to obtain quasi-sensible simulations in the (a priori) model with time-constant relief-related parameters, a spatially distributed parameterization was strictly required. Second, a generalized likelihood uncertainty estimation (GLUE) inference applied to the improved model structure, and conditioned to observed nested hydrographs, showed that accounting for dynamic relief-related parameters led to improved simulations. The discussion is finally broadened by considering the use of the calibrated model both to analyze the sensitivity of the watershed to storm motion and to attempt the flood forecasting of a stratiform event with highly different behavior.