85 resultados para Data replication processes
em CentAUR: Central Archive University of Reading - UK
Resumo:
This article reflects on key methodological issues emerging from children and young people's involvement in data analysis processes. We outline a pragmatic framework illustrating different approaches to engaging children, using two case studies of children's experiences of participating in data analysis. The article highlights methods of engagement and important issues such as the balance of power between adults and children, training, support, ethical considerations, time and resources. We argue that involving children in data analysis processes can have several benefits, including enabling a greater understanding of children's perspectives and helping to prioritise children's agendas in policy and practice. (C) 2007 The Author(s). Journal compilation (C) 2007 National Children's Bureau.
Resumo:
Owing to continuous advances in the computational power of handheld devices like smartphones and tablet computers, it has become possible to perform Big Data operations including modern data mining processes onboard these small devices. A decade of research has proved the feasibility of what has been termed as Mobile Data Mining, with a focus on one mobile device running data mining processes. However, it is not before 2010 until the authors of this book initiated the Pocket Data Mining (PDM) project exploiting the seamless communication among handheld devices performing data analysis tasks that were infeasible until recently. PDM is the process of collaboratively extracting knowledge from distributed data streams in a mobile computing environment. This book provides the reader with an in-depth treatment on this emerging area of research. Details of techniques used and thorough experimental studies are given. More importantly and exclusive to this book, the authors provide detailed practical guide on the deployment of PDM in the mobile environment. An important extension to the basic implementation of PDM dealing with concept drift is also reported. In the era of Big Data, potential applications of paramount importance offered by PDM in a variety of domains including security, business and telemedicine are discussed.
Resumo:
Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors.
Resumo:
We consider the finite sample properties of model selection by information criteria in conditionally heteroscedastic models. Recent theoretical results show that certain popular criteria are consistent in that they will select the true model asymptotically with probability 1. To examine the empirical relevance of this property, Monte Carlo simulations are conducted for a set of non–nested data generating processes (DGPs) with the set of candidate models consisting of all types of model used as DGPs. In addition, not only is the best model considered but also those with similar values of the information criterion, called close competitors, thus forming a portfolio of eligible models. To supplement the simulations, the criteria are applied to a set of economic and financial series. In the simulations, the criteria are largely ineffective at identifying the correct model, either as best or a close competitor, the parsimonious GARCH(1, 1) model being preferred for most DGPs. In contrast, asymmetric models are generally selected to represent actual data. This leads to the conjecture that the properties of parameterizations of processes commonly used to model heteroscedastic data are more similar than may be imagined and that more attention needs to be paid to the behaviour of the standardized disturbances of such models, both in simulation exercises and in empirical modelling.
Resumo:
We consider different methods for combining probability forecasts. In empirical exercises, the data generating process of the forecasts and the event being forecast is not known, and therefore the optimal form of combination will also be unknown. We consider the properties of various combination schemes for a number of plausible data generating processes, and indicate which types of combinations are likely to be useful. We also show that whether forecast encompassing is found to hold between two rival sets of forecasts or not may depend on the type of combination adopted. The relative performances of the different combination methods are illustrated, with an application to predicting recession probabilities using leading indicators.
Resumo:
A common problem in many data based modelling algorithms such as associative memory networks is the problem of the curse of dimensionality. In this paper, a new two-stage neurofuzzy system design and construction algorithm (NeuDeC) for nonlinear dynamical processes is introduced to effectively tackle this problem. A new simple preprocessing method is initially derived and applied to reduce the rule base, followed by a fine model detection process based on the reduced rule set by using forward orthogonal least squares model structure detection. In both stages, new A-optimality experimental design-based criteria we used. In the preprocessing stage, a lower bound of the A-optimality design criterion is derived and applied as a subset selection metric, but in the later stage, the A-optimality design criterion is incorporated into a new composite cost function that minimises model prediction error as well as penalises the model parameter variance. The utilisation of NeuDeC leads to unbiased model parameters with low parameter variance and the additional benefit of a parsimonious model structure. Numerical examples are included to demonstrate the effectiveness of this new modelling approach for high dimensional inputs.
Resumo:
Flood modelling of urban areas is still at an early stage, partly because until recently topographic data of sufficiently high resolution and accuracy have been lacking in urban areas. However, Digital Surface Models (DSMs) generated from airborne scanning laser altimetry (LiDAR) having sub-metre spatial resolution have now become available, and these are able to represent the complexities of urban topography. The paper describes the development of a LiDAR post-processor for urban flood modelling based on the fusion of LiDAR and digital map data. The map data are used in conjunction with LiDAR data to identify different object types in urban areas, though pattern recognition techniques are also employed. Post-processing produces a Digital Terrain Model (DTM) for use as model bathymetry, and also a friction parameter map for use in estimating spatially-distributed friction coefficients. In vegetated areas, friction is estimated from LiDAR-derived vegetation height, and (unlike most vegetation removal software) the method copes with short vegetation less than ~1m high, which may occupy a substantial fraction of even an urban floodplain. The DTM and friction parameter map may also be used to help to generate an unstructured mesh of a vegetated urban floodplain for use by a 2D finite element model. The mesh is decomposed to reflect floodplain features having different frictional properties to their surroundings, including urban features such as buildings and roads as well as taller vegetation features such as trees and hedges. This allows a more accurate estimation of local friction. The method produces a substantial node density due to the small dimensions of many urban features.
Resumo:
A case of long-range transport of a biomass burning plume from Alaska to Europe is analyzed using a Lagrangian approach. This plume was sampled several times in the free troposphere over North America, the North Atlantic and Europe by three different aircraft during the IGAC Lagrangian 2K4 experiment which was part of the ICARTT/ITOP measurement intensive in summer 2004. Measurements in the plume showed enhanced values of CO, VOCs and NOy, mainly in form of PAN. Observed O3 levels increased by 17 ppbv over 5 days. A photochemical trajectory model, CiTTyCAT, was used to examine processes responsible for the chemical evolution of the plume. The model was initialized with upwind data and compared with downwind measurements. The influence of high aerosol loading on photolysis rates in the plume was investigated using in situ aerosol measurements in the plume and lidar retrievals of optical depth as input into a photolysis code (Fast-J), run in the model. Significant impacts on photochemistry are found with a decrease of 18% in O3 production and 24% in O3 destruction over 5 days when including aerosols. The plume is found to be chemically active with large O3 increases attributed primarily to PAN decomposition during descent of the plume toward Europe. The predicted O3 changes are very dependent on temperature changes during transport and also on water vapor levels in the lower troposphere which can lead to O3 destruction. Simulation of mixing/dilution was necessary to reproduce observed pollutant levels in the plume. Mixing was simulated using background concentrations from measurements in air masses in close proximity to the plume, and mixing timescales (averaging 6.25 days) were derived from CO changes. Observed and simulated O3/CO correlations in the plume were also compared in order to evaluate the photochemistry in the model. Observed slopes change from negative to positive over 5 days. This change, which can be attributed largely to photochemistry, is well reproduced by multiple model runs even if slope values are slightly underestimated suggesting a small underestimation in modeled photochemical O3 production. The possible impact of this biomass burning plume on O3 levels in the European boundary layer was also examined by running the model for a further 5 days and comparing with data collected at surface sites, such as Jungfraujoch, which showed small O3 increases and elevated CO levels. The model predicts significant changes in O3 over the entire 10 day period due to photochemistry but the signal is largely lost because of the effects of dilution. However, measurements in several other BB plumes over Europe show that O3 impact of Alaskan fires can be potentially significant over Europe.
Resumo:
Many modelling studies examine the impacts of climate change on crop yield, but few explore either the underlying bio-physical processes, or the uncertainty inherent in the parameterisation of crop growth and development. We used a perturbed-parameter crop modelling method together with a regional climate model (PRECIS) driven by the 2071-2100 SRES A2 emissions scenario in order to examine processes and uncertainties in yield simulation. Crop simulations used the groundnut (i.e. peanut; Arachis hypogaea L.) version of the General Large-Area Model for annual crops (GLAM). Two sets of GLAM simulations were carried out: control simulations and fixed-duration simulations, where the impact of mean temperature on crop development rate was removed. Model results were compared to sensitivity tests using two other crop models of differing levels of complexity: CROPGRO, and the groundnut model of Hammer et al. [Hammer, G.L., Sinclair, T.R., Boote, K.J., Wright, G.C., Meinke, H., and Bell, M.J., 1995, A peanut simulation model: I. Model development and testing. Agron. J. 87, 1085-1093]. GLAM simulations were particularly sensitive to two processes. First, elevated vapour pressure deficit (VPD) consistently reduced yield. The same result was seen in some simulations using both other crop models. Second, GLAM crop duration was longer, and yield greater, when the optimal temperature for the rate of development was exceeded. Yield increases were also seen in one other crop model. Overall, the models differed in their response to super-optimal temperatures, and that difference increased with mean temperature; percentage changes in yield between current and future climates were as diverse as -50% and over +30% for the same input data. The first process has been observed in many crop experiments, whilst the second has not. Thus, we conclude that there is a need for: (i) more process-based modelling studies of the impact of VPD on assimilation, and (ii) more experimental studies at super-optimal temperatures. Using the GLAM results, central values and uncertainty ranges were projected for mean 2071-2100 crop yields in India. In the fixed-duration simulations, ensemble mean yields mostly rose by 10-30%. The full ensemble range was greater than this mean change (20-60% over most of India). In the control simulations, yield stimulation by elevated CO2 was more than offset by other processes-principally accelerated crop development rates at elevated, but sub-optimal, mean temperatures. Hence, the quantification of uncertainty can facilitate relatively robust indications of the likely sign of crop yield changes in future climates. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
We review briefly recent progress on understanding the role of surface waves on the marine atmospheric boundary layer and the ocean mixed layer and give a global perspective on these processes by analysing ERA-40 data. Ocean surface waves interact with the marine atmospheric boundary layer in two broad regimes: (i) the conventional wind-driven wave regime, when fast winds blow over slower moving waves, and (ii) a wave-driven wind regime when long wavelength swell propagates under low winds, and generates a wave-driven jet in the lower part of the marine boundary layer. Analysis of ERA-40 data indicates that the wave-driven wind regime is as prevalent as the conventional wind-driven regime. Ocean surface waves also change profoundly mixing in the ocean mixed layer through generation of Langmuir circulation. Results from large-eddy simulation are used here to develop a scaling for the resulting Langmuir turbulence, which is a necessary step in developing a parametrization of the process. ERA-40 data is then used to show that the Langmuir regime is the predominant regime over much of the global ocean, providing a compelling motivation for parameterising this process in ocean general circulation models.
Resumo:
This paper maps the carbonate geochemistry of the Makgadikgadi Pans region of northern Botswana from moderate resolution (500 m pixels) remotely sensed data, to assess the impact of various geomorphological processes on surficial carbonate distribution. Previous palaeo-environmental studies have demonstrated that the pans have experienced several highstands during the Quaternary, forming calcretes around shoreline embayments. The pans are also a significant regional source of dust, and some workers have suggested that surficial carbonate distributions may be controlled, in part, by wind regime. Field studies of carbonate deposits in the region have also highlighted the importance of fluvial and groundwater processes in calcrete formation. However, due to the large area involved and problems of accessibility, the carbonate distribution across the entire Makgadikgadi basin remains poorly understood. The MODIS instrument permits mapping of carbonate distribution over large areas; comparison with estimates from Landsat Thematic Mapper data show reasonable agreement, and there is good agreement with estimates from laboratory analysis of field samples. The results suggest that palaeo-lake highstands, reconstructed here using the SRTM 3 arc-second digital elevation model, have left behind surficial carbonate deposits, which can be mapped by the MODIS instrument. Copyright (c) 2006 John Wiley & Sons, Ltd.
Resumo:
Within this paper modern techniques such as satellite image analysis and tools provided by geographic information systems (GIS.) are exploited in order to extend and improve existing techniques for mapping the spatial distribution of sediment transport processes. The processes of interest comprise mass movements such as solifluction, slope wash, dirty avalanches and rock- and boulder falls. They differ considerably in nature and therefore different approaches for the derivation of their spatial extent are required. A major challenge is addressing the differences between the comparably coarse resolution of the available satellite data (Landsat TM/ETM+, 30 in x 30 m) and the actual scale of sediment transport in this environment. A three-stepped approach has been developed which is based on the concept of Geomorphic Process Units (GPUs): parameterization, process area delineation and combination. Parameters include land cover from satellite data and digital elevation model derivatives. Process areas are identified using a hierarchical classification scheme utilizing thresholds and definition of topology. The approach has been developed for the Karkevagge in Sweden and could be successfully transferred to the Rabotsbekken catchment at Okstindan, Norway using similar input data. Copyright (C) 2008 John Wiley & Sons, Ltd.