903 resultados para Data-driven Methods
Resumo:
This chapter presents techniques used for the generation of 3D digital elevation models (DEMs) from remotely sensed data. Three methods are explored and discussed—optical stereoscopic imagery, Interferometric Synthetic Aperture Radar (InSAR), and LIght Detection and Ranging (LIDAR). For each approach, the state-of-the-art presented in the literature is reviewed. Techniques involved in DEM generation are presented with accuracy evaluation. Results of DEMs reconstructed from remotely sensed data are illustrated. While the processes of DEM generation from satellite stereoscopic imagery represents a good example of passive, multi-view imaging technology, discussed in Chap. 2 of this book, InSAR and LIDAR use different principles to acquire 3D information. With regard to InSAR and LIDAR, detailed discussions are conducted in order to convey the fundamentals of both technologies.
Resumo:
Global communication requirements and load imbalance of some parallel data mining algorithms are the major obstacles to exploit the computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication cost in iterative parallel data mining algorithms. In particular, the analysis focuses on one of the most influential and popular data mining methods, the k-means algorithm for cluster analysis. The straightforward parallel formulation of the k-means algorithm requires a global reduction operation at each iteration step, which hinders its scalability. This work studies a different parallel formulation of the algorithm where the requirement of global communication can be relaxed while still providing the exact solution of the centralised k-means algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real world distributed applications or can be induced by means of multi-dimensional binary search trees. The approach can also be extended to accommodate an approximation error which allows a further reduction of the communication costs.
Resumo:
Residential electricity demand in most European countries accounts for a major proportion of overall electricity consumption. The timing of residential electricity demand has significant impacts on carbon emissions and system costs. This paper reviews the data and methods used in time use studies in the context of residential electricity demand modelling. It highlights key issues which are likely to become more topical for research on the timing of electricity demand following the roll-out of smart metres.
Resumo:
Current methods for initialising coupled atmosphere-ocean forecasts often rely on the use of separate atmosphere and ocean analyses, the combination of which can leave the coupled system imbalanced at the beginning of the forecast, potentially accelerating the development of errors. Using a series of experiments with the European Centre for Medium-range Weather Forecasts coupled system, the magnitude and extent of these so-called initialisation shocks is quantified, and their impact on forecast skill measured. It is found that forecasts initialised by separate ocean and atmospheric analyses do exhibit initialisation shocks in lower atmospheric temperature, when compared to forecasts initialised using a coupled data assimilation method. These shocks result in as much as a doubling of root-mean-square error on the first day of the forecast in some regions, and in increases that are sustained for the duration of the 10-day forecasts performed here. However, the impacts of this choice of initialisation on forecast skill, assessed using independent datasets, were found to be negligible, at least over the limited period studied. Larger initialisation shocks are found to follow a change in either the atmospheric or ocean model component between the analysis and forecast phases: changes in the ocean component can lead to sea surface temperature shocks of more than 0.5K in some equatorial regions during the first day of the forecast. Implications for the development of coupled forecast systems, particularly with respect to coupled data assimilation methods, are discussed.
Resumo:
Effective public policy to mitigate climate change footprints should build on data-driven analysis of firm-level strategies. This article’s conceptual approach augments the resource-based view (RBV) of the firm and identifies investments in four firm-level resource domains (Governance, Information management, Systems, and Technology [GISTe]) to develop capabilities in climate change impact mitigation. The authors denote the resulting framework as the GISTe model, which frames their analysis and public policy recommendations. This research uses the 2008 Carbon Disclosure Project (CDP) database, with high-quality information on firm-level climate change strategies for 552 companies from North America and Europe. In contrast to the widely accepted myth that European firms are performing better than North American ones, the authors find a different result. Many firms, whether European or North American, do not just “talk” about climate change impact mitigation, but actually do “walk the talk.” European firms appear to be better than their North American counterparts in “walk I,” denoting attention to governance, information management, and systems. But when it comes down to “walk II,” meaning actual Technology-related investments, North American firms’ performance is equal or superior to that of the European companies. The authors formulate public policy recommendations to accelerate firm-level, sector-level, and cluster-level implementation of climate change strategies.
Resumo:
Empirical mode decomposition (EMD) is a data-driven method used to decompose data into oscillatory components. This paper examines to what extent the defined algorithm for EMD might be susceptible to data format. Two key issues with EMD are its stability and computational speed. This paper shows that for a given signal there is no significant difference between results obtained with single (binary32) and double (binary64) floating points precision. This implies that there is no benefit in increasing floating point precision when performing EMD on devices optimised for single floating point format, such as graphical processing units (GPUs).
Resumo:
Empirical Mode Decomposition (EMD) is a data driven technique for extraction of oscillatory components from data. Although it has been introduced over 15 years ago, its mathematical foundations are still missing which also implies lack of objective metrics for decomposed set evaluation. Most common technique for assessing results of EMD is their visual inspection, which is very subjective. This article provides objective measures for assessing EMD results based on the original definition of oscillatory components.
Resumo:
Estimating trajectories and parameters of dynamical systems from observations is a problem frequently encountered in various branches of science; geophysicists for example refer to this problem as data assimilation. Unlike as in estimation problems with exchangeable observations, in data assimilation the observations cannot easily be divided into separate sets for estimation and validation; this creates serious problems, since simply using the same observations for estimation and validation might result in overly optimistic performance assessments. To circumvent this problem, a result is presented which allows us to estimate this optimism, thus allowing for a more realistic performance assessment in data assimilation. The presented approach becomes particularly simple for data assimilation methods employing a linear error feedback (such as synchronization schemes, nudging, incremental 3DVAR and 4DVar, and various Kalman filter approaches). Numerical examples considering a high gain observer confirm the theory.
Resumo:
We present a data-driven mathematical model of a key initiating step in platelet activation, a central process in the prevention of bleeding following Injury. In vascular disease, this process is activated inappropriately and causes thrombosis, heart attacks and stroke. The collagen receptor GPVI is the primary trigger for platelet activation at sites of injury. Understanding the complex molecular mechanisms initiated by this receptor is important for development of more effective antithrombotic medicines. In this work we developed a series of nonlinear ordinary differential equation models that are direct representations of biological hypotheses surrounding the initial steps in GPVI-stimulated signal transduction. At each stage model simulations were compared to our own quantitative, high-temporal experimental data that guides further experimental design, data collection and model refinement. Much is known about the linear forward reactions within platelet signalling pathways but knowledge of the roles of putative reverse reactions are poorly understood. An initial model, that includes a simple constitutively active phosphatase, was unable to explain experimental data. Model revisions, incorporating a complex pathway of interactions (and specifically the phosphatase TULA-2), provided a good description of the experimental data both based on observations of phosphorylation in samples from one donor and in those of a wider population. Our model was used to investigate the levels of proteins involved in regulating the pathway and the effect of low GPVI levels that have been associated with disease. Results indicate a clear separation in healthy and GPVI deficient states in respect of the signalling cascade dynamics associated with Syk tyrosine phosphorylation and activation. Our approach reveals the central importance of this negative feedback pathway that results in the temporal regulation of a specific class of protein tyrosine phosphatases in controlling the rate, and therefore extent, of GPVI-stimulated platelet activation.
Resumo:
Nonlinear data assimilation is high on the agenda in all fields of the geosciences as with ever increasing model resolution and inclusion of more physical (biological etc.) processes, and more complex observation operators the data-assimilation problem becomes more and more nonlinear. The suitability of particle filters to solve the nonlinear data assimilation problem in high-dimensional geophysical problems will be discussed. Several existing and new schemes will be presented and it is shown that at least one of them, the Equivalent-Weights Particle Filter, does indeed beat the curse of dimensionality and provides a way forward to solve the problem of nonlinear data assimilation in high-dimensional systems.
Resumo:
Accurate knowledge of the location and magnitude of ocean heat content (OHC) variability and change is essential for understanding the processes that govern decadal variations in surface temperature, quantifying changes in the planetary energy budget, and developing constraints on the transient climate response to external forcings. We present an overview of the temporal and spatial characteristics of OHC variability and change as represented by an ensemble of dynamical and statistical ocean reanalyses (ORAs). Spatial maps of the 0–300 m layer show large regions of the Pacific and Indian Oceans where the interannual variability of the ensemble mean exceeds ensemble spread, indicating that OHC variations are well-constrained by the available observations over the period 1993–2009. At deeper levels, the ORAs are less well-constrained by observations with the largest differences across the ensemble mostly associated with areas of high eddy kinetic energy, such as the Southern Ocean and boundary current regions. Spatial patterns of OHC change for the period 1997–2009 show good agreement in the upper 300 m and are characterized by a strong dipole pattern in the Pacific Ocean. There is less agreement in the patterns of change at deeper levels, potentially linked to differences in the representation of ocean dynamics, such as water mass formation processes. However, the Atlantic and Southern Oceans are regions in which many ORAs show widespread warming below 700 m over the period 1997–2009. Annual time series of global and hemispheric OHC change for 0–700 m show the largest spread for the data sparse Southern Hemisphere and a number of ORAs seem to be subject to large initialization ‘shock’ over the first few years. In agreement with previous studies, a number of ORAs exhibit enhanced ocean heat uptake below 300 and 700 m during the mid-1990s or early 2000s. The ORA ensemble mean (±1 standard deviation) of rolling 5-year trends in full-depth OHC shows a relatively steady heat uptake of approximately 0.9 ± 0.8 W m−2 (expressed relative to Earth’s surface area) between 1995 and 2002, which reduces to about 0.2 ± 0.6 W m−2 between 2004 and 2006, in qualitative agreement with recent analysis of Earth’s energy imbalance. There is a marked reduction in the ensemble spread of OHC trends below 300 m as the Argo profiling float observations become available in the early 2000s. In general, we suggest that ORAs should be treated with caution when employed to understand past ocean warming trends—especially when considering the deeper ocean where there is little in the way of observational constraints. The current work emphasizes the need to better observe the deep ocean, both for providing observational constraints for future ocean state estimation efforts and also to develop improved models and data assimilation methods.
Resumo:
TIGGE was a major component of the THORPEX (The Observing System Research and Predictability Experiment) research program, whose aim is to accelerate improvements in forecasting high-impact weather. By providing ensemble prediction data from leading operational forecast centers, TIGGE has enhanced collaboration between the research and operational meteorological communities and enabled research studies on a wide range of topics. The paper covers the objective evaluation of the TIGGE data. For a range of forecast parameters, it is shown to be beneficial to combine ensembles from several data providers in a Multi-model Grand Ensemble. Alternative methods to correct systematic errors, including the use of reforecast data, are also discussed. TIGGE data have been used for a range of research studies on predictability and dynamical processes. Tropical cyclones are the most destructive weather systems in the world, and are a focus of multi-model ensemble research. Their extra-tropical transition also has a major impact on skill of mid-latitude forecasts. We also review how TIGGE has added to our understanding of the dynamics of extra-tropical cyclones and storm tracks. Although TIGGE is a research project, it has proved invaluable for the development of products for future operational forecasting. Examples include the forecasting of tropical cyclone tracks, heavy rainfall, strong winds, and flood prediction through coupling hydrological models to ensembles. Finally the paper considers the legacy of TIGGE. We discuss the priorities and key issues in predictability and ensemble forecasting, including the new opportunities of convective-scale ensembles, links with ensemble data assimilation methods, and extension of the range of useful forecast skill.
Resumo:
The dynamical processes that lead to open cluster disruption cause its mass to decrease. To investigate such processes from the observational point of view, it is important to identify open cluster remnants (OCRs), which are intrinsically poorly populated. Due to their nature, distinguishing them from field star fluctuations is still an unresolved issue. In this work, we developed a statistical diagnostic tool to distinguish poorly populated star concentrations from background field fluctuations. We use 2MASS photometry to explore one of the conditions required for a stellar group to be a physical group: to produce distinct sequences in a colour-magnitude diagram (CMD). We use automated tools to (i) derive the limiting radius; (ii) decontaminate the field and assign membership probabilities; (iii) fit isochrones; and (iv) compare object and field CMDs, considering the isochrone solution, in order to verify the similarity. If the object cannot be statistically considered as a field fluctuation, we derive its probable age, distance modulus, reddening and uncertainties in a self-consistent way. As a test, we apply the tool to open clusters and comparison fields. Finally, we study the OCR candidates DoDz 6, NGC 272, ESO 435 SC48 and ESO 325 SC15. The tool is optimized to treat these low-statistic objects and to separate the best OCR candidates for studies on kinematics and chemical composition. The study of the possible OCRs will certainly provide a deep understanding of OCR properties and constraints for theoretical models, including insights into the evolution of open clusters and dissolution rates.
Resumo:
Evidence of jet precession in many galactic and extragalactic sources has been reported in the literature. Much of this evidence is based on studies of the kinematics of the jet knots, which depends on the correct identification of the components to determine their respective proper motions and position angles on the plane of the sky. Identification problems related to fitting procedures, as well as observations poorly sampled in time, may influence the follow-up of the components in time, which consequently might contribute to a misinterpretation of the data. In order to deal with these limitations, we introduce a very powerful statistical tool to analyse jet precession: the cross-entropy method for continuous multi-extremal optimization. Only based on the raw data of the jet components (right ascension and declination offsets from the core), the cross-entropy method searches for the precession model parameters that better represent the data. In this work we present a large number of tests to validate this technique, using synthetic precessing jets built from a given set of precession parameters. With the aim of recovering these parameters, we applied the cross-entropy method to our precession model, varying exhaustively the quantities associated with the method. Our results have shown that even in the most challenging tests, the cross-entropy method was able to find the correct parameters within a 1 per cent level. Even for a non-precessing jet, our optimization method could point out successfully the lack of precession.
Resumo:
We present a new technique for obtaining model fittings to very long baseline interferometric images of astrophysical jets. The method minimizes a performance function proportional to the sum of the squared difference between the model and observed images. The model image is constructed by summing N(s) elliptical Gaussian sources characterized by six parameters: two-dimensional peak position, peak intensity, eccentricity, amplitude, and orientation angle of the major axis. We present results for the fitting of two main benchmark jets: the first constructed from three individual Gaussian sources, the second formed by five Gaussian sources. Both jets were analyzed by our cross-entropy technique in finite and infinite signal-to-noise regimes, the background noise chosen to mimic that found in interferometric radio maps. Those images were constructed to simulate most of the conditions encountered in interferometric images of active galactic nuclei. We show that the cross-entropy technique is capable of recovering the parameters of the sources with a similar accuracy to that obtained from the very traditional Astronomical Image Processing System Package task IMFIT when the image is relatively simple (e. g., few components). For more complex interferometric maps, our method displays superior performance in recovering the parameters of the jet components. Our methodology is also able to show quantitatively the number of individual components present in an image. An additional application of the cross-entropy technique to a real image of a BL Lac object is shown and discussed. Our results indicate that our cross-entropy model-fitting technique must be used in situations involving the analysis of complex emission regions having more than three sources, even though it is substantially slower than current model-fitting tasks (at least 10,000 times slower for a single processor, depending on the number of sources to be optimized). As in the case of any model fitting performed in the image plane, caution is required in analyzing images constructed from a poorly sampled (u, v) plane.