The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.


Using the method of Lorenz (1982), we have estimated the predictability of a recent version of the European Center for Medium-Range Weather Forecasting (ECMWF) model using two different estimates of the initial error corresponding to 6- and 24-hr forecast errors, respectively. For a 6-hr forecast error of the extratropical 500-hPa geopotential height field, a potential increase in forecast skill by more than 3 d is suggested, indicating a further increase in predictability by another 1.5 d compared to the use of a 24-hr forecast error. This is due to a smaller initial error and to an initial error reduction resulting in a smaller averaged growth rate for the whole 7-d forecast. A similar assessment for the tropics using the wind vector fields at 850 and 250 hPa suggests a huge potential improvement with a 7-d forecast providing the same skill as a 1-d forecast now. A contributing factor to the increase in the estimate of predictability is the apparent slow increase of error during the early part of the forecast.


Data assimilation – the set of techniques whereby information from observing systems and models is combined optimally – is rapidly becoming prominent in endeavours to exploit Earth Observation for Earth sciences, including climate prediction. This paper explains the broad principles of data assimilation, outlining different approaches (optimal interpolation, three-dimensional and four-dimensional variational methods, the Kalman Filter), together with the approximations that are often necessary to make them practicable. After pointing out a variety of benefits of data assimilation, the paper then outlines some practical applications of the exploitation of Earth Observation by data assimilation in the areas of operational oceanography, chemical weather forecasting and carbon cycle modelling. Finally, some challenges for the future are noted.


The Convective Storm Initiation Project (CSIP) is an international project to understand precisely where, when, and how convective clouds form and develop into showers in the mainly maritime environment of southern England. A major aim of CSIP is to compare the results of the very high resolution Met Office weather forecasting model with detailed observations of the early stages of convective clouds and to use the newly gained understanding to improve the predictions of the model. A large array of ground-based instruments plus two instrumented aircraft, from the U.K. National Centre for Atmospheric Science (NCAS) and the German Institute for Meteorology and Climate Research (IMK), Karlsruhe, were deployed in southern England, over an area centered on the meteorological radars at Chilbolton, during the summers of 2004 and 2005. In addition to a variety of ground-based remote-sensing instruments, numerous rawin-sondes were released at one- to two-hourly intervals from six closely spaced sites. The Met Office weather radar network and Meteosat satellite imagery were used to provide context for the observations made by the instruments deployed during CSIP. This article presents an overview of the CSIP field campaign and examples from CSIP of the types of convective initiation phenomena that are typical in the United Kingdom. It shows the way in which certain kinds of observational data are able to reveal these phenomena and gives an explanation of how the analyses of data from the field campaign will be used in the development of an improved very high resolution NWP model for operational use.


Observations from the High Resolution Dynamics Limb Sounder (HIRDLS) instrument on NASA's Aura satellite are used to quantify gravity wave momentum fluxes in the middle atmosphere. The period around the 2006 Arctic sudden stratospheric warming (SSW) is investigated, during which a substantial elevation of the stratopause occurred. Analysis of the HIRDLS results, together with analysis of European Centre for Medium-Range Weather Forecasting zonal winds, provide direct evidence of wind filtering of the gravity wave spectrum during this period. This confirms previous hypotheses from model studies and further contributes to our understanding of the effects of gravity wave driving on the winter polar stratopause.


The “butterfly effect” is a popularly known paradigm; commonly it is said that when a butterfly flaps its wings in Brazil, it may cause a tornado in Texas. This essentially describes how weather forecasts can be extremely senstive to small changes in the given atmospheric data, or initial conditions, used in computer model simulations. In 1961 Edward Lorenz found, when running a weather model, that small changes in the initial conditions given to the model can, over time, lead to entriely different forecasts (Lorenz, 1963). This discovery highlights one of the major challenges in modern weather forecasting; that is to provide the computer model with the most accurately specified initial conditions possible. A process known as data assimilation seeks to minimize the errors in the given initial conditions and was, in 1911, described by Bjerkness as “the ultimate problem in meteorology” (Bjerkness, 1911).


We unfold a profound relationship between the dynamics of finite-size perturbations in spatially extended chaotic systems and the universality class of Kardar-Parisi-Zhang (KPZ). We show how this relationship can be exploited to obtain a complete theoretical description of the bred vectors dynamics. The existence of characteristic length/time scales, the spatial extent of spatial correlations and how to time it, and the role of the breeding amplitude are all analyzed in the light of our theory. Implications to weather forecasting based on ensembles of initial conditions are also discussed.


Ice clouds are an important yet largely unvalidated component of weather forecasting and climate models, but radar offers the potential to provide the necessary data to evaluate them. First in this paper, coordinated aircraft in situ measurements and scans by a 3-GHz radar are presented, demonstrating that, for stratiform midlatitude ice clouds, radar reflectivity in the Rayleigh-scattering regime may be reliably calculated from aircraft size spectra if the "Brown and Francis" mass-size relationship is used. The comparisons spanned radar reflectivity values from -15 to +20 dBZ, ice water contents (IWCs) from 0.01 to 0.4 g m(-3), and median volumetric diameters between 0.2 and 3 mm. In mixed-phase conditions the agreement is much poorer because of the higher-density ice particles present. A large midlatitude aircraft dataset is then used to derive expressions that relate radar reflectivity and temperature to ice water content and visible extinction coefficient. The analysis is an advance over previous work in several ways: the retrievals vary smoothly with both input parameters, different relationships are derived for the common radar frequencies of 3, 35, and 94 GHz, and the problem of retrieving the long-term mean and the horizontal variance of ice cloud parameters is considered separately. It is shown that the dependence on temperature arises because of the temperature dependence of the number concentration "intercept parameter" rather than mean particle size. A comparison is presented of ice water content derived from scanning 3-GHz radar with the values held in the Met Office mesoscale forecast model, for eight precipitating cases spanning 39 h over Southern England. It is found that the model predicted mean I WC to within 10% of the observations at temperatures between -30 degrees and - 10 degrees C but tended to underestimate it by around a factor of 2 at colder temperatures.


Prediction of the solar wind conditions in near-Earth space, arising from both quasi-steady and transient structures, is essential for space weather forecasting. To achieve forecast lead times of a day or more, such predictions must be made on the basis of remote solar observations. A number of empirical prediction schemes have been proposed to forecast the transit time and speed of coronal mass ejections (CMEs) at 1 AU. However, the current lack of magnetic field measurements in the corona severely limits our ability to forecast the 1 AU magnetic field strengths resulting from interplanetary CMEs (ICMEs). In this study we investigate the relation between the characteristic magnetic field strengths and speeds of both magnetic cloud and noncloud ICMEs at 1 AU. Correlation between field and speed is found to be significant only in the sheath region ahead of magnetic clouds, not within the clouds themselves. The lack of such a relation in the sheaths ahead of noncloud ICMEs is consistent with such ICMEs being skimming encounters of magnetic clouds, though other explanations are also put forward. Linear fits to the radial speed profiles of ejecta reveal that faster-traveling ICMEs are also expanding more at 1 AU. We combine these empirical relations to form a prediction scheme for the magnetic field strength in the sheaths ahead of magnetic clouds and also suggest a method for predicting the radial speed profile through an ICME on the basis of upstream measurements.


We present stereoscopic images of an Earth-impacting Coronal Mass Ejection (CME). The CME was imaged by the Heliospheric Imagers onboard the twin STEREO spacecraft during December 2008. The apparent acceleration of the CME is used to provide independent estimates of its speed and direction from the two spacecraft. Three distinct signatures within the CME were all found to be closely Earth-directed. At the time that the CME was predicted to pass the ACE spacecraft, in-situ observations contained a typical CME signature. At Earth, ground-based magnetometer observations showed a small but widespread sudden response to the compression of the geomagnetic cavity at CME impact. In this case, STEREO could have given warning of CME impact at least 24 hours in advance. These stereoscopic observations represent a significant milestone for the STEREO mission and have significant potential for improving operational space weather forecasting.


One of the fundamental questions in dynamical meteorology, and one of the basic objectives of GARP, is to determine the predictability of the atmosphere. In the early planning stage and preparation for GARP a number of theoretical and numerical studies were undertaken, indicating that there existed an inherent unpredictability in the atmosphere which even with the most ideal observing system would limit useful weather forecasting to 2-3 weeks.


The very first numerical models which were developed more than 20 years ago were drastic simplifications of the real atmosphere and they were mostly restricted to describe adiabatic processes. For prediction of a day or two of the mid tropospheric flow these models often gave reasonable results but the result deteriorated quickly when the prediction was extended further in time. The prediction of the surface flow was unsatisfactory even for short predictions. It was evident that both the energy generating processes as well as the dissipative processes have to be included in numerical models in order to predict the weather patterns in the lower part of the atmosphere and to predict the atmosphere in general beyond a day or two. Present-day computers make it possible to attack the weather forecasting problem in a more comprehensive and complete way and substantial efforts have been made during the last decade in particular to incorporate the non-adiabatic processes in numerical prediction models. The physics of radiational transfer, condensation of moisture, turbulent transfer of heat, momentum and moisture and the dissipation of kinetic energy are the most important processes associated with the formation of energy sources and sinks in the atmosphere and these have to be incorporated in numerical prediction models extended over more than a few days. The mechanisms of these processes are mainly related to small scale disturbances in space and time or even molecular processes. It is therefore one of the basic characteristics of numerical models that these small scale disturbances cannot be included in an explicit way. The reason for this is the discretization of the model's atmosphere by a finite difference grid or the use of a Galerkin or spectral function representation. The second reason why we cannot explicitly introduce these processes into a numerical model is due to the fact that some physical processes necessary to describe them (such as the local buoyance) are a priori eliminated by the constraints of hydrostatic adjustment. Even if this physical constraint can be relaxed by making the models non-hydrostatic the scale problem is virtually impossible to solve and for the foreseeable future we have to try to incorporate the ensemble or gross effect of these physical processes on the large scale synoptic flow. The formulation of the ensemble effect in terms of grid-scale variables (the parameters of the large-scale flow) is called 'parameterization'. For short range prediction of the synoptic flow at middle and high latitudes, very simple parameterization has proven to be rather successful.


A programmable data acquisition system to allow novel use of meteorological radiosondes for atmospheric science measurements is described. In its basic form it supports four analogue inputs at 16 bit resolution, and up to two further inputs at lower resolution configurable instead for digital instruments. It also provides multiple instrument power supplies (+8V, +16V, +5V and -8V) from the 9V radiosonde battery. During a balloon flight encountering air temperatures from +17°C to -66°C, the worst case voltage drift in the 5V unipolar digitisation circuitry was 20mV. The system liberates a new range of low cost atmospheric research measurements, by utilising radiosondes routinely launched internationally for weather forecasting purposes. No additional receiving equipment is required. Comparisons between the specially instrumented and standard meteorological radiosondes show negligible effect of the additional instrumentation on the standard meteorological data.


Many operational weather forecasting centres use semi-implicit time-stepping schemes because of their good efficiency. However, as computers become ever more parallel, horizontally explicit solutions of the equations of atmospheric motion might become an attractive alternative due to the additional inter-processor communication of implicit methods. Implicit and explicit (IMEX) time-stepping schemes have long been combined in models of the atmosphere using semi-implicit, split-explicit or HEVI splitting. However, most studies of the accuracy and stability of IMEX schemes have been limited to the parabolic case of advection–diffusion equations. We demonstrate how a number of Runge–Kutta IMEX schemes can be used to solve hyperbolic wave equations either semi-implicitly or HEVI. A new form of HEVI splitting is proposed, UfPreb, which dramatically improves accuracy and stability of simulations of gravity waves in stratified flow. As a consequence it is found that there are HEVI schemes that do not lose accuracy in comparison to semi-implicit ones. The stability limits of a number of variations of trapezoidal implicit and some Runge–Kutta IMEX schemes are found and the schemes are tested on two vertical slice cases using the compressible Boussinesq equations split into various combinations of implicit and explicit terms. Some of the Runge–Kutta schemes are found to be beneficial over trapezoidal, especially since they damp high frequencies without dropping to first-order accuracy. We test schemes that are not formally accurate for stiff systems but in stiff limits (nearly incompressible) and find that they can perform well. The scheme ARK2(2,3,2) performs the best in the tests.