341 resultados para Topic Modelling
em CentAUR: Central Archive University of Reading - UK
Resumo:
Using the novel technique of topic modelling, this paper examines thematic patterns and their changes over time in a large corpus of corporate social responsibility (CSR) reports produced in the oil sector. Whereas previous research on corporate communications has been small-scale or interested in selected lexical aspects and thematic categories identified ex ante, our approach allows for thematic patterns to emerge from the data. The analysis reveals a number of major trends and topic shifts pointing to changing practices of CSR. Nowadays ‘people’, ‘communities’ and ‘rights’ seem to be given more prominence, whereas ‘environmental protection’ appears to be less relevant. Using more established corpus-based methods, we subsequently explore two top phrases - ‘human rights’ and ‘climate change’ that were identified as representative of the shifting thematic patterns. Our approach strikes a balance between the purely quantitative and qualitative methodologies and offers applied linguists new ways of exploring discourse in large collections of texts.
Resumo:
Finite computing resources limit the spatial resolution of state-of-the-art global climate simulations to hundreds of kilometres. In neither the atmosphere nor the ocean are small-scale processes such as convection, clouds and ocean eddies properly represented. Climate simulations are known to depend, sometimes quite strongly, on the resulting bulk-formula representation of unresolved processes. Stochastic physics schemes within weather and climate models have the potential to represent the dynamical effects of unresolved scales in ways which conventional bulk-formula representations are incapable of so doing. The application of stochastic physics to climate modelling is a rapidly advancing, important and innovative topic. The latest research findings are gathered together in the Theme Issue for which this paper serves as the introduction.
Resumo:
The global monsoon system is so varied and complex that understanding and predicting its diverse behaviour remains a challenge that will occupy modellers for many years to come. Despite the difficult task ahead, an improved monsoon modelling capability has been realized through the inclusion of more detailed physics of the climate system and higher resolution in our numerical models. Perhaps the most crucial improvement to date has been the development of coupled ocean-atmosphere models. From subseasonal to interdecadal time scales, only through the inclusion of air-sea interaction can the proper phasing and teleconnections of convection be attained with respect to sea surface temperature variations. Even then, the response to slow variations in remote forcings (e.g., El Niño—Southern Oscillation) does not result in a robust solution, as there are a host of competing modes of variability that must be represented, including those that appear to be chaotic. Understanding the links between monsoons and land surface processes is not as mature as that explored regarding air-sea interactions. A land surface forcing signal appears to dominate the onset of wet season rainfall over the North American monsoon region, though the relative role of ocean versus land forcing remains a topic of investigation in all the monsoon systems. Also, improved forecasts have been made during periods in which additional sounding observations are available for data assimilation. Thus, there is untapped predictability that can only be attained through the development of a more comprehensive observing system for all monsoon regions. Additionally, improved parameterizations - for example, of convection, cloud, radiation, and boundary layer schemes as well as land surface processes - are essential to realize the full potential of monsoon predictability. A more comprehensive assessment is needed of the impact of black carbon aerosols, which may modulate that of other anthropogenic greenhouse gases. Dynamical considerations require ever increased horizontal resolution (probably to 0.5 degree or higher) in order to resolve many monsoon features including, but not limited to, the Mei-Yu/Baiu sudden onset and withdrawal, low-level jet orientation and variability, and orographic forced rainfall. Under anthropogenic climate change many competing factors complicate making robust projections of monsoon changes. Absent aerosol effects, increased land-sea temperature contrast suggests strengthened monsoon circulation due to climate change. However, increased aerosol emissions will reflect more solar radiation back to space, which may temper or even reduce the strength of monsoon circulations compared to the present day. Precipitation may behave independently from the circulation under warming conditions in which an increased atmospheric moisture loading, based purely on thermodynamic considerations, could result in increased monsoon rainfall under climate change. The challenge to improve model parameterizations and include more complex processes and feedbacks pushes computing resources to their limit, thus requiring continuous upgrades of computational infrastructure to ensure progress in understanding and predicting current and future behaviour of monsoons.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
Flood modelling of urban areas is still at an early stage, partly because until recently topographic data of sufficiently high resolution and accuracy have been lacking in urban areas. However, Digital Surface Models (DSMs) generated from airborne scanning laser altimetry (LiDAR) having sub-metre spatial resolution have now become available, and these are able to represent the complexities of urban topography. The paper describes the development of a LiDAR post-processor for urban flood modelling based on the fusion of LiDAR and digital map data. The map data are used in conjunction with LiDAR data to identify different object types in urban areas, though pattern recognition techniques are also employed. Post-processing produces a Digital Terrain Model (DTM) for use as model bathymetry, and also a friction parameter map for use in estimating spatially-distributed friction coefficients. In vegetated areas, friction is estimated from LiDAR-derived vegetation height, and (unlike most vegetation removal software) the method copes with short vegetation less than ~1m high, which may occupy a substantial fraction of even an urban floodplain. The DTM and friction parameter map may also be used to help to generate an unstructured mesh of a vegetated urban floodplain for use by a 2D finite element model. The mesh is decomposed to reflect floodplain features having different frictional properties to their surroundings, including urban features such as buildings and roads as well as taller vegetation features such as trees and hedges. This allows a more accurate estimation of local friction. The method produces a substantial node density due to the small dimensions of many urban features.
Resumo:
Satellite-based rainfall monitoring is widely used for climatological studies because of its full global coverage but it is also of great importance for operational purposes especially in areas such as Africa where there is a lack of ground-based rainfall data. Satellite rainfall estimates have enormous potential benefits as input to hydrological and agricultural models because of their real time availability, low cost and full spatial coverage. One issue that needs to be addressed is the uncertainty on these estimates. This is particularly important in assessing the likely errors on the output from non-linear models (rainfall-runoff or crop yield) which make use of the rainfall estimates, aggregated over an area, as input. Correct assessment of the uncertainty on the rainfall is non-trivial as it must take account of • the difference in spatial support of the satellite information and independent data used for calibration • uncertainties on the independent calibration data • the non-Gaussian distribution of rainfall amount • the spatial intermittency of rainfall • the spatial correlation of the rainfall field This paper describes a method for estimating the uncertainty on satellite-based rainfall values taking account of these factors. The method involves firstly a stochastic calibration which completely describes the probability of rainfall occurrence and the pdf of rainfall amount for a given satellite value, and secondly the generation of ensemble of rainfall fields based on the stochastic calibration but with the correct spatial correlation structure within each ensemble member. This is achieved by the use of geostatistical sequential simulation. The ensemble generated in this way may be used to estimate uncertainty at larger spatial scales. A case study of daily rainfall monitoring in the Gambia, west Africa for the purpose of crop yield forecasting is presented to illustrate the method.
Resumo:
A wind-tunnel study was conducted to investigate ventilation of scalars from urban-like geometries at neighbourhood scale by exploring two different geometries a uniform height roughness and a non-uniform height roughness, both with an equal plan and frontal density of λ p = λ f = 25%. In both configurations a sub-unit of the idealized urban surface was coated with a thin layer of naphthalene to represent area sources. The naphthalene sublimation method was used to measure directly total area-averaged transport of scalars out of the complex geometries. At the same time, naphthalene vapour concentrations controlled by the turbulent fluxes were detected using a fast Flame Ionisation Detection (FID) technique. This paper describes the novel use of a naphthalene coated surface as an area source in dispersion studies. Particular emphasis was also given to testing whether the concentration measurements were independent of Reynolds number. For low wind speeds, transfer from the naphthalene surface is determined by a combination of forced and natural convection. Compared with a propane point source release, a 25% higher free stream velocity was needed for the naphthalene area source to yield Reynolds-number-independent concentration fields. Ventilation transfer coefficients w T /U derived from the naphthalene sublimation method showed that, whilst there was enhanced vertical momentum exchange due to obstacle height variability, advection was reduced and dispersion from the source area was not enhanced. Thus, the height variability of a canopy is an important parameter when generalising urban dispersion. Fine resolution concentration measurements in the canopy showed the effect of height variability on dispersion at street scale. Rapid vertical transport in the wake of individual high-rise obstacles was found to generate elevated point-like sources. A Gaussian plume model was used to analyse differences in the downstream plumes. Intensified lateral and vertical plume spread and plume dilution with height was found for the non-uniform height roughness
Resumo:
Compute grids are used widely in many areas of environmental science, but there has been limited uptake of grid computing by the climate modelling community, partly because the characteristics of many climate models make them difficult to use with popular grid middleware systems. In particular, climate models usually produce large volumes of output data, and running them usually involves complicated workflows implemented as shell scripts. For example, NEMO (Smith et al. 2008) is a state-of-the-art ocean model that is used currently for operational ocean forecasting in France, and will soon be used in the UK for both ocean forecasting and climate modelling. On a typical modern cluster, a particular one year global ocean simulation at 1-degree resolution takes about three hours when running on 40 processors, and produces roughly 20 GB of output as 50000 separate files. 50-year simulations are common, during which the model is resubmitted as a new job after each year. Running NEMO relies on a set of complicated shell scripts and command utilities for data pre-processing and post-processing prior to job resubmission. Grid Remote Execution (G-Rex) is a pure Java grid middleware system that allows scientific applications to be deployed as Web services on remote computer systems, and then launched and controlled as if they are running on the user's own computer. Although G-Rex is general purpose middleware it has two key features that make it particularly suitable for remote execution of climate models: (1) Output from the model is transferred back to the user while the run is in progress to prevent it from accumulating on the remote system and to allow the user to monitor the model; (2) The client component is a command-line program that can easily be incorporated into existing model work-flow scripts. G-Rex has a REST (Fielding, 2000) architectural style, which allows client programs to be very simple and lightweight and allows users to interact with model runs using only a basic HTTP client (such as a Web browser or the curl utility) if they wish. This design also allows for new client interfaces to be developed in other programming languages with relatively little effort. The G-Rex server is a standard Web application that runs inside a servlet container such as Apache Tomcat and is therefore easy to install and maintain by system administrators. G-Rex is employed as the middleware for the NERC1 Cluster Grid, a small grid of HPC2 clusters belonging to collaborating NERC research institutes. Currently the NEMO (Smith et al. 2008) and POLCOMS (Holt et al, 2008) ocean models are installed, and there are plans to install the Hadley Centre’s HadCM3 model for use in the decadal climate prediction project GCEP (Haines et al., 2008). The science projects involving NEMO on the Grid have a particular focus on data assimilation (Smith et al. 2008), a technique that involves constraining model simulations with observations. The POLCOMS model will play an important part in the GCOMS project (Holt et al, 2008), which aims to simulate the world’s coastal oceans. A typical use of G-Rex by a scientist to run a climate model on the NERC Cluster Grid proceeds as follows :(1) The scientist prepares input files on his or her local machine. (2) Using information provided by the Grid’s Ganglia3 monitoring system, the scientist selects an appropriate compute resource. (3) The scientist runs the relevant workflow script on his or her local machine. This is unmodified except that calls to run the model (e.g. with “mpirun”) are simply replaced with calls to "GRexRun" (4) The G-Rex middleware automatically handles the uploading of input files to the remote resource, and the downloading of output files back to the user, including their deletion from the remote system, during the run. (5) The scientist monitors the output files, using familiar analysis and visualization tools on his or her own local machine. G-Rex is well suited to climate modelling because it addresses many of the middleware usability issues that have led to limited uptake of grid computing by climate scientists. It is a lightweight, low-impact and easy-to-install solution that is currently designed for use in relatively small grids such as the NERC Cluster Grid. A current topic of research is the use of G-Rex as an easy-to-use front-end to larger-scale Grid resources such as the UK National Grid service.
Resumo:
The decadal predictability of three-dimensional Atlantic Ocean anomalies is examined in a coupled global climate model (HadCM3) using a Linear Inverse Modelling (LIM) approach. It is found that the evolution of temperature and salinity in the Atlantic, and the strength of the meridional overturning circulation (MOC), can be effectively described by a linear dynamical system forced by white noise. The forecasts produced using this linear model are more skillful than other reference forecasts for several decades. Furthermore, significant non-normal amplification is found under several different norms. The regions from which this growth occurs are found to be fairly shallow and located in the far North Atlantic. Initially, anomalies in the Nordic Seas impact the MOC, and the anomalies then grow to fill the entire Atlantic basin, especially at depth, over one to three decades. It is found that the structure of the optimal initial condition for amplification is sensitive to the norm employed, but the initial growth seems to be dominated by MOC-related basin scale changes, irrespective of the choice of norm. The consistent identification of the far North Atlantic as the most sensitive region for small perturbations suggests that additional observations in this region would be optimal for constraining decadal climate predictions.