39 resultados para Random coefficient logit models
Resumo:
Asymmetry in a distribution can arise from a long tail of values in the underlying process or from outliers that belong to another population that contaminate the primary process. The first paper of this series examined the effects of the former on the variogram and this paper examines the effects of asymmetry arising from outliers. Simulated annealing was used to create normally distributed random fields of different size that are realizations of known processes described by variograms with different nugget:sill ratios. These primary data sets were then contaminated with randomly located and spatially aggregated outliers from a secondary process to produce different degrees of asymmetry. Experimental variograms were computed from these data by Matheron's estimator and by three robust estimators. The effects of standard data transformations on the coefficient of skewness and on the variogram were also investigated. Cross-validation was used to assess the performance of models fitted to experimental variograms computed from a range of data contaminated by outliers for kriging. The results showed that where skewness was caused by outliers the variograms retained their general shape, but showed an increase in the nugget and sill variances and nugget:sill ratios. This effect was only slightly more for the smallest data set than for the two larger data sets and there was little difference between the results for the latter. Overall, the effect of size of data set was small for all analyses. The nugget:sill ratio showed a consistent decrease after transformation to both square roots and logarithms; the decrease was generally larger for the latter, however. Aggregated outliers had different effects on the variogram shape from those that were randomly located, and this also depended on whether they were aggregated near to the edge or the centre of the field. The results of cross-validation showed that the robust estimators and the removal of outliers were the most effective ways of dealing with outliers for variogram estimation and kriging. (C) 2007 Elsevier Ltd. All rights reserved.
Resumo:
Airborne scanning laser altimetry (LiDAR) is an important new data source for river flood modelling. LiDAR can give dense and accurate DTMs of floodplains for use as model bathymetry. Spatial resolutions of 0.5m or less are possible, with a height accuracy of 0.15m. LiDAR gives a Digital Surface Model (DSM), so vegetation removal software (e.g. TERRASCAN) must be used to obtain a DTM. An example used to illustrate the current state of the art will be the LiDAR data provided by the EA, which has been processed by their in-house software to convert the raw data to a ground DTM and separate vegetation height map. Their method distinguishes trees from buildings on the basis of object size. EA data products include the DTM with or without buildings removed, a vegetation height map, a DTM with bridges removed, etc. Most vegetation removal software ignores short vegetation less than say 1m high. We have attempted to extend vegetation height measurement to short vegetation using local height texture. Typically most of a floodplain may be covered in such vegetation. The idea is to assign friction coefficients depending on local vegetation height, so that friction is spatially varying. This obviates the need to calibrate a global floodplain friction coefficient. It’s not clear at present if the method is useful, but it’s worth testing further. The LiDAR DTM is usually determined by looking for local minima in the raw data, then interpolating between these to form a space-filling height surface. This is a low pass filtering operation, in which objects of high spatial frequency such as buildings, river embankments and walls may be incorrectly classed as vegetation. The problem is particularly acute in urban areas. A solution may be to apply pattern recognition techniques to LiDAR height data fused with other data types such as LiDAR intensity or multispectral CASI data. We are attempting to use digital map data (Mastermap structured topography data) to help to distinguish buildings from trees, and roads from areas of short vegetation. The problems involved in doing this will be discussed. A related problem of how best to merge historic river cross-section data with a LiDAR DTM will also be considered. LiDAR data may also be used to help generate a finite element mesh. In rural area we have decomposed a floodplain mesh according to taller vegetation features such as hedges and trees, so that e.g. hedge elements can be assigned higher friction coefficients than those in adjacent fields. We are attempting to extend this approach to urban area, so that the mesh is decomposed in the vicinity of buildings, roads, etc as well as trees and hedges. A dominant points algorithm is used to identify points of high curvature on a building or road, which act as initial nodes in the meshing process. A difficulty is that the resulting mesh may contain a very large number of nodes. However, the mesh generated may be useful to allow a high resolution FE model to act as a benchmark for a more practical lower resolution model. A further problem discussed will be how best to exploit data redundancy due to the high resolution of the LiDAR compared to that of a typical flood model. Problems occur if features have dimensions smaller than the model cell size e.g. for a 5m-wide embankment within a raster grid model with 15m cell size, the maximum height of the embankment locally could be assigned to each cell covering the embankment. But how could a 5m-wide ditch be represented? Again, this redundancy has been exploited to improve wetting/drying algorithms using the sub-grid-scale LiDAR heights within finite elements at the waterline.
Resumo:
Grass-based diets are of increasing social-economic importance in dairy cattle farming, but their low supply of glucogenic nutrients may limit the production of milk. Current evaluation systems that assess the energy supply and requirements are based on metabolisable energy (ME) or net energy (NE). These systems do not consider the characteristics of the energy delivering nutrients. In contrast, mechanistic models take into account the site of digestion, the type of nutrient absorbed and the type of nutrient required for production of milk constituents, and may therefore give a better prediction of supply and requirement of nutrients. The objective of the present study is to compare the ability of three energy evaluation systems, viz. the Dutch NE system, the agricultural and food research council (AFRC) ME system, and the feed into milk (FIM) ME system, and of a mechanistic model based on Dijkstra et al. [Simulation of digestion in cattle fed sugar cane: prediction of nutrient supply for milk production with locally available supplements. J. Agric. Sci., Cambridge 127, 247-60] and Mills et al. [A mechanistic model of whole-tract digestion and methanogenesis in the lactating dairy cow: model development, evaluation and application. J. Anim. Sci. 79, 1584-97] to predict the feed value of grass-based diets for milk production. The dataset for evaluation consists of 41 treatments of grass-based diets (at least 0.75 g ryegrass/g diet on DM basis). For each model, the predicted energy or nutrient supply, based on observed intake, was compared with predicted requirement based on observed performance. Assessment of the error of energy or nutrient supply relative to requirement is made by calculation of mean square prediction error (MSPE) and by concordance correlation coefficient (CCC). All energy evaluation systems predicted energy requirement to be lower (6-11%) than energy supply. The root MSPE (expressed as a proportion of the supply) was lowest for the mechanistic model (0.061), followed by the Dutch NE system (0.082), FIM ME system (0.097) and AFRCME system(0.118). For the energy evaluation systems, the error due to overall bias of prediction dominated the MSPE, whereas for the mechanistic model, proportionally 0.76 of MSPE was due to random variation. CCC analysis confirmed the higher accuracy and precision of the mechanistic model compared with energy evaluation systems. The error of prediction was positively related to grass protein content for the Dutch NE system, and was also positively related to grass DMI level for all models. In conclusion, current energy evaluation systems overestimate energy supply relative to energy requirement on grass-based diets for dairy cattle. The mechanistic model predicted glucogenic nutrients to limit performance of dairy cattle on grass-based diets, and proved to be more accurate and precise than the energy systems. The mechanistic model could be improved by allowing glucose maintenance and utilization requirements parameters to be variable. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Objectives: To assess the potential source of variation that surgeon may add to patient outcome in a clinical trial of surgical procedures. Methods: Two large (n = 1380) parallel multicentre randomized surgical trials were undertaken to compare laparoscopically assisted hysterectomy with conventional methods of abdominal and vaginal hysterectomy; involving 43 surgeons. The primary end point of the trial was the occurrence of at least one major complication. Patients were nested within surgeons giving the data set a hierarchical structure. A total of 10% of patients had at least one major complication, that is, a sparse binary outcome variable. A linear mixed logistic regression model (with logit link function) was used to model the probability of a major complication, with surgeon fitted as a random effect. Models were fitted using the method of maximum likelihood in SAS((R)). Results: There were many convergence problems. These were resolved using a variety of approaches including; treating all effects as fixed for the initial model building; modelling the variance of a parameter on a logarithmic scale and centring of continuous covariates. The initial model building process indicated no significant 'type of operation' across surgeon interaction effect in either trial, the 'type of operation' term was highly significant in the abdominal trial, and the 'surgeon' term was not significant in either trial. Conclusions: The analysis did not find a surgeon effect but it is difficult to conclude that there was not a difference between surgeons. The statistical test may have lacked sufficient power, the variance estimates were small with large standard errors, indicating that the precision of the variance estimates may be questionable.
Resumo:
Nonlinear system identification is considered using a generalized kernel regression model. Unlike the standard kernel model, which employs a fixed common variance for all the kernel regressors, each kernel regressor in the generalized kernel model has an individually tuned diagonal covariance matrix that is determined by maximizing the correlation between the training data and the regressor using a repeated guided random search based on boosting optimization. An efficient construction algorithm based on orthogonal forward regression with leave-one-out (LOO) test statistic and local regularization (LR) is then used to select a parsimonious generalized kernel regression model from the resulting full regression matrix. The proposed modeling algorithm is fully automatic and the user is not required to specify any criterion to terminate the construction procedure. Experimental results involving two real data sets demonstrate the effectiveness of the proposed nonlinear system identification approach.
Resumo:
The performance of various statistical models and commonly used financial indicators for forecasting securitised real estate returns are examined for five European countries: the UK, Belgium, the Netherlands, France and Italy. Within a VAR framework, it is demonstrated that the gilt-equity yield ratio is in most cases a better predictor of securitized returns than the term structure or the dividend yield. In particular, investors should consider in their real estate return models the predictability of the gilt-equity yield ratio in Belgium, the Netherlands and France, and the term structure of interest rates in France. Predictions obtained from the VAR and univariate time-series models are compared with the predictions of an artificial neural network model. It is found that, whilst no single model is universally superior across all series, accuracy measures and horizons considered, the neural network model is generally able to offer the most accurate predictions for 1-month horizons. For quarterly and half-yearly forecasts, the random walk with a drift is the most successful for the UK, Belgian and Dutch returns and the neural network for French and Italian returns. Although this study underscores market context and forecast horizon as parameters relevant to the choice of the forecast model, it strongly indicates that analysts should exploit the potential of neural networks and assess more fully their forecast performance against more traditional models.
Resumo:
The performance of flood inundation models is often assessed using satellite observed data; however these data have inherent uncertainty. In this study we assess the impact of this uncertainty when calibrating a flood inundation model (LISFLOOD-FP) for a flood event in December 2006 on the River Dee, North Wales, UK. The flood extent is delineated from an ERS-2 SAR image of the event using an active contour model (snake), and water levels at the flood margin calculated through intersection of the shoreline vector with LiDAR topographic data. Gauged water levels are used to create a reference water surface slope for comparison with the satellite-derived water levels. Residuals between the satellite observed data points and those from the reference line are spatially clustered into groups of similar values. We show that model calibration achieved using pattern matching of observed and predicted flood extent is negatively influenced by this spatial dependency in the data. By contrast, model calibration using water elevations produces realistic calibrated optimum friction parameters even when spatial dependency is present. To test the impact of removing spatial dependency a new method of evaluating flood inundation model performance is developed by using multiple random subsamples of the water surface elevation data points. By testing for spatial dependency using Moran’s I, multiple subsamples of water elevations that have no significant spatial dependency are selected. The model is then calibrated against these data and the results averaged. This gives a near identical result to calibration using spatially dependent data, but has the advantage of being a statistically robust assessment of model performance in which we can have more confidence. Moreover, by using the variations found in the subsamples of the observed data it is possible to assess the effects of observational uncertainty on the assessment of flooding risk.
Resumo:
The Asian monsoon system, including the western North Pacific (WNP), East Asian, and Indian monsoons, dominates the climate of the Asia-Indian Ocean-Pacific region, and plays a significant role in the global hydrological and energy cycles. The prediction of monsoons and associated climate features is a major challenge in seasonal time scale climate forecast. In this study, a comprehensive assessment of the interannual predictability of the WNP summer climate has been performed using the 1-month lead retrospective forecasts (hindcasts) of five state-of-the-art coupled models from ENSEMBLES for the period of 1960–2005. Spatial distribution of the temporal correlation coefficients shows that the interannual variation of precipitation is well predicted around the Maritime Continent and east of the Philippines. The high skills for the lower-tropospheric circulation and sea surface temperature (SST) spread over almost the whole WNP. These results indicate that the models in general successfully predict the interannual variation of the WNP summer climate. Two typical indices, the WNP summer precipitation index and the WNP lower-tropospheric circulation index (WNPMI), have been used to quantify the forecast skill. The correlation coefficient between five models’ multi-model ensemble (MME) mean prediction and observations for the WNP summer precipitation index reaches 0.66 during 1979–2005 while it is 0.68 for the WNPMI during 1960–2005. The WNPMI-regressed anomalies of lower-tropospheric winds, SSTs and precipitation are similar between observations and MME. Further analysis suggests that prediction reliability of the WNP summer climate mainly arises from the atmosphere–ocean interaction over the tropical Indian and the tropical Pacific Ocean, implying that continuing improvement in the representation of the air–sea interaction over these regions in CGCMs is a key for long-lead seasonal forecast over the WNP and East Asia. On the other hand, the prediction of the WNP summer climate anomalies exhibits a remarkable spread resulted from uncertainty in initial conditions. The summer anomalies related to the prediction spread, including the lower-tropospheric circulation, SST and precipitation anomalies, show a Pacific-Japan or East Asia-Pacific pattern in the meridional direction over the WNP. Our further investigations suggest that the WNPMI prediction spread arises mainly from the internal dynamics in air–sea interaction over the WNP and Indian Ocean, since the local relationships among the anomalous SST, circulation, and precipitation associated with the spread are similar to those associated with the interannual variation of the WNPMI in both observations and MME. However, the magnitudes of these anomalies related to the spread are weaker, ranging from one third to a half of those anomalies associated with the interannual variation of the WNPMI in MME over the tropical Indian Ocean and subtropical WNP. These results further support that the improvement in the representation of the air–sea interaction over the tropical Indian Ocean and subtropical WNP in CGCMs is a key for reducing the prediction spread and for improving the long-lead seasonal forecast over the WNP and East Asia.
Resumo:
What does the saving–investment (SI) relation really measure and how should the SI relation be measured? These are two of the most discussed issues triggered by the so-called Feldstein–Horioka puzzle. Based on panel data we introduce a new variant of functional coefficient models that allows to separate long and short to medium run parameter dependence. The new modeling framework is applied to uncover the determinants of the SI relation. Macroeconomic state variables such as openness, the age dependency ratio, government current and consumption expenditures are found to affect the SI relation significantly in the long run.
Resumo:
The estimation of the long-term wind resource at a prospective site based on a relatively short on-site measurement campaign is an indispensable task in the development of a commercial wind farm. The typical industry approach is based on the measure-correlate-predict �MCP� method where a relational model between the site wind velocity data and the data obtained from a suitable reference site is built from concurrent records. In a subsequent step, a long-term prediction for the prospective site is obtained from a combination of the relational model and the historic reference data. In the present paper, a systematic study is presented where three new MCP models, together with two published reference models �a simple linear regression and the variance ratio method�, have been evaluated based on concurrent synthetic wind speed time series for two sites, simulating the prospective and the reference site. The synthetic method has the advantage of generating time series with the desired statistical properties, including Weibull scale and shape factors, required to evaluate the five methods under all plausible conditions. In this work, first a systematic discussion of the statistical fundamentals behind MCP methods is provided and three new models, one based on a nonlinear regression and two �termed kernel methods� derived from the use of conditional probability density functions, are proposed. All models are evaluated by using five metrics under a wide range of values of the correlation coefficient, the Weibull scale, and the Weibull shape factor. Only one of all models, a kernel method based on bivariate Weibull probability functions, is capable of accurately predicting all performance metrics studied.
Resumo:
We review and structure some of the mathematical and statistical models that have been developed over the past half century to grapple with theoretical and experimental questions about the stochastic development of aging over the life course. We suggest that the mathematical models are in large part addressing the problem of partitioning the randomness in aging: How does aging vary between individuals, and within an individual over the lifecourse? How much of the variation is inherently related to some qualities of the individual, and how much is entirely random? How much of the randomness is cumulative, and how much is merely short-term flutter? We propose that recent lines of statistical inquiry in survival analysis could usefully grapple with these questions, all the more so if they were more explicitly linked to the relevant mathematical and biological models of aging. To this end, we describe points of contact among the various lines of mathematical and statistical research. We suggest some directions for future work, including the exploration of information-theoretic measures for evaluating components of stochastic models as the basis for analyzing experiments and anchoring theoretical discussions of aging.
Resumo:
Undirected graphical models are widely used in statistics, physics and machine vision. However Bayesian parameter estimation for undirected models is extremely challenging, since evaluation of the posterior typically involves the calculation of an intractable normalising constant. This problem has received much attention, but very little of this has focussed on the important practical case where the data consists of noisy or incomplete observations of the underlying hidden structure. This paper specifically addresses this problem, comparing two alternative methodologies. In the first of these approaches particle Markov chain Monte Carlo (Andrieu et al., 2010) is used to efficiently explore the parameter space, combined with the exchange algorithm (Murray et al., 2006) for avoiding the calculation of the intractable normalising constant (a proof showing that this combination targets the correct distribution in found in a supplementary appendix online). This approach is compared with approximate Bayesian computation (Pritchard et al., 1999). Applications to estimating the parameters of Ising models and exponential random graphs from noisy data are presented. Each algorithm used in the paper targets an approximation to the true posterior due to the use of MCMC to simulate from the latent graphical model, in lieu of being able to do this exactly in general. The supplementary appendix also describes the nature of the resulting approximation.
Resumo:
A manageable, relatively inexpensive model was constructed to predict the loss of nitrogen and phosphorus from a complex catchment to its drainage system. The model used an export coefficient approach, calculating the total nitrogen (N) and total phosphorus (P) load delivered annually to a water body as the sum of the individual loads exported from each nutrient source in its catchment. The export coefficient modelling approach permits scaling up from plot-scale experiments to the catchment scale, allowing application of findings from field experimental studies at a suitable scale for catchment management. The catchment of the River Windrush, a tributary of the River Thames, UK, was selected as the initial study site. The Windrush model predicted nitrogen and phosphorus loading within 2% of observed total nitrogen load and 0.5% of observed total phosphorus load in 1989. The export coefficient modelling approach was then validated by application in a second research basin, the catchment of Slapton Ley, south Devon, which has markedly different catchment hydrology and land use. The Slapton model was calibrated within 2% of observed total nitrogen load and 2.5% of observed total phosphorus load in 1986. Both models proved sensitive to the impact of temporal changes in land use and management on water quality in both catchments, and were therefore used to evaluate the potential impact of proposed pollution control strategies on the nutrient loading delivered to the River Windrush and Slapton Ley
Resumo:
We present a benchmark system for global vegetation models. This system provides a quantitative evaluation of multiple simulated vegetation properties, including primary production; seasonal net ecosystem production; vegetation cover, composition and 5 height; fire regime; and runoff. The benchmarks are derived from remotely sensed gridded datasets and site-based observations. The datasets allow comparisons of annual average conditions and seasonal and inter-annual variability, and they allow the impact of spatial and temporal biases in means and variability to be assessed separately. Specifically designed metrics quantify model performance for each process, 10 and are compared to scores based on the temporal or spatial mean value of the observations and a “random” model produced by bootstrap resampling of the observations. The benchmark system is applied to three models: a simple light-use efficiency and water-balance model (the Simple Diagnostic Biosphere Model: SDBM), and the Lund-Potsdam-Jena (LPJ) and Land Processes and eXchanges (LPX) dynamic global 15 vegetation models (DGVMs). SDBM reproduces observed CO2 seasonal cycles, but its simulation of independent measurements of net primary production (NPP) is too high. The two DGVMs show little difference for most benchmarks (including the interannual variability in the growth rate and seasonal cycle of atmospheric CO2), but LPX represents burnt fraction demonstrably more accurately. Benchmarking also identified 20 several weaknesses common to both DGVMs. The benchmarking system provides a quantitative approach for evaluating how adequately processes are represented in a model, identifying errors and biases, tracking improvements in performance through model development, and discriminating among models. Adoption of such a system would do much to improve confidence in terrestrial model predictions of climate change 25 impacts and feedbacks.
Resumo:
Middle-atmosphere models commonly employ a sponge layer in the upper portion of their domain. It is shown that the relaxational nature of the sponge allows it to couple to the dynamics at lower levels in an artificial manner. In particular, the long-term zonally symmetric response to an imposed extratropical local force or diabatic heating is shown to induce a drag force in the sponge that modifies the response expected from the “downward control” arguments of Haynes et al. [1991]. In the case of an imposed local force the sponge acts to divert a fraction of the mean meridional mass flux upward, which for realistic parameter values is approximately equal to exp(−Δz/H), where Δz is the distance between the forcing region and the sponge layer and H is the density scale height. This sponge-induced upper cell causes temperature changes that, just below the sponge layer, are of comparable magnitude to those just below the forcing region. In the case of an imposed local diabatic heating, the sponge induces a meridional circulation extending through the entire depth of the atmosphere. This circulation causes temperature changes that, just below the sponge layer, are of opposite sign and comparable in magnitude to those at the heating region. In both cases, the sponge-induced temperature changes are essentially independent of the height of the imposed force or diabatic heating, provided the latter is located outside the sponge, but decrease exponentially as one moves down from the sponge. Thus the effect of the sponge can be made arbitrarily small at a given altitude by placing the sponge sufficiently high; e.g., its effect on temperatures two scale heights below is roughly at the 10% level, provided the imposed force or diabatic heating is located outside the sponge. When, however, an imposed force is applied within the sponge layer (a highly plausible situation for parameterized mesospheric gravity-wave drag), its effect is almost entirely nullified by the sponge-layer feedback and its expected impact on temperatures below largely fails to materialize. Simulations using a middle-atmosphere general circulation model are described, which demonstrate that this sponge-layer feedback can be a significant effect in parameter regimes of physical interest. Zonally symmetric (two dimensional) middle-atmosphere models commonly employ a Rayleigh drag throughout the model domain. It is shown that the long-term zonally symmetric response to an imposed extratropical local force or diabatic heating, in this case, is noticeably modified from that expected from downward control, even for a very weak drag coefficient