48 resultados para Accelerated failure time Model. Correlated data. Imputation. Residuals analysis
em CentAUR: Central Archive University of Reading - UK
Resumo:
Accelerated failure time models with a shared random component are described, and are used to evaluate the effect of explanatory factors and different transplant centres on survival times following kidney transplantation. Different combinations of the distribution of the random effects and baseline hazard function are considered and the fit of such models to the transplant data is critically assessed. A mixture model that combines short- and long-term components of a hazard function is then developed, which provides a more flexible model for the hazard function. The model can incorporate different explanatory variables and random effects in each component. The model is straightforward to fit using standard statistical software, and is shown to be a good fit to the transplant data. Copyright (C) 2004 John Wiley Sons, Ltd.
Resumo:
In this paper,the Prony's method is applied to the time-domain waveform data modelling in the presence of noise.The following three problems encountered in this work are studied:(1)determination of the order of waveform;(2)de-termination of numbers of multiple roots;(3)determination of the residues.The methods of solving these problems are given and simulated on the computer.Finally,an output pulse of model PG-10N signal generator and the distorted waveform obtained by transmitting the pulse above mentioned through a piece of coaxial cable are modelled,and satisfactory results are obtained.So the effectiveness of Prony's method in waveform data modelling in the presence of noise is confirmed.
Resumo:
Assimilation of temperature observations into an ocean model near the equator often results in a dynamically unbalanced state with unrealistic overturning circulations. The way in which these circulations arise from systematic errors in the model or its forcing is discussed. A scheme is proposed, based on the theory of state augmentation, which uses the departures of the model state from the observations to update slowly evolving bias fields. Results are summarized from an experiment applying this bias correction scheme to an ocean general circulation model. They show that the method produces more balanced analyses and a better fit to the temperature observations.
Resumo:
A system for continuous data assimilation is presented and discussed. To simulate the dynamical development a channel version of a balanced barotropic model is used and geopotential (height) data are assimilated into the models computations as data become available. In the first experiment the updating is performed every 24th, 12th and 6th hours with a given network. The stations are distributed at random in 4 groups in order to simulate 4 areas with different density of stations. Optimum interpolation is performed for the difference between the forecast and the valid observations. The RMS-error of the analyses is reduced in time, and the error being smaller the more frequent the updating is performed. The updating every 6th hour yields an error in the analysis less than the RMS-error of the observation. In a second experiment the updating is performed by data from a moving satellite with a side-scan capability of about 15°. If the satellite data are analysed at every time step before they are introduced into the system the error of the analysis is reduced to a value below the RMS-error of the observation already after 24 hours and yields as a whole a better result than updating from a fixed network. If the satellite data are introduced without any modification the error of the analysis is reduced much slower and it takes about 4 days to reach a comparable result to the one where the data have been analysed.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Global hydrological models (GHMs) model the land surface hydrologic dynamics of continental-scale river basins. Here we describe one such GHM, the Macro-scale - Probability-Distributed Moisture model.09 (Mac-PDM.09). The model has undergone a number of revisions since it was last applied in the hydrological literature. This paper serves to provide a detailed description of the latest version of the model. The main revisions include the following: (1) the ability for the model to be run for n repetitions, which provides more robust estimates of extreme hydrological behaviour, (2) the ability of the model to use a gridded field of coefficient of variation (CV) of daily rainfall for the stochastic disaggregation of monthly precipitation to daily precipitation, and (3) the model can now be forced with daily input climate data as well as monthly input climate data. We demonstrate the effects that each of these three revisions has on simulated runoff relative to before the revisions were applied. Importantly, we show that when Mac-PDM.09 is forced with monthly input data, it results in a negative runoff bias relative to when daily forcings are applied, for regions of the globe where the day-to-day variability in relative humidity is high. The runoff bias can be up to - 80% for a small selection of catchments but the absolute magnitude of the bias may be small. As such, we recommend future applications of Mac-PDM.09 that use monthly climate forcings acknowledge the bias as a limitation of the model. The performance of Mac-PDM.09 is evaluated by validating simulated runoff against observed runoff for 50 catchments. We also present a sensitivity analysis that demonstrates that simulated runoff is considerably more sensitive to method of PE calculation than to perturbations in soil moisture and field capacity parameters.
Resumo:
Motivated by a matched case-control study to investigate potential risk factors for meningococcal disease amongst adolescents, we consider the analysis of matched case-control studies where disease incidence, and possibly other risk factors, vary with time of year. For the cases, the time of infection may be recorded. For controls, however, the recorded time is simply the time of data collection, which is shortly after the time of infection for the matched case, and so depends on the latter. We show that the effect of risk factors and interactions may be adjusted for the time of year effect in a standard conditional logistic regression analysis without introducing any bias. We also show that, if the time delay between data collection for cases and controls is constant, provided this delay is not very short, estimates of the time of year effect are approximately unbiased. In the case that the length of the delay varies over time, the estimate of the time of year effect is biased. We obtain an approximate expression for the degree of bias in this case. Copyright © 2004 John Wiley & Sons, Ltd.
Resumo:
Bloom-forming and toxin-producing cyanobacteria remain a persistent nuisance across the world. Modelling of cyanobacteria in freshwaters is an important tool for understanding their population dynamics and predicting the location and timing of the bloom events in lakes and rivers. A new deterministic-mathematical model was developed, which simulates the growth and movement of cyanobacterial blooms in river systems. The model focuses on the mathematical description of the bloom formation, vertical migration and lateral transport of colonies within river environments by taking into account the major factors that affect the cyanobacterial bloom formation in rivers including, light, nutrients and temperature. A technique called generalised sensitivity analysis was applied to the model to identify the critical parameter uncertainties in the model and investigates the interaction between the chosen parameters of the model. The result of the analysis suggested that 8 out of 12 parameters were significant in obtaining the observed cyanobacterial behaviour in a simulation. It was found that there was a high degree of correlation between the half-saturation rate constants used in the model.
Resumo:
[1] Decadal hindcast simulations of Arctic Ocean sea ice thickness made by a modern dynamic-thermodynamic sea ice model and forced independently by both the ERA-40 and NCEP/NCAR reanalysis data sets are compared for the first time. Using comprehensive data sets of observations made between 1979 and 2001 of sea ice thickness, draft, extent, and speeds, we find that it is possible to tune model parameters to give satisfactory agreement with observed data, thereby highlighting the skill of modern sea ice models, though the parameter values chosen differ according to the model forcing used. We find a consistent decreasing trend in Arctic Ocean sea ice thickness since 1979, and a steady decline in the Eastern Arctic Ocean over the full 40-year period of comparison that accelerated after 1980, but the predictions of Western Arctic Ocean sea ice thickness between 1962 and 1980 differ substantially. The origins of differing thickness trends and variability were isolated not to parameter differences but to differences in the forcing fields applied, and in how they are applied. It is argued that uncertainty, differences and errors in sea ice model forcing sets complicate the use of models to determine the exact causes of the recently reported decline in Arctic sea ice thickness, but help in the determination of robust features if the models are tuned appropriately against observations.
Resumo:
Model-based estimates of future uncertainty are generally based on the in-sample fit of the model, as when Box-Jenkins prediction intervals are calculated. However, this approach will generate biased uncertainty estimates in real time when there are data revisions. A simple remedy is suggested, and used to generate more accurate prediction intervals for 25 macroeconomic variables, in line with the theory. A simulation study based on an empirically-estimated model of data revisions for US output growth is used to investigate small-sample properties.
Resumo:
We explore the potential for making statistical decadal predictions of sea surface temperatures (SSTs) in a perfect model analysis, with a focus on the Atlantic basin. Various statistical methods (Lagged correlations, Linear Inverse Modelling and Constructed Analogue) are found to have significant skill in predicting the internal variability of Atlantic SSTs for up to a decade ahead in control integrations of two different global climate models (GCMs), namely HadCM3 and HadGEM1. Statistical methods which consider non-local information tend to perform best, but which is the most successful statistical method depends on the region considered, GCM data used and prediction lead time. However, the Constructed Analogue method tends to have the highest skill at longer lead times. Importantly, the regions of greatest prediction skill can be very different to regions identified as potentially predictable from variance explained arguments. This finding suggests that significant local decadal variability is not necessarily a prerequisite for skillful decadal predictions, and that the statistical methods are capturing some of the dynamics of low-frequency SST evolution. In particular, using data from HadGEM1, significant skill at lead times of 6–10 years is found in the tropical North Atlantic, a region with relatively little decadal variability compared to interannual variability. This skill appears to come from reconstructing the SSTs in the far north Atlantic, suggesting that the more northern latitudes are optimal for SST observations to improve predictions. We additionally explore whether adding sub-surface temperature data improves these decadal statistical predictions, and find that, again, it depends on the region, prediction lead time and GCM data used. Overall, we argue that the estimated prediction skill motivates the further development of statistical decadal predictions of SSTs as a benchmark for current and future GCM-based decadal climate predictions.
Resumo:
Decision theory is the study of models of judgement involved in, and leading to, deliberate and (usually) rational choice. In real estate investment there are normative models for the allocation of assets. These asset allocation models suggest an optimum allocation between the respective asset classes based on the investors’ judgements of performance and risk. Real estate is selected, as other assets, on the basis of some criteria, e.g. commonly its marginal contribution to the production of a mean variance efficient multi asset portfolio, subject to the investor’s objectives and capital rationing constraints. However, decisions are made relative to current expectations and current business constraints. Whilst a decision maker may believe in the required optimum exposure levels as dictated by an asset allocation model, the final decision may/will be influenced by factors outside the parameters of the mathematical model. This paper discusses investors' perceptions and attitudes toward real estate and highlights the important difference between theoretical exposure levels and pragmatic business considerations. It develops a model to identify “soft” parameters in decision making which will influence the optimal allocation for that asset class. This “soft” information may relate to behavioural issues such as the tendency to mirror competitors; a desire to meet weight of money objectives; a desire to retain the status quo and many other non-financial considerations. The paper aims to establish the place of property in multi asset portfolios in the UK and examine the asset allocation process in practice, with a view to understanding the decision making process and to look at investors’ perceptions based on an historic analysis of market expectation; a comparison with historic data and an analysis of actual performance.
Resumo:
This paper derives exact discrete time representations for data generated by a continuous time autoregressive moving average (ARMA) system with mixed stock and flow data. The representations for systems comprised entirely of stocks or of flows are also given. In each case the discrete time representations are shown to be of ARMA form, the orders depending on those of the continuous time system. Three examples and applications are also provided, two of which concern the stationary ARMA(2, 1) model with stock variables (with applications to sunspot data and a short-term interest rate) and one concerning the nonstationary ARMA(2, 1) model with a flow variable (with an application to U.S. nondurable consumers’ expenditure). In all three examples the presence of an MA(1) component in the continuous time system has a dramatic impact on eradicating unaccounted-for serial correlation that is present in the discrete time version of the ARMA(2, 0) specification, even though the form of the discrete time model is ARMA(2, 1) for both models.
Resumo:
The collection of wind speed time series by means of digital data loggers occurs in many domains, including civil engineering, environmental sciences and wind turbine technology. Since averaging intervals are often significantly larger than typical system time scales, the information lost has to be recovered in order to reconstruct the true dynamics of the system. In the present work we present a simple algorithm capable of generating a real-time wind speed time series from data logger records containing the average, maximum, and minimum values of the wind speed in a fixed interval, as well as the standard deviation. The signal is generated from a generalized random Fourier series. The spectrum can be matched to any desired theoretical or measured frequency distribution. Extreme values are specified through a postprocessing step based on the concept of constrained simulation. Applications of the algorithm to 10-min wind speed records logged at a test site at 60 m height above the ground show that the recorded 10-min values can be reproduced by the simulated time series to a high degree of accuracy.