972 resultados para Models, statistical
Resumo:
Observations in daily practice are sometimes registered as positive values larger then a given threshold α. The sample space is in this case the interval (α,+∞), α & 0, which can be structured as a real Euclidean space in different ways. This fact opens the door to alternative statistical models depending not only on the assumed distribution function, but also on the metric which is considered as appropriate, i.e. the way differences are measured, and thus variability
Resumo:
This paper is a first draft of the principle of statistical modelling on coordinates. Several causes —which would be long to detail—have led to this situation close to the deadline for submitting papers to CODAWORK’03. The main of them is the fast development of the approach along thelast months, which let appear previous drafts as obsolete. The present paper contains the essential parts of the state of the art of this approach from my point of view. I would like to acknowledge many clarifying discussions with the group of people working in this field in Girona, Barcelona, Carrick Castle, Firenze, Berlin, G¨ottingen, and Freiberg. They have given a lot of suggestions and ideas. Nevertheless, there might be still errors or unclear aspects which are exclusively my fault. I hope this contribution serves as a basis for further discussions and new developments
Resumo:
Quantitative or algorithmic trading is the automatization of investments decisions obeying a fixed or dynamic sets of rules to determine trading orders. It has increasingly made its way up to 70% of the trading volume of one of the biggest financial markets such as the New York Stock Exchange (NYSE). However, there is not a signi cant amount of academic literature devoted to it due to the private nature of investment banks and hedge funds. This projects aims to review the literature and discuss the models available in a subject that publications are scarce and infrequently. We review the basic and fundamental mathematical concepts needed for modeling financial markets such as: stochastic processes, stochastic integration and basic models for prices and spreads dynamics necessary for building quantitative strategies. We also contrast these models with real market data with minutely sampling frequency from the Dow Jones Industrial Average (DJIA). Quantitative strategies try to exploit two types of behavior: trend following or mean reversion. The former is grouped in the so-called technical models and the later in the so-called pairs trading. Technical models have been discarded by financial theoreticians but we show that they can be properly cast into a well defined scientific predictor if the signal generated by them pass the test of being a Markov time. That is, we can tell if the signal has occurred or not by examining the information up to the current time; or more technically, if the event is F_t-measurable. On the other hand the concept of pairs trading or market neutral strategy is fairly simple. However it can be cast in a variety of mathematical models ranging from a method based on a simple euclidean distance, in a co-integration framework or involving stochastic differential equations such as the well-known Ornstein-Uhlenbeck mean reversal ODE and its variations. A model for forecasting any economic or financial magnitude could be properly defined with scientific rigor but it could also lack of any economical value and be considered useless from a practical point of view. This is why this project could not be complete without a backtesting of the mentioned strategies. Conducting a useful and realistic backtesting is by no means a trivial exercise since the \laws" that govern financial markets are constantly evolving in time. This is the reason because we make emphasis in the calibration process of the strategies' parameters to adapt the given market conditions. We find out that the parameters from technical models are more volatile than their counterpart form market neutral strategies and calibration must be done in a high-frequency sampling manner to constantly track the currently market situation. As a whole, the goal of this project is to provide an overview of a quantitative approach to investment reviewing basic strategies and illustrating them by means of a back-testing with real financial market data. The sources of the data used in this project are Bloomberg for intraday time series and Yahoo! for daily prices. All numeric computations and graphics used and shown in this project were implemented in MATLAB^R scratch from scratch as a part of this thesis. No other mathematical or statistical software was used.
Resumo:
This paper investigates the role of learning by private agents and the central bank (two-sided learning) in a New Keynesian framework in which both sides of the economy have asymmetric and imperfect knowledge about the true data generating process. We assume that all agents employ the data that they observe (which may be distinct for different sets of agents) to form beliefs about unknown aspects of the true model of the economy, use their beliefs to decide on actions, and revise these beliefs through a statistical learning algorithm as new information becomes available. We study the short-run dynamics of our model and derive its policy recommendations, particularly with respect to central bank communications. We demonstrate that two-sided learning can generate substantial increases in volatility and persistence, and alter the behavior of the variables in the model in a signifficant way. Our simulations do not converge to a symmetric rational expectations equilibrium and we highlight one source that invalidates the convergence results of Marcet and Sargent (1989). Finally, we identify a novel aspect of central bank communication in models of learning: communication can be harmful if the central bank's model is substantially mis-specified
Resumo:
The paper discusses maintenance challenges of organisations with a huge number of devices and proposes the use of probabilistic models to assist monitoring and maintenance planning. The proposal assumes connectivity of instruments to report relevant features for monitoring. Also, the existence of enough historical registers with diagnosed breakdowns is required to make probabilistic models reliable and useful for predictive maintenance strategies based on them. Regular Markov models based on estimated failure and repair rates are proposed to calculate the availability of the instruments and Dynamic Bayesian Networks are proposed to model cause-effect relationships to trigger predictive maintenance services based on the influence between observed features and previously documented diagnostics
Resumo:
Background: Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both.Results: Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors.Conclusion: We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Resumo:
In this work we describe the usage of bilinear statistical models as a means of factoring the shape variability into two components attributed to inter-subject variation and to the intrinsic dynamics of the human heart. We show that it is feasible to reconstruct the shape of the heart at discrete points in the cardiac cycle. Provided we are given a small number of shape instances representing the same heart atdifferent points in the same cycle, we can use the bilinearmodel to establish this. Using a temporal and a spatial alignment step in the preprocessing of the shapes, around half of the reconstruction errors were on the order of the axial image resolution of 2 mm, and over 90% was within 3.5 mm. From this, weconclude that the dynamics were indeed separated from theinter-subject variability in our dataset.
Resumo:
An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001.We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling.
Resumo:
OBJECTIVE: To better understand the structure of the Patient Assessment of Chronic Illness Care (PACIC) instrument. More specifically to test all published validation models, using one single data set and appropriate statistical tools. DESIGN: Validation study using data from cross-sectional survey. PARTICIPANTS: A population-based sample of non-institutionalized adults with diabetes residing in Switzerland (canton of Vaud). MAIN OUTCOME MEASURE: French version of the 20-items PACIC instrument (5-point response scale). We conducted validation analyses using confirmatory factor analysis (CFA). The original five-dimension model and other published models were tested with three types of CFA: based on (i) a Pearson estimator of variance-covariance matrix, (ii) a polychoric correlation matrix and (iii) a likelihood estimation with a multinomial distribution for the manifest variables. All models were assessed using loadings and goodness-of-fit measures. RESULTS: The analytical sample included 406 patients. Mean age was 64.4 years and 59% were men. Median of item responses varied between 1 and 4 (range 1-5), and range of missing values was between 5.7 and 12.3%. Strong floor and ceiling effects were present. Even though loadings of the tested models were relatively high, the only model showing acceptable fit was the 11-item single-dimension model. PACIC was associated with the expected variables of the field. CONCLUSIONS: Our results showed that the model considering 11 items in a single dimension exhibited the best fit for our data. A single score, in complement to the consideration of single-item results, might be used instead of the five dimensions usually described.
Resumo:
This paper investigates the role of learning by private agents and the central bank(two-sided learning) in a New Keynesian framework in which both sides of the economyhave asymmetric and imperfect knowledge about the true data generating process. Weassume that all agents employ the data that they observe (which may be distinct fordifferent sets of agents) to form beliefs about unknown aspects of the true model ofthe economy, use their beliefs to decide on actions, and revise these beliefs througha statistical learning algorithm as new information becomes available. We study theshort-run dynamics of our model and derive its policy recommendations, particularlywith respect to central bank communications. We demonstrate that two-sided learningcan generate substantial increases in volatility and persistence, and alter the behaviorof the variables in the model in a significant way. Our simulations do not convergeto a symmetric rational expectations equilibrium and we highlight one source thatinvalidates the convergence results of Marcet and Sargent (1989). Finally, we identifya novel aspect of central bank communication in models of learning: communicationcan be harmful if the central bank's model is substantially mis-specified.
Resumo:
The well-known lack of power of unit root tests has often been attributed to the shortlength of macroeconomic variables and also to DGP s that depart from the I(1)-I(0)alternatives. This paper shows that by using long spans of annual real GNP and GNPper capita (133 years) high power can be achieved, leading to the rejection of both theunit root and the trend-stationary hypothesis. This suggests that possibly neither modelprovides a good characterization of these data. Next, more flexible representations areconsidered, namely, processes containing structural breaks (SB) and fractional ordersof integration (FI). Economic justification for the presence of these features in GNP isprovided. It is shown that the latter models (FI and SB) are in general preferred to theARIMA (I(1) or I(0)) ones. As a novelty in this literature, new techniques are appliedto discriminate between FI and SB models. It turns out that the FI specification ispreferred, implying that GNP and GNP per capita are non-stationary, highly persistentbut mean-reverting series. Finally, it is shown that the results are robust when breaksin the deterministic component are allowed for in the FI model. Some macroeconomicimplications of these findings are also discussed.
Resumo:
Species' geographic ranges are usually considered as basic units in macroecology and biogeography, yet it is still difficult to measure them accurately for many reasons. About 20 years ago, researchers started using local data on species' occurrences to estimate broad scale ranges, thereby establishing the niche modeling approach. However, there are still many problems in model evaluation and application, and one of the solutions is to find a consensus solution among models derived from different mathematical and statistical models for niche modeling, climatic projections and variable combination, all of which are sources of uncertainty during niche modeling. In this paper, we discuss this approach of ensemble forecasting and propose that it can be divided into three phases with increasing levels of complexity. Phase I is the simple combination of maps to achieve a consensual and hopefully conservative solution. In Phase II, differences among the maps used are described by multivariate analyses, and Phase III consists of the quantitative evaluation of the relative magnitude of uncertainties from different sources and their mapping. To illustrate these developments, we analyzed the occurrence data of the tiger moth, Utetheisa ornatrix (Lepidoptera, Arctiidae), a Neotropical moth species, and modeled its geographic range in current and future climates.
Resumo:
We study the statistical properties of three estimation methods for a model of learning that is often fitted to experimental data: quadratic deviation measures without unobserved heterogeneity, and maximum likelihood withand without unobserved heterogeneity. After discussing identification issues, we show that the estimators are consistent and provide their asymptotic distribution. Using Monte Carlo simulations, we show that ignoring unobserved heterogeneity can lead to seriously biased estimations in samples which have the typical length of actual experiments. Better small sample properties areobtained if unobserved heterogeneity is introduced. That is, rather than estimating the parameters for each individual, the individual parameters are considered random variables, and the distribution of those random variables is estimated.
Resumo:
In recent years there has been an explosive growth in the development of adaptive and data driven methods. One of the efficient and data-driven approaches is based on statistical learning theory (Vapnik 1998). The theory is based on Structural Risk Minimisation (SRM) principle and has a solid statistical background. When applying SRM we are trying not only to reduce training error ? to fit the available data with a model, but also to reduce the complexity of the model and to reduce generalisation error. Many nonlinear learning procedures recently developed in neural networks and statistics can be understood and interpreted in terms of the structural risk minimisation inductive principle. A recent methodology based on SRM is called Support Vector Machines (SVM). At present SLT is still under intensive development and SVM find new areas of application (www.kernel-machines.org). SVM develop robust and non linear data models with excellent generalisation abilities that is very important both for monitoring and forecasting. SVM are extremely good when input space is high dimensional and training data set i not big enough to develop corresponding nonlinear model. Moreover, SVM use only support vectors to derive decision boundaries. It opens a way to sampling optimization, estimation of noise in data, quantification of data redundancy etc. Presentation of SVM for spatially distributed data is given in (Kanevski and Maignan 2004).
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.