794 resultados para Non Parametric Methodology
Resumo:
We consider the general problem of constructing nonparametric Bayesian models on infinite-dimensional random objects, such as functions, infinite graphs or infinite permutations. The problem has generated much interest in machine learning, where it is treated heuristically, but has not been studied in full generality in non-parametric Bayesian statistics, which tends to focus on models over probability distributions. Our approach applies a standard tool of stochastic process theory, the construction of stochastic processes from their finite-dimensional marginal distributions. The main contribution of the paper is a generalization of the classic Kolmogorov extension theorem to conditional probabilities. This extension allows a rigorous construction of nonparametric Bayesian models from systems of finite-dimensional, parametric Bayes equations. Using this approach, we show (i) how existence of a conjugate posterior for the nonparametric model can be guaranteed by choosing conjugate finite-dimensional models in the construction, (ii) how the mapping to the posterior parameters of the nonparametric model can be explicitly determined, and (iii) that the construction of conjugate models in essence requires the finite-dimensional models to be in the exponential family. As an application of our constructive framework, we derive a model on infinite permutations, the nonparametric Bayesian analogue of a model recently proposed for the analysis of rank data.
Resumo:
State-space inference and learning with Gaussian processes (GPs) is an unsolved problem. We propose a new, general methodology for inference and learning in nonlinear state-space models that are described probabilistically by non-parametric GP models. We apply the expectation maximization algorithm to iterate between inference in the latent state-space and learning the parameters of the underlying GP dynamics model. Copyright 2010 by the authors.
Resumo:
This paper is concerned with the development of efficient algorithms for propagating parametric uncertainty within the context of the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) approach to the analysis of complex vibro-acoustic systems. This approach models the system as a combination of SEA subsystems and FE components; it is assumed that the FE components have fully deterministic properties, while the SEA subsystems have a high degree of randomness. The method has been recently generalised by allowing the FE components to possess parametric uncertainty, leading to two ensembles of uncertainty: a non-parametric one (SEA subsystems) and a parametric one (FE components). The SEA subsystems ensemble is dealt with analytically, while the effect of the additional FE components ensemble can be dealt with by Monte Carlo Simulations. However, this approach can be computationally intensive when applied to complex engineering systems having many uncertain parameters. Two different strategies are proposed: (i) the combination of the hybrid FE/SEA method with the First Order Reliability Method which allows the probability of the non-parametric ensemble average of a response variable exceeding a barrier to be calculated and (ii) the combination of the hybrid FE/SEA method with Laplace's method which allows the evaluation of the probability of a response variable exceeding a limit value. The proposed approaches are illustrated using two built-up plate systems with uncertain properties and the results are validated against direct integration, Monte Carlo simulations of the FE and of the hybrid FE/SEA models. © 2013 Elsevier Ltd.
Resumo:
BACKGROUND: In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data. RESULTS: In this paper, we propose a robust testing method for identifying genes whose expression time profiles depend on a factor. Furthermore, we propose a multiple testing procedure to adjust for multiplicity. CONCLUSIONS: Through an extensive simulation study, we will illustrate the performance of our method. Finally, we will report the results from applying our method to a case study and discussing potential extensions.
Resumo:
Dissertação de Mestrado apresentado ao Instituto de Contabilidade e Administração do Porto para a obtenção do grau de Mestre em Contabilidade e Finanças, sob orientação do Doutor Carlos Quelhas Martins
Resumo:
Contexte : L’activité physique est une composante centrale du développement physique, psychologique et social de l'enfant, particulièrement au sein d'une société où l'impact de la sédentarité et de l'obésité devient de plus en plus important. Cependant, les trajectoires d’activité physique hors école et leurs déterminants sont peu étudiés et les connaissances sur ce sujet sont limitées. Il est également notoire que les types d’activité physique sont rarement pris en considération. Objectif : Ce mémoire a pour but (a) de déterminer les trajectoires de pratique d’activité physique au cours du développement des enfants (b) de valider l’association entre l’activité physique supervisée et l’activité non supervisée et (c) d’identifier les déterminants au niveau du quartier, de la famille et des caractéristiques individuelles associés aux trajectoires de pratique d’activité physique supervisée et non supervisée. Participants : 1 814 enfants (51% garçons) nés en 1998 ayant participé à l’Étude Longitudinale du Développement des Enfants du Québec (ELDEQ). Les données récoltées proviennent uniquement de leur mère. Mesures : La fréquence de l’activité physique supervisée et non supervisée a été mesurée à quatre reprises alors que les enfants étaient âgés entre 5 et 8 ans. Les déterminants ainsi que les variables contrôles ont été mesurés alors que les enfants avaient 4 ou 5 ans. Résultats : Trois trajectoires d’activité physique supervisée et non supervisée ont été identifiées. Les résultats suggèrent que les trajectoires d’activité physique supervisée, représentant respectivement 10%, 55.3% et 34.7% de la population, sont relativement stables même si elles subissent une légère augmentation avec le temps. Des trois trajectoires d’activité physique non supervisée représentant respectivement 14.1%, 28.1% et 57.8% de la population, une augmente considérablement avec le temps alors iv que les deux autres sont stables. Ces deux séries de trajectoires ne sont pas associées significativement entre elles. L’éducation de la mère, l’entraide dans le quartier de résidence ainsi que la prosocialité des enfants déterminent les deux types d’activité physique. La suffisance de revenu et la pratique sportive de la mère sont associées seulement aux trajectoires d’activité physique supervisée. La famille intacte discrimine l’appartenance aux trajectoires d’activité physique non supervisée. Conclusion : Premièrement, la pratique de l’activité physique est relativement stable entre 5 et 8 ans. Deuxièmement, l’activité physique supervisée ainsi que l’activité physique non supervisée sont deux pratiques qui se développent différemment et qui possèdent leurs propres déterminants. Troisièmement, une approche écologique permet de mieux saisir la complexité de ces deux processus.
Productivity growth in electric energy retail in Colombia. A bootstrapped malmquist indices approach
Resumo:
This paper offers a productivity growth estimate for electric energy commercialization firms in Colombia, using a non-parametric Malmquist bootstrap methodology. The estimation and methodology serve two main purposes. First, in Colombia Commercialization firms are subject to a price-cap regulation scheme, a non-common arrangement in the international experience for this part of the industry. Therefore the paper’s result suggest an estimate of the productivity factor to be used by the regulator, not only in Colombia but in other countries where commercialization is a growing part of the industry (renewable energy, for instance). Second, because of poor data collection from regulators and firms themselves, regulation based on a single estimation of productivity seems inappropriate and error-prone. The nonparametric Malmquist bootstrap estimation allows an assessment of the result in contrast to a single one estimation. This would open an opportunity for the regulator to adopt a narrower and more accurate productivity estimation or override an implausible result and impose a productivity factor in the price-cap to foster the development of the industry.
Resumo:
The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology.
Resumo:
1. Closed Ecological Systems (CES) are small manmade ecosystems which do not have any material exchange with the surrounding environment. Recent ecological and technological advances enable successful establishment and maintenance of CES, making them a suitable tool for detecting and measuring subtle feedbacks and mechanisms. 2. As a part of an analogue (physical) C cycle modelling experiment, we developed a non-intrusive methodology to control the internal environment and to monitor atmospheric CO2 concentration inside 16 replicated CES. Whilst maintaining an air-tight seal of all CES, this approach allowed for access to the CO2 measuring equipment for periodic re-calibration and repairs. 3. To ensure reliable cross-comparison of CO2 observations between individual CES units and to minimise the cost of the system, only one CO2 sampling unit was used. An ADC BioScientific OP-2 (open-path) analyser mounted on a swinging arm was passing over a set of 16 measuring cells. Each cell was connected to an individual CES with air continuously circulating between them. 4. Using this setup, we were able to continuously measure several environmental variables and CO2 concentration within each closed system, allowing us to study minute effects of changing temperature on C fluxes within each CES. The CES and the measuring cells showed minimal air leakage during an experimental run lasting, on average, 3 months. The CO2 analyser assembly performed reliably for over 2 years, however an early iteration of the present design proved to be sensitive to positioning errors. 5. We indicate how the methodology can be further improved and suggest possible avenues where future CES based research could be applied.
Resumo:
The use of Bayesian inference in the inference of time-frequency representations has, thus far, been limited to offline analysis of signals, using a smoothing spline based model of the time-frequency plane. In this paper we introduce a new framework that allows the routine use of Bayesian inference for online estimation of the time-varying spectral density of a locally stationary Gaussian process. The core of our approach is the use of a likelihood inspired by a local Whittle approximation. This choice, along with the use of a recursive algorithm for non-parametric estimation of the local spectral density, permits the use of a particle filter for estimating the time-varying spectral density online. We provide demonstrations of the algorithm through tracking chirps and the analysis of musical data.
Resumo:
This paper models the transmission of shocks between the US, Japanese and Australian equity markets. Tests for the existence of linear and non-linear transmission of volatility across the markets are performed using parametric and non-parametric techniques. In particular the size and sign of return innovations are important factors in determining the degree of spillovers in volatility. It is found that a multivariate asymmetric GARCH formulation can explain almost all of the non-linear causality between markets. These results have important implications for the construction of models and forecasts of international equity returns.
Resumo:
In this paper, we study the role of the volatility risk premium for the forecasting performance of implied volatility. We introduce a non-parametric and parsimonious approach to adjust the model-free implied volatility for the volatility risk premium and implement this methodology using more than 20 years of options and futures data on three major energy markets. Using regression models and statistical loss functions, we find compelling evidence to suggest that the risk premium adjusted implied volatility significantly outperforms other models, including its unadjusted counterpart. Our main finding holds for different choices of volatility estimators and competing time-series models, underlying the robustness of our results.
Resumo:
This work is an assessment of frequency of extreme values (EVs) of daily rainfall in the city of Sao Paulo. Brazil, over the period 1933-2005, based on the peaks-over-threshold (POT) and Generalized Pareto Distribution (GPD) approach. Usually. a GPD model is fitted to a sample of POT Values Selected With a constant threshold. However. in this work we use time-dependent thresholds, composed of relatively large p quantities (for example p of 0.97) of daily rainfall amounts computed from all available data. Samples of POT values were extracted with several Values of p. Four different GPD models (GPD-1, GPD-2, GPD-3. and GDP-4) were fitted to each one of these samples by the maximum likelihood (ML) method. The shape parameter was assumed constant for the four models, but time-varying covariates were incorporated into scale parameter of GPD-2. GPD-3, and GPD-4, describing annual cycle in GPD-2. linear trend in GPD-3, and both annual cycle and linear trend in GPD-4. The GPD-1 with constant scale and shape parameters is the simplest model. For identification of the best model among the four models WC used rescaled Akaike Information Criterion (AIC) with second-order bias correction. This criterion isolates GPD-3 as the best model, i.e. the one with positive linear trend in the scale parameter. The slope of this trend is significant compared to the null hypothesis of no trend, for about 98% confidence level. The non-parametric Mann-Kendall test also showed presence of positive trend in the annual frequency of excess over high thresholds. with p-value being virtually zero. Therefore. there is strong evidence that high quantiles of daily rainfall in the city of Sao Paulo have been increasing in magnitude and frequency over time. For example. 0.99 quantiles of daily rainfall amount have increased by about 40 mm between 1933 and 2005. Copyright (C) 2008 Royal Meteorological Society
Resumo:
Este estudo tem por objetivo estimar o impacto do fluxo de emissões corporativas brasileiras em dólar sobre o cupom cambial. Podemos entender o cupom cambial, sob a ótica da Paridade Coberta da Taxa de Juros, como resultado de dois componentes: Taxa de juros externa (Libor) e Risco País. Desvios adicionais sobre a Paridade podem ser explicados por diversos fatores como custos de transação, liquidez, fluxos em transações de arbitragem de empresas financeiras ou não-financeiras, etc. Neste contexto, os fluxos de arbitragem ocorrem quando é possível para uma empresa brasileira captar recursos no mercado externo e internar estes recursos no Brasil encontrando uma taxa final de captação em reais inferior à de sua captação local (via debêntures, notas financeiras, empréstimos, CDB’s, etc) incluindo todos os custos. Quando há condições necessárias a este tipo de operação, o efeito pode ser visto no mercado de FRA de cupom cambial da BM&F, através de um fluxo anormal de doadores de juros. Testes não-paramétricos (Wilcoxon-Mann-Whitney, Kruskal-Wallis e Van der Waerden) e a metodologia de estudo de eventos detectaram comportamento anormal no mercado de FRA de cupom cambial frente aos eventos aqui considerados como emissões de eurobonds de empresas brasileiras, excluindo o efeito do risco soberano, medido pelo CDS Brasil e considerando nulo o risco de conversibilidade no período, após análise do diferencial entre NDF onshore e offshore. Para estimação do impacto das emissões sobre o FRA de cupom cambial foram utilizados dois modelos, AR-GARCH e OLS com correção de Newey-West, e os resultados mostraram que as emissões causam fechamento de 2 a 5 bps no FRA de cupom cambial, dependendo do vencimento da emissão e do modelo avaliado. Sob a mesma metodologia, concluímos de cada USD 100 milhões de emissões são responsáveis por, em média, 1 bps de fechamento no FRA de cupom cambial, tudo mais constante.
Resumo:
The problems of combinatory optimization have involved a large number of researchers in search of approximative solutions for them, since it is generally accepted that they are unsolvable in polynomial time. Initially, these solutions were focused on heuristics. Currently, metaheuristics are used more for this task, especially those based on evolutionary algorithms. The two main contributions of this work are: the creation of what is called an -Operon- heuristic, for the construction of the information chains necessary for the implementation of transgenetic (evolutionary) algorithms, mainly using statistical methodology - the Cluster Analysis and the Principal Component Analysis; and the utilization of statistical analyses that are adequate for the evaluation of the performance of the algorithms that are developed to solve these problems. The aim of the Operon is to construct good quality dynamic information chains to promote an -intelligent- search in the space of solutions. The Traveling Salesman Problem (TSP) is intended for applications based on a transgenetic algorithmic known as ProtoG. A strategy is also proposed for the renovation of part of the chromosome population indicated by adopting a minimum limit in the coefficient of variation of the adequation function of the individuals, with calculations based on the population. Statistical methodology is used for the evaluation of the performance of four algorithms, as follows: the proposed ProtoG, two memetic algorithms and a Simulated Annealing algorithm. Three performance analyses of these algorithms are proposed. The first is accomplished through the Logistic Regression, based on the probability of finding an optimal solution for a TSP instance by the algorithm being tested. The second is accomplished through Survival Analysis, based on a probability of the time observed for its execution until an optimal solution is achieved. The third is accomplished by means of a non-parametric Analysis of Variance, considering the Percent Error of the Solution (PES) obtained by the percentage in which the solution found exceeds the best solution available in the literature. Six experiments have been conducted applied to sixty-one instances of Euclidean TSP with sizes of up to 1,655 cities. The first two experiments deal with the adjustments of four parameters used in the ProtoG algorithm in an attempt to improve its performance. The last four have been undertaken to evaluate the performance of the ProtoG in comparison to the three algorithms adopted. For these sixty-one instances, it has been concluded on the grounds of statistical tests that there is evidence that the ProtoG performs better than these three algorithms in fifty instances. In addition, for the thirty-six instances considered in the last three trials in which the performance of the algorithms was evaluated through PES, it was observed that the PES average obtained with the ProtoG was less than 1% in almost half of these instances, having reached the greatest average for one instance of 1,173 cities, with an PES average equal to 3.52%. Therefore, the ProtoG can be considered a competitive algorithm for solving the TSP, since it is not rare in the literature find PESs averages greater than 10% to be reported for instances of this size.