132 resultados para Akaike
Resumo:
This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-and-bound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before.
Resumo:
We present a novel method for the light-curve characterization of Pan-STARRS1 Medium Deep Survey (PS1 MDS) extragalactic sources into stochastic variables (SVs) and burst-like (BL) transients, using multi-band image-differencing time-series data. We select detections in difference images associated with galaxy hosts using a star/galaxy catalog extracted from the deep PS1 MDS stacked images, and adopt a maximum a posteriori formulation to model their difference-flux time-series in four Pan-STARRS1 photometric bands gP1, rP1, iP1, and zP1. We use three deterministic light-curve models to fit BL transients; a Gaussian, a Gamma distribution, and an analytic supernova (SN) model, and one stochastic light-curve model, the Ornstein-Uhlenbeck process, in order to fit variability that is characteristic of active galactic nuclei (AGNs). We assess the quality of fit of the models band-wise and source-wise, using their estimated leave-out-one cross-validation likelihoods and corrected Akaike information criteria. We then apply a K-means clustering algorithm on these statistics, to determine the source classification in each band. The final source classification is derived as a combination of the individual filter classifications, resulting in two measures of classification quality, from the averages across the photometric filters of (1) the classifications determined from the closest K-means cluster centers, and (2) the square distances from the clustering centers in the K-means clustering spaces. For a verification set of AGNs and SNe, we show that SV and BL occupy distinct regions in the plane constituted by these measures. We use our clustering method to characterize 4361 extragalactic image difference detected sources, in the first 2.5 yr of the PS1 MDS, into 1529 BL, and 2262 SV, with a purity of 95.00% for AGNs, and 90.97% for SN based on our verification sets. We combine our light-curve classifications with their nuclear or off-nuclear host galaxy offsets, to define a robust photometric sample of 1233 AGNs and 812 SNe. With these two samples, we characterize their variability and host galaxy properties, and identify simple photometric priors that would enable their real-time identification in future wide-field synoptic surveys.
Resumo:
Active radio-frequency identification systems that are used for the localisation and tracking of people will be subject to the same body centric processes that impact other forms of wearable communications. To achieve the goal of creating body worn tags with multiyear life spans, it will be necessary to gain an understanding of the channel conditions which are likely to impact the reader-tag interrogation process. In this paper we present the preliminary results of an indoor channel measurement campaign conducted at 868 MHz aimed at understanding and modelling signal characteristics for a wrist-worn tag. Using a model selection process based on the Akaike Information Criterion, the lognormal distribution was selected most often to describe the received signal amplitude. Parameter estimates are provided so that the channels investigated in this study may be readily simulated.
Resumo:
Dissertação apresentada ao Instituto Politécnico do Porto para obtenção do Grau de Mestre em Logística
Resumo:
Forecasting future sales is one of the most important issues that is beyond all strategic and planning decisions in effective operations of retail businesses. For profitable retail businesses, accurate demand forecasting is crucial in organizing and planning production, purchasing, transportation and labor force. Retail sales series belong to a special type of time series that typically contain trend and seasonal patterns, presenting challenges in developing effective forecasting models. This work compares the forecasting performance of state space models and ARIMA models. The forecasting performance is demonstrated through a case study of retail sales of five different categories of women footwear: Boots, Booties, Flats, Sandals and Shoes. On both methodologies the model with the minimum value of Akaike's Information Criteria for the in-sample period was selected from all admissible models for further evaluation in the out-of-sample. Both one-step and multiple-step forecasts were produced. The results show that when an automatic algorithm the overall out-of-sample forecasting performance of state space and ARIMA models evaluated via RMSE, MAE and MAPE is quite similar on both one-step and multi-step forecasts. We also conclude that state space and ARIMA produce coverage probabilities that are close to the nominal rates for both one-step and multi-step forecasts.
Resumo:
Avec les avancements de la technologie de l'information, les données temporelles économiques et financières sont de plus en plus disponibles. Par contre, si les techniques standard de l'analyse des séries temporelles sont utilisées, une grande quantité d'information est accompagnée du problème de dimensionnalité. Puisque la majorité des séries d'intérêt sont hautement corrélées, leur dimension peut être réduite en utilisant l'analyse factorielle. Cette technique est de plus en plus populaire en sciences économiques depuis les années 90. Étant donnée la disponibilité des données et des avancements computationnels, plusieurs nouvelles questions se posent. Quels sont les effets et la transmission des chocs structurels dans un environnement riche en données? Est-ce que l'information contenue dans un grand ensemble d'indicateurs économiques peut aider à mieux identifier les chocs de politique monétaire, à l'égard des problèmes rencontrés dans les applications utilisant des modèles standards? Peut-on identifier les chocs financiers et mesurer leurs effets sur l'économie réelle? Peut-on améliorer la méthode factorielle existante et y incorporer une autre technique de réduction de dimension comme l'analyse VARMA? Est-ce que cela produit de meilleures prévisions des grands agrégats macroéconomiques et aide au niveau de l'analyse par fonctions de réponse impulsionnelles? Finalement, est-ce qu'on peut appliquer l'analyse factorielle au niveau des paramètres aléatoires? Par exemple, est-ce qu'il existe seulement un petit nombre de sources de l'instabilité temporelle des coefficients dans les modèles macroéconomiques empiriques? Ma thèse, en utilisant l'analyse factorielle structurelle et la modélisation VARMA, répond à ces questions à travers cinq articles. Les deux premiers chapitres étudient les effets des chocs monétaire et financier dans un environnement riche en données. Le troisième article propose une nouvelle méthode en combinant les modèles à facteurs et VARMA. Cette approche est appliquée dans le quatrième article pour mesurer les effets des chocs de crédit au Canada. La contribution du dernier chapitre est d'imposer la structure à facteurs sur les paramètres variant dans le temps et de montrer qu'il existe un petit nombre de sources de cette instabilité. Le premier article analyse la transmission de la politique monétaire au Canada en utilisant le modèle vectoriel autorégressif augmenté par facteurs (FAVAR). Les études antérieures basées sur les modèles VAR ont trouvé plusieurs anomalies empiriques suite à un choc de la politique monétaire. Nous estimons le modèle FAVAR en utilisant un grand nombre de séries macroéconomiques mensuelles et trimestrielles. Nous trouvons que l'information contenue dans les facteurs est importante pour bien identifier la transmission de la politique monétaire et elle aide à corriger les anomalies empiriques standards. Finalement, le cadre d'analyse FAVAR permet d'obtenir les fonctions de réponse impulsionnelles pour tous les indicateurs dans l'ensemble de données, produisant ainsi l'analyse la plus complète à ce jour des effets de la politique monétaire au Canada. Motivée par la dernière crise économique, la recherche sur le rôle du secteur financier a repris de l'importance. Dans le deuxième article nous examinons les effets et la propagation des chocs de crédit sur l'économie réelle en utilisant un grand ensemble d'indicateurs économiques et financiers dans le cadre d'un modèle à facteurs structurel. Nous trouvons qu'un choc de crédit augmente immédiatement les diffusions de crédit (credit spreads), diminue la valeur des bons de Trésor et cause une récession. Ces chocs ont un effet important sur des mesures d'activité réelle, indices de prix, indicateurs avancés et financiers. Contrairement aux autres études, notre procédure d'identification du choc structurel ne requiert pas de restrictions temporelles entre facteurs financiers et macroéconomiques. De plus, elle donne une interprétation des facteurs sans restreindre l'estimation de ceux-ci. Dans le troisième article nous étudions la relation entre les représentations VARMA et factorielle des processus vectoriels stochastiques, et proposons une nouvelle classe de modèles VARMA augmentés par facteurs (FAVARMA). Notre point de départ est de constater qu'en général les séries multivariées et facteurs associés ne peuvent simultanément suivre un processus VAR d'ordre fini. Nous montrons que le processus dynamique des facteurs, extraits comme combinaison linéaire des variables observées, est en général un VARMA et non pas un VAR comme c'est supposé ailleurs dans la littérature. Deuxièmement, nous montrons que même si les facteurs suivent un VAR d'ordre fini, cela implique une représentation VARMA pour les séries observées. Alors, nous proposons le cadre d'analyse FAVARMA combinant ces deux méthodes de réduction du nombre de paramètres. Le modèle est appliqué dans deux exercices de prévision en utilisant des données américaines et canadiennes de Boivin, Giannoni et Stevanovic (2010, 2009) respectivement. Les résultats montrent que la partie VARMA aide à mieux prévoir les importants agrégats macroéconomiques relativement aux modèles standards. Finalement, nous estimons les effets de choc monétaire en utilisant les données et le schéma d'identification de Bernanke, Boivin et Eliasz (2005). Notre modèle FAVARMA(2,1) avec six facteurs donne les résultats cohérents et précis des effets et de la transmission monétaire aux États-Unis. Contrairement au modèle FAVAR employé dans l'étude ultérieure où 510 coefficients VAR devaient être estimés, nous produisons les résultats semblables avec seulement 84 paramètres du processus dynamique des facteurs. L'objectif du quatrième article est d'identifier et mesurer les effets des chocs de crédit au Canada dans un environnement riche en données et en utilisant le modèle FAVARMA structurel. Dans le cadre théorique de l'accélérateur financier développé par Bernanke, Gertler et Gilchrist (1999), nous approximons la prime de financement extérieur par les credit spreads. D'un côté, nous trouvons qu'une augmentation non-anticipée de la prime de financement extérieur aux États-Unis génère une récession significative et persistante au Canada, accompagnée d'une hausse immédiate des credit spreads et taux d'intérêt canadiens. La composante commune semble capturer les dimensions importantes des fluctuations cycliques de l'économie canadienne. L'analyse par décomposition de la variance révèle que ce choc de crédit a un effet important sur différents secteurs d'activité réelle, indices de prix, indicateurs avancés et credit spreads. De l'autre côté, une hausse inattendue de la prime canadienne de financement extérieur ne cause pas d'effet significatif au Canada. Nous montrons que les effets des chocs de crédit au Canada sont essentiellement causés par les conditions globales, approximées ici par le marché américain. Finalement, étant donnée la procédure d'identification des chocs structurels, nous trouvons des facteurs interprétables économiquement. Le comportement des agents et de l'environnement économiques peut varier à travers le temps (ex. changements de stratégies de la politique monétaire, volatilité de chocs) induisant de l'instabilité des paramètres dans les modèles en forme réduite. Les modèles à paramètres variant dans le temps (TVP) standards supposent traditionnellement les processus stochastiques indépendants pour tous les TVPs. Dans cet article nous montrons que le nombre de sources de variabilité temporelle des coefficients est probablement très petit, et nous produisons la première évidence empirique connue dans les modèles macroéconomiques empiriques. L'approche Factor-TVP, proposée dans Stevanovic (2010), est appliquée dans le cadre d'un modèle VAR standard avec coefficients aléatoires (TVP-VAR). Nous trouvons qu'un seul facteur explique la majorité de la variabilité des coefficients VAR, tandis que les paramètres de la volatilité des chocs varient d'une façon indépendante. Le facteur commun est positivement corrélé avec le taux de chômage. La même analyse est faite avec les données incluant la récente crise financière. La procédure suggère maintenant deux facteurs et le comportement des coefficients présente un changement important depuis 2007. Finalement, la méthode est appliquée à un modèle TVP-FAVAR. Nous trouvons que seulement 5 facteurs dynamiques gouvernent l'instabilité temporelle dans presque 700 coefficients.
Resumo:
Introducción: La evaluación de injertos vasculares de submucosa de intestino delgado para la regeneración de vasos sanguíneos ha producido una permeabilidad variable (0-100%) que ha sido concurrente con la variabilidad en las técnicas de fabricación. Metodología: Investigamos los efectos de fabricación en permeabilidad y regeneración en un diseño experimental de 22factorial que combino: 1) preservación (P) o remoción (R) de la capa estratum compactum del intestino, y 2) deshidratada (D) o hidratada (H), dentro de cuatro grupos de estudio (PD, RD, PH, RH). Los injertos fueron implantados en las Arterias Carótidas de porcinos (ID 4.5mm, N=4, 7d). Permeabilidad, trombogenicidad, reacción inflamatoria, vascularización, infiltración de fibroblastos, perfil de polarización de macrófagos y fuerza tensil biaxial fueron evaluadas. Resultados: Todos los injertos PD permanecieron permeables (4/4), pero tuvieron escasa vascularización e infiltración de fibroblastos. El grupo RD permaneció permeable (4/4), presentó una extensa vascularización e infiltración de fibroblastos, y el mayor número del fenotipo de macrófagos (M2) asociado a regeneración. El grupo RH presentó menor permeabilidad (3/4), una extensa vascularización e infiltración de fibroblastos, y un perfil dominante de M2. El grupo PH presentó el menor grado de permeabilidad, y a pesar de mayor infiltración celular que PD, exhibió un fenotipo de macrófagos dominante adverso. La elasticidad de los injertos R evolucionó de una manera similar a las Carótidas nativas (particularmente RD, mientras que los injertos P mantuvieron su rigidez inicial. Discusión: Concluimos que los parámetros de fabricación afectan drásticamente los resultados, siendo los injertos RD los que arrojaron mejores resultados.
Resumo:
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon-known as heterotachy-can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.
Resumo:
Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.
Resumo:
This work is an assessment of frequency of extreme values (EVs) of daily rainfall in the city of Sao Paulo. Brazil, over the period 1933-2005, based on the peaks-over-threshold (POT) and Generalized Pareto Distribution (GPD) approach. Usually. a GPD model is fitted to a sample of POT Values Selected With a constant threshold. However. in this work we use time-dependent thresholds, composed of relatively large p quantities (for example p of 0.97) of daily rainfall amounts computed from all available data. Samples of POT values were extracted with several Values of p. Four different GPD models (GPD-1, GPD-2, GPD-3. and GDP-4) were fitted to each one of these samples by the maximum likelihood (ML) method. The shape parameter was assumed constant for the four models, but time-varying covariates were incorporated into scale parameter of GPD-2. GPD-3, and GPD-4, describing annual cycle in GPD-2. linear trend in GPD-3, and both annual cycle and linear trend in GPD-4. The GPD-1 with constant scale and shape parameters is the simplest model. For identification of the best model among the four models WC used rescaled Akaike Information Criterion (AIC) with second-order bias correction. This criterion isolates GPD-3 as the best model, i.e. the one with positive linear trend in the scale parameter. The slope of this trend is significant compared to the null hypothesis of no trend, for about 98% confidence level. The non-parametric Mann-Kendall test also showed presence of positive trend in the annual frequency of excess over high thresholds. with p-value being virtually zero. Therefore. there is strong evidence that high quantiles of daily rainfall in the city of Sao Paulo have been increasing in magnitude and frequency over time. For example. 0.99 quantiles of daily rainfall amount have increased by about 40 mm between 1933 and 2005. Copyright (C) 2008 Royal Meteorological Society
Resumo:
So Paulo is the most developed state in Brazil and contains few fragments of native ecosystems, generally surrounded by intensive agriculture lands. Despite this, some areas still shelter large native animals. We aimed at understanding how medium and large carnivores use a mosaic landscape of forest/savanna and agroecosystems, and how the species respond to different landscape parameters (percentage of landcover and edge density), in a multi-scale perspective. The response variables were: species richness, carnivore frequency and frequency for the three most recorded species (Puma concolor, Chrysocyon brachyurus and Leopardus pardalis). We compared 11 competing models using Akaike`s information criterion (AIC) and assessed model support using weight of AIC. Concurrent models were combinations of landcover types (native vegetation, ""cerrado"" formations, ""cerrado"" and eucalypt plantation), landscape feature (percentage of landcover and edge density) and spatial scale. Herein, spatial scale refers to the radius around a sampling point defining a circular landscape. The scales analyzed were 250 (fine), 1,000 (medium) and 2,000 m (coarse). The shape of curves for response variables (linear, exponential and power) was also assessed. Our results indicate that species with high mobility, P. concolor and C. brachyurus, were best explained by edge density of the native vegetation at a coarse scale (2,000 m). The relationship between P. concolor and C. brachyurus frequency had a negative power-shaped response to explanatory variables. This general trend was also observed for species richness and carnivore frequency. Species richness and P. concolor frequency were also well explained by a second concurrent model: edge density of cerrado at the fine (250 m) scale. A different response was recorded for L. pardalis, as the frequency was best explained for the amount of cerrado at the fine (250 m) scale. The curve of response was linearly positive. The contrasting results (P. concolor and C. brachyurus vs L. pardalis) may be due to the much higher mobility of the two first species, in comparison with the third. Still, L. pardalis requires habitat with higher quality when compared with other two species. This study highlights the importance of considering multiple spatial scales when evaluating species responses to different habitats. An important and new finding was the prevalence of edge density over the habitat extension to explain overall carnivore distribution, a key information for planning and management of protected areas.
Resumo:
1. Analyses of species association have major implications for selecting indicators for freshwater biomonitoring and conservation, because they allow for the elimination of redundant information and focus on taxa that can be easily handled and identified. These analyses are particularly relevant in the debate about using speciose groups (such as the Chironomidae) as indicators in the tropics, because they require difficult and time-consuming analysis, and their responses to environmental gradients, including anthropogenic stressors, are poorly known. 2. Our objective was to show whether chironomid assemblages in Neotropical streams include clear associations of taxa and, if so, how well these associations could be explained by a set of models containing information from different spatial scales. For this, we formulated a priori models that allowed for the influence of local, landscape and spatial factors on chironomid taxon associations (CTA). These models represented biological hypotheses capable of explaining associations between chironomid taxa. For instance, CTA could be best explained by local variables (e.g. pH, conductivity and water temperature) or by processes acting at wider landscape scales (e.g. percentage of forest cover). 3. Biological data were taken from 61 streams in Southeastern Brazil, 47 of which were in well-preserved regions, and 14 of which drained areas severely affected by anthropogenic activities. We adopted a model selection procedure using Akaike`s information criterion to determine the most parsimonious models for explaining CTA. 4. Applying Kendall`s coefficient of concordance, seven genera (Tanytarsus/Caladomyia, Ablabesmyia, Parametriocnemus, Pentaneura, Nanocladius, Polypedilum and Rheotanytarsus) were identified as associated taxa. The best-supported model explained 42.6% of the total variance in the abundance of associated taxa. This model combined local and landscape environmental filters and spatial variables (which were derived from eigenfunction analysis). However, the model with local filters and spatial variables also had a good chance of being selected as the best model. 5. Standardised partial regression coefficients of local and landscape filters, including spatial variables, derived from model averaging allowed an estimation of which variables were best correlated with the abundance of associated taxa. In general, the abundance of the associated genera tended to be lower in streams characterised by a high percentage of forest cover (landscape scale), lower proportion of muddy substrata and high values of pH and conductivity (local scale). 6. Overall, our main result adds to the increasing number of studies that have indicated the importance of local and landscape variables, as well as the spatial relationships among sampling sites, for explaining aquatic insect community patterns in streams. Furthermore, our findings open new possibilities for the elimination of redundant data in the assessment of anthropogenic impacts on tropical streams.
Resumo:
In this article, we present the EM-algorithm for performing maximum likelihood estimation of an asymmetric linear calibration model with the assumption of skew-normally distributed error. A simulation study is conducted for evaluating the performance of the calibration estimator with interpolation and extrapolation situations. As one application in a real data set, we fitted the model studied in a dimensional measurement method used for calculating the testicular volume through a caliper and its calibration by using ultrasonography as the standard method. By applying this methodology, we do not need to transform the variables to have symmetrical errors. Another interesting aspect of the approach is that the developed transformation to make the information matrix nonsingular, when the skewness parameter is near zero, leaves the parameter of interest unchanged. Model fitting is implemented and the best choice between the usual calibration model and the model proposed in this article was evaluated by developing the Akaike information criterion, Schwarz`s Bayesian information criterion and Hannan-Quinn criterion.
Resumo:
The present work aims to study the macroeconomic factors influence in credit risk for installment autoloans operations. The study is based on 4.887 credit operations surveyed in the Credit Risk Information System (SCR) hold by the Brazilian Central Bank. Using Survival Analysis applied to interval censured data, we achieved a model to estimate the hazard function and we propose a method for calculating the probability of default in a twelve month period. Our results indicate a strong time dependence for the hazard function by a polynomial approximation in all estimated models. The model with the best Akaike Information Criteria estimate a positive effect of 0,07% for males over de basic hazard function, and 0,011% for the increasing of ten base points on the operation annual interest rate, toward, for each R$ 1.000,00 on the installment, the hazard function suffer a negative effect of 0,28% , and an estimated elevation of 0,0069% for the same amount added to operation contracted value. For de macroeconomics factors, we find statistically significant effects for the unemployment rate (-0,12%) , for the one lag of the unemployment rate (0,12%), for the first difference of the industrial product index(-0,008%), for one lag of inflation rate (-0,13%) and for the exchange rate (-0,23%). We do not find statistic significant results for all other tested variables.
Resumo:
O presente trabalho utiliza os dados disponibilizados pela BM&FBovespa para analisar as ações da Petrobrás para os meses compreendidos entre julho e agosto de 2010 e outubro de 2008. Primeiramente, apresentamos uma discussão detalhada sobre a manipulação desses dados, na qual explicitamos a impossibilidade de se usar o mid-price quote devido ao número elevado de ofertas de compra/venda com preços muito elevados/baixos. Verificamos alguns dos fatos estilizados empíricos apontados por Cont (2001), entre outros consagrados na literatura de microestrutura. Em geral, os dados replicaram os fatos estilizados. Aplicamos o filtro proposto por Brownlees e Gallo (2006) às ações da Petrobrás e analisamos a sensibilidade do número de possíveis outliers encontrados pelo filtro a variação dos parâmetros desse filtro. Propomos utilizar o critério de Akaike para ordenar e selecionar modelos de duração condicional cujas amostras de duração possuem tamanhos distintos. Os modelos selecionados, nem sempre são aqueles em que os dados foram filtrados. Para o ajuste ACD (1,1), quando considerados apenas os modelos bem ajustados (resíduos não autocorrelacionados), o critério de Akaike indica como melhor modelo aquele em que os dados não foram filtrados.