Biblioteca Digital

65 resultados para Yield curve data sets

Bayesian tools for zero counts in compositional data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The log-ratio methodology makes available powerful tools for analyzing compositionaldata. Nevertheless, the use of this methodology is only possible for those data setswithout null values. Consequently, in those data sets where the zeros are present, aprevious treatment becomes necessary. Last advances in the treatment of compositionalzeros have been centered especially in the zeros of structural nature and in the roundedzeros. These tools do not contemplate the particular case of count compositional datasets with null values. In this work we deal with \count zeros" and we introduce atreatment based on a mixed Bayesian-multiplicative estimation. We use the Dirichletprobability distribution as a prior and we estimate the posterior probabilities. Then weapply a multiplicative modi¯cation for the non-zero values. We present a case studywhere this new methodology is applied.Key words: count data, multiplicative replacement, composition, log-ratio analysis

Possible solution of some essential zero problems in compositional data analysis

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the tantalising remaining problems in compositional data analysis lies in how to deal with data sets in which there are components which are essential zeros. By anessential zero we mean a component which is truly zero, not something recorded as zero simply because the experimental design or the measuring instrument has not been sufficiently sensitive to detect a trace of the part. Such essential zeros occur inmany compositional situations, such as household budget patterns, time budgets,palaeontological zonation studies, ecological abundance studies. Devices such as nonzero replacement and amalgamation are almost invariably ad hoc and unsuccessful insuch situations. From consideration of such examples it seems sensible to build up amodel in two stages, the first determining where the zeros will occur and the secondhow the unit available is distributed among the non-zero parts. In this paper we suggest two such models, an independent binomial conditional logistic normal model and a hierarchical dependent binomial conditional logistic normal model. The compositional data in such modelling consist of an incidence matrix and a conditional compositional matrix. Interesting statistical problems arise, such as the question of estimability of parameters, the nature of the computational process for the estimation of both the incidence and compositional parameters caused by the complexity of the subcompositional structure, the formation of meaningful hypotheses, and the devising of suitable testing methodology within a lattice of such essential zero-compositional hypotheses. The methodology is illustrated by application to both simulated and real compositional data

Biplots of fuzzy coded data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.

GSVA: gene set variation analysis for microarray and RNA-seq data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Gene set enrichment (GSE) analysis is a popular framework for condensing information from gene expression proﬁles into a pathway or signature summary. The strengths of this approach over single gene analysis include noise and dimension reduction, as well as greater biological interpretability. As molecular proﬁling experiments move beyond simple case-control studies, robust and ﬂexible GSE methodologies are needed that can model pathway activity within highly heterogeneous data sets. To address this challenge, we introduce Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner. We demonstrate the robustness of GSVA in a comparison with current state of the art sample-wise enrichment methods. Further, we provide examples of its utility in diﬀerential pathway activity and survival analysis. Lastly, we show how GSVA works analogously with data from both microarray and RNA-seq experiments. GSVA provides increased power to detect subtle pathway activity changes over a sample population in comparison to corresponding methods. While GSE methods are generally regarded as end points of a bioinformatic analysis, GSVA constitutes a starting point to build pathway-centric models of biology. Moreover, GSVA contributes to the current need of GSE methods for RNA-seq data. GSVA is an open source software package for R which forms part of the Bioconductor project and can be downloaded at http://www.bioconductor.org.

Lattices for 3-Dimensional Fuzzy Data generated by Fuzzy Galois Connections

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Vagueness and high dimensional space data are usual features of current data. The paper is an approach to identify conceptual structures among fuzzy three dimensional data sets in order to get conceptual hierarchy. We propose a fuzzy extension of the Galois connections that allows to demonstrate an isomorphism theorem between fuzzy sets closures which is the basis for generating lattices ordered-sets

Cross-country data on the quantity of schooling: a selective survey and some quality measures

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We survey a number of papers that have focused on the construction of cross-country data sets on average years of schooling. We discuss the construction of the different series, compare their profiles and construct indicators of their information content. The discussion focuses on a sample of OECD countries but we also provide some results for a large non-OECD sample.

In Search of a theory of debt management

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A growing literature integrates theories of debt management into models of optimal fiscal policy. One promising theory argues that the composition of government debt should be chosen so that fluctuations in the market value of debt offset changes in expected future deficits. This complete market approach to debt management is valid even when the government only issues non-contingent bonds. A number of authors conclude from this approach that governments should issue long term debt and invest in short term assets. We argue that the conclusions of this approach are too fragile to serve as a basis for policy recommendations. This is because bonds at different maturities have highly correlated returns, causing the determination of the optimal portfolio to be ill-conditioned. To make this point concrete we examine the implications of this approach to debt management in various models, both analytically and using numerical methods calibrated to the US economy. We find the complete market approach recommends asset positions which are huge multiples of GDP. Introducing persistent shocks or capital accumulation only worsens this problem. Increasing the volatility of interest rates through habits partly reduces the size of these simulations we find no presumption that governments should issue long term debt ? policy recommendations can be easily reversed through small perturbations in the specification of shocks or small variations in the maturity of bonds issued. We further extend the literature by removing the assumption that governments every period costlessly repurchase all outstanding debt. This exacerbates the size of the required positions, worsens their volatility and in some cases produces instability in debt holdings. We conclude that it is very difficult to insulate fiscal policy from shocks by using the complete markets approach to debt management. Given the limited variability of the yield curve using maturities is a poor way to substitute for state contingent debt. The result is the positions recommended by this approach conflict with a number of features that we believe are important in making bond markets incomplete e.g allowing for transaction costs, liquidity effects, etc.. Until these features are all fully incorporated we remain in search of a theory of debt management capable of providing robust policy insights.

The Timing of labor demand

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We examine the timing of firms' operations in a formal model of labor demand. Merging a variety of data sets from Portugal from 1995-2004, we describe temporal patterns of firms' demand for labor and estimate production-functions and relative labor-demand equations. The results demonstrate the existence of substitution of employment across times of the day/week and show that legislated penalties for work at irregular hours induce firms to alter their operating schedules. The results suggest a role for such penalties in an unregulated labor market, such as the United States, in which unusually large fractions of work are performed at night and on weekends.

Development of a tool for the construction of "ground truth" for complex color images with text

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aquesta memoria resumeix el treball de final de carrera d’Enginyeria Superior d’Informàtica. Explicarà les principals raons que han motivat el projecte així com exemples que il·lustren l’aplicació resultant. En aquest cas el software intentarà resoldre la actual necessitat que hi ha de tenir dades de Ground Truth per als algoritmes de segmentació de text per imatges de color complexes. Tots els procesos seran explicats en els diferents capítols partint de la definició del problema, la planificació, els requeriments i el disseny fins a completar la il·lustració dels resultats del programa i les dades de Ground Truth resultants.

Interoperability issues between learning object repositories and metadata harvesters

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper we describe an open learning object repository on Statistics based on DSpace which contains true learning objects, that is, exercises, equations, data sets, etc. This repository is part of a large project intended to promote the use of learning object repositories as part of the learning process in virtual learning environments. This involves the creation of a new user interface that provides users with additional services such as resource rating, commenting and so. Both aspects make traditional metadata schemes such as Dublin Core to be inadequate, as there are resources with no title or author, for instance, as those fields are not used by learners to browse and search for learning resources in the repository. Therefore, exporting OAI-PMH compliant records using OAI-DC is not possible, thus limiting the visibility of the learning objects in the repository outside the institution. We propose an architecture based on ontologies and the use of extended metadata records for both storing and refactoring such descriptions.

Trade, firm selection, and innovation: the competition channel

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The availability of rich firm-level data sets has recently led researchers to uncover new evidence on the effects of trade liberalization. First, trade openness forces the least productive firms to exit the market. Secondly, it induces surviving firms to increase their innovation efforts and thirdly, it increases the degree of product market competition. In this paper we propose a model aimed at providing a coherent interpretation of these findings. We introducing firm heterogeneity into an innovation-driven growth model, where incumbent firms operating in oligopolistic industries perform cost-reducing innovations. In this framework, trade liberalization leads to higher product market competition, lower markups and higher quantity produced. These changes in markups and quantities, in turn, promote innovation and productivity growth through a direct competition effect, based on the increase in the size of the market, and a selection effect, produced by the reallocation of resources towards more productive firms. Calibrated to match US aggregate and firm-level statistics, the model predicts that a 10 percent reduction in variable trade costs reduces markups by 1:15 percent, firm surviving probabilities by 1 percent, and induces an increase in productivity growth of about 13 percent. More than 90 percent of the trade-induced growth increase can be attributed to the selection effect.

Conditional compositional biplots: theory and application

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The biplot has proved to be a powerful descriptive and analytical tool in many areasof applications of statistics. For compositional data the necessary theoreticaladaptation has been provided, with illustrative applications, by Aitchison (1990) andAitchison and Greenacre (2002). These papers were restricted to the interpretation ofsimple compositional data sets. In many situations the problem has to be described insome form of conditional modelling. For example, in a clinical trial where interest isin how patients’ steroid metabolite compositions may change as a result of differenttreatment regimes, interest is in relating the compositions after treatment to thecompositions before treatment and the nature of the treatments applied. To study thisthrough a biplot technique requires the development of some form of conditionalcompositional biplot. This is the purpose of this paper. We choose as a motivatingapplication an analysis of the 1992 US President ial Election, where interest may be inhow the three-part composition, the percentage division among the three candidates -Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethniccomposition and on the urban-rural composition of the state. The methodology ofconditional compositional biplots is first developed and a detailed interpretation of the1992 US Presidential Election provided. We use a second application involving theconditional variability of tektite mineral compositions with respect to major oxidecompositions to demonstrate some hazards of simplistic interpretation of biplots.Finally we conjecture on further possible applications of conditional compositionalbiplots

Scene Classification Using a Hybrid Generative/Discriminative Approach

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos

News from compositions, the R package

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The R-package “compositions”is a tool for advanced compositional analysis. Its basicfunctionality has seen some conceptual improvement, containing now some facilitiesto work with and represent ilr bases built from balances, and an elaborated subsys-tem for dealing with several kinds of irregular data: (rounded or structural) zeroes,incomplete observations and outliers. The general approach to these irregularities isbased on subcompositions: for an irregular datum, one can distinguish a “regular” sub-composition (where all parts are actually observed and the datum behaves typically)and a “problematic” subcomposition (with those unobserved, zero or rounded parts, orelse where the datum shows an erratic or atypical behaviour). Systematic classificationschemes are proposed for both outliers and missing values (including zeros) focusing onthe nature of irregularities in the datum subcomposition(s).To compute statistics with values missing at random and structural zeros, a projectionapproach is implemented: a given datum contributes to the estimation of the desiredparameters only on the subcompositon where it was observed. For data sets withvalues below the detection limit, two different approaches are provided: the well-knownimputation technique, and also the projection approach.To compute statistics in the presence of outliers, robust statistics are adapted to thecharacteristics of compositional data, based on the minimum covariance determinantapproach. The outlier classification is based on four different models of outlier occur-rence and Monte-Carlo-based tests for their characterization. Furthermore the packageprovides special plots helping to understand the nature of outliers in the dataset.Keywords: coda-dendrogram, lost values, MAR, missing data, MCD estimator,robustness, rounded zeros

Nonparametric estimation of Value-at-Risk

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A method to estimate an extreme quantile that requires no distributional assumptions is presented. The approach is based on transformed kernel estimation of the cumulative distribution function (cdf). The proposed method consists of a double transformation kernel estimation. We derive optimal bandwidth selection methods that have a direct expression for the smoothing parameter. The bandwidth can accommodate to the given quantile level. The procedure is useful for large data sets and improves quantile estimation compared to other methods in heavy tailed distributions. Implementation is straightforward and R programs are available.

«
1
2
3
4
5
»