13 resultados para R-Statistical computing
em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain
Resumo:
This paper presents an analysis of motor vehicle insurance claims relating to vehicle damage and to associated medical expenses. We use univariate severity distributions estimated with parametric and non-parametric methods. The methods are implemented using the statistical package R. Parametric analysis is limited to estimation of normal and lognormal distributions for each of the two claim types. The nonparametric analysis presented involves kernel density estimation. We illustrate the benefits of applying transformations to data prior to employing kernel based methods. We use a log-transformation and an optimal transformation amongst a class of transformations that produces symmetry in the data. The central aim of this paper is to provide educators with material that can be used in the classroom to teach statistical estimation methods, goodness of fit analysis and importantly statistical computing in the context of insurance and risk management. To this end, we have included in the Appendix of this paper all the R code that has been used in the analysis so that readers, both students and educators, can fully explore the techniques described
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics
Resumo:
Statistical computing when input/output is driven by a Graphical User Interface is considered. A proposal is made for automatic control ofcomputational flow to ensure that only strictly required computationsare actually carried on. The computational flow is modeled by a directed graph for implementation in any object-oriented programming language with symbolic manipulation capabilities. A complete implementation example is presented to compute and display frequency based piecewise linear density estimators such as histograms or frequency polygons.
Resumo:
Projecte de recerca elaborat a partir d’una estada a la Universitat de Bonn, Alemanya, entre agost i desembre del 2008. Recentement, arran de la creació del Registre de Càncer de Catalunya, s'ha el.laborat un nou "estat de la qüestió" del càncer a Catalunya, que ha permès obtenir una imatge complerta de la incidència, mortalitat i supervivència del càncer a Catalunya, a partir de les dades obtingudes pels registres poblacionals del càncer de Girona i Tarragona pel que fa a la incidència del càncer, i pel registre de Mortalitat de Catalunya, pel que fa a la mortalitat per càncer. El projecte realitzat ha tingut dos objectius principals. En primer lloc, desenvolupar un conjunt integrat de funcions per al càlcul automatitzat de la incidència, mortalitat i supervivència, així com l'ajust dels models estadístics que permeten avaluar les tendències i obtenir les projeccions del càncer pels anys futurs. En segon lloc, s'han aplicat les funcions a les dades disponibles i s'han obtingut els resultats a Catalunya, que inclou les projeccions de la incìdència i mortalitat per càncer a Catalunya fins a l'any 2020. Tos dos objectius han estat substancialment assolits. Pel que fa al primer, s'ha desenvolupat un fitxer font en R que conté les macros i funcions utilitzades. Pel que fa al segon, les anàlisis realitzades han estat emprades per a la realització d'una monografia sobre el càncer a Catalunya, que actualment està acceptada per la seva publicació. Els resultats mostren que la incidència per càncer ha augmentat i està previst que així continuï, tot i que es preveu un esmoertiment de l'augment pels homes. Pel que fa a la mortalitat s'observa un recent decrement que es preveu que es mantingui en el futur, essent aquest major pels homes respecte les dones.
Resumo:
El principal objectiu del projecte era desenvolupar millores conceptuals i metodològiques que permetessin una millor predicció dels canvis en la distribució de les espècies (a una escala de paisatge) derivats de canvis ambientals en un context dominat per pertorbacions. En un primer estudi, vàrem comparar l'eficàcia de diferents models dinàmics per a predir la distribució de l'hortolà (Emberiza hortulana). Els nostres resultats indiquen que un model híbrid que combini canvis en la qualitat de l'hàbitat, derivats de canvis en el paisatge, amb un model poblacional espacialment explícit és una aproximació adequada per abordar canvis en la distribució d'espècies en contextos de dinàmica ambiental elevada i una capacitat de dispersió limitada de l'espècie objectiu. En un segon estudi abordarem la calibració mitjançant dades de seguiment de models de distribució dinàmics per a 12 espècies amb preferència per hàbitats oberts. Entre les conclusions extretes destaquem: (1) la necessitat de que les dades de seguiment abarquin aquelles àrees on es produeixen els canvis de qualitat; (2) el biaix que es produeix en la estimació dels paràmetres del model d'ocupació quan la hipòtesi de canvi de paisatge o el model de qualitat d'hàbitat són incorrectes. En el darrer treball estudiarem el possible impacte en 67 espècies d’ocells de diferents règims d’incendis, definits a partir de combinacions de nivells de canvi climàtic (portant a un augment esperat de la mida i freqüència d’incendis forestals), i eficiència d’extinció per part dels bombers. Segons els resultats dels nostres models, la combinació de factors antropogènics del regim d’incendis, tals com l’abandonament rural i l’extinció, poden ser més determinants per als canvis de distribució que els efectes derivats del canvi climàtic. Els productes generats inclouen tres publicacions científiques, una pàgina web amb resultats del projecte i una llibreria per a l'entorn estadístic R.
Resumo:
R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computingand graphics. The environment in which many classical and modern statistical techniques havebeen implemented, but many are supplied as packages. There are 8 standard packages and many moreare available through the cran family of Internet sites http://cran.r-project.org .We started to develop a library of functions in R to support the analysis of mixtures and our goal isa MixeR package for compositional data analysis that provides support foroperations on compositions: perturbation and power multiplication, subcomposition with or withoutresiduals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances,compositional Kullback-Leibler divergence etc.graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features:barycenter, geometric mean of the data set, the percentiles lines, marking and coloring ofsubsets of the data set, theirs geometric means, notation of individual data in the set . . .dealing with zeros and missing values in compositional data sets with R procedures for simpleand multiplicative replacement strategy,the time series analysis of compositional data.We’ll present the current status of MixeR development and illustrate its use on selected data sets
Resumo:
Compositional data naturally arises from the scientific analysis of the chemicalcomposition of archaeological material such as ceramic and glass artefacts. Data of thistype can be explored using a variety of techniques, from standard multivariate methodssuch as principal components analysis and cluster analysis, to methods based upon theuse of log-ratios. The general aim is to identify groups of chemically similar artefactsthat could potentially be used to answer questions of provenance.This paper will demonstrate work in progress on the development of a documentedlibrary of methods, implemented using the statistical package R, for the analysis ofcompositional data. R is an open source package that makes available very powerfulstatistical facilities at no cost. We aim to show how, with the aid of statistical softwaresuch as R, traditional exploratory multivariate analysis can easily be used alongside, orin combination with, specialist techniques of compositional data analysis.The library has been developed from a core of basic R functionality, together withpurpose-written routines arising from our own research (for example that reported atCoDaWork'03). In addition, we have included other appropriate publicly availabletechniques and libraries that have been implemented in R by other authors. Availablefunctions range from standard multivariate techniques through to various approaches tolog-ratio analysis and zero replacement. We also discuss and demonstrate a smallselection of relatively new techniques that have hitherto been little-used inarchaeometric applications involving compositional data. The application of the libraryto the analysis of data arising in archaeometry will be demonstrated; results fromdifferent analyses will be compared; and the utility of the various methods discussed
Resumo:
”compositions” is a new R-package for the analysis of compositional and positive data.It contains four classes corresponding to the four different types of compositional andpositive geometry (including the Aitchison geometry). It provides means for computation,plotting and high-level multivariate statistical analysis in all four geometries.These geometries are treated in an fully analogous way, based on the principle of workingin coordinates, and the object-oriented programming paradigm of R. In this way,called functions automatically select the most appropriate type of analysis as a functionof the geometry. The graphical capabilities include ternary diagrams and tetrahedrons,various compositional plots (boxplots, barplots, piecharts) and extensive graphical toolsfor principal components. Afterwards, ortion and proportion lines, straight lines andellipses in all geometries can be added to plots. The package is accompanied by ahands-on-introduction, documentation for every function, demos of the graphical capabilitiesand plenty of usage examples. It allows direct and parallel computation inall four vector spaces and provides the beginner with a copy-and-paste style of dataanalysis, while letting advanced users keep the functionality and customizability theydemand of R, as well as all necessary tools to add own analysis routines. A completeexample is included in the appendix
Resumo:
First: A continuous-time version of Kyle's model (Kyle 1985), known as the Back's model (Back 1992), of asset pricing with asymmetric information, is studied. A larger class of price processes and of noise traders' processes are studied. The price process, as in Kyle's model, is allowed to depend on the path of the market order. The process of the noise traders' is an inhomogeneous Lévy process. Solutions are found by the Hamilton-Jacobi-Bellman equations. With the insider being risk-neutral, the price pressure is constant, and there is no equilibirium in the presence of jumps. If the insider is risk-averse, there is no equilibirium in the presence of either jumps or drifts. Also, it is analised when the release time is unknown. A general relation is established between the problem of finding an equilibrium and of enlargement of filtrations. Random announcement time is random is also considered. In such a case the market is not fully efficient and there exists equilibrium if the sensitivity of prices with respect to the global demand is time decreasing according with the distribution of the random time. Second: Power variations. it is considered, the asymptotic behavior of the power variation of processes of the form _integral_0^t u(s-)dS(s), where S_ is an alpha-stable process with index of stability 0&alpha&2 and the integral is an Itô integral. Stable convergence of corresponding fluctuations is established. These results provide statistical tools to infer the process u from discrete observations. Third: A bond market is studied where short rates r(t) evolve as an integral of g(t-s)sigma(s) with respect to W(ds), where g and sigma are deterministic and W is the stochastic Wiener measure. Processes of this type are particular cases of ambit processes. These processes are in general not of the semimartingale kind.
Resumo:
In the scope of the European project Hydroptimet, INTERREG IIIB-MEDOCC programme, limited area model (LAM) intercomparison of intense events that produced many damages to people and territory is performed. As the comparison is limited to single case studies, the work is not meant to provide a measure of the different models' skill, but to identify the key model factors useful to give a good forecast on such a kind of meteorological phenomena. This work focuses on the Spanish flash-flood event, also known as "Montserrat-2000" event. The study is performed using forecast data from seven operational LAMs, placed at partners' disposal via the Hydroptimet ftp site, and observed data from Catalonia rain gauge network. To improve the event analysis, satellite rainfall estimates have been also considered. For statistical evaluation of quantitative precipitation forecasts (QPFs), several non-parametric skill scores based on contingency tables have been used. Furthermore, for each model run it has been possible to identify Catalonia regions affected by misses and false alarms using contingency table elements. Moreover, the standard "eyeball" analysis of forecast and observed precipitation fields has been supported by the use of a state-of-the-art diagnostic method, the contiguous rain area (CRA) analysis. This method allows to quantify the spatial shift forecast error and to identify the error sources that affected each model forecasts. High-resolution modelling and domain size seem to have a key role for providing a skillful forecast. Further work is needed to support this statement, including verification using a wider observational data set.
Resumo:
Interdependence is the main feature of dyadic relationships and, in recent years, various statistical procedures have been proposed for quantifying and testing this social attribute in different dyadic designs. The purpose of this paper is to develop several functions for this kind of statistical tests in an R package, known as nonindependence, for use by applied social researchers. A Graphical User Interface (GUI) is also developed to facilitate the use of the functions included in this package. Examples drawn from psychological research and simulated data are used to illustrate how the software works.
Resumo:
Background. Although peer review is widely considered to be the most credible way of selecting manuscripts and improving the quality of accepted papers in scientific journals, there is little evidence to support its use. Our aim was to estimate the effects on manuscript quality of either adding a statistical peer reviewer or suggesting the use of checklists such as CONSORT or STARD to clinical reviewers or both. Methodology and Principal Findings. Interventions were defined as 1) the addition of a statistical reviewer to the clinical peer review process, and 2) suggesting reporting guidelines to reviewers; with"no statistical expert" and"no checklist" as controls. The two interventions were crossed in a 262 balanced factorial design including original research articles consecutively selected, between May 2004 and March 2005, by the Medicina Clinica (Barc) editorial committee. We randomized manuscripts to minimize differences in terms of baseline quality and type of study (intervention, longitudinal, cross-sectional, others). Sample-size calculations indicated that 100 papers provide an 80% power to test a 55% standardized difference. We specified the main outcome as the increment in quality of papers as measured on the Goodman Scale. Two blinded evaluators rated the quality of manuscripts at initial submission and final post peer review version. Of the 327 manuscripts submitted to the journal, 131 were accepted for further review, and 129 were randomized. Of those, 14 that were lost to follow-up showed no differences in initial quality to the followed-up papers. Hence, 115 were included in the main analysis, with 16 rejected for publication after peer review. 21 (18.3%) of the 115 included papers were interventions, 46 (40.0%) were longitudinal designs, 28 (24.3%) cross-sectional and 20 (17.4%) others. The 16 (13.9%) rejected papers had a significantly lower initial score on the overall Goodman scale than accepted papers (difference 15.0, 95% CI: 4.6- 24.4). The effect of suggesting a guideline to the reviewers had no effect on change in overall quality as measured by the Goodman scale (0.9, 95% CI: 20.3+2.1). The estimated effect of adding a statistical reviewer was 5.5 (95% CI: 4.3-6.7), showing a significant improvement in quality. Conclusions and Significance. This prospective randomized study shows the positive effect of adding a statistical reviewer to the field-expert peers in improving manuscript quality. We did not find a statistically significant positive effect by suggesting reviewers use reporting guidelines.
Resumo:
Motivated by experiments on activity in neuronal cultures [J. Soriano, M. Rodr ́ıguez Mart́ınez, T. Tlusty, and E. Moses, Proc. Natl. Acad. Sci. 105, 13758 (2008)], we investigate the percolation transition and critical exponents of spatially embedded Erd̋os-Ŕenyi networks with degree correlations. In our model networks, nodes are randomly distributed in a two-dimensional spatial domain, and the connection probability depends on Euclidian link length by a power law as well as on the degrees of linked nodes. Generally, spatial constraints lead to higher percolation thresholds in the sense that more links are needed to achieve global connectivity. However, degree correlations favor or do not favor percolation depending on the connectivity rules. We employ two construction methods to introduce degree correlations. In the first one, nodes stay homogeneously distributed and are connected via a distance- and degree-dependent probability. We observe that assortativity in the resulting network leads to a decrease of the percolation threshold. In the second construction methods, nodes are first spatially segregated depending on their degree and afterwards connected with a distance-dependent probability. In this segregated model, we find a threshold increase that accompanies the rising assortativity. Additionally, when the network is constructed in a disassortative way, we observe that this property has little effect on the percolation transition.