991 resultados para mathematical application
Resumo:
Recent work shows that a low correlation between the instruments and the included variables leads to serious inference problems. We extend the local-to-zero analysis of models with weak instruments to models with estimated instruments and regressors and with higher-order dependence between instruments and disturbances. This makes this framework applicable to linear models with expectation variables that are estimated non-parametrically. Two examples of such models are the risk-return trade-off in finance and the impact of inflation uncertainty on real economic activity. Results show that inference based on Lagrange Multiplier (LM) tests is more robust to weak instruments than Wald-based inference. Using LM confidence intervals leads us to conclude that no statistically significant risk premium is present in returns on the S&P 500 index, excess holding yields between 6-month and 3-month Treasury bills, or in yen-dollar spot returns.
Resumo:
We study the problem of testing the error distribution in a multivariate linear regression (MLR) model. The tests are functions of appropriately standardized multivariate least squares residuals whose distribution is invariant to the unknown cross-equation error covariance matrix. Empirical multivariate skewness and kurtosis criteria are then compared to simulation-based estimate of their expected value under the hypothesized distribution. Special cases considered include testing multivariate normal, Student t; normal mixtures and stable error models. In the Gaussian case, finite-sample versions of the standard multivariate skewness and kurtosis tests are derived. To do this, we exploit simple, double and multi-stage Monte Carlo test methods. For non-Gaussian distribution families involving nuisance parameters, confidence sets are derived for the the nuisance parameters and the error distribution. The procedures considered are evaluated in a small simulation experi-ment. Finally, the tests are applied to an asset pricing model with observable risk-free rates, using monthly returns on New York Stock Exchange (NYSE) portfolios over five-year subperiods from 1926-1995.
Resumo:
Dans une perspective d’analyse des risques pour la santé publique, l’estimation de l’exposition revêt une importance capitale. Parmi les approches existantes d’estimation de l’exposition, l’utilisation d’outils, tels que des questionnaires alimentaires, la modélisation toxicocinétique ou les reconstructions de doses, en complément de la surveillance biologique, permet de raffiner les estimations, et ainsi, de mieux caractériser les risques pour la santé. Ces différents outils et approches ont été développés et appliqués à deux substances d’intérêt, le méthylmercure et le sélénium en raison des effets toxiques bien connus du méthylmercure, de l’interaction entre le méthylmercure et le sélénium réduisant potentiellement ces effets toxiques, et de l’existence de sources communes via la consommation de poisson. Ainsi, l’objectif général de cette thèse consistait à produire des données cinétiques et comparatives manquantes pour la validation et l’interprétation d’approches et d’outils d’évaluation de l’exposition au méthylmercure et au sélénium. Pour ce faire, l’influence du choix de la méthode d’évaluation de l’exposition au méthylmercure a été déterminée en comparant les apports quotidiens et les risques pour la santé estimés par différentes approches (évaluation directe de l’exposition par la surveillance biologique combinée à la modélisation toxicocinétique ou évaluation indirecte par questionnaire alimentaire). D’importantes différences entre ces deux approches ont été observées : les apports quotidiens de méthylmercure estimés par questionnaires sont en moyenne six fois plus élevés que ceux estimés à l’aide de surveillance biologique et modélisation. Ces deux méthodes conduisent à une appréciation des risques pour la santé divergente puisqu’avec l’approche indirecte, les doses quotidiennes estimées de méthylmercure dépassent les normes de Santé Canada pour 21 des 23 volontaires, alors qu’avec l’approche directe, seulement 2 des 23 volontaires sont susceptibles de dépasser les normes. Ces différences pourraient être dues, entre autres, à des biais de mémoire et de désirabilité lors de la complétion des questionnaires. En outre, l’étude de la distribution du sélénium dans différentes matrices biologiques suite à une exposition non alimentaire (shampoing à forte teneur en sélénium) visait, d’une part, à étudier la cinétique du sélénium provenant de cette source d’exposition et, d’autre part, à évaluer la contribution de cette source à la charge corporelle totale. Un suivi des concentrations biologiques (sang, urine, cheveux et ongles) pendant une période de 18 mois chez des volontaires exposés à une source non alimentaire de sélénium a contribué à mieux expliciter les mécanismes de transfert du sélénium du site d’absorption vers le sang (concomitance des voies régulées et non régulées). Ceci a permis de montrer que, contrairement au méthylmercure, l’utilisation des cheveux comme biomarqueur peut mener à une surestimation importante de la charge corporelle réelle en sélénium en cas de non contrôle de facteurs confondants tels que l’utilisation de shampoing contenant du sélénium. Finalement, une analyse exhaustive des données de surveillance biologique du sélénium issues de 75 études publiées dans la littérature a permis de mieux comprendre la cinétique globale du sélénium dans l’organisme humain. En particulier, elle a permis le développement d’un outil reliant les apports quotidiens et les concentrations biologiques de sélénium dans les différentes matrices à l’aide d’algorithmes mathématiques. Conséquemment, à l’aide de ces données cinétiques exprimées par un système d’équations logarithmiques et de leur représentation graphique, il est possible d’estimer les apports quotidiens chez un individu à partir de divers prélèvements biologiques, et ainsi, de faciliter la comparaison d’études de surveillance biologique du sélénium utilisant des biomarqueurs différents. L’ensemble de ces résultats de recherche montre que la méthode choisie pour évaluer l’exposition a un impact important sur les estimations des risques associés. De plus, les recherches menées ont permis de mettre en évidence que le sélénium non alimentaire ne contribue pas de façon significative à la charge corporelle totale, mais constitue un facteur de confusion pour l’estimation de la charge corporelle réelle en sélénium. Finalement, la détermination des équations et des coefficients reliant les concentrations de sélénium entre différentes matrices biologiques, à l’aide d’une vaste base de données cinétiques, concourt à mieux interpréter les résultats de surveillance biologique.
Resumo:
Study on variable stars is an important topic of modern astrophysics. After the invention of powerful telescopes and high resolving powered CCD’s, the variable star data is accumulating in the order of peta-bytes. The huge amount of data need lot of automated methods as well as human experts. This thesis is devoted to the data analysis on variable star’s astronomical time series data and hence belong to the inter-disciplinary topic, Astrostatistics. For an observer on earth, stars that have a change in apparent brightness over time are called variable stars. The variation in brightness may be regular (periodic), quasi periodic (semi-periodic) or irregular manner (aperiodic) and are caused by various reasons. In some cases, the variation is due to some internal thermo-nuclear processes, which are generally known as intrinsic vari- ables and in some other cases, it is due to some external processes, like eclipse or rotation, which are known as extrinsic variables. Intrinsic variables can be further grouped into pulsating variables, eruptive variables and flare stars. Extrinsic variables are grouped into eclipsing binary stars and chromospheri- cal stars. Pulsating variables can again classified into Cepheid, RR Lyrae, RV Tauri, Delta Scuti, Mira etc. The eruptive or cataclysmic variables are novae, supernovae, etc., which rarely occurs and are not periodic phenomena. Most of the other variations are periodic in nature. Variable stars can be observed through many ways such as photometry, spectrophotometry and spectroscopy. The sequence of photometric observa- xiv tions on variable stars produces time series data, which contains time, magni- tude and error. The plot between variable star’s apparent magnitude and time are known as light curve. If the time series data is folded on a period, the plot between apparent magnitude and phase is known as phased light curve. The unique shape of phased light curve is a characteristic of each type of variable star. One way to identify the type of variable star and to classify them is by visually looking at the phased light curve by an expert. For last several years, automated algorithms are used to classify a group of variable stars, with the help of computers. Research on variable stars can be divided into different stages like observa- tion, data reduction, data analysis, modeling and classification. The modeling on variable stars helps to determine the short-term and long-term behaviour and to construct theoretical models (for eg:- Wilson-Devinney model for eclips- ing binaries) and to derive stellar properties like mass, radius, luminosity, tem- perature, internal and external structure, chemical composition and evolution. The classification requires the determination of the basic parameters like pe- riod, amplitude and phase and also some other derived parameters. Out of these, period is the most important parameter since the wrong periods can lead to sparse light curves and misleading information. Time series analysis is a method of applying mathematical and statistical tests to data, to quantify the variation, understand the nature of time-varying phenomena, to gain physical understanding of the system and to predict future behavior of the system. Astronomical time series usually suffer from unevenly spaced time instants, varying error conditions and possibility of big gaps. This is due to daily varying daylight and the weather conditions for ground based observations and observations from space may suffer from the impact of cosmic ray particles. Many large scale astronomical surveys such as MACHO, OGLE, EROS, xv ROTSE, PLANET, Hipparcos, MISAO, NSVS, ASAS, Pan-STARRS, Ke- pler,ESA, Gaia, LSST, CRTS provide variable star’s time series data, even though their primary intention is not variable star observation. Center for Astrostatistics, Pennsylvania State University is established to help the astro- nomical community with the aid of statistical tools for harvesting and analysing archival data. Most of these surveys releases the data to the public for further analysis. There exist many period search algorithms through astronomical time se- ries analysis, which can be classified into parametric (assume some underlying distribution for data) and non-parametric (do not assume any statistical model like Gaussian etc.,) methods. Many of the parametric methods are based on variations of discrete Fourier transforms like Generalised Lomb-Scargle peri- odogram (GLSP) by Zechmeister(2009), Significant Spectrum (SigSpec) by Reegen(2007) etc. Non-parametric methods include Phase Dispersion Minimi- sation (PDM) by Stellingwerf(1978) and Cubic spline method by Akerlof(1994) etc. Even though most of the methods can be brought under automation, any of the method stated above could not fully recover the true periods. The wrong detection of period can be due to several reasons such as power leakage to other frequencies which is due to finite total interval, finite sampling interval and finite amount of data. Another problem is aliasing, which is due to the influence of regular sampling. Also spurious periods appear due to long gaps and power flow to harmonic frequencies is an inherent problem of Fourier methods. Hence obtaining the exact period of variable star from it’s time series data is still a difficult problem, in case of huge databases, when subjected to automation. As Matthew Templeton, AAVSO, states “Variable star data analysis is not always straightforward; large-scale, automated analysis design is non-trivial”. Derekas et al. 2007, Deb et.al. 2010 states “The processing of xvi huge amount of data in these databases is quite challenging, even when looking at seemingly small issues such as period determination and classification”. It will be beneficial for the variable star astronomical community, if basic parameters, such as period, amplitude and phase are obtained more accurately, when huge time series databases are subjected to automation. In the present thesis work, the theories of four popular period search methods are studied, the strength and weakness of these methods are evaluated by applying it on two survey databases and finally a modified form of cubic spline method is intro- duced to confirm the exact period of variable star. For the classification of new variable stars discovered and entering them in the “General Catalogue of Vari- able Stars” or other databases like “Variable Star Index“, the characteristics of the variability has to be quantified in term of variable star parameters.
Resumo:
The paper will consist of three parts. In part I we shall present some background considerations which are necessary as a basis for what follows. We shall try to clarify some basic concepts and notions, and we shall collect the most important arguments (and related goals) in favour of problem solving, modelling and applications to other subjects in mathematics instruction. In the main part II we shall review the present state, recent trends, and prospective lines of development, both in empirical or theoretical research and in the practice of mathematics instruction and mathematics education, concerning problem solving, modelling, applications and relations to other subjects. In particular, we shall identify and discuss four major trends: a widened spectrum of arguments, an increased globality, an increased unification, and an extended use of computers. In the final part III we shall comment upon some important issues and problems related to our topic.
Resumo:
This paper aims at giving a concise survey of the present state-of-the-art of mathematical modelling in mathematics education and instruction. It will consist of four parts. In part 1, some basic concepts relevant to the topic will be clarified and, in particular, mathematical modelling will be defined in a broad, comprehensive sense. Part 2 will review arguments for the inclusion of modelling in mathematics teaching at schools and universities, and identify certain schools of thought within mathematics education. Part 3 will describe the role of modelling in present mathematics curricula and in everyday teaching practice. Some obstacles for mathematical modelling in the classroom will be analysed, as well as the opportunities and risks of computer usage. In part 4, selected materials and resources for teaching mathematical modelling, developed in the last few years in America, Australia and Europe, will be presented. The examples will demonstrate many promising directions of development.
Resumo:
The paper will consist of three parts. In part I we shall present some background considerations which are necessary as a basis for what follows. We shall try to clarify some basic concepts and notions, and we shall collect the most important arguments (and related goals) in favour of problem solving, modelling and applications to other subjects in mathematics instruction. In the main part II we shall review the present state, recent trends, and prospective lines of development, both in empirical or theoretical research and in the practice of mathematics instruction and mathematics education, concerning (applied) problem solving, modelling, applications and relations to other subjects. In particular, we shall identify and discuss four major trends: a widened spectrum of arguments, an increased globality, an increased unification, and an extended use of computers. In the final part III we shall comment upon some important issues and problems related to our topic.
Resumo:
In connection with the (revived) demand for considering applications in the teaching of mathematics, various schemata or lists of criteria have been developed since the end of the sixties, which set up requirements about closeness to the real world or about the type of mathematics being used, and which have made it possible to analyze the available applications in their light. After having stated the problem (in section 1), we present (in section 2) a sketch of some of the best known of these and of some earlier schemata, although we are not aiming for a complete picture. Then (in section 3) we distinguish among different dimensions.in the analysis of applications. With this as a basis, we develop (in section 4) our own suggestion for categorizing types of applications and conceptions for an application-oriented mathematics instruction. Then (in section 5) we illustrate our schemata by some examples of performed evaluations. Finally (in section 6), we present some preliminary first results of the analysis of teaching conceptions.
Resumo:
The Aitchison vector space structure for the simplex is generalized to a Hilbert space structure A2(P) for distributions and likelihoods on arbitrary spaces. Central notations of statistics, such as Information or Likelihood, can be identified in the algebraical structure of A2(P) and their corresponding notions in compositional data analysis, such as Aitchison distance or centered log ratio transform. In this way very elaborated aspects of mathematical statistics can be understood easily in the light of a simple vector space structure and of compositional data analysis. E.g. combination of statistical information such as Bayesian updating, combination of likelihood and robust M-estimation functions are simple additions/ perturbations in A2(Pprior). Weighting observations corresponds to a weighted addition of the corresponding evidence. Likelihood based statistics for general exponential families turns out to have a particularly easy interpretation in terms of A2(P). Regular exponential families form finite dimensional linear subspaces of A2(P) and they correspond to finite dimensional subspaces formed by their posterior in the dual information space A2(Pprior). The Aitchison norm can identified with mean Fisher information. The closing constant itself is identified with a generalization of the cummulant function and shown to be Kullback Leiblers directed information. Fisher information is the local geometry of the manifold induced by the A2(P) derivative of the Kullback Leibler information and the space A2(P) can therefore be seen as the tangential geometry of statistical inference at the distribution P. The discussion of A2(P) valued random variables, such as estimation functions or likelihoods, give a further interpretation of Fisher information as the expected squared norm of evidence and a scale free understanding of unbiased reasoning
Resumo:
The biplot has proved to be a powerful descriptive and analytical tool in many areas of applications of statistics. For compositional data the necessary theoretical adaptation has been provided, with illustrative applications, by Aitchison (1990) and Aitchison and Greenacre (2002). These papers were restricted to the interpretation of simple compositional data sets. In many situations the problem has to be described in some form of conditional modelling. For example, in a clinical trial where interest is in how patients’ steroid metabolite compositions may change as a result of different treatment regimes, interest is in relating the compositions after treatment to the compositions before treatment and the nature of the treatments applied. To study this through a biplot technique requires the development of some form of conditional compositional biplot. This is the purpose of this paper. We choose as a motivating application an analysis of the 1992 US President ial Election, where interest may be in how the three-part composition, the percentage division among the three candidates - Bush, Clinton and Perot - of the presidential vote in each state, depends on the ethnic composition and on the urban-rural composition of the state. The methodology of conditional compositional biplots is first developed and a detailed interpretation of the 1992 US Presidential Election provided. We use a second application involving the conditional variability of tektite mineral compositions with respect to major oxide compositions to demonstrate some hazards of simplistic interpretation of biplots. Finally we conjecture on further possible applications of conditional compositional biplots
Resumo:
Compositional random vectors are fundamental tools in the Bayesian analysis of categorical data. Many of the issues that are discussed with reference to the statistical analysis of compositional data have a natural counterpart in the construction of a Bayesian statistical model for categorical data. This note builds on the idea of cross-fertilization of the two areas recommended by Aitchison (1986) in his seminal book on compositional data. Particular emphasis is put on the problem of what parameterization to use
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
Precision of released figures is not only an important quality feature of official statistics, it is also essential for a good understanding of the data. In this paper we show a case study of how precision could be conveyed if the multivariate nature of data has to be taken into account. In the official release of the Swiss earnings structure survey, the total salary is broken down into several wage components. We follow Aitchison's approach for the analysis of compositional data, which is based on logratios of components. We first present diferent multivariate analyses of the compositional data whereby the wage components are broken down by economic activity classes. Then we propose a number of ways to assess precision
Resumo:
The application of Discriminant function analysis (DFA) is not a new idea in the study of tephrochrology. In this paper, DFA is applied to compositional datasets of two different types of tephras from Mountain Ruapehu in New Zealand and Mountain Rainier in USA. The canonical variables from the analysis are further investigated with a statistical methodology of change-point problems in order to gain a better understanding of the change in compositional pattern over time. Finally, a special case of segmented regression has been proposed to model both the time of change and the change in pattern. This model can be used to estimate the age for the unknown tephras using Bayesian statistical calibration
Resumo:
The composition of the labour force is an important economic factor for a country. Often the changes in proportions of different groups are of interest. I this paper we study a monthly compositional time series from the Swedish Labour Force Survey from 1994 to 2005. Three models are studied: the ILR-transformed series, the ILR-transformation of the compositional differenced series of order 1, and the ILRtransformation of the compositional differenced series of order 12. For each of the three models a VAR-model is fitted based on the data 1994-2003. We predict the time series 15 steps ahead and calculate 95 % prediction regions. The predictions of the three models are compared with actual values using MAD and MSE and the prediction regions are compared graphically in a ternary time series plot. We conclude that the first, and simplest, model possesses the best predictive power of the three models