972 resultados para Change-point
Resumo:
The application of Discriminant function analysis (DFA) is not a new idea in the studyof tephrochrology. In this paper, DFA is applied to compositional datasets of twodifferent types of tephras from Mountain Ruapehu in New Zealand and MountainRainier in USA. The canonical variables from the analysis are further investigated witha statistical methodology of change-point problems in order to gain a betterunderstanding of the change in compositional pattern over time. Finally, a special caseof segmented regression has been proposed to model both the time of change and thechange in pattern. This model can be used to estimate the age for the unknown tephrasusing Bayesian statistical calibration
Resumo:
The application of Discriminant function analysis (DFA) is not a new idea in the study of tephrochrology. In this paper, DFA is applied to compositional datasets of two different types of tephras from Mountain Ruapehu in New Zealand and Mountain Rainier in USA. The canonical variables from the analysis are further investigated with a statistical methodology of change-point problems in order to gain a better understanding of the change in compositional pattern over time. Finally, a special case of segmented regression has been proposed to model both the time of change and the change in pattern. This model can be used to estimate the age for the unknown tephras using Bayesian statistical calibration
Resumo:
In many applications of lifetime data analysis, it is important to perform inferences about the change-point of the hazard function. The change-point could be a maximum for unimodal hazard functions or a minimum for bathtub forms of hazard functions and is usually of great interest in medical or industrial applications. For lifetime distributions where this change-point of the hazard function can be analytically calculated, its maximum likelihood estimator is easily obtained from the invariance properties of the maximum likelihood estimators. From the asymptotical normality of the maximum likelihood estimators, confidence intervals can also be obtained. Considering the exponentiated Weibull distribution for the lifetime data, we have different forms for the hazard function: constant, increasing, unimodal, decreasing or bathtub forms. This model gives great flexibility of fit, but we do not have analytic expressions for the change-point of the hazard function. In this way, we consider the use of Markov Chain Monte Carlo methods to get posterior summaries for the change-point of the hazard function considering the exponentiated Weibull distribution.
Resumo:
In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.
Resumo:
This paper studies the change-point problem for a general parametric, univariate or multivariate family of distributions. An information theoretic procedure is developed which is based on general divergence measures for testing the hypothesis of the existence of a change. For comparing the exact sizes of the new test-statistic using the criterion proposed in Dale (J R Stat Soc B 48–59, 1986), a simulation study is performed for the special case of exponentially distributed random variables. A complete study of powers of the test-statistics and their corresponding relative local efficiencies, is also considered.
Resumo:
Peer reviewed
Resumo:
In recent papers, Wied and his coauthors have introduced change-point procedures to detect and estimate structural breaks in the correlation between time series. To prove the asymptotic distribution of the test statistic and stopping time as well as the change-point estimation rate, they use an extended functional Delta method and assume nearly constant expectations and variances of the time series. In this thesis, we allow asymptotically infinitely many structural breaks in the means and variances of the time series. For this setting, we present test statistics and stopping times which are used to determine whether or not the correlation between two time series is and stays constant, respectively. Additionally, we consider estimates for change-points in the correlations. The employed nonparametric statistics depend on the means and variances. These (nuisance) parameters are replaced by estimates in the course of this thesis. We avoid assuming a fixed form of these estimates but rather we use "blackbox" estimates, i.e. we derive results under assumptions that these estimates fulfill. These results are supplement with examples. This thesis is organized in seven sections. In Section 1, we motivate the issue and present the mathematical model. In Section 2, we consider a posteriori and sequential testing procedures, and investigate convergence rates for change-point estimation, always assuming that the means and the variances of the time series are known. In the following sections, the assumptions of known means and variances are relaxed. In Section 3, we present the assumptions for the mean and variance estimates that we will use for the mean in Section 4, for the variance in Section 5, and for both parameters in Section 6. Finally, in Section 7, a simulation study illustrates the finite sample behaviors of some testing procedures and estimates.
Resumo:
In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models
Resumo:
The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
This work assessed homogeneity of the Institute of Astronomy, Geophysics and Atmospheric Sciences (IAG) weather station climate series, using various statistical techniques. The record from this target station is one of the longest in Brazil, having commenced in 1933 with observations of precipitation, and temperatures and other variables later in 1936. Thus, it is one of the few stations in Brazil with enough data for long-term climate variability and climate change studies. There is, however, a possibility that its data may have been contaminated by some artifacts over time. Admittedly, there was an intervention on the observations in 1958, with the replacement of instruments, for which the size of impact has not been yet evaluated. The station transformed in the course of time from rural to urban, and this may also have influenced homogeneity of the observations and makes the station less representative for climate studies over larger spatial scales. Homogeneity of the target station was assessed applying both absolute, or single station tests, and tests relatively to regional climate, in annual scale, regarding daily precipitation, relative humidity, maximum (TMax), minimum (TMin), and wet bulb temperatures. Among these quantities, only precipitation does not exhibit any inhomogeneity. A clear signal of change of instruments in 1958 was detected in the TMax and relative humidity data, the latter certainly because of its strong dependence on temperature. This signal is not very clear in TMin, but it presents non-climatic discontinuities around 1953 and around 1970. A significant homogeneity break is found around 1990 for TMax and wet bulb temperature. The discontinuities detected after 1958 may have been caused by urbanization, as the observed warming trend in the station is considerably greater than that corresponding to regional climate.
Resumo:
Farmers in Africa are facing climate change and challenging rural livelihoods while maintaining agricultural systems that are not resilient. By 2050 the mean estimates of production of key staple crops in Africa such as maize, sorghum, millet, groundnut, and cassava are expected to decrease by between 8 and 22 percent (Schlenker and Lobell 2010). In Kenya, although projections of rainfall do not show dramatic decreases, the distribution of impacts is clearly negative for most crops. As increases in temperature will lead to increases in evapotranspiration, a potential increase in rainfall in Kenya may not offset the expected increases in agricultural water needs (Herrero et al. 2010). In order to respond to these present and future challenges, potential mitigation and adaptation options have been developed. However, implementation is not evident. In addition to their benefits in either mitigating or reducing the vulnerability of climate change effects, many of these options do not have economic costs and even provide economic benefits (e.g. savings in the consumption of energy or natural resources). Nevertheless, it is demonstrated that even when there are no biophysical, technological or economic constraints and despite their potential benefits from either the economic or environmental climate change point of view, not all farmers are willing to adopt these measures. This reflects the key role that behavioural barriers can play in the uptake of mitigation and adaptation measures.
Resumo:
We demonstrate that the process of generating smooth transitions Call be viewed as a natural result of the filtering operations implied in the generation of discrete-time series observations from the sampling of data from an underlying continuous time process that has undergone a process of structural change. In order to focus discussion, we utilize the problem of estimating the location of abrupt shifts in some simple time series models. This approach will permit its to address salient issues relating to distortions induced by the inherent aggregation associated with discrete-time sampling of continuous time processes experiencing structural change, We also address the issue of how time irreversible structures may be generated within the smooth transition processes. (c) 2005 Elsevier Inc. All rights reserved.