941 resultados para Change points detection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scientific evidence on climate changes at global level has gained increasing interest in the scientific community in general. The impacts of climate change as well as anthropogenic actions may cause errors in hydro-agricultural projects existent in the watershed under study. This study aimed to identify the presence or absence of trend in total annual precipitation series of the watershed of the Mirim Lagoon, state of Rio Grande do Sul-RS / Brazil / Uruguay (Brazilian side) as well as to detect the period in which they occurred. For that, it was analyzed the precipitation data belonging to 14 weather stations. To detect the existence of monotonic trend and change points, it was used the nonparametric tests of Mann-Kendall and Mann-Whitney, the "t" test of Student for two samples of unpaired data (parametric), as well as the technique of progressive mean. The Weather Station 3152014 (Pelotas) presented changes in the trend in the series of annual precipitation in the period from 1953 to 2007. The methodologies that use subdivided series were more efficient in detecting change in trend when compared with the Mann-Kendall test, which uses the complete series (from 1921 to 2007).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tesis (Maestría en Ciencias con Orientación en Matemáticas) UANL, 2013.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider some non-homogeneous Poisson models to estimate the probability that an air quality standard is exceeded a given number of times in a time interval of interest. We assume that the number of exceedances occurs according to a non-homogeneous Poisson process (NHPP). This Poisson process has rate function lambda(t), t >= 0, which depends on some parameters that must be estimated. We take into account two cases of rate functions: the Weibull and the Goel-Okumoto. We consider models with and without change-points. When the presence of change-points is assumed, we may have the presence of either one, two or three change-points, depending of the data set. The parameters of the rate functions are estimated using a Gibbs sampling algorithm. Results are applied to ozone data provided by the Mexico City monitoring network. In a first instance, we assume that there are no change-points present. Depending on the adjustment of the model, we assume the presence of either one, two or three change-points. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Prepulse inhibition and facilitation of the blink reflex are said to reflect different responses elicited by the lead stimulus, transient detection and orienting response respectively. Two experiments investigated the effects of trial repetition and lead stimulus change on blink modification. It was hypothesized that these manipulations will affect orienting and thus blink facilitation to a greater extent than they will affect transient detection and thus blink inhibition. In Experiment 1 (N = 64), subjects were trained with a sequence of 12 lead stimulus and 12 blink stimulus alone presentations, and 24 lead stimulus-blink stimulus pairings. Lead interval was 120 ms for 12 of the trials and 2000 ms for the other 12. For half the subjects this sequence was followed by a change in pitch of the lead stimulus. In Experiment 2 (N = 64), subjects were trained with a sequence of 36 blink alone stimuli and 36 lead stimulus-blink stimulus pairings. The lead interval was 120 ms for half the subjects and 2000 ms for the other half. The pitch of the lead stimulus on prestimulus trials 31-33 was changed for half the subjects in each group. In both experiments, the amount of blink inhibition decreased during training whereas the amount of blink facilitation remained unchanged. Lead stimulus change had no effect on blink modification in either experiment although it resulted in enhanced skin conductance responses and greater heart rate deceleration in Experiment 2. The present results are not consistent with the notion that blink facilitation is linked to orienting whereas blink inhibition reflects a transient detection mechanism. (C) 1998 Elsevier Science B.V.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In this thesis, we consider Bayesian inference on the detection of variance change-point models with scale mixtures of normal (for short SMN) distributions. This class of distributions is symmetric and thick-tailed and includes as special cases: Gaussian, Student-t, contaminated normal, and slash distributions. The proposed models provide greater flexibility to analyze a lot of practical data, which often show heavy-tail and may not satisfy the normal assumption. As to the Bayesian analysis, we specify some prior distributions for the unknown parameters in the variance change-point models with the SMN distributions. Due to the complexity of the joint posterior distribution, we propose an efficient Gibbs-type with Metropolis- Hastings sampling algorithm for posterior Bayesian inference. Thereafter, following the idea of [1], we consider the problems of the single and multiple change-point detections. The performance of the proposed procedures is illustrated and analyzed by simulation studies. A real application to the closing price data of U.S. stock market has been analyzed for illustrative purposes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The electroencephalogram (EEG) is a physiological time series that measures electrical activity at different locations in the brain, and plays an important role in epilepsy research. Exploring the variance and/or volatility may yield insights for seizure prediction, seizure detection and seizure propagation/dynamics.^ Maximal Overlap Discrete Wavelet Transforms (MODWTs) and ARMA-GARCH models were used to determine variance and volatility characteristics of 66 channels for different states of an epileptic EEG – sleep, awake, sleep-to-awake and seizure. The wavelet variances, changes in wavelet variances and volatility half-lives for the four states were compared for possible differences between seizure and non-seizure channels.^ The half-lives of two of the three seizure channels were found to be shorter than all of the non-seizure channels, based on 95% CIs for the pre-seizure and awake signals. No discernible patterns were found the wavelet variances of the change points for the different signals. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In recent papers, Wied and his coauthors have introduced change-point procedures to detect and estimate structural breaks in the correlation between time series. To prove the asymptotic distribution of the test statistic and stopping time as well as the change-point estimation rate, they use an extended functional Delta method and assume nearly constant expectations and variances of the time series. In this thesis, we allow asymptotically infinitely many structural breaks in the means and variances of the time series. For this setting, we present test statistics and stopping times which are used to determine whether or not the correlation between two time series is and stays constant, respectively. Additionally, we consider estimates for change-points in the correlations. The employed nonparametric statistics depend on the means and variances. These (nuisance) parameters are replaced by estimates in the course of this thesis. We avoid assuming a fixed form of these estimates but rather we use "blackbox" estimates, i.e. we derive results under assumptions that these estimates fulfill. These results are supplement with examples. This thesis is organized in seven sections. In Section 1, we motivate the issue and present the mathematical model. In Section 2, we consider a posteriori and sequential testing procedures, and investigate convergence rates for change-point estimation, always assuming that the means and the variances of the time series are known. In the following sections, the assumptions of known means and variances are relaxed. In Section 3, we present the assumptions for the mean and variance estimates that we will use for the mean in Section 4, for the variance in Section 5, and for both parameters in Section 6. Finally, in Section 7, a simulation study illustrates the finite sample behaviors of some testing procedures and estimates.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Current procedures for flood risk estimation assume flood distributions are stationary over time, meaning annual maximum flood (AMF) series are not affected by climatic variation, land use/land cover (LULC) change, or management practices. Thus, changes in LULC and climate are generally not accounted for in policy and design related to flood risk/control, and historical flood events are deemed representative of future flood risk. These assumptions need to be re-evaluated, however, as climate change and anthropogenic activities have been observed to have large impacts on flood risk in many areas. In particular, understanding the effects of LULC change is essential to the study and understanding of global environmental change and the consequent hydrologic responses. The research presented herein provides possible causation for observed nonstationarity in AMF series with respect to changes in LULC, as well as a means to assess the degree to which future LULC change will impact flood risk. Four watersheds in the Midwest, Northeastern, and Central United States were studied to determine flood risk associated with historical and future projected LULC change. Historical single framed aerial images dating back to the mid-1950s were used along with Geographic Information Systems (GIS) and remote sensing models (SPRING and ERDAS) to create historical land use maps. The Forecasting Scenarios of Future Land Use Change (FORE-SCE) model was applied to generate future LULC maps annually from 2006 to 2100 for the conterminous U.S. based on the four IPCC-SRES future emission scenario conditions. These land use maps were input into previously calibrated Soil and Water Assessment Tool (SWAT) models for two case study watersheds. In order to isolate effects of LULC change, the only variable parameter was the Runoff Curve Number associated with the land use layer. All simulations were run with daily climate data from 1978-1999, consistent with the 'base' model which employed the 1992 NLCD to represent 'current' conditions. Output daily maximum flows were converted to instantaneous AMF series and were subsequently modeled using a Log-Pearson Type 3 (LP3) distribution to evaluate flood risk. Analysis of the progression of LULC change over the historic period and associated SWAT outputs revealed that AMF magnitudes tend to increase over time in response to increasing degrees of urbanization. This is consistent with positive trends in the AMF series identified in previous studies, although there are difficulties identifying correlations between LULC change and identified change points due to large time gaps in the generated historical LULC maps, mainly caused by unavailability of sufficient quality historic aerial imagery. Similarly, increases in the mean and median AMF magnitude were observed in response to future LULC change projections, with the tails of the distributions remaining reasonably constant. FORE-SCE scenario A2 was found to have the most dramatic impact on AMF series, consistent with more extreme projections of population growth, demands for growing energy sources, agricultural land, and urban expansion, while AMF outputs based on scenario B2 showed little changes for the future as the focus is on environmental conservation and regional solutions to environmental issues.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper compares the forecasting performance of different models which have been proposed for forecasting in the presence of structural breaks. These models differ in their treatment of the break process, the parameters defining the model which applies in each regime and the out-of-sample probability of a break occurring. In an extensive empirical evaluation involving many important macroeconomic time series, we demonstrate the presence of structural breaks and their importance for forecasting in the vast majority of cases. However, we find no single forecasting model consistently works best in the presence of structural breaks. In many cases, the formal modeling of the break process is important in achieving good forecast performance. However, there are also many cases where simple, rolling OLS forecasts perform well.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper compares the forecasting performance of different models which have been proposed for forecasting in the presence of structural breaks. These models differ in their treatment of the break process, the parameters defining the model which applies in each regime and the out-of-sample probability of a break occurring. In an extensive empirical evaluation involving many important macroeconomic time series, we demonstrate the presence of structural breaks and their importance for forecasting in the vast majority of cases. However, we find no single forecasting model consistently works best in the presence of structural breaks. In many cases, the formal modeling of the break process is important in achieving good forecast performance. However, there are also many cases where simple, rolling OLS forecasts perform well.