970 resultados para Change points
Resumo:
Tesis (Maestría en Ciencias con Orientación en Matemáticas) UANL, 2013.
Resumo:
The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models
Resumo:
In this paper, we consider some non-homogeneous Poisson models to estimate the probability that an air quality standard is exceeded a given number of times in a time interval of interest. We assume that the number of exceedances occurs according to a non-homogeneous Poisson process (NHPP). This Poisson process has rate function lambda(t), t >= 0, which depends on some parameters that must be estimated. We take into account two cases of rate functions: the Weibull and the Goel-Okumoto. We consider models with and without change-points. When the presence of change-points is assumed, we may have the presence of either one, two or three change-points, depending of the data set. The parameters of the rate functions are estimated using a Gibbs sampling algorithm. Results are applied to ozone data provided by the Mexico City monitoring network. In a first instance, we assume that there are no change-points present. Depending on the adjustment of the model, we assume the presence of either one, two or three change-points. Copyright (C) 2009 John Wiley & Sons, Ltd.
Resumo:
Monitoring unused or dark IP addresses offers opportunities to extract useful information about both on-going and new attack patterns. In recent years, different techniques have been used to analyze such traffic including sequential analysis where a change in traffic behavior, for example change in mean, is used as an indication of malicious activity. Change points themselves say little about detected change; further data processing is necessary for the extraction of useful information and to identify the exact cause of the detected change which is limited due to the size and nature of observed traffic. In this paper, we address the problem of analyzing a large volume of such traffic by correlating change points identified in different traffic parameters. The significance of the proposed technique is two-fold. Firstly, automatic extraction of information related to change points by correlating change points detected across multiple traffic parameters. Secondly, validation of the detected change point by the simultaneous presence of another change point in a different parameter. Using a real network trace collected from unused IP addresses, we demonstrate that the proposed technique enables us to not only validate the change point but also extract useful information about the causes of change points.
Resumo:
We combine Bayesian online change point detection with Gaussian processes to create a nonparametric time series model which can handle change points. The model can be used to locate change points in an online manner; and, unlike other Bayesian online change point detection algorithms, is applicable when temporal correlations in a regime are expected. We show three variations on how to apply Gaussian processes in the change point context, each with their own advantages. We present methods to reduce the computational burden of these models and demonstrate it on several real world data sets. Copyright 2010 by the author(s)/owner(s).
Resumo:
In recent papers, Wied and his coauthors have introduced change-point procedures to detect and estimate structural breaks in the correlation between time series. To prove the asymptotic distribution of the test statistic and stopping time as well as the change-point estimation rate, they use an extended functional Delta method and assume nearly constant expectations and variances of the time series. In this thesis, we allow asymptotically infinitely many structural breaks in the means and variances of the time series. For this setting, we present test statistics and stopping times which are used to determine whether or not the correlation between two time series is and stays constant, respectively. Additionally, we consider estimates for change-points in the correlations. The employed nonparametric statistics depend on the means and variances. These (nuisance) parameters are replaced by estimates in the course of this thesis. We avoid assuming a fixed form of these estimates but rather we use "blackbox" estimates, i.e. we derive results under assumptions that these estimates fulfill. These results are supplement with examples. This thesis is organized in seven sections. In Section 1, we motivate the issue and present the mathematical model. In Section 2, we consider a posteriori and sequential testing procedures, and investigate convergence rates for change-point estimation, always assuming that the means and the variances of the time series are known. In the following sections, the assumptions of known means and variances are relaxed. In Section 3, we present the assumptions for the mean and variance estimates that we will use for the mean in Section 4, for the variance in Section 5, and for both parameters in Section 6. Finally, in Section 7, a simulation study illustrates the finite sample behaviors of some testing procedures and estimates.
Resumo:
In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.
Resumo:
Current procedures for flood risk estimation assume flood distributions are stationary over time, meaning annual maximum flood (AMF) series are not affected by climatic variation, land use/land cover (LULC) change, or management practices. Thus, changes in LULC and climate are generally not accounted for in policy and design related to flood risk/control, and historical flood events are deemed representative of future flood risk. These assumptions need to be re-evaluated, however, as climate change and anthropogenic activities have been observed to have large impacts on flood risk in many areas. In particular, understanding the effects of LULC change is essential to the study and understanding of global environmental change and the consequent hydrologic responses. The research presented herein provides possible causation for observed nonstationarity in AMF series with respect to changes in LULC, as well as a means to assess the degree to which future LULC change will impact flood risk. Four watersheds in the Midwest, Northeastern, and Central United States were studied to determine flood risk associated with historical and future projected LULC change. Historical single framed aerial images dating back to the mid-1950s were used along with Geographic Information Systems (GIS) and remote sensing models (SPRING and ERDAS) to create historical land use maps. The Forecasting Scenarios of Future Land Use Change (FORE-SCE) model was applied to generate future LULC maps annually from 2006 to 2100 for the conterminous U.S. based on the four IPCC-SRES future emission scenario conditions. These land use maps were input into previously calibrated Soil and Water Assessment Tool (SWAT) models for two case study watersheds. In order to isolate effects of LULC change, the only variable parameter was the Runoff Curve Number associated with the land use layer. All simulations were run with daily climate data from 1978-1999, consistent with the 'base' model which employed the 1992 NLCD to represent 'current' conditions. Output daily maximum flows were converted to instantaneous AMF series and were subsequently modeled using a Log-Pearson Type 3 (LP3) distribution to evaluate flood risk. Analysis of the progression of LULC change over the historic period and associated SWAT outputs revealed that AMF magnitudes tend to increase over time in response to increasing degrees of urbanization. This is consistent with positive trends in the AMF series identified in previous studies, although there are difficulties identifying correlations between LULC change and identified change points due to large time gaps in the generated historical LULC maps, mainly caused by unavailability of sufficient quality historic aerial imagery. Similarly, increases in the mean and median AMF magnitude were observed in response to future LULC change projections, with the tails of the distributions remaining reasonably constant. FORE-SCE scenario A2 was found to have the most dramatic impact on AMF series, consistent with more extreme projections of population growth, demands for growing energy sources, agricultural land, and urban expansion, while AMF outputs based on scenario B2 showed little changes for the future as the focus is on environmental conservation and regional solutions to environmental issues.
Resumo:
Clinical trials have shown that weight reduction with lifestyles can delay or prevent diabetes and reduce blood pressure. An appropriate definition of obesity using anthropometric measures is useful in predicting diabetes and hypertension at the population level. However, there is debate on which of the measures of obesity is best or most strongly associated with diabetes and hypertension and on what are the optimal cut-off values for body mass index (BMI) and waist circumference (WC) in this regard. The aims of the study were 1) to compare the strength of the association for undiagnosed or newly diagnosed diabetes (or hypertension) with anthropometric measures of obesity in people of Asian origin, 2) to detect ethnic differences in the association of undiagnosed diabetes with obesity, 3) to identify ethnic- and sex-specific change point values of BMI and WC for changes in the prevalence of diabetes and 4) to evaluate the ethnic-specific WC cutoff values proposed by the International Diabetes Federation (IDF) in 2005 for central obesity. The study population comprised 28 435 men and 35 198 women, ≥ 25 years of age, from 39 cohorts participating in the DECODA and DECODE studies, including 5 Asian Indian (n = 13 537), 3 Mauritian Indian (n = 4505) and Mauritian Creole (n = 1075), 8 Chinese (n =10 801), 1 Filipino (n = 3841), 7 Japanese (n = 7934), 1 Mongolian (n = 1991), and 14 European (n = 20 979) studies. The prevalence of diabetes, hypertension and central obesity was estimated, using descriptive statistics, and the differences were determined with the χ2 test. The odds ratios (ORs) or coefficients (from the logistic model) and hazard ratios (HRs, from the Cox model to interval censored data) for BMI, WC, waist-to-hip ratio (WHR), and waist-to-stature ratio (WSR) were estimated for diabetes and hypertension. The differences between BMI and WC, WHR or WSR were compared, applying paired homogeneity tests (Wald statistics with 1 df). Hierarchical three-level Bayesian change point analysis, adjusting for age, was applied to identify the most likely cut-off/change point values for BMI and WC in association with previously undiagnosed diabetes. The ORs for diabetes in men (women) with BMI, WC, WHR and WSR were 1.52 (1.59), 1.54 (1.70), 1.53 (1.50) and 1.62 (1.70), respectively and the corresponding ORs for hypertension were 1.68 (1.55), 1.66 (1.51), 1.45 (1.28) and 1.63 (1.50). For diabetes the OR for BMI did not differ from that for WC or WHR, but was lower than that for WSR (p = 0.001) in men while in women the ORs were higher for WC and WSR than for BMI (both p < 0.05). Hypertension was more strongly associated with BMI than with WHR in men (p < 0.001) and most strongly with BMI than with WHR (p < 0.001), WSR (p < 0.01) and WC (p < 0.05) in women. The HRs for incidence of diabetes and hypertension did not differ between BMI and the other three central obesity measures in Mauritian Indians and Mauritian Creoles during follow-ups of 5, 6 and 11 years. The prevalence of diabetes was highest in Asian Indians, lowest in Europeans and intermediate in others, given the same BMI or WC category. The coefficients for diabetes in BMI (kg/m2) were (men/women): 0.34/0.28, 0.41/0.43, 0.42/0.61, 0.36/0.59 and 0.33/0.49 for Asian Indian, Chinese, Japanese, Mauritian Indian and European (overall homogeneity test: p > 0.05 in men and p < 0.001 in women). Similar results were obtained in WC (cm). Asian Indian women had lower coefficients than women of other ethnicities. The change points for BMI were 29.5, 25.6, 24.0, 24.0 and 21.5 in men and 29.4, 25.2, 24.9, 25.3 and 22.5 (kg/m2) in women of European, Chinese, Mauritian Indian, Japanese, and Asian Indian descent. The change points for WC were 100, 85, 79 and 82 cm in men and 91, 82, 82 and 76 cm in women of European, Chinese, Mauritian Indian, and Asian Indian. The prevalence of central obesity using the 2005 IDF definition was higher in Japanese men but lower in Japanese women than in their Asian counterparts. The prevalence of central obesity was 52 times higher in Japanese men but 0.8 times lower in Japanese women compared to the National Cholesterol Education Programme definition. The findings suggest that both BMI and WC predicted diabetes and hypertension equally well in all ethnic groups. At the same BMI or WC level, the prevalence of diabetes was highest in Asian Indians, lowest in Europeans and intermediate in others. Ethnic- and sex-specific change points of BMI and WC should be considered in setting diagnostic criteria for obesity to detect undiagnosed or newly diagnosed diabetes.
Analysis of deformation behavior and workability of advanced 9Cr-Nb-V ferritic heat resistant steels
Resumo:
Hot compression tests were carried out on 9Cr–Nb–V heat resistant steels in the temperature range of 600–1200 °C and the strain rate range of 10−2–100 s−1 to study their deformation characteristics. The full recrystallization temperature and the carbon-free bainite phase transformation temperature were determined by the slope-change points in the curve of mean flow stress versus the inverse of temperature. The parameters of the constitutive equation for the experimental steels were calculated, including the stress exponent and the activation energy. The lower carbon content in steel would increase the fraction of precipitates by increasing the volume of dynamic strain-induced (DSIT) ferrite during deformation. The ln(εc) versus ln(Z) and the ln(σc) versus ln(Z) plots for both steels have similar trends. The efficiency of power dissipation maps with instability maps merged together show excellent workability from the strain of 0.05 to 0.6. The microstructure of the experimental steels was fully recrystallized upon deformation at low Z value owing to the dynamic recrystallization (DRX), and exhibited a necklace structure under the condition of 1050 °C/0.1 s−1 due to the suppression of the secondary flow of DRX. However, there were barely any DRX grains but elongated pancake grains under the condition of 1000 °C/1 s−1 because of the suppression of the metadynamic recrystallization (MDRX).
Resumo:
Seasonal variations in the stable isotopic composition of snow and meltwater were investigated in a sub-arctic, mountainous, but non-glacial, catchment at Okstindan in northern Norway based on analyses of delta(18)O and deltaD. Samples were collected during four field periods (August 1998; April 1999; June 1999 and August 1999) at three sites lying on an altitudinal transect (740-970 m a.s.l.). Snowpack data display an increase in the mean values of delta(18)O (increasing from a mean value of - 13.51 to - 11.49% between April and August), as well as a decrease in variability through the melt period. Comparison with a regional meteoric water line indicates that the slope of the delta(18)O - deltaD line for the snowpacks decreases over the same period, dropping from 7.49 to approximately 6.2. This change points to the role of evaporation in snowpack ablation and is confirmed by the vertical profile of deuterium excess. Snowpack seepage data, although limited, also suggest reduced values of deltaD, as might be associated with local evaporation during meltwater generation. In general, meltwaters were depleted in delta(18)O relative to the source snowpack at the peak of the melt (June), but later in the year (August) the difference between the two was not statistically significant. The diurnal pattern of isotopic composition indicates that the most depleted meltwaters coincide with the peak in temperature and, hence, meltwater production.
Resumo:
Die Nichtlineare Zeitreihenanalyse konnte in den letzten Jahren im Rahmen von Laborexperimenten ihre prinzipelle Brauchbarkeit beweisen. Allerdings handelte es sich in der Regel um ausgewählte oder speziell konstruierte nichtlineare Systeme. Sieht man einmal von der Überwachung von Prozessen und Produkten ab, so sind Anwendungen auf konkrete, vorgegebene dynamische Probleme im industriellen Bereich kaum bekannt geworden. Ziel dieser Arbeit war es, an Hand von zwei Problemen aus der technischen Praxis zu untersuchen, ob die Anwendung des kanonischen Schemas der Nichtlinearen Zeitreihenanalyse auch dort zu brauchbaren Resultaten führt oder ob Modifikationen (Vereinfachungen oder Erweiterungen) notwendig werden. Am Beispiel der Herstellung von optischen Oberflächen durch Hochpräzisionsdrehbearbeitung konnte gezeigt werden, daß eine aktive Störungskompensation in Echtzeit mit einem speziell entwickelten nichtlinearen Vorhersagealgorithmus möglich ist. Standardverfahren der Nichtlinearen Zeitreihenanalyse beschreiten hier den allgemeinen, aber sehr aufwendigen Weg über eine möglichst vollständige Phasenraumrekonstruktion. Das neue Verfahren verzichtet auf viele der kanonischen Zwischenschritte. Dies führt zu einererheblichen Rechenzeitersparnis und zusätzlich zu einer wesentlich höheren Stabilität gegenüber additivem Meßrauschen. Mit den berechneten Vorhersagen der unerwünschten Maschinenschwingungen wurde eine Störungskompensation realisiert, die die Oberflächengüte des bearbeiteten Werkstücks um 20-30% verbesserte. Das zweite Beispiel betrifft die Klassifikation von Körperschallsignalen, die zur Überwachung von Zerspansprozessen gemessen werden. Diese Signale zeigen, wie auch viele andere Prozesse im Bereich der Produktion, ein hochgradig nichtstationäres Verhalten. Hier versagen die Standardverfahren der Nichtlinearen Datenanalyse, die FT- bzw. AAFT-Surrogate benutzen. Daher wurde eine neue Klasse von Surrogatdaten zum Testen der Nullhypothese nichtstationärer linearer stochastischer Prozesse entwickelt, die in der Lage ist, zwischen deterministischen nichtlinear chaotischen und stochastischen linearen nichtstationären Zeitreihen mit change points zu unterscheiden. Damit konnte gezeigt werden, daß die untersuchten Köperschallsignale sich statistisch signifikant einer nichtstationären stochastischen Folge von einfachen linearen Prozessen zuordnen lassen und eine Interpretation als nichtlineare chaotische Zeitreihe nicht erforderlich ist.
Resumo:
Changepoint analysis is a well established area of statistical research, but in the context of spatio-temporal point processes it is as yet relatively unexplored. Some substantial differences with regard to standard changepoint analysis have to be taken into account: firstly, at every time point the datum is an irregular pattern of points; secondly, in real situations issues of spatial dependence between points and temporal dependence within time segments raise. Our motivating example consists of data concerning the monitoring and recovery of radioactive particles from Sandside beach, North of Scotland; there have been two major changes in the equipment used to detect the particles, representing known potential changepoints in the number of retrieved particles. In addition, offshore particle retrieval campaigns are believed may reduce the particle intensity onshore with an unknown temporal lag; in this latter case, the problem concerns multiple unknown changepoints. We therefore propose a Bayesian approach for detecting multiple changepoints in the intensity function of a spatio-temporal point process, allowing for spatial and temporal dependence within segments. We use Log-Gaussian Cox Processes, a very flexible class of models suitable for environmental applications that can be implemented using integrated nested Laplace approximation (INLA), a computationally efficient alternative to Monte Carlo Markov Chain methods for approximating the posterior distribution of the parameters. Once the posterior curve is obtained, we propose a few methods for detecting significant change points. We present a simulation study, which consists in generating spatio-temporal point pattern series under several scenarios; the performance of the methods is assessed in terms of type I and II errors, detected changepoint locations and accuracy of the segment intensity estimates. We finally apply the above methods to the motivating dataset and find good and sensible results about the presence and quality of changes in the process.
Resumo:
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm. Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data. Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.