984 resultados para Change-points


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristicsin a text that are rarely controlled by the author, with those in other texts. When thegoal is to settle authorship questions, these characteristics should relate to the author’s style andnot to the genre, epoch or editor, and they should be such that their variation between authors islarger than the variation within comparable texts from the same author.For an overview of the literature on stylometry and some of the techniques involved, see for exampleMosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) orLebart, Salem and Berry (1998).Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be“the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writterslike Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translatedseveral times into Spanish, Italian and French, with modern English translations by Rosenthal(1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465,but it was not printed until 1490.There is an intense and long lasting debate around its authorship sprouting from its first edition,where its introduction states that the whole book is the work of Martorell (1413?-1468), while atthe end it is stated that the last one fourth of the book is by Galba (?-1490), after the death ofMartorell. Some of the authors that support the theory of single authorship are Riquer (1990),Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer(1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990).Neither of the two candidate authors left any text comparable to the one under study, and thereforediscriminant analysis can not be used to help classify chapters by author. By using sample textsencompassing about ten percent of the book, and looking at word length and at the use of 44conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that mightindicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba andGinebra (2000) estimates that stylistic boundary to be near chapter 383.Following the lead of the extensive literature, this paper looks into word length, the use of the mostfrequent words and into the use of vowels in each chapter of the book. Given that the featuresselected are categorical, that leads to three contingency tables of ordered rows and therefore tothree sequences of multinomial observations.Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3describes the problem of the estimation of a suden change-point in those sequences, in the followingsections we propose various ways to estimate change-points in multinomial sequences; the methodin section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma modelsonto the sequence of Chi-square distances between each row profiles and the average profile, theone in Section 6 fits models onto the sequence of values taken by the first component of thecorrespondence analysis as well as onto sequences of other summary measures like the averageword length. In Section 7 we fit models onto the marginal binomial sequences to identify thefeatures that distinguish the chapters before and after that boundary. Most methods rely heavilyon the use of generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The extension of traditional data mining methods to time series has been effectively applied to a wide range of domains such as finance, econometrics, biology, security, and medicine. Many existing mining methods deal with the task of change points detection, but very few provide a flexible approach. Querying specific change points with linguistic variables is particularly useful in crime analysis, where intuitive, understandable, and appropriate detection of changes can significantly improve the allocation of resources for timely and concise operations. In this paper, we propose an on-line method for detecting and querying change points in crime-related time series with the use of a meaningful representation and a fuzzy inference system. Change points detection is based on a shape space representation, and linguistic terms describing geometric properties of the change points are used to express queries, offering the advantage of intuitiveness and flexibility. An empirical evaluation is first conducted on a crime data set to confirm the validity of the proposed method and then on a financial data set to test its general applicability. A comparison to a similar change-point detection algorithm and a sensitivity analysis are also conducted. Results show that the method is able to accurately detect change points at very low computational costs. More broadly, the detection of specific change points within time series of virtually any domain is made more intuitive and more understandable, even for experts not related to data mining.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Scientific evidence on climate changes at global level has gained increasing interest in the scientific community in general. The impacts of climate change as well as anthropogenic actions may cause errors in hydro-agricultural projects existent in the watershed under study. This study aimed to identify the presence or absence of trend in total annual precipitation series of the watershed of the Mirim Lagoon, state of Rio Grande do Sul-RS / Brazil / Uruguay (Brazilian side) as well as to detect the period in which they occurred. For that, it was analyzed the precipitation data belonging to 14 weather stations. To detect the existence of monotonic trend and change points, it was used the nonparametric tests of Mann-Kendall and Mann-Whitney, the "t" test of Student for two samples of unpaired data (parametric), as well as the technique of progressive mean. The Weather Station 3152014 (Pelotas) presented changes in the trend in the series of annual precipitation in the period from 1953 to 2007. The methodologies that use subdivided series were more efficient in detecting change in trend when compared with the Mann-Kendall test, which uses the complete series (from 1921 to 2007).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Tesis (Maestría en Ciencias con Orientación en Matemáticas) UANL, 2013.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The statistical analysis of literary style is the part of stylometry that compares measurable characteristics in a text that are rarely controlled by the author, with those in other texts. When the goal is to settle authorship questions, these characteristics should relate to the author’s style and not to the genre, epoch or editor, and they should be such that their variation between authors is larger than the variation within comparable texts from the same author. For an overview of the literature on stylometry and some of the techniques involved, see for example Mosteller and Wallace (1964, 82), Herdan (1964), Morton (1978), Holmes (1985), Oakes (1998) or Lebart, Salem and Berry (1998). Tirant lo Blanc, a chivalry book, is the main work in catalan literature and it was hailed to be “the best book of its kind in the world” by Cervantes in Don Quixote. Considered by writters like Vargas Llosa or Damaso Alonso to be the first modern novel in Europe, it has been translated several times into Spanish, Italian and French, with modern English translations by Rosenthal (1996) and La Fontaine (1993). The main body of this book was written between 1460 and 1465, but it was not printed until 1490. There is an intense and long lasting debate around its authorship sprouting from its first edition, where its introduction states that the whole book is the work of Martorell (1413?-1468), while at the end it is stated that the last one fourth of the book is by Galba (?-1490), after the death of Martorell. Some of the authors that support the theory of single authorship are Riquer (1990), Chiner (1993) and Badia (1993), while some of those supporting the double authorship are Riquer (1947), Coromines (1956) and Ferrando (1995). For an overview of this debate, see Riquer (1990). Neither of the two candidate authors left any text comparable to the one under study, and therefore discriminant analysis can not be used to help classify chapters by author. By using sample texts encompassing about ten percent of the book, and looking at word length and at the use of 44 conjunctions, prepositions and articles, Ginebra and Cabos (1998) detect heterogeneities that might indicate the existence of two authors. By analyzing the diversity of the vocabulary, Riba and Ginebra (2000) estimates that stylistic boundary to be near chapter 383. Following the lead of the extensive literature, this paper looks into word length, the use of the most frequent words and into the use of vowels in each chapter of the book. Given that the features selected are categorical, that leads to three contingency tables of ordered rows and therefore to three sequences of multinomial observations. Section 2 explores these sequences graphically, observing a clear shift in their distribution. Section 3 describes the problem of the estimation of a suden change-point in those sequences, in the following sections we propose various ways to estimate change-points in multinomial sequences; the method in section 4 involves fitting models for polytomous data, the one in Section 5 fits gamma models onto the sequence of Chi-square distances between each row profiles and the average profile, the one in Section 6 fits models onto the sequence of values taken by the first component of the correspondence analysis as well as onto sequences of other summary measures like the average word length. In Section 7 we fit models onto the marginal binomial sequences to identify the features that distinguish the chapters before and after that boundary. Most methods rely heavily on the use of generalized linear models

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this paper, we consider some non-homogeneous Poisson models to estimate the probability that an air quality standard is exceeded a given number of times in a time interval of interest. We assume that the number of exceedances occurs according to a non-homogeneous Poisson process (NHPP). This Poisson process has rate function lambda(t), t >= 0, which depends on some parameters that must be estimated. We take into account two cases of rate functions: the Weibull and the Goel-Okumoto. We consider models with and without change-points. When the presence of change-points is assumed, we may have the presence of either one, two or three change-points, depending of the data set. The parameters of the rate functions are estimated using a Gibbs sampling algorithm. Results are applied to ozone data provided by the Mexico City monitoring network. In a first instance, we assume that there are no change-points present. Depending on the adjustment of the model, we assume the presence of either one, two or three change-points. Copyright (C) 2009 John Wiley & Sons, Ltd.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In recent papers, Wied and his coauthors have introduced change-point procedures to detect and estimate structural breaks in the correlation between time series. To prove the asymptotic distribution of the test statistic and stopping time as well as the change-point estimation rate, they use an extended functional Delta method and assume nearly constant expectations and variances of the time series. In this thesis, we allow asymptotically infinitely many structural breaks in the means and variances of the time series. For this setting, we present test statistics and stopping times which are used to determine whether or not the correlation between two time series is and stays constant, respectively. Additionally, we consider estimates for change-points in the correlations. The employed nonparametric statistics depend on the means and variances. These (nuisance) parameters are replaced by estimates in the course of this thesis. We avoid assuming a fixed form of these estimates but rather we use "blackbox" estimates, i.e. we derive results under assumptions that these estimates fulfill. These results are supplement with examples. This thesis is organized in seven sections. In Section 1, we motivate the issue and present the mathematical model. In Section 2, we consider a posteriori and sequential testing procedures, and investigate convergence rates for change-point estimation, always assuming that the means and the variances of the time series are known. In the following sections, the assumptions of known means and variances are relaxed. In Section 3, we present the assumptions for the mean and variance estimates that we will use for the mean in Section 4, for the variance in Section 5, and for both parameters in Section 6. Finally, in Section 7, a simulation study illustrates the finite sample behaviors of some testing procedures and estimates.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

In this study, the Schwarz Information Criterion (SIC) is applied in order to detect change-points in the time series of surface water quality variables. The application of change-point analysis allowed detecting change-points in both the mean and the variance in series under study. Time variations in environmental data are complex and they can hinder the identification of the so-called change-points when traditional models are applied to this type of problems. The assumptions of normality and uncorrelation are not present in some time series, and so, a simulation study is carried out in order to evaluate the methodology’s performance when applied to non-normal data and/or with time correlation.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Current procedures for flood risk estimation assume flood distributions are stationary over time, meaning annual maximum flood (AMF) series are not affected by climatic variation, land use/land cover (LULC) change, or management practices. Thus, changes in LULC and climate are generally not accounted for in policy and design related to flood risk/control, and historical flood events are deemed representative of future flood risk. These assumptions need to be re-evaluated, however, as climate change and anthropogenic activities have been observed to have large impacts on flood risk in many areas. In particular, understanding the effects of LULC change is essential to the study and understanding of global environmental change and the consequent hydrologic responses. The research presented herein provides possible causation for observed nonstationarity in AMF series with respect to changes in LULC, as well as a means to assess the degree to which future LULC change will impact flood risk. Four watersheds in the Midwest, Northeastern, and Central United States were studied to determine flood risk associated with historical and future projected LULC change. Historical single framed aerial images dating back to the mid-1950s were used along with Geographic Information Systems (GIS) and remote sensing models (SPRING and ERDAS) to create historical land use maps. The Forecasting Scenarios of Future Land Use Change (FORE-SCE) model was applied to generate future LULC maps annually from 2006 to 2100 for the conterminous U.S. based on the four IPCC-SRES future emission scenario conditions. These land use maps were input into previously calibrated Soil and Water Assessment Tool (SWAT) models for two case study watersheds. In order to isolate effects of LULC change, the only variable parameter was the Runoff Curve Number associated with the land use layer. All simulations were run with daily climate data from 1978-1999, consistent with the 'base' model which employed the 1992 NLCD to represent 'current' conditions. Output daily maximum flows were converted to instantaneous AMF series and were subsequently modeled using a Log-Pearson Type 3 (LP3) distribution to evaluate flood risk. Analysis of the progression of LULC change over the historic period and associated SWAT outputs revealed that AMF magnitudes tend to increase over time in response to increasing degrees of urbanization. This is consistent with positive trends in the AMF series identified in previous studies, although there are difficulties identifying correlations between LULC change and identified change points due to large time gaps in the generated historical LULC maps, mainly caused by unavailability of sufficient quality historic aerial imagery. Similarly, increases in the mean and median AMF magnitude were observed in response to future LULC change projections, with the tails of the distributions remaining reasonably constant. FORE-SCE scenario A2 was found to have the most dramatic impact on AMF series, consistent with more extreme projections of population growth, demands for growing energy sources, agricultural land, and urban expansion, while AMF outputs based on scenario B2 showed little changes for the future as the focus is on environmental conservation and regional solutions to environmental issues.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper compares the forecasting performance of different models which have been proposed for forecasting in the presence of structural breaks. These models differ in their treatment of the break process, the parameters defining the model which applies in each regime and the out-of-sample probability of a break occurring. In an extensive empirical evaluation involving many important macroeconomic time series, we demonstrate the presence of structural breaks and their importance for forecasting in the vast majority of cases. However, we find no single forecasting model consistently works best in the presence of structural breaks. In many cases, the formal modeling of the break process is important in achieving good forecast performance. However, there are also many cases where simple, rolling OLS forecasts perform well.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper compares the forecasting performance of different models which have been proposed for forecasting in the presence of structural breaks. These models differ in their treatment of the break process, the parameters defining the model which applies in each regime and the out-of-sample probability of a break occurring. In an extensive empirical evaluation involving many important macroeconomic time series, we demonstrate the presence of structural breaks and their importance for forecasting in the vast majority of cases. However, we find no single forecasting model consistently works best in the presence of structural breaks. In many cases, the formal modeling of the break process is important in achieving good forecast performance. However, there are also many cases where simple, rolling OLS forecasts perform well.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Background: Cardiovascular diseases are ranked among the leading causes of death in the industrialized countries. This study is aimed at ascertaining the mortality trends by ischemic heart disease (IHD) and cerebrovascular diseases (CVD) in Andalusia within the 1975-2004 period. Method: Based on the official IHD and CVD death statistics and the related populations, the gross rates (GR) and age-adjusted rates (TS) and the Potential Years of Life Lost (PYLL) were calculated. To quantify the trends and their change points, a joinpoint regression analysis was made. Results: The number of IHD deaths for females rose from 2,086 deaths in 1975 to 3,336 in 2004, the TS having dropped from 74.29 to 50.94 deaths/100,000 females, the PYLL having dropped from 173.65 years to 90.56 years/100,000 females. The number of deaths for males rose from 2,854 deaths in 1975 to 4,085 in 2004, the TS having dropped from 147, 67 to 104.96 deaths /100,000 males. The PYLL showed a like behaviour from the first to the last year of the series, showing values of 716.46 and 460.04 years / 100,000 males. For the IHD in females, the number of deaths in absolute numbers dropped from 4,712 to 4,221, the TS having dropped from 166.00 to 62.08 deaths in females, and the PYLL from 338.08 to 87.63 years / 100,000 females. For males, the number of deaths dropped from 3,714 to 2,951, the TS from 206.88 deaths /100,000 males in 1975 to 76.12 /100,000 males in 2004, and the PYLL dropping from 533.12 to 182.38 years / 100,000 males. Conclusions: The trend in mortality due to IHD was not constant either among females or males, although it has always been a downward trend, the drop being statistically significant. The drop in the CVD has been such a major one that both the absolute numbers and the gross rates are lower for the most recent years that the first years in the series studied despite the aging of Andalusia’s population.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aquest projecte es basa en l'aplicació de models de simulació de processos a un exemple d'empresa de producció i la seva adaptació en dimensió i recursos al mercat en un entorn de poca capacitat d'inversió i finançament, de forma que l'entorn de simulació digital aporti valor a la presa de decisions emmarcada en l'estratègia de l'empresa en cada escenari de mercat en que aquesta es trobi. Es realitza el treball sobre el cas d'una empresa, INNOVANAUTIC, dedicada a la innovació, desenvolupament i producció de sistemes de propulsió d'embarcacions. La simulació es una tècnica que permet optimitzar els processos, representant-ne i comprovant el funcionament dels processos, tant entorns físics, de producció com dels serveis associats o subcontractacions de diferents processos i els seus impactes en la disponibilitat de recursos, espais i terminis d'entrega, sense haver de recórrer a procediments de prova i error sobre sistemes reals que impliquen costos a tots nivells en l'empresa. Aquestes metodologies son habitualment emprades en d'altres països o també en el nostre país però en empreses de gran tamany. El present treball, emmarcat dins un entorn socioeconòmic convuls, amb grans limitacions financeres i de recursos per les empreses, demostra com la utilització d'eines de simulació és útil per a PIMES en aquest entorn i permet el dimensionament i la modelització dels processos de forma que es permeti trobar els punts òptims en els que l'empresa ha de donar un pas de creixement en alguns dels paràmetres. La metodologia amb que s'elabora el present treball es la de plantejar una simulació complerta del procés, i definir diversos escenaris de mercat per als productes fabricats, cercant els punt òptims de canvi de dimensió de l'empresa atenent a espai físic, sotscontractació de processos, personal i recursos.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Seasonal variations in the stable isotopic composition of snow and meltwater were investigated in a sub-arctic, mountainous, but non-glacial, catchment at Okstindan in northern Norway based on analyses of delta(18)O and deltaD. Samples were collected during four field periods (August 1998; April 1999; June 1999 and August 1999) at three sites lying on an altitudinal transect (740-970 m a.s.l.). Snowpack data display an increase in the mean values of delta(18)O (increasing from a mean value of - 13.51 to - 11.49% between April and August), as well as a decrease in variability through the melt period. Comparison with a regional meteoric water line indicates that the slope of the delta(18)O - deltaD line for the snowpacks decreases over the same period, dropping from 7.49 to approximately 6.2. This change points to the role of evaporation in snowpack ablation and is confirmed by the vertical profile of deuterium excess. Snowpack seepage data, although limited, also suggest reduced values of deltaD, as might be associated with local evaporation during meltwater generation. In general, meltwaters were depleted in delta(18)O relative to the source snowpack at the peak of the melt (June), but later in the year (August) the difference between the two was not statistically significant. The diurnal pattern of isotopic composition indicates that the most depleted meltwaters coincide with the peak in temperature and, hence, meltwater production.