967 resultados para Integer-Valued Time Series
Resumo:
In this paper, we indicate how integer-valued autoregressive time series Ginar(d) of ordre d, d ≥ 1, are simple functionals of multitype branching processes with immigration. This allows the derivation of a simple criteria for the existence of a stationary distribution of the time series, thus proving and extending some results by Al-Osh and Alzaid [1], Du and Li [9] and Gauthier and Latour [11]. One can then transfer results on estimation in subcritical multitype branching processes to stationary Ginar(d) and get consistency and asymptotic normality for the corresponding estimators. The technique covers autoregressive moving average time series as well.
Resumo:
Zeitreihen sind allgegenwärtig. Die Erfassung und Verarbeitung kontinuierlich gemessener Daten ist in allen Bereichen der Naturwissenschaften, Medizin und Finanzwelt vertreten. Das enorme Anwachsen aufgezeichneter Datenmengen, sei es durch automatisierte Monitoring-Systeme oder integrierte Sensoren, bedarf außerordentlich schneller Algorithmen in Theorie und Praxis. Infolgedessen beschäftigt sich diese Arbeit mit der effizienten Berechnung von Teilsequenzalignments. Komplexe Algorithmen wie z.B. Anomaliedetektion, Motivfabfrage oder die unüberwachte Extraktion von prototypischen Bausteinen in Zeitreihen machen exzessiven Gebrauch von diesen Alignments. Darin begründet sich der Bedarf nach schnellen Implementierungen. Diese Arbeit untergliedert sich in drei Ansätze, die sich dieser Herausforderung widmen. Das umfasst vier Alignierungsalgorithmen und ihre Parallelisierung auf CUDA-fähiger Hardware, einen Algorithmus zur Segmentierung von Datenströmen und eine einheitliche Behandlung von Liegruppen-wertigen Zeitreihen.rnrnDer erste Beitrag ist eine vollständige CUDA-Portierung der UCR-Suite, die weltführende Implementierung von Teilsequenzalignierung. Das umfasst ein neues Berechnungsschema zur Ermittlung lokaler Alignierungsgüten unter Verwendung z-normierten euklidischen Abstands, welches auf jeder parallelen Hardware mit Unterstützung für schnelle Fouriertransformation einsetzbar ist. Des Weiteren geben wir eine SIMT-verträgliche Umsetzung der Lower-Bound-Kaskade der UCR-Suite zur effizienten Berechnung lokaler Alignierungsgüten unter Dynamic Time Warping an. Beide CUDA-Implementierungen ermöglichen eine um ein bis zwei Größenordnungen schnellere Berechnung als etablierte Methoden.rnrnAls zweites untersuchen wir zwei Linearzeit-Approximierungen für das elastische Alignment von Teilsequenzen. Auf der einen Seite behandeln wir ein SIMT-verträgliches Relaxierungschema für Greedy DTW und seine effiziente CUDA-Parallelisierung. Auf der anderen Seite führen wir ein neues lokales Abstandsmaß ein, den Gliding Elastic Match (GEM), welches mit der gleichen asymptotischen Zeitkomplexität wie Greedy DTW berechnet werden kann, jedoch eine vollständige Relaxierung der Penalty-Matrix bietet. Weitere Verbesserungen umfassen Invarianz gegen Trends auf der Messachse und uniforme Skalierung auf der Zeitachse. Des Weiteren wird eine Erweiterung von GEM zur Multi-Shape-Segmentierung diskutiert und auf Bewegungsdaten evaluiert. Beide CUDA-Parallelisierung verzeichnen Laufzeitverbesserungen um bis zu zwei Größenordnungen.rnrnDie Behandlung von Zeitreihen beschränkt sich in der Literatur in der Regel auf reellwertige Messdaten. Der dritte Beitrag umfasst eine einheitliche Methode zur Behandlung von Liegruppen-wertigen Zeitreihen. Darauf aufbauend werden Distanzmaße auf der Rotationsgruppe SO(3) und auf der euklidischen Gruppe SE(3) behandelt. Des Weiteren werden speichereffiziente Darstellungen und gruppenkompatible Erweiterungen elastischer Maße diskutiert.
Resumo:
We propose a method to measure real-valued time series irreversibility which combines two different tools: the horizontal visibility algorithm and the Kullback-Leibler divergence. This method maps a time series to a directed network according to a geometric criterion. The degree of irreversibility of the series is then estimated by the Kullback-Leibler divergence (i.e. the distinguishability) between the in and out degree distributions of the associated graph. The method is computationally efficient and does not require any ad hoc symbolization process. We find that the method correctly distinguishes between reversible and irreversible stationary time series, including analytical and numerical studies of its performance for: (i) reversible stochastic processes (uncorrelated and Gaussian linearly correlated), (ii) irreversible stochastic processes (a discrete flashing ratchet in an asymmetric potential), (iii) reversible (conservative) and irreversible (dissipative) chaotic maps, and (iv) dissipative chaotic maps in the presence of noise. Two alternative graph functionals, the degree and the degree-degree distributions, can be used as the Kullback-Leibler divergence argument. The former is simpler and more intuitive and can be used as a benchmark, but in the case of an irreversible process with null net current, the degree-degree distribution has to be considered to identify the irreversible nature of the series
Resumo:
In the last decades the study of integer-valued time series has gained notoriety due to its broad applicability (modeling the number of car accidents in a given highway, or the number of people infected by a virus are two examples). One of the main interests of this area of study is to make forecasts, and for this reason it is very important to propose methods to make such forecasts, which consist of nonnegative integer values, due to the discrete nature of the data. In this work, we focus on the study and proposal of forecasts one, two and h steps ahead for integer-valued second-order autoregressive conditional heteroskedasticity processes [INARCH (2)], and in determining some theoretical properties of this model, such as the ordinary moments of its marginal distribution and the asymptotic distribution of its conditional least squares estimators. In addition, we study, via Monte Carlo simulation, the behavior of the estimators for the parameters of INARCH(2) processes obtained using three di erent methods (Yule- Walker, conditional least squares, and conditional maximum likelihood), in terms of mean squared error, mean absolute error and bias. We present some forecast proposals for INARCH(2) processes, which are compared again via Monte Carlo simulation. As an application of this proposed theory, we model a dataset related to the number of live male births of mothers living at Riachuelo city, in the state of Rio Grande do Norte, Brazil.
Resumo:
In the last decades the study of integer-valued time series has gained notoriety due to its broad applicability (modeling the number of car accidents in a given highway, or the number of people infected by a virus are two examples). One of the main interests of this area of study is to make forecasts, and for this reason it is very important to propose methods to make such forecasts, which consist of nonnegative integer values, due to the discrete nature of the data. In this work, we focus on the study and proposal of forecasts one, two and h steps ahead for integer-valued second-order autoregressive conditional heteroskedasticity processes [INARCH (2)], and in determining some theoretical properties of this model, such as the ordinary moments of its marginal distribution and the asymptotic distribution of its conditional least squares estimators. In addition, we study, via Monte Carlo simulation, the behavior of the estimators for the parameters of INARCH(2) processes obtained using three di erent methods (Yule- Walker, conditional least squares, and conditional maximum likelihood), in terms of mean squared error, mean absolute error and bias. We present some forecast proposals for INARCH(2) processes, which are compared again via Monte Carlo simulation. As an application of this proposed theory, we model a dataset related to the number of live male births of mothers living at Riachuelo city, in the state of Rio Grande do Norte, Brazil.
Resumo:
We present simple procedures for the prediction of a real valued sequence. The algorithms are based on a combinationof several simple predictors. We show that if the sequence is a realization of a bounded stationary and ergodic random process then the average of squared errors converges, almost surely, to that of the optimum, given by the Bayes predictor. We offer an analog result for the prediction of stationary gaussian processes.
Resumo:
The surface of the Earth is subjected to vertical deformations caused by geophysical and geological processes which can be monitored by Global Positioning System (GPS) observations. The purpose of this work is to investigate GPS height time series to identify interannual signals affecting the Earth’s surface over the European and Mediterranean area, during the period 2001-2019. Thirty-six homogeneously distributed GPS stations were selected from the online dataset made available by the Nevada Geodetic Laboratory (NGL) on the basis of the length and quality of the data series. The Principal Component Analysis (PCA) is the technique applied to extract the main patterns of the space and time variability of the GPS Up coordinate. The time series were studied by means of a frequency analysis using a periodogram and the real-valued Morlet wavelet. The periodogram is used to identify the dominant frequencies and the spectral density of the investigated signals; the second one is applied to identify the signals in the time domain and the relevant periodicities. This study has identified, over European and Mediterranean area, the presence of interannual non-linear signals with a period of 2-to-4 years, possibly related to atmospheric and hydrological loading displacements and to climate phenomena, such as El Niño Southern Oscillation (ENSO). A clear signal with a period of about six years is present in the vertical component of the GPS time series, likely explainable by the gravitational coupling between the Earth’s mantle and the inner core. Moreover, signals with a period in the order of 8-9 years, might be explained by mantle-inner core gravity coupling and the cycle of the lunar perigee, and a signal of 18.6 years, likely associated to lunar nodal cycle, were identified through the wavelet spectrum. However, these last two signals need further confirmation because the present length of the GPS time series is still too short when compared to the periods involved.
Resumo:
The objective of the study is to evaluate the effect of the daily variation in concentrations of fine particulate matter (diameter less than 2.5µm - PM2.5) resulting from the burning of biomass on the daily number of hospitalizations of children and elderly people for respiratory diseases, in Alta Floresta and Tangará da Serra in the Brazilian Amazon in 2005. This is an ecological time series study that uses data on daily number of hospitalizations of children and the elderly for respiratory diseases, and estimated concentration of PM2.5. In Alta Floresta, the percentage increases in the relative risk (%RR) of hospitalization for respiratory diseases in children were significant for the whole year and for the dry season with 3-4 day lags. In the dry season these measurements reach 6% (95%CI: 1.4-10.8). The associations were sig-nificant for moving averages of 3-5 days. The %RR for the elderly was significant for the current day of the drought, with a 6.8% increase (95%CI: 0.5-13.5) for each additional 10µg/m3 of PM2.5. No as-sociations were verified for Tangara da Serra. The PM2.5 from the burning of biomass increased hospitalizations for respiratory diseases in children and the elderly.
Resumo:
A susceptible-infective-recovered (SIR) epidemiological model based on probabilistic cellular automaton (PCA) is employed for simulating the temporal evolution of the registered cases of chickenpox in Arizona, USA, between 1994 and 2004. At each time step, every individual is in one of the states S, I, or R. The parameters of this model are the probabilities of each individual (each cell forming the PCA lattice ) passing from a state to another state. Here, the values of these probabilities are identified by using a genetic algorithm. If nonrealistic values are allowed to the parameters, the predictions present better agreement with the historical series than if they are forced to present realistic values. A discussion about how the size of the PCA lattice affects the quality of the model predictions is presented. Copyright (C) 2009 L. H. A. Monteiro et al.
Resumo:
Background: Microarray techniques have become an important tool to the investigation of genetic relationships and the assignment of different phenotypes. Since microarrays are still very expensive, most of the experiments are performed with small samples. This paper introduces a method to quantify dependency between data series composed of few sample points. The method is used to construct gene co-expression subnetworks of highly significant edges. Results: The results shown here are for an adapted subset of a Saccharomyces cerevisiae gene expression data set with low temporal resolution and poor statistics. The method reveals common transcription factors with a high confidence level and allows the construction of subnetworks with high biological relevance that reveals characteristic features of the processes driving the organism adaptations to specific environmental conditions. Conclusion: Our method allows a reliable and sophisticated analysis of microarray data even under severe constraints. The utilization of systems biology improves the biologists ability to elucidate the mechanisms underlying celular processes and to formulate new hypotheses.
Resumo:
Measurements of polar organic marker compounds were performed on aerosols that were collected at a pasture site in the Amazon basin (Rondonia, Brazil) using a high-volume dichotomous sampler (HVDS) and a Micro-Orifice Uniform Deposit Impactor (MOUDI) within the framework of the 2002 LBA-SMOCC (Large-Scale Biosphere Atmosphere Experiment in Amazonia - Smoke Aerosols, Clouds, Rainfall, and Climate: Aerosols From Biomass Burning Perturb Global and Regional Climate) campaign. The campaign spanned the late dry season (biomass burning), a transition period, and the onset of the wet season (clean conditions). In the present study a more detailed discussion is presented compared to previous reports on the behavior of selected polar marker compounds, including levoglucosan, malic acid, isoprene secondary organic aerosol (SOA) tracers and tracers for fungal spores. The tracer data are discussed taking into account new insights that recently became available into their stability and/or aerosol formation processes. During all three periods, levoglucosan was the most dominant identified organic species in the PM(2.5) size fraction of the HVDS samples. In the dry period levoglucosan reached concentrations of up to 7.5 mu g m(-3) and exhibited diel variations with a nighttime prevalence. It was closely associated with the PM mass in the size-segregated samples and was mainly present in the fine mode, except during the wet period where it peaked in the coarse mode. Isoprene SOA tracers showed an average concentration of 250 ng m(-3) during the dry period versus 157 ng m(-3) during the transition period and 52 ng m(-3) during the wet period. Malic acid and the 2-methyltetrols exhibited a different size distribution pattern, which is consistent with different aerosol formation processes (i.e., gas-to-particle partitioning in the case of malic acid and heterogeneous formation from gas-phase precursors in the case of the 2-methyltetrols). The 2-methyltetrols were mainly associated with the fine mode during all periods, while malic acid was prevalent in the fine mode only during the dry and transition periods, and dominant in the coarse mode during the wet period. The sum of the fungal spore tracers arabitol, mannitol, and erythritol in the PM(2.5) fraction of the HVDS samples during the dry, transition, and wet periods was, on average, 54 ng m(-3), 34 ng m(-3), and 27 ng m(-3), respectively, and revealed minor day/night variation. The mass size distributions of arabitol and mannitol during all periods showed similar patterns and an association with the coarse mode, consistent with their primary origin. The results show that even under the heavy smoke conditions of the dry period a natural background with contributions from bioaerosols and isoprene SOA can be revealed. The enhancement in isoprene SOA in the dry season is mainly attributed to an increased acidity of the aerosols, increased NO(x) concentrations and a decreased wet deposition.
Resumo:
Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual information), a new criterion function is here proposed. Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs from time series expression profiles. The inference process is based on a feature selection approach and the conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex networks and its gene transference function is obtained from random drawing on the set of possible Boolean functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size and its topologies are based on real networks. The dynamics are generated by continuous differential equations with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the inference with respect to different network topologies, transfer functions and network sizes. Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free parameter of the Tsallis entropy was on average in the range 2.5 <= q <= 3.5 (hence, subextensive entropy), which opens new perspectives for GRNs inference methods based on information theory and for investigation of the nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/dimreduction and http://code.google.com/p/dimreduction/.
Resumo:
Since 2000, the southwestern Brazilian Amazon has undergone a rapid transformation from natural vegetation and pastures to row-crop agricultural with the potential to affect regional biogeochemistry. The goals of this research are to assess wavelet algorithms applied to MODIS time series to determine expansion of row-crops and intensification of the number of crops grown. MODIS provides data from February 2000 to present, a period of agricultural expansion and intensification in the southwestern Brazilian Amazon. We have selected a study area near Comodoro, Mato Grosso because of the rapid growth of row-crop agriculture and availability of ground truth data of agricultural land-use history. We used a 90% power wavelet transform to create a wavelet-smoothed time series for five years of MODIS EVI data. From this wavelet-smoothed time series we determine characteristic phenology of single and double crops. We estimate that over 3200 km(2) were converted from native vegetation and pasture to row-crop agriculture from 2000 to 2005 in our study area encompassing 40,000 km(2). We observe an increase of 2000 km(2) of agricultural intensification, where areas of single crops were converted to double crops during the study period. (C) 2007 Elsevier Inc. All rights reserved.
Resumo:
The Random Parameter model was proposed to explain the structure of the covariance matrix in problems where most, but not all, of the eigenvalues of the covariance matrix can be explained by Random Matrix Theory. In this article, we explore the scaling properties of the model, as observed in the multifractal structure of the simulated time series. We use the Wavelet Transform Modulus Maxima technique to obtain the multifractal spectrum dependence with the parameters of the model. The model shows a scaling structure compatible with the stylized facts for a reasonable choice of the parameter values. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Due to the several kinds of services that use the Internet and data networks infra-structures, the present networks are characterized by the diversity of types of traffic that have statistical properties as complex temporal correlation and non-gaussian distribution. The networks complex temporal correlation may be characterized by the Short Range Dependence (SRD) and the Long Range Dependence - (LRD). Models as the fGN (Fractional Gaussian Noise) may capture the LRD but not the SRD. This work presents two methods for traffic generation that synthesize approximate realizations of the self-similar fGN with SRD random process. The first one employs the IDWT (Inverse Discrete Wavelet Transform) and the second the IDWPT (Inverse Discrete Wavelet Packet Transform). It has been developed the variance map concept that allows to associate the LRD and SRD behaviors directly to the wavelet transform coefficients. The developed methods are extremely flexible and allow the generation of Gaussian time series with complex statistical behaviors.