950 resultados para Longitudinal Data Analysis and Time Series
Resumo:
We present a method to integrate environmental time series into stock assessment models and to test the significance of correlations between population processes and the environmental time series. Parameters that relate the environmental time series to population processes are included in the stock assessment model, and likelihood ratio tests are used to determine if the parameters improve the fit to the data significantly. Two approaches are considered to integrate the environmental relationship. In the environmental model, the population dynamics process (e.g. recruitment) is proportional to the environmental variable, whereas in the environmental model with process error it is proportional to the environmental variable, but the model allows an additional temporal variation (process error) constrained by a log-normal distribution. The methods are tested by using simulation analysis and compared to the traditional method of correlating model estimates with environmental variables outside the estimation procedure. In the traditional method, the estimates of recruitment were provided by a model that allowed the recruitment only to have a temporal variation constrained by a log-normal distribution. We illustrate the methods by applying them to test the statistical significance of the correlation between sea-surface temperature (SST) and recruitment to the snapper (Pagrus auratus) stock in the Hauraki Gulf–Bay of Plenty, New Zealand. Simulation analyses indicated that the integrated approach with additional process error is superior to the traditional method of correlating model estimates with environmental variables outside the estimation procedure. The results suggest that, for the snapper stock, recruitment is positively correlated with SST at the time of spawning.
Resumo:
We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/.
Resumo:
Nonlinear multivariate statistical techniques on fast computers offer the potential to capture more of the dynamics of the high dimensional, noisy systems underlying financial markets than traditional models, while making fewer restrictive assumptions. This thesis presents a collection of practical techniques to address important estimation and confidence issues for Radial Basis Function networks arising from such a data driven approach, including efficient methods for parameter estimation and pruning, a pointwise prediction error estimator, and a methodology for controlling the "data mining'' problem. Novel applications in the finance area are described, including customized, adaptive option pricing and stock price prediction.
Resumo:
BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
Time-series and sequences are important patterns in data mining. Based on an ontology of time-elements, this paper presents a formal characterization of time-series and state-sequences, where a state denotes a collection of data whose validation is dependent on time. While a time-series is formalized as a vector of time-elements temporally ordered one after another, a state-sequence is denoted as a list of states correspondingly ordered by a time-series. In general, a time-series and a state-sequence can be incomplete in various ways. This leads to the distinction between complete and incomplete time-series, and between complete and incomplete state-sequences, which allows the expression of both absolute and relative temporal knowledge in data mining.
Resumo:
Time-series analysis and prediction play an important role in state-based systems that involve dealing with varying situations in terms of states of the world evolving with time. Generally speaking, the world in the discourse persists in a given state until something occurs to it into another state. This paper introduces a framework for prediction and analysis based on time-series of states. It takes a time theory that addresses both points and intervals as primitive time elements as the temporal basis. A state of the world under consideration is defined as a set of time-varying propositions with Boolean truth-values that are dependent on time, including properties, facts, actions, events and processes, etc. A time-series of states is then formalized as a list of states that are temporally ordered one after another. The framework supports explicit expression of both absolute and relative temporal knowledge. A formal schema for expressing general time-series of states to be incomplete in various ways, while the concept of complete time-series of states is also formally defined. As applications of the formalism in time-series analysis and prediction, we present two illustrating examples.
Resumo:
Interannual and seasonal trends of zooplankton abundance and species composition were compared between the Bongo net and Continuous Plankton Recorder (CPR) time series in the Gulf of Maine. Data from 5799 Bongo and 3118 CPR samples were compared from the years 1978–2006. The two programs use different sampling methods, with the Bongo time series composed of bimonthly vertically integrated samples from locations throughout the region, while the CPR was towed monthly at 10 m depth on a transect that bisects the region. It was found that there was a significant correlation between the interannual (r = 0.67, P < 0.01) and seasonal (r = 0.95, P < 0.01) variability of total zooplankton counts. Abundance rankings of individual taxa were highly correlated and temporal trends of dominant copepods were similar between samplers. Multivariate analysis also showed that both time series equally detected major shifts in community structure through time. However, absolute abundance levels were higher in the Bongo and temporal patterns for many of the less abundant taxa groups were not similar between the two devices. The different mesh sizes of the samplers probably caused some of the discrepancies; but diel migration patterns, damage to soft bodied animals and avoidance of the small CPR aperture by some taxa likely contributed to the catch differences between the two devices. Nonetheless, Bongo data presented here confirm the previously published patterns found in the CPR data set, and both show that the abundance increase of the 1990s has been followed by average to below average levels from 2002 to 06.
Resumo:
The validity of load estimates from intermittent, instantaneous grab sampling is dependent on adequate spatial coverage by monitoring networks and a sampling frequency that re?ects the variability in the system under study. Catchments with a ?ashy hydrology due to surface runoff pose a particular challenge as intense short duration rainfall events may account for a signi?cant portion of the total diffuse transfer of pollution from soil to water in any hydrological year. This can also be exacerbated by the presence of strong background pollution signals from point sources during low flows. In this paper, a range of sampling methodologies and load estimation techniques are applied to phosphorus data from such a surface water dominated river system, instrumented at three sub-catchments (ranging from 3 to 5 km2 in area) with near-continuous monitoring stations. Systematic and Monte Carlo approaches were applied to simulate grab sampling using multiple strategies and to calculate an estimated load, Le based on established load estimation methods. Comparison with the actual load, Lt, revealed signi?cant average underestimation, of up to 60%, and high variability for all feasible sampling approaches. Further analysis of the time series provides an insight into these observations; revealing peak frequencies and power-law scaling in the distributions of P concentration, discharge and load associated with surface runoff and background transfers. Results indicate that only near-continuous monitoring that re?ects the rapid temporal changes in these river systems is adequate for comparative monitoring and evaluation purposes. While the implications of this analysis may be more tenable to small scale ?ashy systems, this represents an appropriate scale in terms of evaluating catchment mitigation strategies such as agri-environmental policies for managing diffuse P transfers in complex landscapes.
Resumo:
The problem of model selection of a univariate long memory time series is investigated once a semi parametric estimator for the long memory parameter has been used. Standard information criteria are not consistent in this case. A Modified Information Criterion (MIC) that overcomes these difficulties is introduced and proofs that show its asymptotic validity are provided. The results are general and cover a wide range of short memory processes. Simulation evidence compares the new and existing methodologies and empirical applications in monthly inflation and daily realized volatility are presented.
Resumo:
Objective
To investigate the effect of fast food consumption on mean population body mass index (BMI) and explore the possible influence of market deregulation on fast food consumption and BMI.
Methods
The within-country association between fast food consumption and BMI in 25 high-income member countries of the Organisation for Economic Co-operation and Development between 1999 and 2008 was explored through multivariate panel regression models, after adjustment for per capita gross domestic product, urbanization, trade openness, lifestyle indicators and other covariates. The possible mediating effect of annual per capita intake of soft drinks, animal fats and total calories on the association between fast food consumption and BMI was also analysed. Two-stage least squares regression models were conducted, using economic freedom as an instrumental variable, to study the causal effect of fast food consumption on BMI.
Findings
After adjustment for covariates, each 1-unit increase in annual fast food transactions per capita was associated with an increase of 0.033 kg/m2 in age-standardized BMI (95% confidence interval, CI: 0.013–0.052). Only the intake of soft drinks – not animal fat or total calories – mediated the observed association (β: 0.030; 95% CI: 0.010–0.050). Economic freedom was an independent predictor of fast food consumption (β: 0.27; 95% CI: 0.16–0.37). When economic freedom was used as an instrumental variable, the association between fast food and BMI weakened but remained significant (β: 0.023; 95% CI: 0.001–0.045).
Conclusion
Fast food consumption is an independent predictor of mean BMI in high-income countries. Market deregulation policies may contribute to the obesity epidemic by facilitating the spread of fast food.
Resumo:
This paper presents a framework for a telecommunications interface which allows data from sensors embedded in Smart Grid applications to reliably archive data in an appropriate time-series database. The challenge in doing so is two-fold, firstly the various formats in which sensor data is represented, secondly the problems of telecoms reliability. A prototype of the authors' framework is detailed which showcases the main features of the framework in a case study featuring Phasor Measurement Units (PMU) as the application. Useful analysis of PMU data is achieved whenever data from multiple locations can be compared on a common time axis. The prototype developed highlights its reliability, extensibility and adoptability; features which are largely deferred from industry standards for data representation to proprietary database solutions. The open source framework presented provides link reliability for any type of Smart Grid sensor and is interoperable with existing proprietary database systems, and open database systems. The features of the authors' framework allow for researchers and developers to focus on the core of their real-time or historical analysis applications, rather than having to spend time interfacing with complex protocols.
Resumo:
This case study deals with the role of time series analysis in sociology, and its relationship with the wider literature and methodology of comparative case study research. Time series analysis is now well-represented in top-ranked sociology journals, often in the form of ‘pooled time series’ research designs. These studies typically pool multiple countries together into a pooled time series cross-section panel, in order to provide a larger sample for more robust and comprehensive analysis. This approach is well suited to exploring trans-national phenomena, and for elaborating useful macro-level theories specific to social structures, national policies, and long-term historical processes. It is less suited however, to understanding how these global social processes work in different countries. As such, the complexities of individual countries - which often display very different or contradictory dynamics than those suggested in pooled studies – are subsumed. Meanwhile, a robust literature on comparative case-based methods exists in the social sciences, where researchers focus on differences between cases, and the complex ways in which they co-evolve or diverge over time. A good example of this is the inequality literature, where although panel studies suggest a general trend of rising inequality driven by the weakening power of labour, marketisation of welfare, and the rising power of capital, some countries have still managed to remain resilient. This case study takes a closer look at what can be learned by applying the insights of case-based comparative research to the method of time series analysis. Taking international income inequality as its point of departure, it argues that we have much to learn about the viability of different combinations of policy options by examining how they work in different countries over time. By taking representative cases from different welfare systems (liberal, social democratic, corporatist, or antipodean), we can better sharpen our theories of how policies can be more specifically engineered to offset rising inequality. This involves a fundamental realignment of the strategy of time series analysis, grounding it instead in a qualitative appreciation of the historical context of cases, as a basis for comparing effects between different countries.
Resumo:
This thesis focuses on the application of optimal alarm systems to non linear time series models. The most common classes of models in the analysis of real-valued and integer-valued time series are described. The construction of optimal alarm systems is covered and its applications explored. Considering models with conditional heteroscedasticity, particular attention is given to the Fractionally Integrated Asymmetric Power ARCH, FIAPARCH(p; d; q) model and an optimal alarm system is implemented, following both classical and Bayesian methodologies. Taking into consideration the particular characteristics of the APARCH(p; q) representation for financial time series, the introduction of a possible counterpart for modelling time series of counts is proposed: the INteger-valued Asymmetric Power ARCH, INAPARCH(p; q). The probabilistic properties of the INAPARCH(1; 1) model are comprehensively studied, the conditional maximum likelihood (ML) estimation method is applied and the asymptotic properties of the conditional ML estimator are obtained. The final part of the work consists on the implementation of an optimal alarm system to the INAPARCH(1; 1) model. An application is presented to real data series.
Resumo:
In this paper we analyze the behavior of tornado time-series in the U.S. from the perspective of dynamical systems. A tornado is a violently rotating column of air extending from a cumulonimbus cloud down to the ground. Such phenomena reveal features that are well described by power law functions and unveil characteristics found in systems with long range memory effects. Tornado time series are viewed as the output of a complex system and are interpreted as a manifestation of its dynamics. Tornadoes are modeled as sequences of Dirac impulses with amplitude proportional to the events size. First, a collection of time series involving 64 years is analyzed in the frequency domain by means of the Fourier transform. The amplitude spectra are approximated by power law functions and their parameters are read as an underlying signature of the system dynamics. Second, it is adopted the concept of circular time and the collective behavior of tornadoes analyzed. Clustering techniques are then adopted to identify and visualize the emerging patterns.