Biblioteca Digital

975 resultados para MULTIVARIATE MODELS

Annual forecasting of the Australian macadamia crop - integrating tree census data with statistical climate-adjustment models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To facilitate marketing and export, the Australian macadamia industry requires accurate crop forecasts. Each year, two levels of crop predictions are produced for this industry. The first is an overall longer-term forecast based on tree census data of growers in the Australian Macadamia Society (AMS). This data set currently accounts for around 70% of total production, and is supplemented by our best estimates of non-AMS orchards. Given these total tree numbers, average yields per tree are needed to complete the long-term forecasts. Yields from regional variety trials were initially used, but were found to be consistently higher than the average yields that growers were obtaining. Hence, a statistical model was developed using growers' historical yields, also taken from the AMS database. This model accounted for the effects of tree age, variety, year, region and tree spacing, and explained 65% of the total variation in the yield per tree data. The second level of crop prediction is an annual climate adjustment of these overall long-term estimates, taking into account the expected effects on production of the previous year's climate. This adjustment is based on relative historical yields, measured as the percentage deviance between expected and actual production. The dominant climatic variables are observed temperature, evaporation, solar radiation and modelled water stress. Initially, a number of alternate statistical models showed good agreement within the historical data, with jack-knife cross-validation R2 values of 96% or better. However, forecasts varied quite widely between these alternate models. Exploratory multivariate analyses and nearest-neighbour methods were used to investigate these differences. For 2001-2003, the overall forecasts were in the right direction (when compared with the long-term expected values), but were over-estimates. In 2004 the forecast was well under the observed production, and in 2005 the revised models produced a forecast within 5.1% of the actual production. Over the first five years of forecasting, the absolute deviance for the climate-adjustment models averaged 10.1%, just outside the targeted objective of 10%.

High-throughput prediction of eucalypt lignin syringyl/guaiacyl content using multivariate analysis: a comparison between mid-infrared, near-infrared, and Raman spectroscopies for model development

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: In order to rapidly and efficiently screen potential biofuel feedstock candidates for quintessential traits, robust high-throughput analytical techniques must be developed and honed. The traditional methods of measuring lignin syringyl/guaiacyl (S/G) ratio can be laborious, involve hazardous reagents, and/or be destructive. Vibrational spectroscopy can furnish high-throughput instrumentation without the limitations of the traditional techniques. Spectral data from mid-infrared, near-infrared, and Raman spectroscopies was combined with S/G ratios, obtained using pyrolysis molecular beam mass spectrometry, from 245 different eucalypt and Acacia trees across 17 species. Iterations of spectral processing allowed the assembly of robust predictive models using partial least squares (PLS). RESULTS: The PLS models were rigorously evaluated using three different randomly generated calibration and validation sets for each spectral processing approach. Root mean standard errors of prediction for validation sets were lowest for models comprised of Raman (0.13 to 0.16) and mid-infrared (0.13 to 0.15) spectral data, while near-infrared spectroscopy led to more erroneous predictions (0.18 to 0.21). Correlation coefficients (r) for the validation sets followed a similar pattern: Raman (0.89 to 0.91), mid-infrared (0.87 to 0.91), and near-infrared (0.79 to 0.82). These statistics signify that Raman and mid-infrared spectroscopy led to the most accurate predictions of S/G ratio in a diverse consortium of feedstocks. CONCLUSION: Eucalypts present an attractive option for biofuel and biochemical production. Given the assortment of over 900 different species of Eucalyptus and Corymbia, in addition to various species of Acacia, it is necessary to isolate those possessing ideal biofuel traits. This research has demonstrated the validity of vibrational spectroscopy to efficiently partition different potential biofuel feedstocks according to lignin S/G ratio, significantly reducing experiment and analysis time and expense while providing non-destructive, accurate, global, predictive models encompassing a diverse array of feedstocks.

Modeling Asymmetries in Financial Data with Multiplicative Error Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis addresses modeling of financial time series, especially stock market returns and daily price ranges. Modeling data of this kind can be approached with so-called multiplicative error models (MEM). These models nest several well known time series models such as GARCH, ACD and CARR models. They are able to capture many well established features of financial time series including volatility clustering and leptokurtosis. In contrast to these phenomena, different kinds of asymmetries have received relatively little attention in the existing literature. In this thesis asymmetries arise from various sources. They are observed in both conditional and unconditional distributions, for variables with non-negative values and for variables that have values on the real line. In the multivariate context asymmetries can be observed in the marginal distributions as well as in the relationships of the variables modeled. New methods for all these cases are proposed. Chapter 2 considers GARCH models and modeling of returns of two stock market indices. The chapter introduces the so-called generalized hyperbolic (GH) GARCH model to account for asymmetries in both conditional and unconditional distribution. In particular, two special cases of the GARCH-GH model which describe the data most accurately are proposed. They are found to improve the fit of the model when compared to symmetric GARCH models. The advantages of accounting for asymmetries are also observed through Value-at-Risk applications. Both theoretical and empirical contributions are provided in Chapter 3 of the thesis. In this chapter the so-called mixture conditional autoregressive range (MCARR) model is introduced, examined and applied to daily price ranges of the Hang Seng Index. The conditions for the strict and weak stationarity of the model as well as an expression for the autocorrelation function are obtained by writing the MCARR model as a first order autoregressive process with random coefficients. The chapter also introduces inverse gamma (IG) distribution to CARR models. The advantages of CARR-IG and MCARR-IG specifications over conventional CARR models are found in the empirical application both in- and out-of-sample. Chapter 4 discusses the simultaneous modeling of absolute returns and daily price ranges. In this part of the thesis a vector multiplicative error model (VMEM) with asymmetric Gumbel copula is found to provide substantial benefits over the existing VMEM models based on elliptical copulas. The proposed specification is able to capture the highly asymmetric dependence of the modeled variables thereby improving the performance of the model considerably. The economic significance of the results obtained is established when the information content of the volatility forecasts derived is examined.

Assessing resilience and state-transition models with historical records of cheatgrass Bromus tectorum invasion in North American sagebrush-steppe

Relevância:

30.00% 30.00%

Publicador:

Resumo:

1. Resilience-based approaches are increasingly being called upon to inform ecosystem management, particularly in arid and semi-arid regions. This requires management frameworks that can assess ecosystem dynamics, both within and between alternative states, at relevant time scales. 2. We analysed long-term vegetation records from two representative sites in the North American sagebrush-steppe ecosystem, spanning nine decades, to determine if empirical patterns were consistent with resilience theory, and to determine if cheatgrass Bromus tectorum invasion led to thresholds as currently envisioned by expert-based state-and-transition models (STM). These data span the entire history of cheatgrass invasion at these sites and provide a unique opportunity to assess the impacts of biotic invasion on ecosystem resilience. 3. We used univariate and multivariate statistical tools to identify unique plant communities and document the magnitude, frequency and directionality of community transitions through time. Community transitions were characterized by 37-47% dissimilarity in species composition, they were not evenly distributed through time, their frequency was not correlated with precipitation, and they could not be readily attributed to fire or grazing. Instead, at both sites, the majority of community transitions occurred within an 8-10year period of increasing cheatgrass density, became infrequent after cheatgrass density peaked, and thereafter transition frequency declined. 4. Greater cheatgrass density, replacement of native species and indication of asymmetry in community transitions suggest that thresholds may have been exceeded in response to cheatgrass invasion at one site (more arid), but not at the other site (less arid). Asymmetry in the direction of community transitions also identified communities that were at-risk' of cheatgrass invasion, as well as potential restoration pathways for recovery of pre-invasion states. 5. Synthesis and applications. These results illustrate the complexities associated with threshold identification, and indicate that criteria describing the frequency, magnitude, directionality and temporal scale of community transitions may provide greater insight into resilience theory and its application for ecosystem management. These criteria are likely to vary across biogeographic regions that are susceptible to cheatgrass invasion, and necessitate more in-depth assessments of thresholds and alternative states, than currently available.

Multivariate PLS Modeling of Apicomplexan FabD-Ligand Interaction Space for Mapping Target-Specific Chemical Space and Pharmacophore Fingerprints

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biomolecular recognition underlying drug-target interactions is determined by both binding affinity and specificity. Whilst, quantification of binding efficacy is possible, determining specificity remains a challenge, as it requires affinity data for multiple targets with the same ligand dataset. Thus, understanding the interaction space by mapping the target space to model its complementary chemical space through computational techniques are desirable. In this study, active site architecture of FabD drug target in two apicomplexan parasites viz. Plasmodium falciparum (PfFabD) and Toxoplasma gondii (TgFabD) is explored, followed by consensus docking calculations and identification of fifteen best hit compounds, most of which are found to be derivatives of natural products. Subsequently, machine learning techniques were applied on molecular descriptors of six FabD homologs and sixty ligands to induce distinct multivariate partial-least square models. The biological space of FabD mapped by the various chemical entities explain their interaction space in general. It also highlights the selective variations in FabD of apicomplexan parasites with that of the host. Furthermore, chemometric models revealed the principal chemical scaffolds in PfFabD and TgFabD as pyrrolidines and imidazoles, respectively, which render target specificity and improve binding affinity in combination with other functional descriptors conducive for the design and optimization of the leads.

Proposal and validation of methodologies for the categorisation of continuous variables in the development of prediction models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

158 p.

Use of SARIMA models to assess data-poor fisheries: a case study with a sciaenid fishery off Portugal

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research on assessment and monitoring methods has primarily focused on fisheries with long multivariate data sets. Less research exists on methods applicable to data-poor fisheries with univariate data sets with a small sample size. In this study, we examine the capabilities of seasonal autoregressive integrated moving average (SARIMA) models to fit, forecast, and monitor the landings of such data-poor fisheries. We use a European fishery on meagre (Sciaenidae: Argyrosomus regius), where only a short time series of landings was available to model (n=60 months), as our case-study. We show that despite the limited sample size, a SARIMA model could be found that adequately fitted and forecasted the time series of meagre landings (12-month forecasts; mean error: 3.5 tons (t); annual absolute percentage error: 15.4%). We derive model-based prediction intervals and show how they can be used to detect problematic situations in the fishery. Our results indicate that over the course of one year the meagre landings remained within the prediction limits of the model and therefore indicated no need for urgent management intervention. We discuss the information that SARIMA model structure conveys on the meagre lifecycle and fishery, the methodological requirements of SARIMA forecasting of data-poor fisheries landings, and the capabilities SARIMA models present within current efforts to monitor the world’s data-poorest resources.

Analysis of multivariate stochastic signals sampled by on-line particle analyzers: Application to the quantitative assessment of occupational exposure to NOAA in multisource industrial scenarios (MSIS)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In multisource industrial scenarios (MSIS) coexist NOAA generating activities with other productive sources of airborne particles, such as parallel processes of manufacturing or electrical and diesel machinery. A distinctive characteristic of MSIS is the spatially complex distribution of aerosol sources, as well as their potential differences in dynamics, due to the feasibility of multi-task configuration at a given time. Thus, the background signal is expected to challenge the aerosol analyzers at a probably wide range of concentrations and size distributions, depending of the multisource configuration at a given time. Monitoring and prediction by using statistical analysis of time series captured by on-line particle analyzers in industrial scenarios, have been proven to be feasible in predicting PNC evolution provided a given quality of net signals (difference between signal at source and background). However the analysis and modelling of non-consistent time series, influenced by low levels of SNR (Signal-Noise Ratio) could build a misleading basis for decision making. In this context, this work explores the use of stochastic models based on ARIMA methodology to monitor and predict exposure values (PNC). The study was carried out in a MSIS where an case study focused on the manufacture of perforated tablets of nano-TiO2 by cold pressing was performed

Bayesian Analysis of Latent Threshold Dynamic Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We discuss a general approach to dynamic sparsity modeling in multivariate time series analysis. Time-varying parameters are linked to latent processes that are thresholded to induce zero values adaptively, providing natural mechanisms for dynamic variable inclusion/selection. We discuss Bayesian model specification, analysis and prediction in dynamic regressions, time-varying vector autoregressions, and multivariate volatility models using latent thresholding. Application to a topical macroeconomic time series problem illustrates some of the benefits of the approach in terms of statistical and economic interpretations as well as improved predictions. Supplementary materials for this article are available online. © 2013 Copyright Taylor and Francis Group, LLC.

Bayesian Gaussian Copula Factor Models for Mixed Data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.

Event models for tumor classification with SAGE gene expression data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Serial Analysis of Gene Expression (SAGE) is a relatively new method for monitoring gene expression levels and is expected to contribute significantly to the progress in cancer treatment by enabling a precise and early diagnosis. A promising application of SAGE gene expression data is classification of tumors. In this paper, we build three event models (the multivariate Bernoulli model, the multinomial model and the normalized multinomial model) for SAGE data classification. Both binary classification and multicategory classification are investigated. Experiments on two SAGE datasets show that the multivariate Bernoulli model performs well with small feature sizes, but the multinomial performs better at large feature sizes, while the normalized multinomial performs well with medium feature sizes. The multinomial achieves the highest overall accuracy.

Comparison of new and primary production models using SeaWiFS data in contrasting hydrographic zones of the northern North Atlantic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The accuracy of two satellite models of marine primary (PP) and new production (NP) were assessed against 14C and 15N uptake measurements taken during six research cruises in the northern North Atlantic. The wavelength resolving model (WRM) was more accurate than the Vertical General Production Model (VGPM) for computation of both PP and NP. Mean monthly satellite maps of PP and NP for both models were generated from 1997 to 2010 using SeaWiFS data for the Irminger basin and North Atlantic. Intra- and inter-annual variability of the two models was compared in six hydrographic zones. Both models exhibited similar spatio-temporal patterns: PP and NP increased from April to June and decreased by August. Higher values were associated with the East Greenland Current (EGC), Iceland Basin (ICB) and the Reykjanes Ridge (RKR) and lower values occurred in the Central Irminger Current (CIC), North Irminger Current (NIC) and Southern Irminger Current (SIC). The annual PP and NP over the SeaWiFS record was 258 and 82 gC m-2 yr-1 respectively for the VGPM and 190 and 41 gC m-2 yr-1 for the WRM. Average annual cumulative sum in the anomalies of NP for the VGPM were positively correlated with the North Atlantic Oscillation (NAO) in the EGC, CIC and SIC and negatively correlated with the multivariate ENSO index (MEI) in the ICB. By contrast, cumulative sum of the anomalies of NP for the WRM were significantly correlated with NAO only in the EGC and CIC. NP from both VGPM and WRM exhibited significant negative correlations with Arctic Oscillation (AO) in all hydrographic zones. The differences in estimates of PP and NP in these hydrographic zones arise principally from the parameterisation of the euphotic depth and the SST dependence of photo-physiological term in the VGPM, which has a greater sensitivity to variations in temperature than the WRM. In waters of 0 to 5C PP using the VGPM was 43% higher than WRM, from 5 to 10C the VGPM was 29% higher and from 10 to 15C the VGPM was 27% higher.

Multivariate Statistical Analysis Applied to an IL6 Signal Transduction Model in Hepatocytes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper introduces the application of linear multivariate statistical techniques, including partial least squares (PLS), canonical correlation analysis (CCA) and reduced rank regression (RRR), into the area of Systems Biology. This new approach aims to extract the important proteins embedded in complex signal transduction pathway models.The analysis is performed on a model of intracellular signalling along the janus-associated kinases/signal transducers and transcription factors (JAK/STAT) and mitogen activated protein kinases (MAPK) signal transduction pathways in interleukin-6 (IL6) stimulated hepatocytes, which produce signal transducer and activator of transcription factor 3 (STAT3).A region of redundancy within the MAPK pathway that does not affect the STAT3 transcription was identified using CCA. This is the core finding of this analysis and cannot be obtained by inspecting the model by eye. In addition, RRR was found to isolate terms that do not significantly contribute to changes in protein concentrations, while the application of PLS does not provide such a detailed picture by virtue of its construction.This analysis has a similar objective to conventional model reduction techniques with the advantage of maintaining the meaning of the states prior to and after the reduction process. A significant model reduction is performed, with a marginal loss in accuracy, offering a more concise model while maintaining the main influencing factors on the STAT3 transcription.The findings offer a deeper understanding of the reaction terms involved, confirm the relevance of several proteins to the production of Acute Phase Proteins and complement existing findings regarding cross-talk between the two signalling pathways.

Dynamic Density Forecasts for Multivariate Asset Returns

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a simple and flexible framework for forecasting the joint density of asset returns. The multinormal distribution is augmented with a polynomial in (time-varying) non-central co-moments of assets. We estimate the coefficients of the polynomial via the Method of Moments for a carefully selected set of co-moments. In an extensive empirical study, we compare the proposed model with a range of other models widely used in the literature. Employing a recently proposed as well as standard techniques to evaluate multivariate forecasts, we conclude that the augmented joint density provides highly accurate forecasts of the “negative tail” of the joint distribution.

An Examination of Transformation Techniques to Investigate and Interpret Multivariate Geochemical Data Analysis: Tellus Case Study

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research aims to use the multivariate geochemical dataset, generated by the Tellus project, to investigate the appropriate use of transformation methods to maintain the integrity of geochemical data and inherent constrained behaviour in multivariate relationships. The widely used normal score transform is compared with the use of a stepwise conditional transform technique. The Tellus Project, managed by GSNI and funded by the Department of Enterprise Trade and Development and the EU’s Building Sustainable Prosperity Fund, involves the most comprehensive geological mapping project ever undertaken in Northern Ireland. Previous study has demonstrated spatial variability in the Tellus data but geostatistical analysis and interpretation of the datasets requires use of an appropriate methodology that reproduces the inherently complex multivariate relations. Previous investigation of the Tellus geochemical data has included use of Gaussian-based techniques. However, earth science variables are rarely Gaussian, hence transformation of data is integral to the approach. The multivariate geochemical dataset generated by the Tellus project provides an opportunity to investigate the appropriate use of transformation methods, as required for Gaussian-based geostatistical analysis. In particular, the stepwise conditional transform is investigated and developed for the geochemical datasets obtained as part of the Tellus project. The transform is applied to four variables in a bivariate nested fashion due to the limited availability of data. Simulation of these transformed variables is then carried out, along with a corresponding back transformation to original units. Results show that the stepwise transform is successful in reproducing both univariate statistics and the complex bivariate relations exhibited by the data. Greater fidelity to multivariate relationships will improve uncertainty models, which are required for consequent geological, environmental and economic inferences.

«
1
2
...
7
8
9
10
11
12
13
...
64
65
»