983 resultados para Statistical Prediction
Resumo:
In this contribution we aim at anchoring Agent-Based Modeling (ABM) simulations in actual models of human psychology. More specifically, we apply unidirectional ABM to social psychological models using low level agents (i.e., intra-individual) to examine whether they generate better predictions, in comparison to standard statistical approaches, concerning the intentions of performing a behavior and the behavior. Moreover, this contribution tests to what extent the predictive validity of models of attitude such as the Theory of Planned Behavior (TPB) or Model of Goal-directed Behavior (MGB) depends on the assumption that peoples’ decisions and actions are purely rational. Simulations were therefore run by considering different deviations from rationality of the agents with a trembling hand method. Two data sets concerning respectively the consumption of soft drinks and physical activity were used. Three key findings emerged from the simulations. First, compared to standard statistical approach the agent-based simulation generally improves the prediction of behavior from intention. Second, the improvement in prediction is inversely proportional to the complexity of the underlying theoretical model. Finally, the introduction of varying degrees of deviation from rationality in agents’ behavior can lead to an improvement in the goodness of fit of the simulations. By demonstrating the potential of ABM as a complementary perspective to evaluating social psychological models, this contribution underlines the necessity of better defining agents in terms of psychological processes before examining higher levels such as the interactions between individuals.
Resumo:
Accurate decadal climate predictions could be used to inform adaptation actions to a changing climate. The skill of such predictions from initialised dynamical global climate models (GCMs) may be assessed by comparing with predictions from statistical models which are based solely on historical observations. This paper presents two benchmark statistical models for predicting both the radiatively forced trend and internal variability of annual mean sea surface temperatures (SSTs) on a decadal timescale based on the gridded observation data set HadISST. For both statistical models, the trend related to radiative forcing is modelled using a linear regression of SST time series at each grid box on the time series of equivalent global mean atmospheric CO2 concentration. The residual internal variability is then modelled by (1) a first-order autoregressive model (AR1) and (2) a constructed analogue model (CA). From the verification of 46 retrospective forecasts with start years from 1960 to 2005, the correlation coefficient for anomaly forecasts using trend with AR1 is greater than 0.7 over parts of extra-tropical North Atlantic, the Indian Ocean and western Pacific. This is primarily related to the prediction of the forced trend. More importantly, both CA and AR1 give skillful predictions of the internal variability of SSTs in the subpolar gyre region over the far North Atlantic for lead time of 2 to 5 years, with correlation coefficients greater than 0.5. For the subpolar gyre and parts of the South Atlantic, CA is superior to AR1 for lead time of 6 to 9 years. These statistical forecasts are also compared with ensemble mean retrospective forecasts by DePreSys, an initialised GCM. DePreSys is found to outperform the statistical models over large parts of North Atlantic for lead times of 2 to 5 years and 6 to 9 years, however trend with AR1 is generally superior to DePreSys in the North Atlantic Current region, while trend with CA is superior to DePreSys in parts of South Atlantic for lead time of 6 to 9 years. These findings encourage further development of benchmark statistical decadal prediction models, and methods to combine different predictions.
Resumo:
A statistical model is derived relating the diurnal variation of sea surface temperature (SST) to the net surface heat flux and surface wind speed from a numerical weather prediction (NWP) model. The model is derived using fluxes and winds from the European Centre for Medium-Range Weather Forecasting (ECMWF) NWP model and SSTs from the Spinning Enhanced Visible and Infrared Imager (SEVIRI). In the model, diurnal warming has a linear dependence on the net surface heat flux integrated since (approximately) dawn and an inverse quadratic dependence on the maximum of the surface wind speed in the same period. The model coefficients are found by matching, for a given integrated heat flux, the frequency distributions of the maximum wind speed and the observed warming. Diurnal cooling, where it occurs, is modelled as proportional to the integrated heat flux divided by the heat capacity of the seasonal mixed layer. The model reproduces the statistics (mean, standard deviation, and 95-percentile) of the diurnal variation of SST seen by SEVIRI and reproduces the geographical pattern of mean warming seen by the Advanced Microwave Scanning Radiometer (AMSR-E). We use the functional dependencies in the statistical model to test the behaviour of two physical model of diurnal warming that display contrasting systematic errors.
Resumo:
A statistical–dynamical downscaling (SDD) approach for the regionalization of wind energy output (Eout) over Europe with special focus on Germany is proposed. SDD uses an extended circulation weather type (CWT) analysis on global daily mean sea level pressure fields with the central point being located over Germany. Seventy-seven weather classes based on the associated CWT and the intensity of the geostrophic flow are identified. Representatives of these classes are dynamically downscaled with the regional climate model COSMO-CLM. By using weather class frequencies of different data sets, the simulated representatives are recombined to probability density functions (PDFs) of near-surface wind speed and finally to Eout of a sample wind turbine for present and future climate. This is performed for reanalysis, decadal hindcasts and long-term future projections. For evaluation purposes, results of SDD are compared to wind observations and to simulated Eout of purely dynamical downscaling (DD) methods. For the present climate, SDD is able to simulate realistic PDFs of 10-m wind speed for most stations in Germany. The resulting spatial Eout patterns are similar to DD-simulated Eout. In terms of decadal hindcasts, results of SDD are similar to DD-simulated Eout over Germany, Poland, Czech Republic, and Benelux, for which high correlations between annual Eout time series of SDD and DD are detected for selected hindcasts. Lower correlation is found for other European countries. It is demonstrated that SDD can be used to downscale the full ensemble of the Earth System Model of the Max Planck Institute (MPI-ESM) decadal prediction system. Long-term climate change projections in Special Report on Emission Scenarios of ECHAM5/MPI-OM as obtained by SDD agree well to the results of other studies using DD methods, with increasing Eout over northern Europe and a negative trend over southern Europe. Despite some biases, it is concluded that SDD is an adequate tool to assess regional wind energy changes in large model ensembles.
Resumo:
Preparing for episodes with risks of anomalous weather a month to a year ahead is an important challenge for governments, non-governmental organisations, and private companies and is dependent on the availability of reliable forecasts. The majority of operational seasonal forecasts are made using process-based dynamical models, which are complex, computationally challenging and prone to biases. Empirical forecast approaches built on statistical models to represent physical processes offer an alternative to dynamical systems and can provide either a benchmark for comparison or independent supplementary forecasts. Here, we present a simple empirical system based on multiple linear regression for producing probabilistic forecasts of seasonal surface air temperature and precipitation across the globe. The global CO2-equivalent concentration is taken as the primary predictor; subsequent predictors, including large-scale modes of variability in the climate system and local-scale information, are selected on the basis of their physical relationship with the predictand. The focus given to the climate change signal as a source of skill and the probabilistic nature of the forecasts produced constitute a novel approach to global empirical prediction. Hindcasts for the period 1961–2013 are validated against observations using deterministic (correlation of seasonal means) and probabilistic (continuous rank probability skill scores) metrics. Good skill is found in many regions, particularly for surface air temperature and most notably in much of Europe during the spring and summer seasons. For precipitation, skill is generally limited to regions with known El Niño–Southern Oscillation (ENSO) teleconnections. The system is used in a quasi-operational framework to generate empirical seasonal forecasts on a monthly basis.
Resumo:
This thesis develops and evaluates statistical methods for different types of genetic analyses, including quantitative trait loci (QTL) analysis, genome-wide association study (GWAS), and genomic evaluation. The main contribution of the thesis is to provide novel insights in modeling genetic variance, especially via random effects models. In variance component QTL analysis, a full likelihood model accounting for uncertainty in the identity-by-descent (IBD) matrix was developed. It was found to be able to correctly adjust the bias in genetic variance component estimation and gain power in QTL mapping in terms of precision. Double hierarchical generalized linear models, and a non-iterative simplified version, were implemented and applied to fit data of an entire genome. These whole genome models were shown to have good performance in both QTL mapping and genomic prediction. A re-analysis of a publicly available GWAS data set identified significant loci in Arabidopsis that control phenotypic variance instead of mean, which validated the idea of variance-controlling genes. The works in the thesis are accompanied by R packages available online, including a general statistical tool for fitting random effects models (hglm), an efficient generalized ridge regression for high-dimensional data (bigRR), a double-layer mixed model for genomic data analysis (iQTL), a stochastic IBD matrix calculator (MCIBD), a computational interface for QTL mapping (qtl.outbred), and a GWAS analysis tool for mapping variance-controlling loci (vGWAS).
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The identification of genes essential for survival is important for the understanding of the minimal requirements for cellular life and for drug design. As experimental studies with the purpose of building a catalog of essential genes for a given organism are time-consuming and laborious, a computational approach which could predict gene essentiality with high accuracy would be of great value. We present here a novel computational approach, called NTPGE (Network Topology-based Prediction of Gene Essentiality), that relies on the network topology features of a gene to estimate its essentiality. The first step of NTPGE is to construct the integrated molecular network for a given organism comprising protein physical, metabolic and transcriptional regulation interactions. The second step consists in training a decision-tree-based machine-learning algorithm on known essential and non-essential genes of the organism of interest, considering as learning attributes the network topology information for each of these genes. Finally, the decision-tree classifier generated is applied to the set of genes of this organism to estimate essentiality for each gene. We applied the NTPGE approach for discovering the essential genes in Escherichia coli and then assessed its performance. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
The code STATFLUX, implementing a new and simple statistical procedure for the calculation of transfer coefficients in radionuclide transport to animals and plants, is proposed. The method is based on the general multiple-compartment model, which uses a system of linear equations involving geometrical volume considerations. Flow parameters were estimated by employing two different least-squares procedures: Derivative and Gauss-Marquardt methods, with the available experimental data of radionuclide concentrations as the input functions of time. The solution of the inverse problem, which relates a given set of flow parameter with the time evolution of concentration functions, is achieved via a Monte Carlo Simulation procedure.Program summaryTitle of program: STATFLUXCatalogue identifier: ADYS_v1_0Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADYS_v1_0Program obtainable from: CPC Program Library, Queen's University of Belfast, N. IrelandLicensing provisions: noneComputer for which the program is designed and others on which it has been tested: Micro-computer with Intel Pentium III, 3.0 GHzInstallation: Laboratory of Linear Accelerator, Department of Experimental Physics, University of São Paulo, BrazilOperating system: Windows 2000 and Windows XPProgramming language used: Fortran-77 as implemented in Microsoft Fortran 4.0. NOTE: Microsoft Fortran includes non-standard features which are used in this program. Standard Fortran compilers such as, g77, f77, ifort and NAG95, are not able to compile the code and therefore it has not been possible for the CPC Program Library to test the program.Memory, required to execute with typical data: 8 Mbytes of RAM memory and 100 MB of Hard disk memoryNo. of bits in a word: 16No. of lines in distributed program, including test data, etc.: 6912No. of bytes in distributed Program, including test data, etc.: 229 541Distribution format: tar.gzNature of the physical problem: the investigation of transport mechanisms for radioactive substances, through environmental pathways, is very important for radiological protection of populations. One such pathway, associated with the food chain, is the grass-animal-man sequence. The distribution of trace elements in humans and laboratory animals has been intensively studied over the past 60 years [R.C. Pendlenton, C.W. Mays, R.D. Lloyd, A.L. Brooks, Differential accumulation of iodine-131 from local fallout in people and milk, Health Phys. 9 (1963) 1253-1262]. In addition, investigations on the incidence of cancer in humans, and a possible causal relationship to radioactive fallout, have been undertaken [E.S. Weiss, M.L. Rallison, W.T. London, W.T. Carlyle Thompson, Thyroid nodularity in southwestern Utah school children exposed to fallout radiation, Amer. J. Public Health 61 (1971) 241-249; M.L. Rallison, B.M. Dobyns, F.R. Keating, J.E. Rall, F.H. Tyler, Thyroid diseases in children, Amer. J. Med. 56 (1974) 457-463; J.L. Lyon, M.R. Klauber, J.W. Gardner, K.S. Udall, Childhood leukemia associated with fallout from nuclear testing, N. Engl. J. Med. 300 (1979) 397-402]. From the pathways of entry of radionuclides in the human (or animal) body, ingestion is the most important because it is closely related to life-long alimentary (or dietary) habits. Those radionuclides which are able to enter the living cells by either metabolic or other processes give rise to localized doses which can be very high. The evaluation of these internally localized doses is of paramount importance for the assessment of radiobiological risks and radiological protection. The time behavior of trace concentration in organs is the principal input for prediction of internal doses after acute or chronic exposure. The General Multiple-Compartment Model (GMCM) is the powerful and more accepted method for biokinetical studies, which allows the calculation of concentration of trace elements in organs as a function of time, when the flow parameters of the model are known. However, few biokinetics data exist in the literature, and the determination of flow and transfer parameters by statistical fitting for each system is an open problem.Restriction on the complexity of the problem: This version of the code works with the constant volume approximation, which is valid for many situations where the biological half-live of a trace is lower than the volume rise time. Another restriction is related to the central flux model. The model considered in the code assumes that exist one central compartment (e.g., blood), that connect the flow with all compartments, and the flow between other compartments is not included.Typical running time: Depends on the choice for calculations. Using the Derivative Method the time is very short (a few minutes) for any number of compartments considered. When the Gauss-Marquardt iterative method is used the calculation time can be approximately 5-6 hours when similar to 15 compartments are considered. (C) 2006 Elsevier B.V. All rights reserved.
Resumo:
Predictability is related to the uncertainty in the outcome of future events during the evolution of the state of a system. The cluster weighted modeling (CWM) is interpreted as a tool to detect such an uncertainty and used it in spatially distributed systems. As such, the simple prediction algorithm in conjunction with the CWM forms a powerful set of methods to relate predictability and dimension.
Resumo:
The present paper deals with estimation of variance components, prediction of breeding values and selection in a population of rubber tree [Hevea brasiliensis (Willd. ex Adr. de Juss.) Müell.-Arg.] from Rio Branco, State of Acre, Brazil. The REML/BLUP (restricted maximum likelihood/best linear unbiased prediction) procedure was applied. For this purpose, 37 rubber tree families were obtained and assessed in a randomized complete block design, with three unbalanced replications. The field trial was carried out at the Experimental Station of UNESP, located in Selvíria, State of Mato Grosso do Sul, Brazil. The quantitative traits evaluated were: girth (G), bark thickness (BT), number of latex vessel rings (NR), and plant height (PH). Given the unbalanced condition of the progeny test, the REML/BLUP procedure was used for estimation. The narrow-sense individual heritability estimates were 0.43 for G, 0.18 for BT, 0.01 for NR, and 0.51 for PH. Two selection strategies were adopted: one short-term (ST - selection intensity of 8.85%) and the other long-term (LT - selection intensity of 26.56%). For G, the estimated genetic gains in relation to the population average were 26.80% and 17.94%, respectively, according to the ST and LT strategies. The effective population sizes were 22.35 and 46.03, respectively. The LT and ST strategies maintained 45.80% and 28.24%, respectively, of the original genetic diversity represented in the progeny test. So, it can be inferred that this population has potential for both breeding and ex situ genetic conservation as a supplier of genetic material for advanced rubber tree breeding programs. Copyright by the Brazilian Society of Genetics.
Resumo:
Structural health monitoring (SHM) is related to the ability of monitoring the state and deciding the level of damage or deterioration within aerospace, civil and mechanical systems. In this sense, this paper deals with the application of a two-step auto-regressive and auto-regressive with exogenous inputs (AR-ARX) model for linear prediction of damage diagnosis in structural systems. This damage detection algorithm is based on the. monitoring of residual error as damage-sensitive indexes, obtained through vibration response measurements. In complex structures there are. many positions under observation and a large amount of data to be handed, making difficult the visualization of the signals. This paper also investigates data compression by using principal component analysis. In order to establish a threshold value, a fuzzy c-means clustering is taken to quantify the damage-sensitive index in an unsupervised learning mode. Tests are made in a benchmark problem, as proposed by IASC-ASCE with different damage patterns. The diagnosis that was obtained showed high correlation with the actual integrity state of the structure. Copyright © 2007 by ABCM.
Resumo:
This paper introduces a methodology for predicting the surface roughness of advanced ceramics using Adaptive Neuro-Fuzzy Inference System (ANFIS). To this end, a grinding machine was used, equipped with an acoustic emission sensor and a power transducer connected to the electric motor rotating the diamond grinding wheel. The alumina workpieces used in this work were pressed and sintered into rectangular bars. Acoustic emission and cutting power signals were collected during the tests and digitally processed to calculate the mean, standard deviation, and two other statistical data. These statistics, as well the root mean square of the acoustic emission and cutting power signals were used as input data for ANFIS. The output values of surface roughness (measured during the tests) were implemented for training and validation of the model. The results indicated that an ANFIS network is an excellent tool when applied to predict the surface roughness of ceramic workpieces in the grinding process.
Resumo:
The purpose of this study was to compare-using cephalometric analysis (McNamara, and Legan and Burstone)-prediction tracings performed using three different methods, that is, manual and using the Dentofacial Planner Plus and Dolphin Image computer programs, with postoperative outcomes. Pre- and postoperative (6 months after surgery) lateral cephalometric radiographs were selected from 25 long-faced patients treated with combined surgery. Prediction tracings were made with each method and compared cephalometrically with the postoperative results. This protocol was repeated once more for method error evaluation. Statistical analysis was made by ANOVA and the Tukey test. The results showed superior predictability when the manual method was applied (50% similarity to postoperative results), followed by Dentofacial Planner Plus (31.2%) and Dolphin Image (18.8%). The experimental condition suggests that the manual method provides greater accuracy, although the predictability of the digital methods proved quite satisfactory. © 2013 World Federation of Orthodontists.
Resumo:
This study aimed to investigate the potential use of magnetic susceptibility (MS) as pedotransfer function to predict soil attributes under two sugarcane harvesting management systems. For each area of 1 ha (one with green sugarcane mechanized harvesting and other one with burnt sugarcane manual harvesting), 126 soil samples were collected and subjected to laboratory analysis to determine soil physical, chemical and mineralogical attributes and for measuring of MS. Data were submitted to descriptive statistics by calculating the mean and coefficient of variation. In order to compare the means in the different harvesting management systems it was carried out the Tukey test at a significance level of 5%. In order to investigate the correlation of the MS with other soil properties it was made the correlation test and aiming to assess how the MS contributes to the prediction of soil complex attributes it was made the multiple linear regressions. The results demonstrate that MS showed, in both sugarcane harvesting management systems, statistical correlation with chemical, physical and mineralogical soil attributes and it also showed potential to be used as pedotransfer function to predict attributes of the studied oxisol.