30 resultados para Statistical Theory
Resumo:
Markov chain Monte Carlo (MCMC) is a methodology that is gaining widespread use in the phylogenetics community and is central to phylogenetic software packages such as MrBayes. An important issue for users of MCMC methods is how to select appropriate values for adjustable parameters such as the length of the Markov chain or chains, the sampling density, the proposal mechanism, and, if Metropolis-coupled MCMC is being used, the number of heated chains and their temperatures. Although some parameter settings have been examined in detail in the literature, others are frequently chosen with more regard to computational time or personal experience with other data sets. Such choices may lead to inadequate sampling of tree space or an inefficient use of computational resources. We performed a detailed study of convergence and mixing for 70 randomly selected, putatively orthologous protein sets with different sizes and taxonomic compositions. Replicated runs from multiple random starting points permit a more rigorous assessment of convergence, and we developed two novel statistics, delta and epsilon, for this purpose. Although likelihood values invariably stabilized quickly, adequate sampling of the posterior distribution of tree topologies took considerably longer. Our results suggest that multimodality is common for data sets with 30 or more taxa and that this results in slow convergence and mixing. However, we also found that the pragmatic approach of combining data from several short, replicated runs into a metachain to estimate bipartition posterior probabilities provided good approximations, and that such estimates were no worse in approximating a reference posterior distribution than those obtained using a single long run of the same length as the metachain. Precision appears to be best when heated Markov chains have low temperatures, whereas chains with high temperatures appear to sample trees with high posterior probabilities only rarely. [Bayesian phylogenetic inference; heating parameter; Markov chain Monte Carlo; replicated chains.]
Resumo:
Pharmacodynamics (PD) is the study of the biochemical and physiological effects of drugs. The construction of optimal designs for dose-ranging trials with multiple periods is considered in this paper, where the outcome of the trial (the effect of the drug) is considered to be a binary response: the success or failure of a drug to bring about a particular change in the subject after a given amount of time. The carryover effect of each dose from one period to the next is assumed to be proportional to the direct effect. It is shown for a logistic regression model that the efficiency of optimal parallel (single-period) or crossover (two-period) design is substantially greater than a balanced design. The optimal designs are also shown to be robust to misspecification of the value of the parameters. Finally, the parallel and crossover designs are combined to provide the experimenter with greater flexibility.
Resumo:
In early generation variety trials, large numbers of new breeders' lines need to be compared, and usually there is little seed available for each new line. A so-called unreplicated trial has each new line on just one plot at a site, but includes several (often around five) replicated check or control (or standard) varieties. The total proportion of check plots is usually between 10% and 20%. The aim of the trial is to choose some good performing lines (usually around 1/3 of those tested) to go on for further testing, rather than precise estimation of their mean yield. Now that spatial analyses of data from field experiments are becoming more common, there is interest in an efficient layout of an experiment given a proposed spatial analysis. Some possible design criteria are discussed, and efficient layouts under spatial dependence are considered.
Resumo:
In early generation variety trials, large numbers of new breeders' lines (varieties) may be compared, with each having little seed available. A so-called unreplicated trial has each new variety on just one plot at a site, but includes several replicated control varieties, making up around 10% and 20% of the trial. The aim of the trial is to choose some (usually around one third) good performing new varieties to go on for further testing, rather than precise estimation of their mean yields. Now that spatial analyses of data from field experiments are becoming more common, there is interest in an efficient layout of an experiment given a proposed spatial analysis and an efficiency criterion. Common optimal design criteria values depend on the usual C-matrix, which is very large, and hence it is time consuming to calculate its inverse. Since most varieties are unreplicated, the variety incidence matrix has a simple form, and some matrix manipulations can dramatically reduce the computation needed. However, there are many designs to compare, and numerical optimisation lacks insight into good design features. Some possible design criteria are discussed, and approximations to their values considered. These allow the features of efficient layouts under spatial dependence to be given and compared. (c) 2006 Elsevier Inc. All rights reserved.
Resumo:
We consider the problems of computing the power and exponential moments EXs and EetX of square Gaussian random matrices X=A+BWC for positive integer s and real t, where W is a standard normal random vector and A, B, C are appropriately dimensioned constant matrices. We solve the problems by a matrix product scalarization technique and interpret the solutions in system-theoretic terms. The results of the paper are applicable to Bayesian prediction in multivariate autoregressive time series and mean-reverting diffusion processes.
Resumo:
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena. While normal mixture models are often used to cluster data sets of continuous multivariate data, a more robust clustering can be obtained by considering the t mixture model-based approach. Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data where the number of observations n is very large relative to their dimension p. As the approach using the multivariate normal family of distributions is sensitive to outliers, it is more robust to adopt the multivariate t family for the component error and factor distributions. The computational aspects associated with robustness and high dimensionality in these approaches to cluster analysis are discussed and illustrated.
Resumo:
Statistical tests of Load-Unload Response Ratio (LURR) signals are carried in order to verify statistical robustness of the previous studies using the Lattice Solid Model (MORA et al., 2002b). In each case 24 groups of samples with the same macroscopic parameters (tidal perturbation amplitude A, period T and tectonic loading rate k) but different particle arrangements are employed. Results of uni-axial compression experiments show that before the normalized time of catastrophic failure, the ensemble average LURR value rises significantly, in agreement with the observations of high LURR prior to the large earthquakes. In shearing tests, two parameters are found to control the correlation between earthquake occurrence and tidal stress. One is, A/(kT) controlling the phase shift between the peak seismicity rate and the peak amplitude of the perturbation stress. With an increase of this parameter, the phase shift is found to decrease. Another parameter, AT/k, controls the height of the probability density function (Pdf) of modeled seismicity. As this parameter increases, the Pdf becomes sharper and narrower, indicating a strong triggering. Statistical studies of LURR signals in shearing tests also suggest that except in strong triggering cases, where LURR cannot be calculated due to poor data in unloading cycles, the larger events are more likely to occur in higher LURR periods than the smaller ones, supporting the LURR hypothesis.
Resumo:
The Accelerating Moment Release (AMR) preceding earthquakes with magnitude above 5 in Australia that occurred during the last 20 years was analyzed to test the Critical Point Hypothesis. Twelve earthquakes in the catalog were chosen based on a criterion for the number of nearby events. Results show that seven sequences with numerous events recorded leading up to the main earthquake exhibited accelerating moment release. Two occurred near in time and space to other earthquakes preceded by AM R. The remaining three sequences had very few events in the catalog so the lack of AMR detected in the analysis may be related to catalog incompleteness. Spatio-temporal scanning of AMR parameters shows that 80% of the areas in which AMR occurred experienced large events. In areas of similar background seismicity with no large events, 10 out of 12 cases exhibit no AMR, and two others are false alarms where AMR was observed but no large event followed. The relationship between AMR and Load-Unload Response Ratio (LURR) was studied. Both methods predict similar critical region sizes, however, the critical point time using AMR is slightly earlier than the time of the critical point LURR anomaly.
Resumo:
In this work, we propose an improvement of the classical Derjaguin-Broekhoff-de Boer (DBdB) theory for capillary condensation/evaporation in mesoporous systems. The primary idea of this improvement is to employ the Gibbs-Tolman-Koenig-Buff equation to predict the surface tension changes in mesopores. In addition, the statistical film thickness (so-called t-curve) evaluated accurately on the basis of the adsorption isotherms measured for the MCM-41 materials is used instead of the originally proposed t-curve (to take into account the excess of the chemical potential due to the surface forces). It is shown that the aforementioned modifications of the original DBdB theory have significant implications for the pore size analysis of mesoporous solids. To verify our improvement of the DBdB pore size analysis method (IDBdB), a series of the calcined MCM-41 samples, which are well-defined materials with hexagonally ordered cylindrical mesopores, were used for the evaluation of the pore size distributions. The correlation of the IDBdB method with the empirically calibrated Kruk-Jaroniec-Sayari (KJS) relationship is very good in the range of small mesopores. So, a major advantage of the IDBdB method is its applicability for small mesopores as well as for the mesopore range beyond that established by the KJS calibration, i.e., for mesopore radii greater than similar to4.5 nm. The comparison of the IDBdB results with experimental data reported by Kruk and Jaroniec for capillary condensation/evaporation as well as with the results from nonlocal density functional theory developed by Neimark et al. clearly justifies our approach. Note that the proposed improvement of the classical DBdB method preserves its original simplicity and simultaneously ensures a significant improvement of the pore size analysis, which is confirmed by the independent estimation of the mean pore size by the powder X-ray diffraction method.
Resumo:
The recent deregulation in electricity markets worldwide has heightened the importance of risk management in energy markets. Assessing Value-at-Risk (VaR) in electricity markets is arguably more difficult than in traditional financial markets because the distinctive features of the former result in a highly unusual distribution of returns-electricity returns are highly volatile, display seasonalities in both their mean and volatility, exhibit leverage effects and clustering in volatility, and feature extreme levels of skewness and kurtosis. With electricity applications in mind, this paper proposes a model that accommodates autoregression and weekly seasonals in both the conditional mean and conditional volatility of returns, as well as leverage effects via an EGARCH specification. In addition, extreme value theory (EVT) is adopted to explicitly model the tails of the return distribution. Compared to a number of other parametric models and simple historical simulation based approaches, the proposed EVT-based model performs well in forecasting out-of-sample VaR. In addition, statistical tests show that the proposed model provides appropriate interval coverage in both unconditional and, more importantly, conditional contexts. Overall, the results are encouraging in suggesting that the proposed EVT-based model is a useful technique in forecasting VaR in electricity markets. (c) 2005 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Resumo:
In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare algorithms using as many different parameter settings and test problems as possible, in border to have a clear and detailed picture of their performance. Unfortunately, the total number of experiments required may be very large, which often makes such research work computationally prohibitive. In this paper, the application of a statistical method called racing is proposed as a general-purpose tool to reduce the computational requirements of large-scale experimental studies in evolutionary algorithms. Experimental results are presented that show that racing typically requires only a small fraction of the cost of an exhaustive experimental study.