849 resultados para large sample distributions
We consider the comparison of two formulations in terms of average bioequivalence using the 2 × 2 cross-over design. In a bioequivalence study, the primary outcome is a pharmacokinetic measure, such as the area under the plasma concentration by time curve, which is usually assumed to have a lognormal distribution. The criterion typically used for claiming bioequivalence is that the 90% confidence interval for the ratio of the means should lie within the interval (0.80, 1.25), or equivalently the 90% confidence interval for the differences in the means on the natural log scale should be within the interval (-0.2231, 0.2231). We compare the gold standard method for calculation of the sample size based on the non-central t distribution with those based on the central t and normal distributions. In practice, the differences between the various approaches are likely to be small. Further approximations to the power function are sometimes used to simplify the calculations. These approximations should be used with caution, because the sample size required for a desirable level of power might be under- or overestimated compared to the gold standard method. However, in some situations the approximate methods produce very similar sample sizes to the gold standard method. Copyright © 2005 John Wiley & Sons, Ltd.
The problem of estimating the individual probabilities of a discrete distribution is considered. The true distribution of the independent observations is a mixture of a family of power series distributions. First, we ensure identifiability of the mixing distribution assuming mild conditions. Next, the mixing distribution is estimated by non-parametric maximum likelihood and an estimator for individual probabilities is obtained from the corresponding marginal mixture density. We establish asymptotic normality for the estimator of individual probabilities by showing that, under certain conditions, the difference between this estimator and the empirical proportions is asymptotically negligible. Our framework includes Poisson, negative binomial and logarithmic series as well as binomial mixture models. Simulations highlight the benefit in achieving normality when using the proposed marginal mixture density approach instead of the empirical one, especially for small sample sizes and/or when interest is in the tail areas. A real data example is given to illustrate the use of the methodology.
We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump ( RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.
Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (similar to30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.
A revised Bayesian algorithm for estimating surface rain rate, convective rain proportion, and latent heating profiles from satellite-borne passive microwave radiometer observations over ocean backgrounds is described. The algorithm searches a large database of cloud-radiative model simulations to find cloud profiles that are radiatively consistent with a given set of microwave radiance measurements. The properties of these radiatively consistent profiles are then composited to obtain best estimates of the observed properties. The revised algorithm is supported by an expanded and more physically consistent database of cloud-radiative model simulations. The algorithm also features a better quantification of the convective and nonconvective contributions to total rainfall, a new geographic database, and an improved representation of background radiances in rain-free regions. Bias and random error estimates are derived from applications of the algorithm to synthetic radiance data, based upon a subset of cloud-resolving model simulations, and from the Bayesian formulation itself. Synthetic rain-rate and latent heating estimates exhibit a trend of high (low) bias for low (high) retrieved values. The Bayesian estimates of random error are propagated to represent errors at coarser time and space resolutions, based upon applications of the algorithm to TRMM Microwave Imager (TMI) data. Errors in TMI instantaneous rain-rate estimates at 0.5°-resolution range from approximately 50% at 1 mm h−1 to 20% at 14 mm h−1. Errors in collocated spaceborne radar rain-rate estimates are roughly 50%–80% of the TMI errors at this resolution. The estimated algorithm random error in TMI rain rates at monthly, 2.5° resolution is relatively small (less than 6% at 5 mm day−1) in comparison with the random error resulting from infrequent satellite temporal sampling (8%–35% at the same rain rate). Percentage errors resulting from sampling decrease with increasing rain rate, and sampling errors in latent heating rates follow the same trend. Averaging over 3 months reduces sampling errors in rain rates to 6%–15% at 5 mm day−1, with proportionate reductions in latent heating sampling errors.
Over recent years there has been an increasing deployment of renewable energy generation technologies, particularly large-scale wind farms. As wind farm deployment increases, it is vital to gain a good understanding of how the energy produced is affected by climate variations, over a wide range of time-scales, from short (hours to weeks) to long (months to decades) periods. By relating wind speed at specific sites in the UK to a large-scale climate pattern (the North Atlantic Oscillation or "NAO"), the power generated by a modelled wind turbine under three different NAO states is calculated. It was found that the wind conditions under these NAO states may yield a difference in the mean wind power output of up to 10%. A simple model is used to demonstrate that forecasts of future NAO states can potentially be used to improve month-ahead statistical forecasts of monthly-mean wind power generation. The results confirm that the NAO has a significant impact on the hourly-, daily- and monthly-mean power output distributions from the turbine with important implications for (a) the use of meteorological data (e.g. their relationship to large scale climate patterns) in wind farm site assessment and, (b) the utilisation of seasonal-to-decadal climate forecasts to estimate future wind farm power output. This suggests that further research into the links between large-scale climate variability and wind power generation is both necessary and valuable.
We compare rain event size distributions derived from measurements in climatically different regions, which we find to be well approximated by power laws of similar exponents over broad ranges. Differences can be seen in the large-scale cutoffs of the distributions. Event duration distributions suggest that the scale-free aspects are related to the absence of characteristic scales in the meteorological mesoscale.
Magmas in volcanic conduits commonly contain microlites in association with preexisting phenocrysts, as often indicated by volcanic rock textures. In this study, we present two different experiments that inves- tigate the flow behavior of these bidisperse systems. In the first experiments, rotational rheometric methods are used to determine the rheology of monodisperse and polydisperse suspensions consisting of smaller, prolate particles (microlites) and larger, equant particles (phenocrysts) in a bubble‐free Newtonian liquid (silicate melt). Our data show that increasing the relative proportion of prolate microlites to equant pheno- crysts in a magma at constant total particle content can increase the relative viscosity by up to three orders of magnitude. Consequently, the rheological effect of particles in magmas cannot be modeled by assuming a monodisperse population of particles. We propose a new model that uses interpolated parameters based on the relative proportions of small and large particles and produces a considerably improved fit to the data than earlier models. In a second series of experiments we investigate the textures produced by shearing bimodal suspensions in gradually solidifying epoxy resin in a concentric cylinder setup. The resulting textures show the prolate particles are aligned with the flow lines and spherical particles are found in well‐organized strings, with sphere‐depleted shear bands in high‐shear regions. These observations may explain the measured variation in the shear thinning and yield stress behavior with increasing solid fraction and particle aspect ratio. The implications for magma flow are discussed, and rheological results and tex- tural observations are compared with observations on natural samples.
An account is given of a number of recent studies with idealised models whose aim is to further understanding of the large-scale tropical atmospheric circulation. Initial-value integrations with a model with imposed heating are used to discuss aspects of the Asian summer monsoon, including constraints on cross-equatorial flow into the monsoon. The summer descent in the Mediterranean region and on the eastern sides of the summer subtropical anticyclones are seen to be associated with the monsoons to their east. An aqua-planet GCM is used to investigate the relationship between simple SST distributions and tropical convection and circulation. The existence of strong equatorial convection and Hadley cells is found to depend sensitively on the curvature of the meridional profile in SST. Zonally confined SST maxima produce convective maxima centred to the west and suppression of convection elsewhere. Strong equatorial zonal flow changes are found in some experiments and three mechanisms for producing these are investigated in a model with imposed heating. 1.
This paper review the literature on the distribution of commercial real estate returns. There is growing evidence that the assumption of normality in returns is not safe. Distributions are found to be peaked, fat-tailed and, tentatively, skewed. There is some evidence of compound distributions and non-linearity. Public traded real estate assets (such as property company or REIT shares) behave in a fashion more similar to other common stocks. However, as in equity markets, it would be unwise to assume normality uncritically. Empirical evidence for UK real estate markets is obtained by applying distribution fitting routines to IPD Monthly Index data for the aggregate index and selected sub-sectors. It is clear that normality is rejected in most cases. It is often argued that observed differences in real estate returns are a measurement issue resulting from appraiser behaviour. However, unsmoothing the series does not assist in modelling returns. A large proportion of returns are close to zero. This would be characteristic of a thinly-traded market where new information arrives infrequently. Analysis of quarterly data suggests that, over longer trading periods, return distributions may conform more closely to those found in other asset markets. These results have implications for the formulation and implementation of a multi-asset portfolio allocation strategy.
Reliable techniques for screening large numbers of plants for root traits are still being developed, but include aeroponic, hydroponic and agar plate systems. Coupled with digital cameras and image analysis software, these systems permit the rapid measurement of root numbers, length and diameter in moderate ( typically <1000) numbers of plants. Usually such systems are employed with relatively small seedlings, and information is recorded in 2D. Recent developments in X-ray microtomography have facilitated 3D non-invasive measurement of small root systems grown in solid media, allowing angular distributions to be obtained in addition to numbers and length. However, because of the time taken to scan samples, only a small number can be screened (typically<10 per day, not including analysis time of the large spatial datasets generated) and, depending on sample size, limited resolution may mean that fine roots remain unresolved. Although agar plates allow differences between lines and genotypes to be discerned in young seedlings, the rank order may not be the same when the same materials are grown in solid media. For example, root length of dwarfing wheat ( Triticum aestivum L.) lines grown on agar plates was increased by similar to 40% relative to wild-type and semi-dwarfing lines, but in a sandy loam soil under well watered conditions it was decreased by 24-33%. Such differences in ranking suggest that significant soil environment-genotype interactions are occurring. Developments in instruments and software mean that a combination of high-throughput simple screens and more in-depth examination of root-soil interactions is becoming viable.
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
We compare the characteristics of synthetic European droughts generated by the HiGEM1 coupled climate model run with present day atmospheric composition with observed drought events extracted from the CRU TS3 data set. The results demonstrate consistency in both the rate of drought occurrence and the spatiotemporal structure of the events. Estimates of the probability density functions for event area, duration and severity are shown to be similar with confidence > 90%. Encouragingly, HiGEM is shown to replicate the extreme tails of the observed distributions and thus the most damaging European drought events. The soil moisture state is shown to play an important role in drought development. Once a large-scale drought has been initiated it is found to be 50% more likely to continue if the local soil moisture is below the 40th percentile. In response to increased concentrations of atmospheric CO2, the modelled droughts are found to increase in duration, area and severity. The drought response can be largely attributed to temperature driven changes in relative humidity. 1 HiGEM is based on the latest climate configuration of the Met Office Hadley Centre Unified Model (HadGEM1) with the horizontal resolution increased to 1.25 x 0.83 degrees in longitude and latitude in the atmosphere and 1/3 x 1/3 degrees in the ocean.
By modelling the average activity of large neuronal populations, continuum mean field models (MFMs) have become an increasingly important theoretical tool for understanding the emergent activity of cortical tissue. In order to be computationally tractable, long-range propagation of activity in MFMs is often approximated with partial differential equations (PDEs). However, PDE approximations in current use correspond to underlying axonal velocity distributions incompatible with experimental measurements. In order to rectify this deficiency, we here introduce novel propagation PDEs that give rise to smooth unimodal distributions of axonal conduction velocities. We also argue that velocities estimated from fibre diameters in slice and from latency measurements, respectively, relate quite differently to such distributions, a significant point for any phenomenological description. Our PDEs are then successfully fit to fibre diameter data from human corpus callosum and rat subcortical white matter. This allows for the first time to simulate long-range conduction in the mammalian brain with realistic, convenient PDEs. Furthermore, the obtained results suggest that the propagation of activity in rat and human differs significantly beyond mere scaling. The dynamical consequences of our new formulation are investigated in the context of a well known neural field model. On the basis of Turing instability analyses, we conclude that pattern formation is more easily initiated using our more realistic propagator. By increasing characteristic conduction velocities, a smooth transition can occur from self-sustaining bulk oscillations to travelling waves of various wavelengths, which may influence axonal growth during development. Our analytic results are also corroborated numerically using simulations on a large spatial grid. Thus we provide here a comprehensive analysis of empirically constrained activity propagation in the context of MFMs, which will allow more realistic studies of mammalian brain activity in the future.
Mean field models (MFMs) of cortical tissue incorporate salient, average features of neural masses in order to model activity at the population level, thereby linking microscopic physiology to macroscopic observations, e.g., with the electroencephalogram (EEG). One of the common aspects of MFM descriptions is the presence of a high-dimensional parameter space capturing neurobiological attributes deemed relevant to the brain dynamics of interest. We study the physiological parameter space of a MFM of electrocortical activity and discover robust correlations between physiological attributes of the model cortex and its dynamical features. These correlations are revealed by the study of bifurcation plots, which show that the model responses to changes in inhibition belong to two archetypal categories or “families”. After investigating and characterizing them in depth, we discuss their essential differences in terms of four important aspects: power responses with respect to the modeled action of anesthetics, reaction to exogenous stimuli such as thalamic input, and distributions of model parameters and oscillatory repertoires when inhibition is enhanced. Furthermore, while the complexity of sustained periodic orbits differs significantly between families, we are able to show how metamorphoses between the families can be brought about by exogenous stimuli. We here unveil links between measurable physiological attributes of the brain and dynamical patterns that are not accessible by linear methods. They instead emerge when the nonlinear structure of parameter space is partitioned according to bifurcation responses. We call this general method “metabifurcation analysis”. The partitioning cannot be achieved by the investigation of only a small number of parameter sets and is instead the result of an automated bifurcation analysis of a representative sample of 73,454 physiologically admissible parameter sets. Our approach generalizes straightforwardly and is well suited to probing the dynamics of other models with large and complex parameter spaces.