972 resultados para Models, statistical


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Modeling and predicting co-occurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing co-occurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are derived to fit the model parameters. We develop improved versions of EM which largely avoid overfitting problems and overcome the inherent locality of EM--based optimization. Among the broad variety of possible applications, e.g., in information retrieval, natural language processing, data mining, and computer vision, we have chosen document retrieval, the statistical analysis of noun/adjective co-occurrence and the unsupervised segmentation of textured images to test and evaluate the proposed algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new technique is described for the analysis of cloud-resolving model simulations, which allows one to investigate the statistics of the lifecycles of cumulus clouds. Clouds are tracked from timestep-to-timestep within the model run. This allows for a very simple method of tracking, but one which is both comprehensive and robust. An approach for handling cloud splits and mergers is described which allows clouds with simple and complicated time histories to be compared within a single framework. This is found to be important for the analysis of an idealized simulation of radiative-convective equilibrium, in which the moist, buoyant, updrafts (i.e., the convective cores) were tracked. Around half of all such cores were subject to splits and mergers during their lifecycles. For cores without any such events, the average lifetime is 30min, but events can lengthen the typical lifetime considerably.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

An appropriate model of recent human evolution is not only important to understand our own history, but it is necessary to disentangle the effects of demography and selection on genome diversity. Although most genetic data support the view that our species originated recently in Africa, it is still unclear if it completely replaced former members of the Homo genus, or if some interbreeding occurred during its range expansion. Several scenarios of modern human evolution have been proposed on the basis of molecular and paleontological data, but their likelihood has never been statistically assessed. Using DNA data from 50 nuclear loci sequenced in African, Asian and Native American samples, we show here by extensive simulations that a simple African replacement model with exponential growth has a higher probability (78%) as compared with alternative multiregional evolution or assimilation scenarios. A Bayesian analysis of the data under this best supported model points to an origin of our species approximate to 141 thousand years ago (Kya), an exit out-of-Africa approximate to 51 Kya, and a recent colonization of the Americas approximate to 10.5 Kya. We also find that the African replacement model explains not only the shallow ancestry of mtDNA or Y-chromosomes but also the occurrence of deep lineages at some autosomal loci, which has been formerly interpreted as a sign of interbreeding with Homo erectus.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We assessed the vulnerability of blanket peat to climate change in Great Britain using an ensemble of 8 bioclimatic envelope models. We used 4 published models that ranged from simple threshold models, based on total annual precipitation, to Generalised Linear Models (GLMs, based on mean annual temperature). In addition, 4 new models were developed which included measures of water deficit as threshold, classification tree, GLM and generalised additive models (GAM). Models that included measures of both hydrological conditions and maximum temperature provided a better fit to the mapped peat area than models based on hydrological variables alone. Under UKCIP02 projections for high (A1F1) and low (B1) greenhouse gas emission scenarios, 7 out of the 8 models showed a decline in the bioclimatic space associated with blanket peat. Eastern regions (Northumbria, North York Moors, Orkney) were shown to be more vulnerable than higher-altitude, western areas (Highlands, Western Isles and Argyle, Bute and The Trossachs). These results suggest a long-term decline in the distribution of actively growing blanket peat, especially under the high emissions scenario, although it is emphasised that existing peatlands may well persist for decades under a changing climate. Observational data from long-term monitoring and manipulation experiments in combination with process-based models are required to explore the nature and magnitude of climate change impacts on these vulnerable areas more fully.

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Relevância:

40.00% 40.00%

Publicador:

Resumo:

As in any field of scientific inquiry, advancements in the field of second language acquisition (SLA) rely in part on the interpretation and generalizability of study findings using quantitative data analysis and inferential statistics. While statistical techniques such as ANOVA and t-tests are widely used in second language research, this review article provides a review of a class of newer statistical models that have not yet been widely adopted in the field, but have garnered interest in other fields of language research. The class of statistical models called mixed-effects models are introduced, and the potential benefits of these models for the second language researcher are discussed. A simple example of mixed-effects data analysis using the statistical software package R (R Development Core Team, 2011) is provided as an introduction to the use of these statistical techniques, and to exemplify how such analyses can be reported in research articles. It is concluded that mixed-effects models provide the second language researcher with a powerful tool for the analysis of a variety of types of second language acquisition data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This article examines the ability of several models to generate optimal hedge ratios. Statistical models employed include univariate and multivariate generalized autoregressive conditionally heteroscedastic (GARCH) models, and exponentially weighted and simple moving averages. The variances of the hedged portfolios derived using these hedge ratios are compared with those based on market expectations implied by the prices of traded options. One-month and three-month hedging horizons are considered for four currency pairs. Overall, it has been found that an exponentially weighted moving-average model leads to lower portfolio variances than any of the GARCH-based, implied or time-invariant approaches.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A new frontier in weather forecasting is emerging by operational forecast models now being run at convection-permitting resolutions at many national weather services. However, this is not a panacea; significant systematic errors remain in the character of convective storms and rainfall distributions. The DYMECS project (Dynamical and Microphysical Evolution of Convective Storms) is taking a fundamentally new approach to evaluate and improve such models: rather than relying on a limited number of cases, which may not be representative, we have gathered a large database of 3D storm structures on 40 convective days using the Chilbolton radar in southern England. We have related these structures to storm life-cycles derived by tracking features in the rainfall from the UK radar network, and compared them statistically to storm structures in the Met Office model, which we ran at horizontal grid length between 1.5 km and 100 m, including simulations with different subgrid mixing length. We also evaluated the scale and intensity of convective updrafts using a new radar technique. We find that the horizontal size of simulated convective storms and the updrafts within them is much too large at 1.5-km resolution, such that the convective mass flux of individual updrafts can be too large by an order of magnitude. The scale of precipitation cores and updrafts decreases steadily with decreasing grid lengths, as does the typical storm lifetime. The 200-m grid-length simulation with standard mixing length performs best over all diagnostics, although a greater mixing length improves the representation of deep convective storms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A dynamical wind-wave climate simulation covering the North Atlantic Ocean and spanning the whole 21st century under the A1B scenario has been compared with a set of statistical projections using atmospheric variables or large scale climate indices as predictors. As a first step, the performance of all statistical models has been evaluated for the present-day climate; namely they have been compared with a dynamical wind-wave hindcast in terms of winter Significant Wave Height (SWH) trends and variance as well as with altimetry data. For the projections, it has been found that statistical models that use wind speed as independent variable predictor are able to capture a larger fraction of the winter SWH inter-annual variability (68% on average) and of the long term changes projected by the dynamical simulation. Conversely, regression models using climate indices, sea level pressure and/or pressure gradient as predictors, account for a smaller SWH variance (from 2.8% to 33%) and do not reproduce the dynamically projected long term trends over the North Atlantic. Investigating the wind-sea and swell components separately, we have found that the combination of two regression models, one for wind-sea waves and another one for the swell component, can improve significantly the wave field projections obtained from single regression models over the North Atlantic.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Generalized linear mixed models are flexible tools for modeling non-normal data and are useful for accommodating overdispersion in Poisson regression models with random effects. Their main difficulty resides in the parameter estimation because there is no analytic solution for the maximization of the marginal likelihood. Many methods have been proposed for this purpose and many of them are implemented in software packages. The purpose of this study is to compare the performance of three different statistical principles - marginal likelihood, extended likelihood, Bayesian analysis-via simulation studies. Real data on contact wrestling are used for illustration.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper uses an output oriented Data Envelopment Analysis (DEA) measure of technical efficiency to assess the technical efficiencies of the Brazilian banking system. Four approaches to estimation are compared in order to assess the significance of factors affecting inefficiency. These are nonparametric Analysis of Covariance, maximum likelihood using a family of exponential distributions, maximum likelihood using a family of truncated normal distributions, and the normal Tobit model. The sole focus of the paper is on a combined measure of output and the data analyzed refers to the year 2001. The factors of interest in the analysis and likely to affect efficiency are bank nature (multiple and commercial), bank type (credit, business, bursary and retail), bank size (large, medium, small and micro), bank control (private and public), bank origin (domestic and foreign), and non-performing loans. The latter is a measure of bank risk. All quantitative variables, including non-performing loans, are measured on a per employee basis. The best fits to the data are provided by the exponential family and the nonparametric Analysis of Covariance. The significance of a factor however varies according to the model fit although it can be said that there is some agreements between the best models. A highly significant association in all models fitted is observed only for nonperforming loans. The nonparametric Analysis of Covariance is more consistent with the inefficiency median responses observed for the qualitative factors. The findings of the analysis reinforce the significant association of the level of bank inefficiency, measured by DEA residuals, with the risk of bank failure.