939 resultados para Statistical Language Model
Resumo:
The proportional odds model provides a powerful tool for analysing ordered categorical data and setting sample size, although for many clinical trials its validity is questionable. The purpose of this paper is to present a new class of constrained odds models which includes the proportional odds model. The efficient score and Fisher's information are derived from the profile likelihood for the constrained odds model. These results are new even for the special case of proportional odds where the resulting statistics define the Mann-Whitney test. A strategy is described involving selecting one of these models in advance, requiring assumptions as strong as those underlying proportional odds, but allowing a choice of such models. The accuracy of the new procedure and its power are evaluated.
Resumo:
A physically motivated statistical model is used to diagnose variability and trends in wintertime ( October - March) Global Precipitation Climatology Project (GPCP) pentad (5-day mean) precipitation. Quasi-geostrophic theory suggests that extratropical precipitation amounts should depend multiplicatively on the pressure gradient, saturation specific humidity, and the meridional temperature gradient. This physical insight has been used to guide the development of a suitable statistical model for precipitation using a mixture of generalized linear models: a logistic model for the binary occurrence of precipitation and a Gamma distribution model for the wet day precipitation amount. The statistical model allows for the investigation of the role of each factor in determining variations and long-term trends. Saturation specific humidity q(s) has a generally negative effect on global precipitation occurrence and with the tropical wet pentad precipitation amount, but has a positive relationship with the pentad precipitation amount at mid- and high latitudes. The North Atlantic Oscillation, a proxy for the meridional temperature gradient, is also found to have a statistically significant positive effect on precipitation over much of the Atlantic region. Residual time trends in wet pentad precipitation are extremely sensitive to the choice of the wet pentad threshold because of increasing trends in low-amplitude precipitation pentads; too low a choice of threshold can lead to a spurious decreasing trend in wet pentad precipitation amounts. However, for not too small thresholds, it is found that the meridional temperature gradient is an important factor for explaining part of the long-term trend in Atlantic precipitation.
Resumo:
We compared output from 3 dynamic process-based models (DMs: ECOSSE, MILLENNIA and the Durham Carbon Model) and 9 bioclimatic envelope models (BCEMs; including BBOG ensemble and PEATSTASH) ranging from simple threshold to semi-process-based models. Model simulations were run at 4 British peatland sites using historical climate data and climate projections under a medium (A1B) emissions scenario from the 11-RCM (regional climate model) ensemble underpinning UKCP09. The models showed that blanket peatlands are vulnerable to projected climate change; however, predictions varied between models as well as between sites. All BCEMs predicted a shift from presence to absence of a climate associated with blanket peat, where the sites with the lowest total annual precipitation were closest to the presence/absence threshold. DMs showed a more variable response. ECOSSE predicted a decline in net C sink and shift to net C source by the end of this century. The Durham Carbon Model predicted a smaller decline in the net C sink strength, but no shift to net C source. MILLENNIA predicted a slight overall increase in the net C sink. In contrast to the BCEM projections, the DMs predicted that the sites with coolest temperatures and greatest total annual precipitation showed the largest change in carbon sinks. In this model inter-comparison, the greatest variation in model output in response to climate change projections was not between the BCEMs and DMs but between the DMs themselves, because of different approaches to modelling soil organic matter pools and decomposition amongst other processes. The difference in the sign of the response has major implications for future climate feedbacks, climate policy and peatland management. Enhanced data collection, in particular monitoring peatland response to current change, would significantly improve model development and projections of future change.
Resumo:
We explore the potential for making statistical decadal predictions of sea surface temperatures (SSTs) in a perfect model analysis, with a focus on the Atlantic basin. Various statistical methods (Lagged correlations, Linear Inverse Modelling and Constructed Analogue) are found to have significant skill in predicting the internal variability of Atlantic SSTs for up to a decade ahead in control integrations of two different global climate models (GCMs), namely HadCM3 and HadGEM1. Statistical methods which consider non-local information tend to perform best, but which is the most successful statistical method depends on the region considered, GCM data used and prediction lead time. However, the Constructed Analogue method tends to have the highest skill at longer lead times. Importantly, the regions of greatest prediction skill can be very different to regions identified as potentially predictable from variance explained arguments. This finding suggests that significant local decadal variability is not necessarily a prerequisite for skillful decadal predictions, and that the statistical methods are capturing some of the dynamics of low-frequency SST evolution. In particular, using data from HadGEM1, significant skill at lead times of 6–10 years is found in the tropical North Atlantic, a region with relatively little decadal variability compared to interannual variability. This skill appears to come from reconstructing the SSTs in the far north Atlantic, suggesting that the more northern latitudes are optimal for SST observations to improve predictions. We additionally explore whether adding sub-surface temperature data improves these decadal statistical predictions, and find that, again, it depends on the region, prediction lead time and GCM data used. Overall, we argue that the estimated prediction skill motivates the further development of statistical decadal predictions of SSTs as a benchmark for current and future GCM-based decadal climate predictions.
Resumo:
The development of a combined engineering and statistical Artificial Neural Network model of UK domestic appliance load profiles is presented. The model uses diary-style appliance use data and a survey questionnaire collected from 51 suburban households and 46 rural households during the summer of 2010 and2011 respectively. It also incorporates measured energy data and is sensitive to socioeconomic, physical dwelling and temperature variables. A prototype model is constructed in MATLAB using a two layer feed forward network with back propagation training which has a 12:10:24 architecture. Model outputs include appliance load profiles which can be applied to the fields of energy planning (microrenewables and smart grids), building simulation tools and energy policy.
Resumo:
The search for ever deeper relationships among the World’s languages is bedeviled by the fact that most words evolve too rapidly to preserve evidence of their ancestry beyond 5,000 to 9,000 y. On the other hand, quantitative modeling indicates that some “ultraconserved” words exist that might be used to find evidence for deep linguistic relationships beyond that time barrier. Here we use a statistical model, which takes into account the frequency with which words are used in common everyday speech, to predict the existence of a set of such highly conserved words among seven language families of Eurasia postulated to form a linguistic superfamily that evolved from a common ancestor around 15,000 y ago. We derive a dated phylogenetic tree of this proposed superfamily with a time-depth of ∼14,450 y, implying that some frequently used words have been retained in related forms since the end of the last ice age. Words used more than once per 1,000 in everyday speech were 7- to 10-times more likely to show deep ancestry on this tree. Our results suggest a remarkable fidelity in the transmission of some words and give theoretical justification to the search for features of language that might be preserved across wide spans of time and geography.
Resumo:
A statistical model is derived relating the diurnal variation of sea surface temperature (SST) to the net surface heat flux and surface wind speed from a numerical weather prediction (NWP) model. The model is derived using fluxes and winds from the European Centre for Medium-Range Weather Forecasting (ECMWF) NWP model and SSTs from the Spinning Enhanced Visible and Infrared Imager (SEVIRI). In the model, diurnal warming has a linear dependence on the net surface heat flux integrated since (approximately) dawn and an inverse quadratic dependence on the maximum of the surface wind speed in the same period. The model coefficients are found by matching, for a given integrated heat flux, the frequency distributions of the maximum wind speed and the observed warming. Diurnal cooling, where it occurs, is modelled as proportional to the integrated heat flux divided by the heat capacity of the seasonal mixed layer. The model reproduces the statistics (mean, standard deviation, and 95-percentile) of the diurnal variation of SST seen by SEVIRI and reproduces the geographical pattern of mean warming seen by the Advanced Microwave Scanning Radiometer (AMSR-E). We use the functional dependencies in the statistical model to test the behaviour of two physical model of diurnal warming that display contrasting systematic errors.
Resumo:
As in any field of scientific inquiry, advancements in the field of second language acquisition (SLA) rely in part on the interpretation and generalizability of study findings using quantitative data analysis and inferential statistics. While statistical techniques such as ANOVA and t-tests are widely used in second language research, this review article provides a review of a class of newer statistical models that have not yet been widely adopted in the field, but have garnered interest in other fields of language research. The class of statistical models called mixed-effects models are introduced, and the potential benefits of these models for the second language researcher are discussed. A simple example of mixed-effects data analysis using the statistical software package R (R Development Core Team, 2011) is provided as an introduction to the use of these statistical techniques, and to exemplify how such analyses can be reported in research articles. It is concluded that mixed-effects models provide the second language researcher with a powerful tool for the analysis of a variety of types of second language acquisition data.
Resumo:
This article elucidates the Typological Primacy Model (TPM; Rothman, 2010, 2011, 2013) for the initial stages of adult third language (L3) morphosyntactic transfer, addressing questions that stem from the model and its application. The TPM maintains that structural proximity between the L3 and the L1 and/or the L2 determines L3 transfer. In addition to demonstrating empirical support for the TPM, this article articulates a proposal for how the mind unconsciously determines typological (structural) proximity based on linguistic cues from the L3 input stream used by the parser early on to determine holistic transfer of one previous (the L1 or the L2) system. This articulated version of the TPM is motivated by argumentation appealing to cognitive and linguistic factors. Finally, in line with the general tenets of the TPM, I ponder if and why L3 transfer might obtain differently depending on the type of bilingual (e.g. early vs. late) and proficiency level of bilingualism involved in the L3 process.
Resumo:
We investigate the initialization of Northern-hemisphere sea ice in the global climate model ECHAM5/MPI-OM by assimilating sea-ice concentration data. The analysis updates for concentration are given by Newtonian relaxation, and we discuss different ways of specifying the analysis updates for mean thickness. Because the conservation of mean ice thickness or actual ice thickness in the analysis updates leads to poor assimilation performance, we introduce a proportional dependence between concentration and mean thickness analysis updates. Assimilation with these proportional mean-thickness analysis updates significantly reduces assimilation error both in identical-twin experiments and when assimilating sea-ice observations, reducing the concentration error by a factor of four to six, and the thickness error by a factor of two. To understand the physical aspects of assimilation errors, we construct a simple prognostic model of the sea-ice thermodynamics, and analyse its response to the assimilation. We find that the strong dependence of thermodynamic ice growth on ice concentration necessitates an adjustment of mean ice thickness in the analysis update. To understand the statistical aspects of assimilation errors, we study the model background error covariance between ice concentration and ice thickness. We find that the spatial structure of covariances is best represented by the proportional mean-thickness analysis updates. Both physical and statistical evidence supports the experimental finding that proportional mean-thickness updates are superior to the other two methods considered and enable us to assimilate sea ice in a global climate model using simple Newtonian relaxation.
Resumo:
We investigate the initialisation of Northern Hemisphere sea ice in the global climate model ECHAM5/MPI-OM by assimilating sea-ice concentration data. The analysis updates for concentration are given by Newtonian relaxation, and we discuss different ways of specifying the analysis updates for mean thickness. Because the conservation of mean ice thickness or actual ice thickness in the analysis updates leads to poor assimilation performance, we introduce a proportional dependence between concentration and mean thickness analysis updates. Assimilation with these proportional mean-thickness analysis updates leads to good assimilation performance for sea-ice concentration and thickness, both in identical-twin experiments and when assimilating sea-ice observations. The simulation of other Arctic surface fields in the coupled model is, however, not significantly improved by the assimilation. To understand the physical aspects of assimilation errors, we construct a simple prognostic model of the sea-ice thermodynamics, and analyse its response to the assimilation. We find that an adjustment of mean ice thickness in the analysis update is essential to arrive at plausible state estimates. To understand the statistical aspects of assimilation errors, we study the model background error covariance between ice concentration and ice thickness. We find that the spatial structure of covariances is best represented by the proportional mean-thickness analysis updates. Both physical and statistical evidence supports the experimental finding that assimilation with proportional mean-thickness updates outperforms the other two methods considered. The method described here is very simple to implement, and gives results that are sufficiently good to be used for initialising sea ice in a global climate model for seasonal to decadal predictions.
Resumo:
Traditionally, the cusp has been described in terms of a time-stationary feature of the magnetosphere which allows access of magnetosheath-like plasma to low altitudes. Statistical surveys of data from low-altitude spacecraft have shown the average characteristics and position of the cusp. Recently, however, it has been suggested that the ionospheric footprint of flux transfer events (FTEs) may be identified as variations of the “cusp” on timescales of a few minutes. In this model, the cusp can vary in form between a steady-state feature in one limit and a series of discrete ionospheric FTE signatures in the other limit. If this time-dependent cusp scenario is correct, then the signatures of the transient reconnection events must be able, on average, to reproduce the statistical cusp occurrence previously determined from the satellite observations. In this paper, we predict the precipitation signatures which are associated with transient magnetopause reconnection, following recent observations of the dependence of dayside ionospheric convection on the orientation of the IMF. We then employ a simple model of the longitudinal motion of FTE signatures to show how such events can easily reproduce the local time distribution of cusp occurrence probabilities, as observed by low-altitude satellites. This is true even in the limit where the cusp is a series of discrete events. Furthermore, we investigate the existence of double cusp patches predicted by the simple model and show how these events may be identified in the data.
Resumo:
Learning to talk about motion in a second language is very difficult because it involves restructuring deeply entrenched patterns from the first language (Slobin 1996). In this paper we argue that statistical learning (Saffran et al. 1997) can explain why L2 learners are only partially successful in restructuring their second language grammars. We explore to what extent L2 learners make use of two mechanisms of statistical learning, entrenchment and pre-emption (Boyd and Goldberg 2011) to acquire target-like expressions of motion and retreat from overgeneralisation in this domain. Paying attention to the frequency of existing patterns in the input can help learners to adjust the frequency with which they use path and manner verbs in French but is insufficient to acquire the boundary crossing constraint (Slobin and Hoiting 1994) and learn what not to say. We also look at the role of language proficiency and exposure to French in explaining the findings.