988 resultados para statistical framework


Relevância:

60.00% 60.00%

Publicador:

Resumo:

We apply a new parameterisation of the Greenland ice sheet (GrIS) feedback between surface mass balance (SMB: the sum of surface accumulation and surface ablation) and surface elevation in the MAR regional climate model (Edwards et al., 2014) to projections of future climate change using five ice sheet models (ISMs). The MAR (Modèle Atmosphérique Régional: Fettweis, 2007) climate projections are for 2000–2199, forced by the ECHAM5 and HadCM3 global climate models (GCMs) under the SRES A1B emissions scenario. The additional sea level contribution due to the SMB– elevation feedback averaged over five ISM projections for ECHAM5 and three for HadCM3 is 4.3% (best estimate; 95% credibility interval 1.8–6.9 %) at 2100, and 9.6% (best estimate; 95% credibility interval 3.6–16.0 %) at 2200. In all results the elevation feedback is significantly positive, amplifying the GrIS sea level contribution relative to the MAR projections in which the ice sheet topography is fixed: the lower bounds of our 95% credibility intervals (CIs) for sea level contributions are larger than the “no feedback” case for all ISMs and GCMs. Our method is novel in sea level projections because we propagate three types of modelling uncertainty – GCM and ISM structural uncertainties, and elevation feedback parameterisation uncertainty – along the causal chain, from SRES scenario to sea level, within a coherent experimental design and statistical framework. The relative contributions to uncertainty depend on the timescale of interest. At 2100, the GCM uncertainty is largest, but by 2200 both the ISM and parameterisation uncertainties are larger. We also perform a perturbed parameter ensemble with one ISM to estimate the shape of the projected sea level probability distribution; our results indicate that the probability density is slightly skewed towards higher sea level contributions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper investigates the feasibility of using approximate Bayesian computation (ABC) to calibrate and evaluate complex individual-based models (IBMs). As ABC evolves, various versions are emerging, but here we only explore the most accessible version, rejection-ABC. Rejection-ABC involves running models a large number of times, with parameters drawn randomly from their prior distributions, and then retaining the simulations closest to the observations. Although well-established in some fields, whether ABC will work with ecological IBMs is still uncertain. Rejection-ABC was applied to an existing 14-parameter earthworm energy budget IBM for which the available data consist of body mass growth and cocoon production in four experiments. ABC was able to narrow the posterior distributions of seven parameters, estimating credible intervals for each. ABC’s accepted values produced slightly better fits than literature values do. The accuracy of the analysis was assessed using cross-validation and coverage, currently the best available tests. Of the seven unnarrowed parameters, ABC revealed that three were correlated with other parameters, while the remaining four were found to be not estimable given the data available. It is often desirable to compare models to see whether all component modules are necessary. Here we used ABC model selection to compare the full model with a simplified version which removed the earthworm’s movement and much of the energy budget. We are able to show that inclusion of the energy budget is necessary for a good fit to the data. We show how our methodology can inform future modelling cycles, and briefly discuss how more advanced versions of ABC may be applicable to IBMs. We conclude that ABC has the potential to represent uncertainty in model structure, parameters and predictions, and to embed the often complex process of optimizing an IBM’s structure and parameters within an established statistical framework, thereby making the process more transparent and objective.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We review the basic hypotheses which motivate the statistical framework used to analyze the cosmic microwave background, and how that framework can be enlarged as we relax those hypotheses. In particular, we try to separate as much as possible the questions of gaussianity, homogeneity, and isotropy from each other. We focus both on isotropic estimators of nongaussianity as well as statistically anisotropic estimators of gaussianity, giving particular emphasis on their signatures and the enhanced cosmic variances that become increasingly important as our putative Universe becomes less symmetric. After reviewing the formalism behind some simple model-independent tests, we discuss how these tests can be applied to CMBdata when searching for large-scale anomalies. Copyright © 2010 L. Raul Abramo and Thiago S. Pereira.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Structured AbstractObjectivesTo investigate the 3D morphological variations in 169 temporomandibular ioint (TMJ) condyles, using novel imaging statistical modeling approaches.Setting and sample populationThe Department of Orthodontics and Pediatric Dentistry at the University of Michigan. Cone beam CT scans were acquired from 69 subjects with long-term TMJ osteoarthritis (OA, mean age 39.115.7years), 15 subjects at initial consult diagnosis of OA (mean age 44.914.8years), and seven healthy controls (mean age 4312.4years).Materials and methods3D surface models of the condyles were constructed, and homologous correspondent points on each model were established. The statistical framework included Direction-Projection-Permutation (DiProPerm) for testing statistical significance of the differences between healthy controls and the OA groups determined by clinical and radiographic diagnoses.ResultsCondylar morphology in OA and healthy subjects varied widely with categorization from mild to severe bone degeneration or overgrowth. DiProPerm statistics supported a significant difference between the healthy control group and the initial diagnosis of OA group (t=6.6, empirical p-value=0.006) and between healthy and long-term diagnosis of OA group (t=7.2, empirical p-value=0). Compared with healthy controls, the average condyle in OA subjects was significantly smaller in all dimensions, except its anterior surface, even in subjects with initial diagnosis of OA.ConclusionThis new statistical modeling of condylar morphology allows the development of more targeted classifications of this condition than previously possible.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Dealing with latent constructs (loaded by reflective and congeneric measures) cross-culturally compared means studying how these unobserved variables vary, and/or covary each other, after controlling for possibly disturbing cultural forces. This yields to the so-called ‘measurement invariance’ matter that refers to the extent to which data collected by the same multi-item measurement instrument (i.e., self-reported questionnaire of items underlying common latent constructs) are comparable across different cultural environments. As a matter of fact, it would be unthinkable exploring latent variables heterogeneity (e.g., latent means; latent levels of deviations from the means (i.e., latent variances), latent levels of shared variation from the respective means (i.e., latent covariances), levels of magnitude of structural path coefficients with regard to causal relations among latent variables) across different populations without controlling for cultural bias in the underlying measures. Furthermore, it would be unrealistic to assess this latter correction without using a framework that is able to take into account all these potential cultural biases across populations simultaneously. Since the real world ‘acts’ in a simultaneous way as well. As a consequence, I, as researcher, may want to control for cultural forces hypothesizing they are all acting at the same time throughout groups of comparison and therefore examining if they are inflating or suppressing my new estimations with hierarchical nested constraints on the original estimated parameters. Multi Sample Structural Equation Modeling-based Confirmatory Factor Analysis (MS-SEM-based CFA) still represents a dominant and flexible statistical framework to work out this potential cultural bias in a simultaneous way. With this dissertation I wanted to make an attempt to introduce new viewpoints on measurement invariance handled under covariance-based SEM framework by means of a consumer behavior modeling application on functional food choices.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Common goals in epidemiologic studies of infectious diseases include identification of the infectious agent, description of the modes of transmission and characterization of factors that influence the probability of transmission from infected to uninfected individuals. In the case of AIDS, the agent has been identified as the Human Immunodeficiency Virus (HIV), and transmission is known to occur through a variety of contact mechanisms including unprotected sexual intercourse, transfusion of infected blood products and sharing of needles in intravenous drug use. Relatively little is known about the probability of IV transmission associated with the various modes of contact, or the role that other cofactors play in promoting or suppressing transmission. Here, transmission probability refers to the probability that the virus is transmitted to a susceptible individual following exposure consisting of a series of potentially infectious contacts. The infectivity of HIV for a given route of transmission is defined to be the per contact probability of infection. Knowledge of infectivity and its relationship to other factors is important in understanding the dynamics of the AIDS epidemic and in suggesting appropriate measures to control its spread. The primary source of empirical data about infectivity comes from sexual partners of infected individuals. Partner studies consist of a series of such partnerships, usually heterosexual and monogamous, each composed of an initially infected "index case" and a partner who may or may not be infected by the time of data collection. However, because the infection times of both partners may be unknown and the history of contacts uncertain, any quantitative characterization of infectivity is extremely difficult. Thus, most statistical analyses of partner study data involve the simplifying assumption that infectivity is a constant common to all partnerships. The major objectives of this work are to describe and discuss the design and analysis of partner studies, providing a general statistical framework for investigations of infectivity and risk factors for HIV transmission. The development is largely based on three papers: Jewell and Shiboski (1990), Kim and Lagakos (1990), and Shiboski and Jewell (1992).

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In a network of competing species, a competitive intransitivity occurs when the ranking of competitive abilities does not follow a linear hierarchy (A > B > C but C > A). A variety of mathematical models suggests that intransitive networks can prevent or slow down competitive exclusion and maintain biodiversity by enhancing species coexistence. However, it has been difficult to assess empirically the relative importance of intransitive competition because a large number of pairwise species competition experiments are needed to construct a competition matrix that is used to parameterize existing models. Here we introduce a statistical framework for evaluating the contribution of intransitivity to community structure using species abundance matrices that are commonly generated from replicated sampling of species assemblages. We provide metrics and analytical methods for using abundance matrices to estimate species competition and patch transition matrices by using reverse-engineering and a colonization-competition model. These matrices provide complementary metrics to estimate the degree of intransitivity in the competition network of the sampled communities. Benchmark tests reveal that the proposed methods could successfully detect intransitive competition networks, even in the absence of direct measures of pairwise competitive strength. To illustrate the approach, we analyzed patterns of abundance and biomass of five species of necrophagous Diptera and eight species of their hymenopteran parasitoids that co-occur in beech forests in Germany. We found evidence for a strong competitive hierarchy within communities of flies and parasitoids. However, for parasitoids, there was a tendency towards increasing intransitivity in higher weight classes, which represented larger resource patches. These tests provide novel methods for empirically estimating the degree of intransitivity in competitive networks from observational datasets. They can be applied to experimental measures of pairwise species interactions, as well as to spatio-temporal samples of assemblages in homogenous environments or environmental gradients.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In this work we propose the adoption of a statistical framework used in the evaluation of forensic evidence as a tool for evaluating and presenting circumstantial "evidence" of a disease outbreak from syndromic surveillance. The basic idea is to exploit the predicted distributions of reported cases to calculate the ratio of the likelihood of observing n cases given an ongoing outbreak over the likelihood of observing n cases given no outbreak. The likelihood ratio defines the Value of Evidence (V). Using Bayes' rule, the prior odds for an ongoing outbreak are multiplied by V to obtain the posterior odds. This approach was applied to time series on the number of horses showing clinical respiratory symptoms or neurological symptoms. The separation between prior beliefs about the probability of an outbreak and the strength of evidence from syndromic surveillance offers a transparent reasoning process suitable for supporting decision makers. The value of evidence can be translated into a verbal statement, as often done in forensics or used for the production of risk maps. Furthermore, a Bayesian approach offers seamless integration of data from syndromic surveillance with results from predictive modeling and with information from other sources such as disease introduction risk assessments.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

1. We analysed time-series data from populations of red kangaroos (Macropus rufus, Desmarest) inhabiting four areas in the pastoral zone of South Australia. We formulated a set of a priori models to disentangle the relative effects of the covariates: rainfall, harvesting, intraspecific competition, and domestic herbivores, on kangaroo population-growth rate. 2. The statistical framework allowed for spatial variation in the growth-rate parameters, response to covariates, and environmental variability, as well as spatially correlated error terms due to shared environment. 3. The most parsimonious model included all covariates but no area-specific parameter values, suggesting that kangaroo densities respond in the same way to the covariates across the areas. 4. The temporal dynamics were spatially correlated, even after taking into account the potentially synchronizing effect of rainfall, harvesting and domestic herbivores. 5. Counter-intuitively, we found a positive rather than negative effect of domestic herbivore density on the population-growth rate of kangaroos. We hypothesize that this effect is caused by sheep and cattle acting as a surrogate for resource availability beyond rainfall. 6. Even though our system is well studied, we must conclude that approximating resources by surrogates such as rainfall is more difficult than previously thought. This is an important message for studies of consumer-resource systems and highlights the need to be explicit about population processes when analysing population patterns.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This present work uses a generalized similarity measure called correntropy to develop a new method to estimate a linear relation between variables given their samples. Towards this goal, the concept of correntropy is extended from two variables to any two vectors (even with different dimensions) using a statistical framework. With this multidimensionals extensions of Correntropy the regression problem can be formulated in a different manner by seeking the hyperplane that has maximum probability density with the target data. Experiments show that the new algorithm has a nice fixed point update for the parameters and robust performs in the presence of outlier noise.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Sequences of timestamped events are currently being generated across nearly every domain of data analytics, from e-commerce web logging to electronic health records used by doctors and medical researchers. Every day, this data type is reviewed by humans who apply statistical tests, hoping to learn everything they can about how these processes work, why they break, and how they can be improved upon. To further uncover how these processes work the way they do, researchers often compare two groups, or cohorts, of event sequences to find the differences and similarities between outcomes and processes. With temporal event sequence data, this task is complex because of the variety of ways single events and sequences of events can differ between the two cohorts of records: the structure of the event sequences (e.g., event order, co-occurring events, or frequencies of events), the attributes about the events and records (e.g., gender of a patient), or metrics about the timestamps themselves (e.g., duration of an event). Running statistical tests to cover all these cases and determining which results are significant becomes cumbersome. Current visual analytics tools for comparing groups of event sequences emphasize a purely statistical or purely visual approach for comparison. Visual analytics tools leverage humans' ability to easily see patterns and anomalies that they were not expecting, but is limited by uncertainty in findings. Statistical tools emphasize finding significant differences in the data, but often requires researchers have a concrete question and doesn't facilitate more general exploration of the data. Combining visual analytics tools with statistical methods leverages the benefits of both approaches for quicker and easier insight discovery. Integrating statistics into a visualization tool presents many challenges on the frontend (e.g., displaying the results of many different metrics concisely) and in the backend (e.g., scalability challenges with running various metrics on multi-dimensional data at once). I begin by exploring the problem of comparing cohorts of event sequences and understanding the questions that analysts commonly ask in this task. From there, I demonstrate that combining automated statistics with an interactive user interface amplifies the benefits of both types of tools, thereby enabling analysts to conduct quicker and easier data exploration, hypothesis generation, and insight discovery. The direct contributions of this dissertation are: (1) a taxonomy of metrics for comparing cohorts of temporal event sequences, (2) a statistical framework for exploratory data analysis with a method I refer to as high-volume hypothesis testing (HVHT), (3) a family of visualizations and guidelines for interaction techniques that are useful for understanding and parsing the results, and (4) a user study, five long-term case studies, and five short-term case studies which demonstrate the utility and impact of these methods in various domains: four in the medical domain, one in web log analysis, two in education, and one each in social networks, sports analytics, and security. My dissertation contributes an understanding of how cohorts of temporal event sequences are commonly compared and the difficulties associated with applying and parsing the results of these metrics. It also contributes a set of visualizations, algorithms, and design guidelines for balancing automated statistics with user-driven analysis to guide users to significant, distinguishing features between cohorts. This work opens avenues for future research in comparing two or more groups of temporal event sequences, opening traditional machine learning and data mining techniques to user interaction, and extending the principles found in this dissertation to data types beyond temporal event sequences.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

For climate risk management, cumulative distribution functions (CDFs) are an important source of information. They are ideally suited to compare probabilistic forecasts of primary (e.g. rainfall) or secondary data (e.g. crop yields). Summarised as CDFs, such forecasts allow an easy quantitative assessment of possible, alternative actions. Although the degree of uncertainty associated with CDF estimation could influence decisions, such information is rarely provided. Hence, we propose Cox-type regression models (CRMs) as a statistical framework for making inferences on CDFs in climate science. CRMs were designed for modelling probability distributions rather than just mean or median values. This makes the approach appealing for risk assessments where probabilities of extremes are often more informative than central tendency measures. CRMs are semi-parametric approaches originally designed for modelling risks arising from time-to-event data. Here we extend this original concept beyond time-dependent measures to other variables of interest. We also provide tools for estimating CDFs and surrounding uncertainty envelopes from empirical data. These statistical techniques intrinsically account for non-stationarities in time series that might be the result of climate change. This feature makes CRMs attractive candidates to investigate the feasibility of developing rigorous global circulation model (GCM)-CRM interfaces for provision of user-relevant forecasts. To demonstrate the applicability of CRMs, we present two examples for El Ni ? no/Southern Oscillation (ENSO)-based forecasts: the onset date of the wet season (Cairns, Australia) and total wet season rainfall (Quixeramobim, Brazil). This study emphasises the methodological aspects of CRMs rather than discussing merits or limitations of the ENSO-based predictors.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we describe the methodology and the structural design of a system that translates English into Malayalam using statistical models. A monolingual Malayalam corpus and a bilingual English/Malayalam corpus are the main resource in building this Statistical Machine Translator. Training strategy adopted has been enhanced by PoS tagging which helps to get rid of the insignificant alignments. Moreover, incorporating units like suffix separator and the stop word eliminator has proven to be effective in bringing about better training results. In the decoder, order conversion rules are applied to reduce the structural difference between the language pair. The quality of statistical outcome of the decoder is further improved by applying mending rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics