8 resultados para SCALE MODELS

em Duke University


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The population structure of an organism reflects its evolutionary history and influences its evolutionary trajectory. It constrains the combination of genetic diversity and reveals patterns of past gene flow. Understanding it is a prerequisite for detecting genomic regions under selection, predicting the effect of population disturbances, or modeling gene flow. This paper examines the detailed global population structure of Arabidopsis thaliana. Using a set of 5,707 plants collected from around the globe and genotyped at 149 SNPs, we show that while A. thaliana as a species self-fertilizes 97% of the time, there is considerable variation among local groups. This level of outcrossing greatly limits observed heterozygosity but is sufficient to generate considerable local haplotypic diversity. We also find that in its native Eurasian range A. thaliana exhibits continuous isolation by distance at every geographic scale without natural breaks corresponding to classical notions of populations. By contrast, in North America, where it exists as an exotic species, A. thaliana exhibits little or no population structure at a continental scale but local isolation by distance that extends hundreds of km. This suggests a pattern for the development of isolation by distance that can establish itself shortly after an organism fills a new habitat range. It also raises questions about the general applicability of many standard population genetics models. Any model based on discrete clusters of interchangeable individuals will be an uneasy fit to organisms like A. thaliana which exhibit continuous isolation by distance on many scales.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

INTRODUCTION: We previously reported models that characterized the synergistic interaction between remifentanil and sevoflurane in blunting responses to verbal and painful stimuli. This preliminary study evaluated the ability of these models to predict a return of responsiveness during emergence from anesthesia and a response to tibial pressure when patients required analgesics in the recovery room. We hypothesized that model predictions would be consistent with observed responses. We also hypothesized that under non-steady-state conditions, accounting for the lag time between sevoflurane effect-site concentration (Ce) and end-tidal (ET) concentration would improve predictions. METHODS: Twenty patients received a sevoflurane, remifentanil, and fentanyl anesthetic. Two model predictions of responsiveness were recorded at emergence: an ET-based and a Ce-based prediction. Similarly, 2 predictions of a response to noxious stimuli were recorded when patients first required analgesics in the recovery room. Model predictions were compared with observations with graphical and temporal analyses. RESULTS: While patients were anesthetized, model predictions indicated a high likelihood that patients would be unresponsive (> or = 99%). However, after termination of the anesthetic, models exhibited a wide range of predictions at emergence (1%-97%). Although wide, the Ce-based predictions of responsiveness were better distributed over a percentage ranking of observations than the ET-based predictions. For the ET-based model, 45% of the patients awoke within 2 min of the 50% model predicted probability of unresponsiveness and 65% awoke within 4 min. For the Ce-based model, 45% of the patients awoke within 1 min of the 50% model predicted probability of unresponsiveness and 85% awoke within 3.2 min. Predictions of a response to a painful stimulus in the recovery room were similar for the Ce- and ET-based models. DISCUSSION: Results confirmed, in part, our study hypothesis; accounting for the lag time between Ce and ET sevoflurane concentrations improved model predictions of responsiveness but had no effect on predicting a response to a noxious stimulus in the recovery room. These models may be useful in predicting events of clinical interest but large-scale evaluations with numerous patients are needed to better characterize model performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.

Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.

One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.

Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.

In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.

Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.

The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.

Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RNA viruses are an important cause of global morbidity and mortality. The rapid evolutionary rates of RNA virus pathogens, caused by high replication rates and error-prone polymerases, can make the pathogens difficult to control. RNA viruses can undergo immune escape within their hosts and develop resistance to the treatment and vaccines we design to fight them. Understanding the spread and evolution of RNA pathogens is essential for reducing human suffering. In this dissertation, I make use of the rapid evolutionary rate of viral pathogens to answer several questions about how RNA viruses spread and evolve. To address each of the questions, I link mathematical techniques for modeling viral population dynamics with phylogenetic and coalescent techniques for analyzing and modeling viral genetic sequences and evolution. The first project uses multi-scale mechanistic modeling to show that decreases in viral substitution rates over the course of an acute infection, combined with the timing of infectious hosts transmitting new infections to susceptible individuals, can account for discrepancies in viral substitution rates in different host populations. The second project combines coalescent models with within-host mathematical models to identify driving evolutionary forces in chronic hepatitis C virus infection. The third project compares the effects of intrinsic and extrinsic viral transmission rate variation on viral phylogenies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Light rainfall is the baseline input to the annual water budget in mountainous landscapes through the tropics and at mid-latitudes. In the Southern Appalachians, the contribution from light rainfall ranges from 50-60% during wet years to 80-90% during dry years, with convective activity and tropical cyclone input providing most of the interannual variability. The Southern Appalachians is a region characterized by rich biodiversity that is vulnerable to land use/land cover changes due to its proximity to a rapidly growing population. Persistent near surface moisture and associated microclimates observed in this region has been well documented since the colonization of the area in terms of species health, fire frequency, and overall biodiversity. The overarching objective of this research is to elucidate the microphysics of light rainfall and the dynamics of low level moisture in the inner region of the Southern Appalachians during the warm season, with a focus on orographically mediated processes. The overarching research hypothesis is that physical processes leading to and governing the life cycle of orographic fog, low level clouds, and precipitation, and their interactions, are strongly tied to landform, land cover, and the diurnal cycles of flow patterns, radiative forcing, and surface fluxes at the ridge-valley scale. The following science questions will be addressed specifically: 1) How do orographic clouds and fog affect the hydrometeorological regime from event to annual scale and as a function of terrain characteristics and land cover?; 2) What are the source areas, governing processes, and relevant time-scales of near surface moisture convergence patterns in the region?; and 3) What are the four dimensional microphysical and dynamical characteristics, including variability and controlling factors and processes, of fog and light rainfall? The research was conducted with two major components: 1) ground-based high-quality observations using multi-sensor platforms and 2) interpretive numerical modeling guided by the analysis of the in situ data collection. Findings illuminate a high level of spatial – down to the ridge scale - and temporal – from event to annual scale - heterogeneity in observations, and a significant impact on the hydrological regime as a result of seeder-feeder interactions among fog, low level clouds, and stratiform rainfall that enhance coalescence efficiency and lead to significantly higher rainfall rates at the land surface. Specifically, results show that enhancement of an event up to one order of magnitude in short-term accumulation can occur as a result of concurrent fog presence. Results also show that events are modulated strongly by terrain characteristics including elevation, slope, geometry, and land cover. These factors produce interactions between highly localized flows and gradients of temperature and moisture with larger scale circulations. Resulting observations of DSD and rainfall patterns are stratified by region and altitude and exhibit clear diurnal and seasonal cycles.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The full-scale base-isolated structure studied in this dissertation is the only base-isolated building in South Island of New Zealand. It sustained hundreds of earthquake ground motions from September 2010 and well into 2012. Several large earthquake responses were recorded in December 2011 by NEES@UCLA and by GeoNet recording station nearby Christchurch Women's Hospital. The primary focus of this dissertation is to advance the state-of-the art of the methods to evaluate performance of seismic-isolated structures and the effects of soil-structure interaction by developing new data processing methodologies to overcome current limitations and by implementing advanced numerical modeling in OpenSees for direct analysis of soil-structure interaction.

This dissertation presents a novel method for recovering force-displacement relations within the isolators of building structures with unknown nonlinearities from sparse seismic-response measurements of floor accelerations. The method requires only direct matrix calculations (factorizations and multiplications); no iterative trial-and-error methods are required. The method requires a mass matrix, or at least an estimate of the floor masses. A stiffness matrix may be used, but is not necessary. Essentially, the method operates on a matrix of incomplete measurements of floor accelerations. In the special case of complete floor measurements of systems with linear dynamics, real modes, and equal floor masses, the principal components of this matrix are the modal responses. In the more general case of partial measurements and nonlinear dynamics, the method extracts a number of linearly-dependent components from Hankel matrices of measured horizontal response accelerations, assembles these components row-wise and extracts principal components from the singular value decomposition of this large matrix of linearly-dependent components. These principal components are then interpolated between floors in a way that minimizes the curvature energy of the interpolation. This interpolation step can make use of a reduced-order stiffness matrix, a backward difference matrix or a central difference matrix. The measured and interpolated floor acceleration components at all floors are then assembled and multiplied by a mass matrix. The recovered in-service force-displacement relations are then incorporated into the OpenSees soil structure interaction model.

Numerical simulations of soil-structure interaction involving non-uniform soil behavior are conducted following the development of the complete soil-structure interaction model of Christchurch Women's Hospital in OpenSees. In these 2D OpenSees models, the superstructure is modeled as two-dimensional frames in short span and long span respectively. The lead rubber bearings are modeled as elastomeric bearing (Bouc Wen) elements. The soil underlying the concrete raft foundation is modeled with linear elastic plane strain quadrilateral element. The non-uniformity of the soil profile is incorporated by extraction and interpolation of shear wave velocity profile from the Canterbury Geotechnical Database. The validity of the complete two-dimensional soil-structure interaction OpenSees model for the hospital is checked by comparing the results of peak floor responses and force-displacement relations within the isolation system achieved from OpenSees simulations to the recorded measurements. General explanations and implications, supported by displacement drifts, floor acceleration and displacement responses, force-displacement relations are described to address the effects of soil-structure interaction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Terrestrial ecosystems, occupying more than 25% of the Earth's surface, can serve as

`biological valves' in regulating the anthropogenic emissions of atmospheric aerosol

particles and greenhouse gases (GHGs) as responses to their surrounding environments.

While the signicance of quantifying the exchange rates of GHGs and atmospheric

aerosol particles between the terrestrial biosphere and the atmosphere is

hardly questioned in many scientic elds, the progress in improving model predictability,

data interpretation or the combination of the two remains impeded by

the lack of precise framework elucidating their dynamic transport processes over a

wide range of spatiotemporal scales. The diculty in developing prognostic modeling

tools to quantify the source or sink strength of these atmospheric substances

can be further magnied by the fact that the climate system is also sensitive to the

feedback from terrestrial ecosystems forming the so-called `feedback cycle'. Hence,

the emergent need is to reduce uncertainties when assessing this complex and dynamic

feedback cycle that is necessary to support the decisions of mitigation and

adaptation policies associated with human activities (e.g., anthropogenic emission

controls and land use managements) under current and future climate regimes.

With the goal to improve the predictions for the biosphere-atmosphere exchange

of biologically active gases and atmospheric aerosol particles, the main focus of this

dissertation is on revising and up-scaling the biotic and abiotic transport processes

from leaf to canopy scales. The validity of previous modeling studies in determining

iv

the exchange rate of gases and particles is evaluated with detailed descriptions of their

limitations. Mechanistic-based modeling approaches along with empirical studies

across dierent scales are employed to rene the mathematical descriptions of surface

conductance responsible for gas and particle exchanges as commonly adopted by all

operational models. Specically, how variation in horizontal leaf area density within

the vegetated medium, leaf size and leaf microroughness impact the aerodynamic attributes

and thereby the ultrane particle collection eciency at the leaf/branch scale

is explored using wind tunnel experiments with interpretations by a porous media

model and a scaling analysis. A multi-layered and size-resolved second-order closure

model combined with particle

uxes and concentration measurements within and

above a forest is used to explore the particle transport processes within the canopy

sub-layer and the partitioning of particle deposition onto canopy medium and forest

oor. For gases, a modeling framework accounting for the leaf-level boundary layer

eects on the stomatal pathway for gas exchange is proposed and combined with sap

ux measurements in a wind tunnel to assess how leaf-level transpiration varies with

increasing wind speed. How exogenous environmental conditions and endogenous

soil-root-stem-leaf hydraulic and eco-physiological properties impact the above- and

below-ground water dynamics in the soil-plant system and shape plant responses

to droughts is assessed by a porous media model that accommodates the transient

water

ow within the plant vascular system and is coupled with the aforementioned

leaf-level gas exchange model and soil-root interaction model. It should be noted

that tackling all aspects of potential issues causing uncertainties in forecasting the

feedback cycle between terrestrial ecosystem and the climate is unrealistic in a single

dissertation but further research questions and opportunities based on the foundation

derived from this dissertation are also brie

y discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Uncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.

Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.