870 resultados para concurrent multi-scale modeling
Resumo:
For the past several decades, we have experienced the tremendous growth, in both scale and scope, of real-time embedded systems, thanks largely to the advances in IC technology. However, the traditional approach to get performance boost by increasing CPU frequency has been a way of past. Researchers from both industry and academia are turning their focus to multi-core architectures for continuous improvement of computing performance. In our research, we seek to develop efficient scheduling algorithms and analysis methods in the design of real-time embedded systems on multi-core platforms. Real-time systems are the ones with the response time as critical as the logical correctness of computational results. In addition, a variety of stringent constraints such as power/energy consumption, peak temperature and reliability are also imposed to these systems. Therefore, real-time scheduling plays a critical role in design of such computing systems at the system level. We started our research by addressing timing constraints for real-time applications on multi-core platforms, and developed both partitioned and semi-partitioned scheduling algorithms to schedule fixed priority, periodic, and hard real-time tasks on multi-core platforms. Then we extended our research by taking temperature constraints into consideration. We developed a closed-form solution to capture temperature dynamics for a given periodic voltage schedule on multi-core platforms, and also developed three methods to check the feasibility of a periodic real-time schedule under peak temperature constraint. We further extended our research by incorporating the power/energy constraint with thermal awareness into our research problem. We investigated the energy estimation problem on multi-core platforms, and developed a computation efficient method to calculate the energy consumption for a given voltage schedule on a multi-core platform. In this dissertation, we present our research in details and demonstrate the effectiveness and efficiency of our approaches with extensive experimental results.
Resumo:
Petri Nets are a formal, graphical and executable modeling technique for the specification and analysis of concurrent and distributed systems and have been widely applied in computer science and many other engineering disciplines. Low level Petri nets are simple and useful for modeling control flows but not powerful enough to define data and system functionality. High level Petri nets (HLPNs) have been developed to support data and functionality definitions, such as using complex structured data as tokens and algebraic expressions as transition formulas. Compared to low level Petri nets, HLPNs result in compact system models that are easier to be understood. Therefore, HLPNs are more useful in modeling complex systems. ^ There are two issues in using HLPNs—modeling and analysis. Modeling concerns the abstracting and representing the systems under consideration using HLPNs, and analysis deals with effective ways study the behaviors and properties of the resulting HLPN models. In this dissertation, several modeling and analysis techniques for HLPNs are studied, which are integrated into a framework that is supported by a tool. ^ For modeling, this framework integrates two formal languages: a type of HLPNs called Predicate Transition Net (PrT Net) is used to model a system's behavior and a first-order linear time temporal logic (FOLTL) to specify the system's properties. The main contribution of this dissertation with regard to modeling is to develop a software tool to support the formal modeling capabilities in this framework. ^ For analysis, this framework combines three complementary techniques, simulation, explicit state model checking and bounded model checking (BMC). Simulation is a straightforward and speedy method, but only covers some execution paths in a HLPN model. Explicit state model checking covers all the execution paths but suffers from the state explosion problem. BMC is a tradeoff as it provides a certain level of coverage while more efficient than explicit state model checking. The main contribution of this dissertation with regard to analysis is adapting BMC to analyze HLPN models and integrating the three complementary analysis techniques in a software tool to support the formal analysis capabilities in this framework. ^ The SAMTools developed for this framework in this dissertation integrates three tools: PIPE+ for HLPNs behavioral modeling and simulation, SAMAT for hierarchical structural modeling and property specification, and PIPE+Verifier for behavioral verification.^
Resumo:
The performance of building envelopes and roofing systems significantly depends on accurate knowledge of wind loads and the response of envelope components under realistic wind conditions. Wind tunnel testing is a well-established practice to determine wind loads on structures. For small structures much larger model scales are needed than for large structures, to maintain modeling accuracy and minimize Reynolds number effects. In these circumstances the ability to obtain a large enough turbulence integral scale is usually compromised by the limited dimensions of the wind tunnel meaning that it is not possible to simulate the low frequency end of the turbulence spectrum. Such flows are called flows with Partial Turbulence Simulation. In this dissertation, the test procedure and scaling requirements for tests in partial turbulence simulation are discussed. A theoretical method is proposed for including the effects of low-frequency turbulences in the post-test analysis. In this theory the turbulence spectrum is divided into two distinct statistical processes, one at high frequencies which can be simulated in the wind tunnel, and one at low frequencies which can be treated in a quasi-steady manner. The joint probability of load resulting from the two processes is derived from which full-scale equivalent peak pressure coefficients can be obtained. The efficacy of the method is proved by comparing predicted data derived from tests on large-scale models of the Silsoe Cube and Texas-Tech University buildings in Wall of Wind facility at Florida International University with the available full-scale data. For multi-layer building envelopes such as rain-screen walls, roof pavers, and vented energy efficient walls not only peak wind loads but also their spatial gradients are important. Wind permeable roof claddings like roof pavers are not well dealt with in many existing building codes and standards. Large-scale experiments were carried out to investigate the wind loading on concrete pavers including wind blow-off tests and pressure measurements. Simplified guidelines were developed for design of loose-laid roof pavers against wind uplift. The guidelines are formatted so that use can be made of the existing information in codes and standards such as ASCE 7-10 on pressure coefficients on components and cladding.
Resumo:
The purpose of this study was to create a scale that could measure compartmentalization. In the first of two studies 311 working undergraduates were asked to indicate agreement with 119 items that measured compartmentalization. The resulting scale's reliability and validity were evaluated by having a second sample of 312 working students complete the items that comprise a sphere overlap scale, two measures of spillover, and a measure of personality, coping, and demoralization. Although the study's original goal was not realized, its procedures were successful in developing a short (10-item) measure of work-to-home spillover whose items loaded on a single factor. Structural equation modeling indicated that SOS items were correlated with existing measures of spillover and could be discriminated from related concepts of personality and coping. The SOS was also more highly correlated with demoralization than existing measures of spillover in hierarchical analyses that controlled for demographic factors, personality characteristics, and coping style. It is concluded that the SOS shows enough promise to warrant the cost of its appraisal as an alternative measure of spillover in a longitudinal study.
Resumo:
Virtual machines (VMs) are powerful platforms for building agile datacenters and emerging cloud systems. However, resource management for a VM-based system is still a challenging task. First, the complexity of application workloads as well as the interference among competing workloads makes it difficult to understand their VMs’ resource demands for meeting their Quality of Service (QoS) targets; Second, the dynamics in the applications and system makes it also difficult to maintain the desired QoS target while the environment changes; Third, the transparency of virtualization presents a hurdle for guest-layer application and host-layer VM scheduler to cooperate and improve application QoS and system efficiency. This dissertation proposes to address the above challenges through fuzzy modeling and control theory based VM resource management. First, a fuzzy-logic-based nonlinear modeling approach is proposed to accurately capture a VM’s complex demands of multiple types of resources automatically online based on the observed workload and resource usages. Second, to enable fast adaption for resource management, the fuzzy modeling approach is integrated with a predictive-control-based controller to form a new Fuzzy Modeling Predictive Control (FMPC) approach which can quickly track the applications’ QoS targets and optimize the resource allocations under dynamic changes in the system. Finally, to address the limitations of black-box-based resource management solutions, a cross-layer optimization approach is proposed to enable cooperation between a VM’s host and guest layers and further improve the application QoS and resource usage efficiency. The above proposed approaches are prototyped and evaluated on a Xen-based virtualized system and evaluated with representative benchmarks including TPC-H, RUBiS, and TerraFly. The results demonstrate that the fuzzy-modeling-based approach improves the accuracy in resource prediction by up to 31.4% compared to conventional regression approaches. The FMPC approach substantially outperforms the traditional linear-model-based predictive control approach in meeting application QoS targets for an oversubscribed system. It is able to manage dynamic VM resource allocations and migrations for over 100 concurrent VMs across multiple hosts with good efficiency. Finally, the cross-layer optimization approach further improves the performance of a virtualized application by up to 40% when the resources are contended by dynamic workloads.
Resumo:
A novel modeling approach is applied to karst hydrology. Long-standing problems in karst hydrology and solute transport are addressed using Lattice Boltzmann methods (LBMs). These methods contrast with other modeling approaches that have been applied to karst hydrology. The motivation of this dissertation is to develop new computational models for solving ground water hydraulics and transport problems in karst aquifers, which are widespread around the globe. This research tests the viability of the LBM as a robust alternative numerical technique for solving large-scale hydrological problems. The LB models applied in this research are briefly reviewed and there is a discussion of implementation issues. The dissertation focuses on testing the LB models. The LBM is tested for two different types of inlet boundary conditions for solute transport in finite and effectively semi-infinite domains. The LBM solutions are verified against analytical solutions. Zero-diffusion transport and Taylor dispersion in slits are also simulated and compared against analytical solutions. These results demonstrate the LBM’s flexibility as a solute transport solver. The LBM is applied to simulate solute transport and fluid flow in porous media traversed by larger conduits. A LBM-based macroscopic flow solver (Darcy’s law-based) is linked with an anisotropic dispersion solver. Spatial breakthrough curves in one and two dimensions are fitted against the available analytical solutions. This provides a steady flow model with capabilities routinely found in ground water flow and transport models (e.g., the combination of MODFLOW and MT3D). However the new LBM-based model retains the ability to solve inertial flows that are characteristic of karst aquifer conduits. Transient flows in a confined aquifer are solved using two different LBM approaches. The analogy between Fick’s second law (diffusion equation) and the transient ground water flow equation is used to solve the transient head distribution. An altered-velocity flow solver with source/sink term is applied to simulate a drawdown curve. Hydraulic parameters like transmissivity and storage coefficient are linked with LB parameters. These capabilities complete the LBM’s effective treatment of the types of processes that are simulated by standard ground water models. The LB model is verified against field data for drawdown in a confined aquifer.
Resumo:
A pilot scale multi-media filtration system was used to evaluate the effectiveness of filtration in removing petroleum hydrocarbons from a source water contaminated with diesel fuel. Source water was artificially prepared by mixing bentonite clay and tap water to produce a turbidity range of 10-15 NTU. Diesel fuel concentrations of 150 ppm or 750 ppm were used to contaminate the source water. The coagulants used included Cat Floc K-10 and Cat Floc T-2. The experimental phase was conducted under direct filtration conditions at constant head and constant rate filtration at 8.0 gpm. Filtration experiments were run until the filter reached its clogging point as noted by a measured peak pressure loss of 10 psi. The experimental variables include type of coagulant, oil concentration and source water. Filtration results were evaluated based on turbidity removal and petroleum hydrocarbon (PHC) removal efficiency as measured by gas chromatography. Experiments indicated that clogging was controlled by the clay loading on the filter and that inadequate destabilization of the contaminated water by the coagulant limited the PHC removal. ^
Resumo:
Research has identified a number of putative risk factors that places adolescents at incrementally higher risk for involvement in alcohol and other drug (AOD) use and sexual risk behaviors (SRBs). Such factors include personality characteristics such as sensation-seeking, cognitive factors such as positive expectancies and inhibition conflict as well as peer norm processes. The current study was guided by a conceptual perspective that support the notion that an integrative framework that includes multi-level factors has significant explanatory value for understanding processes associated with the co-occurrence of AOD use and sexual risk behavior outcomes. This study evaluated simultaneously the mediating role of AOD-sex related expectancies and inhibition conflict on antecedents of AOD use and SRBs including sexual sensation-seeking and peer norms for condom use. The sample was drawn from the Enhancing My Personal Options While Evaluating Risk (EMPOWER: Jonathan Tubman, PI), data set (N = 396; aged 12-18 years). Measures used in the study included Sexual Sensation-Seeking Scale, Inhibition Conflict for Condom Use, Risky Sex Scale. All relevant measures had well-documented psychometric properties. A global assessment of alcohol, drug use and sexual risk behaviors was used. Results demonstrated that AOD-sex related expectancies mediated the influence of sexual sensation-seeking on the co-occurrence of alcohol and other drug use and sexual risk behaviors. The evaluation of the integrative model also revealed that sexual sensation-seeking was positively associated with peer norms for condom use. Also, peer norms predicted inhibition conflict among this sample of multi-problem youth. This dissertation research identified mechanisms of risk and protection associated with the co-occurrence of AOD use and SRBs among a multi-problem sample of adolescents receiving treatment for alcohol or drug use and related problems. This study is informative for adolescent-serving programs that address those individual and contextual characteristics that enhance treatment efficacy and effectiveness among adolescents receiving substance use and related problems services.
Resumo:
With the flow of the Mara River becoming increasingly erratic especially in the upper reaches, attention has been directed to land use change as the major cause of this problem. The semi-distributed hydrological model Soil and Water Assessment Tool 5 (SWAT) and Landsat imagery were utilized in the upper Mara River Basin in order to 1) map existing field scale land use practices in order to determine their impact 2) determine the impacts of land use change on water flux; and 3) determine the impacts of rainfall (0%, ±10% and ±20%) and air temperature variations (0% and +5%) based on the Intergovernmental Panel on Climate Change projections on the water flux of the 10 upper Mara River. This study found that the different scenarios impacted on the water balance components differently. Land use changes resulted in a slightly more erratic discharge while rainfall and air temperature changes had a more predictable impact on the discharge and water balance components. These findings demonstrate that the model results 15 show the flow was more sensitive to the rainfall changes than land use changes. It was also shown that land use changes can reduce dry season flow which is the most important problem in the basin. The model shows also deforestation in the Mau Forest increased the peak flows which can also lead to high sediment loading in the Mara River. The effect of the land use and climate change scenarios on the sediment and 20 water quality of the river needs a thorough understanding of the sediment transport processes in addition to observed sediment and water quality data for validation of modeling results.
Resumo:
Multi-Cloud Applications are composed of services offered by multiple cloud platforms where the user/developer has full knowledge of the use of such platforms. The use of multiple cloud platforms avoids the following problems: (i) vendor lock-in, which is dependency on the application of a certain cloud platform, which is prejudicial in the case of degradation or failure of platform services, or even price increasing on service usage; (ii) degradation or failure of the application due to fluctuations in quality of service (QoS) provided by some cloud platform, or even due to a failure of any service. In multi-cloud scenario is possible to change a service in failure or with QoS problems for an equivalent of another cloud platform. So that an application can adopt the perspective multi-cloud is necessary to create mechanisms that are able to select which cloud services/platforms should be used in accordance with the requirements determined by the programmer/user. In this context, the major challenges in terms of development of such applications include questions such as: (i) the choice of which underlying services and cloud computing platforms should be used based on the defined user requirements in terms of functionality and quality (ii) the need to continually monitor the dynamic information (such as response time, availability, price, availability), related to cloud services, in addition to the wide variety of services, and (iii) the need to adapt the application if QoS violations affect user defined requirements. This PhD thesis proposes an approach for dynamic adaptation of multi-cloud applications to be applied when a service is unavailable or when the requirements set by the user/developer point out that other available multi-cloud configuration meets more efficiently. Thus, this work proposes a strategy composed of two phases. The first phase consists of the application modeling, exploring the similarities representation capacity and variability proposals in the context of the paradigm of Software Product Lines (SPL). In this phase it is used an extended feature model to specify the cloud service configuration to be used by the application (similarities) and the different possible providers for each service (variability). Furthermore, the non-functional requirements associated with cloud services are specified by properties in this model by describing dynamic information about these services. The second phase consists of an autonomic process based on MAPE-K control loop, which is responsible for selecting, optimally, a multicloud configuration that meets the established requirements, and perform the adaptation. The adaptation strategy proposed is independent of the used programming technique for performing the adaptation. In this work we implement the adaptation strategy using various programming techniques such as aspect-oriented programming, context-oriented programming and components and services oriented programming. Based on the proposed steps, we tried to assess the following: (i) the process of modeling and the specification of non-functional requirements can ensure effective monitoring of user satisfaction; (ii) if the optimal selection process presents significant gains compared to sequential approach; and (iii) which techniques have the best trade-off when compared efforts to development/modularity and performance.
Resumo:
The successful performance of a hydrological model is usually challenged by the quality of the sensitivity analysis, calibration and uncertainty analysis carried out in the modeling exercise and subsequent simulation results. This is especially important under changing climatic conditions where there are more uncertainties associated with climate models and downscaling processes that increase the complexities of the hydrological modeling system. In response to these challenges and to improve the performance of the hydrological models under changing climatic conditions, this research proposed five new methods for supporting hydrological modeling. First, a design of experiment aided sensitivity analysis and parameterization (DOE-SAP) method was proposed to investigate the significant parameters and provide more reliable sensitivity analysis for improving parameterization during hydrological modeling. The better calibration results along with the advanced sensitivity analysis for significant parameters and their interactions were achieved in the case study. Second, a comprehensive uncertainty evaluation scheme was developed to evaluate three uncertainty analysis methods, the sequential uncertainty fitting version 2 (SUFI-2), generalized likelihood uncertainty estimation (GLUE) and Parameter solution (ParaSol) methods. The results showed that the SUFI-2 performed better than the other two methods based on calibration and uncertainty analysis results. The proposed evaluation scheme demonstrated that it is capable of selecting the most suitable uncertainty method for case studies. Third, a novel sequential multi-criteria based calibration and uncertainty analysis (SMC-CUA) method was proposed to improve the efficiency of calibration and uncertainty analysis and control the phenomenon of equifinality. The results showed that the SMC-CUA method was able to provide better uncertainty analysis results with high computational efficiency compared to the SUFI-2 and GLUE methods and control parameter uncertainty and the equifinality effect without sacrificing simulation performance. Fourth, an innovative response based statistical evaluation method (RESEM) was proposed for estimating the uncertainty propagated effects and providing long-term prediction for hydrological responses under changing climatic conditions. By using RESEM, the uncertainty propagated from statistical downscaling to hydrological modeling can be evaluated. Fifth, an integrated simulation-based evaluation system for uncertainty propagation analysis (ISES-UPA) was proposed for investigating the effects and contributions of different uncertainty components to the total propagated uncertainty from statistical downscaling. Using ISES-UPA, the uncertainty from statistical downscaling, uncertainty from hydrological modeling, and the total uncertainty from two uncertainty sources can be compared and quantified. The feasibility of all the methods has been tested using hypothetical and real-world case studies. The proposed methods can also be integrated as a hydrological modeling system to better support hydrological studies under changing climatic conditions. The results from the proposed integrated hydrological modeling system can be used as scientific references for decision makers to reduce the potential risk of damages caused by extreme events for long-term water resource management and planning.
Resumo:
We present new d13C measurements of atmospheric CO2 covering the last glacial/interglacial cycle, complementing previous records covering Terminations I and II. Most prominent in the new record is a significant depletion in d13C(atm) of 0.5 permil occurring during marine isotope stage (MIS) 4, followed by an enrichment of the same magnitude at the beginning of MIS 3. Such a significant excursion in the record is otherwise only observed at glacial terminations, suggesting that similar processes were at play, such as changing sea surface temperatures, changes in marine biological export in the Southern Ocean (SO) due to variations in aeolian iron fluxes, changes in the Atlantic meridional overturning circulation, upwelling of deep water in the SO, and long-term trends in terrestrial carbon storage. Based on previous modeling studies, we propose constraints on some of these processes during specific time intervals. The decrease in d13C(atm) at the end of MIS 4 starting approximately 64 kyr B.P. was accompanied by increasing [CO2]. This period is also marked by a decrease in aeolian iron flux to the SO, followed by an increase in SO upwelling during Heinrich event 6, indicating that it is likely that a large amount of d13C-depleted carbon was transferred to the deep oceans previously, i.e., at the onset of MIS 4. Apart from the upwelling event at the end of MIS 4 (and potentially smaller events during Heinrich events in MIS 3), upwelling of deep water in the SO remained reduced until the last glacial termination, whereupon a second pulse of isotopically light carbon was released into the atmosphere.
Resumo:
This paper presents a theoretical model on the vibration analysis of micro scale fluid-loaded rectangular isotropic plates, based on the Lamb's assumption of fluid-structure interaction and the Rayleigh-Ritz energy method. An analytical solution for this model is proposed, which can be applied to most cases of boundary conditions. The dynamical experimental data of a series of microfabricated silicon plates are obtained using a base-excitation dynamic testing facility. The natural frequencies and mode shapes in the experimental results are in good agreement with the theoretical simulations for the lower order modes. The presented theoretical and experimental investigations on the vibration characteristics of the micro scale plates are of particular interest in the design of microplate based biosensing devices. Copyright © 2009 by ASME.
Resumo:
This paper presents the summary of the key objectives, instrumentation and logistic details, goals, and initial scientific findings of the European Marie Curie Action SAPUSS project carried out in the western Mediterranean Basin (WMB) during September-October in autumn 2010. The key SAPUSS objective is to deduce aerosol source characteristics and to understand the atmospheric processes responsible for their generations and transformations - both horizontally and vertically in the Mediterranean urban environment. In order to achieve so, the unique approach of SAPUSS is the concurrent measurements of aerosols with multiple techniques occurring simultaneously in six monitoring sites around the city of Barcelona (NE Spain): a main road traffic site, two urban background sites, a regional background site and two urban tower sites (150 m and 545 m above sea level, 150 m and 80 m above ground, respectively). SAPUSS allows us to advance our knowledge sensibly of the atmospheric chemistry and physics of the urban Mediterranean environment. This is well achieved only because of both the three dimensional spatial scale and the high sampling time resolution used. During SAPUSS different meteorological regimes were encountered, including warm Saharan, cold Atlantic, wet European and stagnant regional ones. The different meteorology of such regimes is herein described. Additionally, we report the trends of the parameters regulated by air quality purposes (both gaseous and aerosol mass concentrations); and we also compare the six monitoring sites. High levels of traffic-related gaseous pollutants were measured at the urban ground level monitoring sites, whereas layers of tropospheric ozone were recorded at tower levels. Particularly, tower level night-time average ozone concentrations (80 +/- 25 mu g m(-3)) were up to double compared to ground level ones. The examination of the vertical profiles clearly shows the predominant influence of NOx on ozone concentrations, and a source of ozone aloft. Analysis of the particulate matter (PM) mass concentrations shows an enhancement of coarse particles (PM2.5-10) at the urban ground level (+64 %, average 11.7 mu g m(-3)) but of fine ones (PM1) at urban tower level (+28 %, average 14.4 mu g m(-3)). These results show complex dynamics of the size-resolved PM mass at both horizontal and vertical levels of the study area. Preliminary modelling findings reveal an underestimation of the fine accumulation aerosols. In summary, this paper lays the foundation of SAPUSS, an integrated study of relevance to many other similar urban Mediterranean coastal environment sites.
Resumo:
Many modern applications fall into the category of "large-scale" statistical problems, in which both the number of observations n and the number of features or parameters p may be large. Many existing methods focus on point estimation, despite the continued relevance of uncertainty quantification in the sciences, where the number of parameters to estimate often exceeds the sample size, despite huge increases in the value of n typically seen in many fields. Thus, the tendency in some areas of industry to dispense with traditional statistical analysis on the basis that "n=all" is of little relevance outside of certain narrow applications. The main result of the Big Data revolution in most fields has instead been to make computation much harder without reducing the importance of uncertainty quantification. Bayesian methods excel at uncertainty quantification, but often scale poorly relative to alternatives. This conflict between the statistical advantages of Bayesian procedures and their substantial computational disadvantages is perhaps the greatest challenge facing modern Bayesian statistics, and is the primary motivation for the work presented here.
Two general strategies for scaling Bayesian inference are considered. The first is the development of methods that lend themselves to faster computation, and the second is design and characterization of computational algorithms that scale better in n or p. In the first instance, the focus is on joint inference outside of the standard problem of multivariate continuous data that has been a major focus of previous theoretical work in this area. In the second area, we pursue strategies for improving the speed of Markov chain Monte Carlo algorithms, and characterizing their performance in large-scale settings. Throughout, the focus is on rigorous theoretical evaluation combined with empirical demonstrations of performance and concordance with the theory.
One topic we consider is modeling the joint distribution of multivariate categorical data, often summarized in a contingency table. Contingency table analysis routinely relies on log-linear models, with latent structure analysis providing a common alternative. Latent structure models lead to a reduced rank tensor factorization of the probability mass function for multivariate categorical data, while log-linear models achieve dimensionality reduction through sparsity. Little is known about the relationship between these notions of dimensionality reduction in the two paradigms. In Chapter 2, we derive several results relating the support of a log-linear model to nonnegative ranks of the associated probability tensor. Motivated by these findings, we propose a new collapsed Tucker class of tensor decompositions, which bridge existing PARAFAC and Tucker decompositions, providing a more flexible framework for parsimoniously characterizing multivariate categorical data. Taking a Bayesian approach to inference, we illustrate empirical advantages of the new decompositions.
Latent class models for the joint distribution of multivariate categorical, such as the PARAFAC decomposition, data play an important role in the analysis of population structure. In this context, the number of latent classes is interpreted as the number of genetically distinct subpopulations of an organism, an important factor in the analysis of evolutionary processes and conservation status. Existing methods focus on point estimates of the number of subpopulations, and lack robust uncertainty quantification. Moreover, whether the number of latent classes in these models is even an identified parameter is an open question. In Chapter 3, we show that when the model is properly specified, the correct number of subpopulations can be recovered almost surely. We then propose an alternative method for estimating the number of latent subpopulations that provides good quantification of uncertainty, and provide a simple procedure for verifying that the proposed method is consistent for the number of subpopulations. The performance of the model in estimating the number of subpopulations and other common population structure inference problems is assessed in simulations and a real data application.
In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis--Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. In Chapter 4 we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis--Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback-Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even in relatively small samples. The proposed approximation provides a computationally scalable and principled approach to regularized estimation and approximate Bayesian inference for log-linear models.
Another challenging and somewhat non-standard joint modeling problem is inference on tail dependence in stochastic processes. In applications where extreme dependence is of interest, data are almost always time-indexed. Existing methods for inference and modeling in this setting often cluster extreme events or choose window sizes with the goal of preserving temporal information. In Chapter 5, we propose an alternative paradigm for inference on tail dependence in stochastic processes with arbitrary temporal dependence structure in the extremes, based on the idea that the information on strength of tail dependence and the temporal structure in this dependence are both encoded in waiting times between exceedances of high thresholds. We construct a class of time-indexed stochastic processes with tail dependence obtained by endowing the support points in de Haan's spectral representation of max-stable processes with velocities and lifetimes. We extend Smith's model to these max-stable velocity processes and obtain the distribution of waiting times between extreme events at multiple locations. Motivated by this result, a new definition of tail dependence is proposed that is a function of the distribution of waiting times between threshold exceedances, and an inferential framework is constructed for estimating the strength of extremal dependence and quantifying uncertainty in this paradigm. The method is applied to climatological, financial, and electrophysiology data.
The remainder of this thesis focuses on posterior computation by Markov chain Monte Carlo. The Markov Chain Monte Carlo method is the dominant paradigm for posterior computation in Bayesian analysis. It has long been common to control computation time by making approximations to the Markov transition kernel. Comparatively little attention has been paid to convergence and estimation error in these approximating Markov Chains. In Chapter 6, we propose a framework for assessing when to use approximations in MCMC algorithms, and how much error in the transition kernel should be tolerated to obtain optimal estimation performance with respect to a specified loss function and computational budget. The results require only ergodicity of the exact kernel and control of the kernel approximation accuracy. The theoretical framework is applied to approximations based on random subsets of data, low-rank approximations of Gaussian processes, and a novel approximating Markov chain for discrete mixture models.
Data augmentation Gibbs samplers are arguably the most popular class of algorithm for approximately sampling from the posterior distribution for the parameters of generalized linear models. The truncated Normal and Polya-Gamma data augmentation samplers are standard examples for probit and logit links, respectively. Motivated by an important problem in quantitative advertising, in Chapter 7 we consider the application of these algorithms to modeling rare events. We show that when the sample size is large but the observed number of successes is small, these data augmentation samplers mix very slowly, with a spectral gap that converges to zero at a rate at least proportional to the reciprocal of the square root of the sample size up to a log factor. In simulation studies, moderate sample sizes result in high autocorrelations and small effective sample sizes. Similar empirical results are observed for related data augmentation samplers for multinomial logit and probit models. When applied to a real quantitative advertising dataset, the data augmentation samplers mix very poorly. Conversely, Hamiltonian Monte Carlo and a type of independence chain Metropolis algorithm show good mixing on the same dataset.