948 resultados para Model Output Statistics
Resumo:
Uncertainty quantification (UQ) is both an old and new concept. The current novelty lies in the interactions and synthesis of mathematical models, computer experiments, statistics, field/real experiments, and probability theory, with a particular emphasize on the large-scale simulations by computer models. The challenges not only come from the complication of scientific questions, but also from the size of the information. It is the focus in this thesis to provide statistical models that are scalable to massive data produced in computer experiments and real experiments, through fast and robust statistical inference.
Chapter 2 provides a practical approach for simultaneously emulating/approximating massive number of functions, with the application on hazard quantification of Soufri\`{e}re Hills volcano in Montserrate island. Chapter 3 discusses another problem with massive data, in which the number of observations of a function is large. An exact algorithm that is linear in time is developed for the problem of interpolation of Methylation levels. Chapter 4 and Chapter 5 are both about the robust inference of the models. Chapter 4 provides a new criteria robustness parameter estimation criteria and several ways of inference have been shown to satisfy such criteria. Chapter 5 develops a new prior that satisfies some more criteria and is thus proposed to use in practice.
Resumo:
Fitting statistical models is computationally challenging when the sample size or the dimension of the dataset is huge. An attractive approach for down-scaling the problem size is to first partition the dataset into subsets and then fit using distributed algorithms. The dataset can be partitioned either horizontally (in the sample space) or vertically (in the feature space), and the challenge arise in defining an algorithm with low communication, theoretical guarantees and excellent practical performance in general settings. For sample space partitioning, I propose a MEdian Selection Subset AGgregation Estimator ({\em message}) algorithm for solving these issues. The algorithm applies feature selection in parallel for each subset using regularized regression or Bayesian variable selection method, calculates the `median' feature inclusion index, estimates coefficients for the selected features in parallel for each subset, and then averages these estimates. The algorithm is simple, involves very minimal communication, scales efficiently in sample size, and has theoretical guarantees. I provide extensive experiments to show excellent performance in feature selection, estimation, prediction, and computation time relative to usual competitors.
While sample space partitioning is useful in handling datasets with large sample size, feature space partitioning is more effective when the data dimension is high. Existing methods for partitioning features, however, are either vulnerable to high correlations or inefficient in reducing the model dimension. In the thesis, I propose a new embarrassingly parallel framework named {\em DECO} for distributed variable selection and parameter estimation. In {\em DECO}, variables are first partitioned and allocated to m distributed workers. The decorrelated subset data within each worker are then fitted via any algorithm designed for high-dimensional problems. We show that by incorporating the decorrelation step, DECO can achieve consistent variable selection and parameter estimation on each subset with (almost) no assumptions. In addition, the convergence rate is nearly minimax optimal for both sparse and weakly sparse models and does NOT depend on the partition number m. Extensive numerical experiments are provided to illustrate the performance of the new framework.
For datasets with both large sample sizes and high dimensionality, I propose a new "divided-and-conquer" framework {\em DEME} (DECO-message) by leveraging both the {\em DECO} and the {\em message} algorithm. The new framework first partitions the dataset in the sample space into row cubes using {\em message} and then partition the feature space of the cubes using {\em DECO}. This procedure is equivalent to partitioning the original data matrix into multiple small blocks, each with a feasible size that can be stored and fitted in a computer in parallel. The results are then synthezied via the {\em DECO} and {\em message} algorithm in a reverse order to produce the final output. The whole framework is extremely scalable.
Resumo:
English has been taught as a core and compulsory subject in China for decades. Recently, the demand for English in China has increased dramatically. China now has the world’s largest English-learning population. The traditional English-teaching method cannot continue to be the only approach because it merely focuses on reading, grammar and translation, which cannot meet English learners and users’ needs (i.e., communicative competence and skills in speaking and writing). This study was conducted to investigate if the Picture-Word Inductive Model (PWIM), a new pedagogical method using pictures and inductive thinking, would benefit English learners in China in terms of potential higher output in speaking and writing. With the gauge of Cognitive Load Theory (CLT), specifically, its redundancy effect, I investigated whether processing words and a picture concurrently would present a cognitive overload for English learners in China. I conducted a mixed methods research study. A quasi-experiment (pretest, intervention for seven weeks, and posttest) was conducted using 234 students in four groups in Lianyungang, China (58 fourth graders and 57 seventh graders as an experimental group with PWIM and 59 fourth graders and 60 seventh graders as a control group with the traditional method). No significant difference in the effects of PWIM was found on vocabulary acquisition based on grade levels. Observations, questionnaires with open-ended questions, and interviews were deployed to answer the three remaining research questions. A few students felt cognitively overloaded when they encountered too many writing samples, too many new words at one time, repeated words, mismatches between words and pictures, and so on. Many students listed and exemplified numerous strengths of PWIM, but a few mentioned weaknesses of PWIM. The students expressed the idea that PWIM had a positive effect on their English teaching. As integrated inferences, qualitative findings were used to explain the quantitative results that there were no significant differences of the effects of the PWIM between the experimental and control groups in both grade levels, from four contextual aspects: time constraints on PWIM implementation, teachers’ resistance, how to use PWIM and PWIM implemented in a classroom over 55 students.
Resumo:
Many dynamical processes are subject to abrupt changes in state. Often these perturbations can be periodic and of short duration relative to the evolving process. These types of phenomena are described well by what are referred to as impulsive differential equations, systems of differential equations coupled with discrete mappings in state space. In this thesis we employ impulsive differential equations to model disease transmission within an industrial livestock barn. In particular we focus on the poultry industry and a viral disease of poultry called Marek's disease. This system lends itself well to impulsive differential equations. Entire cohorts of poultry are introduced and removed from a barn concurrently. Additionally, Marek's disease is transmitted indirectly and the viral particles can survive outside the host for weeks. Therefore, depopulating, cleaning, and restocking of the barn are integral factors in modelling disease transmission and can be completely captured by the impulsive component of the model. Our model allows us to investigate how modern broiler farm practices can make disease elimination difficult or impossible to achieve. It also enables us to investigate factors that may contribute to virulence evolution. Our model suggests that by decrease the cohort duration or by decreasing the flock density, Marek's disease can be eliminated from a barn with no increase in cleaning effort. Unfortunately our model also suggests that these practices will lead to disease evolution towards greater virulence. Additionally, our model suggests that if intensive cleaning between cohorts does not rid the barn of disease, it may drive evolution and cause the disease to become more virulent.
Resumo:
Software assets are key output of the RAGE project and they can be used by applied game developers to enhance the pedagogical and educational value of their games. These software assets cover a broad spectrum of functionalities – from player analytics including emotion detection to intelligent adaptation and social gamification. In order to facilitate integration and interoperability, all of these assets adhere to a common model, which describes their properties through a set of metadata. In this paper the RAGE asset model and asset metadata model is presented, capturing the detail of assets and their potential usage within three distinct dimensions – technological, gaming and pedagogical. The paper highlights key issues and challenges in constructing the RAGE asset and asset metadata model and details the process and design of a flexible metadata editor that facilitates both adaptation and improvement of the asset metadata model.
Resumo:
The application of custom classification techniques and posterior probability modeling (PPM) using Worldview-2 multispectral imagery to archaeological field survey is presented in this paper. Research is focused on the identification of Neolithic felsite stone tool workshops in the North Mavine region of the Shetland Islands in Northern Scotland. Sample data from known workshops surveyed using differential GPS are used alongside known non-sites to train a linear discriminant analysis (LDA) classifier based on a combination of datasets including Worldview-2 bands, band difference ratios (BDR) and topographical derivatives. Principal components analysis is further used to test and reduce dimensionality caused by redundant datasets. Probability models were generated by LDA using principal components and tested with sites identified through geological field survey. Testing shows the prospective ability of this technique and significance between 0.05 and 0.01, and gain statistics between 0.90 and 0.94, higher than those obtained using maximum likelihood and random forest classifiers. Results suggest that this approach is best suited to relatively homogenous site types, and performs better with correlated data sources. Finally, by combining posterior probability models and least-cost analysis, a survey least-cost efficacy model is generated showing the utility of such approaches to archaeological field survey.
Resumo:
In this work we explore optimising parameters of a physical circuit model relative to input/output measurements, using the Dallas Rangemaster Treble Booster as a case study. A hybrid metaheuristic/gradient descent algorithm is implemented, where the initial parameter sets for the optimisation are informed by nominal values from schematics and datasheets. Sensitivity analysis is used to screen parameters, which informs a study of the optimisation algorithm against model complexity by fixing parameters. The results of the optimisation show a significant increase in the accuracy of model behaviour, but also highlight several key issues regarding the recovery of parameters.
Resumo:
An RVE–based stochastic numerical model is used to calculate the permeability of randomly generated porous media at different values of the fiber volume fraction for the case of transverse flow in a unidirectional ply. Analysis of the numerical results shows that the permeability is not normally distributed. With the aim of proposing a new understanding on this particular topic, permeability data are fitted using both a mixture model and a unimodal distribution. Our findings suggest that permeability can be fitted well using a mixture model based on the lognormal and power law distributions. In case of a unimodal distribution, it is found, using the maximum-likelihood estimation method (MLE), that the generalized extreme value (GEV) distribution represents the best fit. Finally, an expression of the permeability as a function of the fiber volume fraction based on the GEV distribution is discussed in light of the previous results.
Resumo:
Graphs are powerful tools to describe social, technological and biological networks, with nodes representing agents (people, websites, gene, etc.) and edges (or links) representing relations (or interactions) between agents. Examples of real-world networks include social networks, the World Wide Web, collaboration networks, protein networks, etc. Researchers often model these networks as random graphs. In this dissertation, we study a recently introduced social network model, named the Multiplicative Attribute Graph model (MAG), which takes into account the randomness of nodal attributes in the process of link formation (i.e., the probability of a link existing between two nodes depends on their attributes). Kim and Lesckovec, who defined the model, have claimed that this model exhibit some of the properties a real world social network is expected to have. Focusing on a homogeneous version of this model, we investigate the existence of zero-one laws for graph properties, e.g., the absence of isolated nodes, graph connectivity and the emergence of triangles. We obtain conditions on the parameters of the model, so that these properties occur with high or vanishingly probability as the number of nodes becomes unboundedly large. In that regime, we also investigate the property of triadic closure and the nodal degree distribution.
Resumo:
A self-organising model of macadamia, expressed using L-Systems, was used to explore aspects of canopy management. A small set of parameters control the basic architecture of the model, with a high degree of self-organisation occurring to determine the fate and growth of buds. Light was sensed at the leaf level and used to represent vigour and accumulated basipetally. Buds also sensed light so as to provide demand in the subsequent redistribution of the vigour. Empirical relationships were derived from a set of 24 completely digitised trees after conversion to multiscale tree graphs (MTG) and analysis with the OpenAlea software library. The ability to write MTG files was embedded within the model so that various tree statistics could be exported for each run of the model. To explore the parameter space a series of runs was completed using a high-throughput computing platform. When combined with MTG generation and analysis with OpenAlea it provided a convenient way in which thousands of simulations could be explored. We allowed the model trees to develop using self-organisation and simulated cultural practices such as hedging, topping, removal of the leader and limb removal within a small representation of an orchard. The model provides insight into the impact of these practices on potential for growth and the light distribution within the canopy and to the orchard floor by coupling the model with a path-tracing program to simulate the light environment. The lessons learnt from this will be applied to other evergreen, tropical fruit and nut trees.
Resumo:
In this paper we investigate a novel model of concatenation of a pair of two-dimensional (2D) convolutional codes. We consider finite-support 2D convolutional codes and choose the so-called Fornasini-Marchesini input-state-output (ISO) model to represent these codes. More concretely, we interconnect in series two ISO representations of two 2D convolutional codes and derive the ISO representation of the ob- tained 2D convolutional code. We provide necessary condition for this representation to be minimal. Moreover, structural properties of modal reachability and modal observability of the resulting 2D convolutional codes are investigated.
Resumo:
As usage metrics continue to attain an increasingly central role in library system assessment and analysis, librarians tasked with system selection, implementation, and support are driven to identify metric approaches that simultaneously require less technical complexity and greater levels of data granularity. Such approaches allow systems librarians to present evidence-based claims of platform usage behaviors while reducing the resources necessary to collect such information, thereby representing a novel approach to real-time user analysis as well as dual benefit in active and preventative cost reduction. As part of the DSpace implementation for the MD SOAR initiative, the Consortial Library Application Support (CLAS) division has begun test implementation of the Google Tag Manager analytic system in an attempt to collect custom analytical dimensions to track author- and university-specific download behaviors. Building on the work of Conrad , CLAS seeks to demonstrate that the GTM approach to custom analytics provides both granular metadata-based usage statistics in an approach that will prove extensible for additional statistical gathering in the future. This poster will discuss the methodology used to develop these custom tag approaches, the benefits of using the GTM model, and the risks and benefits associated with further implementation.