926 resultados para Bayesian Mixture Model, Cavalieri Method, Trapezoidal Rule
Resumo:
When something unfamiliar emerges or when something familiar does something unexpected people need to make sense of what is emerging or going on in order to act. Social representations theory suggests how individuals and society make sense of the unfamiliar and hence how the resultant social representations (SRs) cognitively, emotionally, and actively orient people and enable communication. SRs are social constructions that emerge through individual and collective engagement with media and with everyday conversations among people. Recent developments in text analysis techniques, and in particular topic modeling, provide a potentially powerful analytical method to examine the structure and content of SRs using large samples of narrative or text. In this paper I describe the methods and results of applying topic modeling to 660 micronarratives collected from Australian academics / researchers, government employees, and members of the public in 2010-2011. The narrative fragments focused on adaptation to climate change (CC) and hence provide an example of Australian society making sense of an emerging and conflict ridden phenomena. The results of the topic modeling reflect elements of SRs of adaptation to CC that are consistent with findings in the literature as well as being reasonably robust predictors of classes of action in response to CC. Bayesian Network (BN) modeling was used to identify relationships among the topics (SR elements) and in particular to identify relationships among topics, sentiment, and action. Finally the resulting model and topic modeling results are used to highlight differences in the salience of SR elements among social groups. The approach of linking topic modeling and BN modeling offers a new and encouraging approach to analysis for ongoing research on SRs.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,
Resumo:
Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Resumo:
A new type of space debris was recently discovered by Schildknecht in near -geosynchronous orbit (GEO). These objects were later identified as exhibiting properties associated with High Area-to-Mass ratio (HAMR) objects. According to their brightness magnitudes (light curve), high rotation rates and composition properties (albedo, amount of specular and diffuse reflection, colour, etc), it is thought that these objects are multilayer insulation (MLI). Observations have shown that this debris type is very sensitive to environmental disturbances, particularly solar radiation pressure, due to the fact that their shapes are easily deformed leading to changes in the Area-to-Mass ratio (AMR) over time. This thesis proposes a simple effective flexible model of the thin, deformable membrane with two different methods. Firstly, this debris is modelled with Finite Element Analysis (FEA) by using Bernoulli-Euler theory called “Bernoulli model”. The Bernoulli model is constructed with beam elements consisting 2 nodes and each node has six degrees of freedom (DoF). The mass of membrane is distributed in beam elements. Secondly, the debris based on multibody dynamics theory call “Multibody model” is modelled as a series of lump masses, connected through flexible joints, representing the flexibility of the membrane itself. The mass of the membrane, albeit low, is taken into account with lump masses in the joints. The dynamic equations for the masses, including the constraints defined by the connecting rigid rod, are derived using fundamental Newtonian mechanics. The physical properties of both flexible models required by the models (membrane density, reflectivity, composition, etc.), are assumed to be those of multilayer insulation. Both flexible membrane models are then propagated together with classical orbital and attitude equations of motion near GEO region to predict the orbital evolution under the perturbations of solar radiation pressure, Earth’s gravity field, luni-solar gravitational fields and self-shadowing effect. These results are then compared to two rigid body models (cannonball and flat rigid plate). In this investigation, when comparing with a rigid model, the evolutions of orbital elements of the flexible models indicate the difference of inclination and secular eccentricity evolutions, rapid irregular attitude motion and unstable cross-section area due to a deformation over time. Then, the Monte Carlo simulations by varying initial attitude dynamics and deformed angle are investigated and compared with rigid models over 100 days. As the results of the simulations, the different initial conditions provide unique orbital motions, which is significantly different in term of orbital motions of both rigid models. Furthermore, this thesis presents a methodology to determine the material dynamic properties of thin membranes and validates the deformation of the multibody model with real MLI materials. Experiments are performed in a high vacuum chamber (10-4 mbar) replicating space environment. A thin membrane is hinged at one end but free at the other. The free motion experiment, the first experiment, is a free vibration test to determine the damping coefficient and natural frequency of the thin membrane. In this test, the membrane is allowed to fall freely in the chamber with the motion tracked and captured through high velocity video frames. A Kalman filter technique is implemented in the tracking algorithm to reduce noise and increase the tracking accuracy of the oscillating motion. The forced motion experiment, the last test, is performed to determine the deformation characteristics of the object. A high power spotlight (500-2000W) is used to illuminate the MLI and the displacements are measured by means of a high resolution laser sensor. Finite Element Analysis (FEA) and multibody dynamics of the experimental setups are used for the validation of the flexible model by comparing with the experimental results of displacements and natural frequencies.
Resumo:
A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,
Resumo:
The long-term adverse effects on health associated with air pollution exposure can be estimated using either cohort or spatio-temporal ecological designs. In a cohort study, the health status of a cohort of people are assessed periodically over a number of years, and then related to estimated ambient pollution concentrations in the cities in which they live. However, such cohort studies are expensive and time consuming to implement, due to the long-term follow up required for the cohort. Therefore, spatio-temporal ecological studies are also being used to estimate the long-term health effects of air pollution as they are easy to implement due to the routine availability of the required data. Spatio-temporal ecological studies estimate the health impact of air pollution by utilising geographical and temporal contrasts in air pollution and disease risk across $n$ contiguous small-areas, such as census tracts or electoral wards, for multiple time periods. The disease data are counts of the numbers of disease cases occurring in each areal unit and time period, and thus Poisson log-linear models are typically used for the analysis. The linear predictor includes pollutant concentrations and known confounders such as socio-economic deprivation. However, as the disease data typically contain residual spatial or spatio-temporal autocorrelation after the covariate effects have been accounted for, these known covariates are augmented by a set of random effects. One key problem in these studies is estimating spatially representative pollution concentrations in each areal which are typically estimated by applying Kriging to data from a sparse monitoring network, or by computing averages over modelled concentrations (grid level) from an atmospheric dispersion model. The aim of this thesis is to investigate the health effects of long-term exposure to Nitrogen Dioxide (NO2) and Particular matter (PM10) in mainland Scotland, UK. In order to have an initial impression about the air pollution health effects in mainland Scotland, chapter 3 presents a standard epidemiological study using a benchmark method. The remaining main chapters (4, 5, 6) cover the main methodological focus in this thesis which has been threefold: (i) how to better estimate pollution by developing a multivariate spatio-temporal fusion model that relates monitored and modelled pollution data over space, time and pollutant; (ii) how to simultaneously estimate the joint effects of multiple pollutants; and (iii) how to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. Specifically, chapters 4 and 5 are developed to achieve (i), while chapter 6 focuses on (ii) and (iii). In chapter 4, I propose an integrated model for estimating the long-term health effects of NO2, that fuses modelled and measured pollution data to provide improved predictions of areal level pollution concentrations and hence health effects. The air pollution fusion model proposed is a Bayesian space-time linear regression model for relating the measured concentrations to the modelled concentrations for a single pollutant, whilst allowing for additional covariate information such as site type (e.g. roadside, rural, etc) and temperature. However, it is known that some pollutants might be correlated because they may be generated by common processes or be driven by similar factors such as meteorology. The correlation between pollutants can help to predict one pollutant by borrowing strength from the others. Therefore, in chapter 5, I propose a multi-pollutant model which is a multivariate spatio-temporal fusion model that extends the single pollutant model in chapter 4, which relates monitored and modelled pollution data over space, time and pollutant to predict pollution across mainland Scotland. Considering that we are exposed to multiple pollutants simultaneously because the air we breathe contains a complex mixture of particle and gas phase pollutants, the health effects of exposure to multiple pollutants have been investigated in chapter 6. Therefore, this is a natural extension to the single pollutant health effects in chapter 4. Given NO2 and PM10 are highly correlated (multicollinearity issue) in my data, I first propose a temporally-varying linear model to regress one pollutant (e.g. NO2) against another (e.g. PM10) and then use the residuals in the disease model as well as PM10, thus investigating the health effects of exposure to both pollutants simultaneously. Another issue considered in chapter 6 is to allow for the uncertainty in the estimated pollution concentrations when estimating their health effects. There are in total four approaches being developed to adjust the exposure uncertainty. Finally, chapter 7 summarises the work contained within this thesis and discusses the implications for future research.
Resumo:
International audience
Resumo:
Recently, the interest of the automotive market for hybrid vehicles has increased due to the more restrictive pollutants emissions legislation and to the necessity of decreasing the fossil fuel consumption, since such solution allows a consistent improvement of the vehicle global efficiency. The term hybridization regards the energy flow in the powertrain of a vehicle: a standard vehicle has, usually, only one energy source and one energy tank; instead, a hybrid vehicle has at least two energy sources. In most cases, the prime mover is an internal combustion engine (ICE) while the auxiliary energy source can be mechanical, electrical, pneumatic or hydraulic. It is expected from the control unit of a hybrid vehicle the use of the ICE in high efficiency working zones and to shut it down when it is more convenient, while using the EMG at partial loads and as a fast torque response during transients. However, the battery state of charge may represent a limitation for such a strategy. That’s the reason why, in most cases, energy management strategies are based on the State Of Charge, or SOC, control. Several studies have been conducted on this topic and many different approaches have been illustrated. The purpose of this dissertation is to develop an online (usable on-board) control strategy in which the operating modes are defined using an instantaneous optimization method that minimizes the equivalent fuel consumption of a hybrid electric vehicle. The equivalent fuel consumption is calculated by taking into account the total energy used by the hybrid powertrain during the propulsion phases. The first section presents the hybrid vehicles characteristics. The second chapter describes the global model, with a particular focus on the energy management strategies usable for the supervisory control of such a powertrain. The third chapter shows the performance of the implemented controller on a NEDC cycle compared with the one obtained with the original control strategy.
Resumo:
In the presented thesis work, the meshfree method with distance fields was coupled with the lattice Boltzmann method to obtain solutions of fluid-structure interaction problems. The thesis work involved development and implementation of numerical algorithms, data structure, and software. Numerical and computational properties of the coupling algorithm combining the meshfree method with distance fields and the lattice Boltzmann method were investigated. Convergence and accuracy of the methodology was validated by analytical solutions. The research was focused on fluid-structure interaction solutions in complex, mesh-resistant domains as both the lattice Boltzmann method and the meshfree method with distance fields are particularly adept in these situations. Furthermore, the fluid solution provided by the lattice Boltzmann method is massively scalable, allowing extensive use of cutting edge parallel computing resources to accelerate this phase of the solution process. The meshfree method with distance fields allows for exact satisfaction of boundary conditions making it possible to exactly capture the effects of the fluid field on the solid structure.
Resumo:
Dust attenuation affects nearly all observational aspects of galaxy evolution, yet very little is known about the form of the dust-attenuation law in the distant universe. Here, we model the spectral energy distributions of galaxies at z ~ 1.5–3 from CANDELS with rest-frame UV to near-IR imaging under different assumptions about the dust law, and compare the amount of inferred attenuated light with the observed infrared (IR) luminosities. Some individual galaxies show strong Bayesian evidence in preference of one dust law over another, and this preference agrees with their observed location on the plane of infrared excess (IRX, L_TIR/L_UV) and UV slope (β). We generalize the shape of the dust law with an empirical model, A_ λ,σ =E(B-V)k_ λ (λ / λ v)^ σ where k_λ is the dust law of Calzetti et al., and show that there exists a correlation between the color excess E(B-V) and tilt δ with δ =(0.62±0.05)log(E(B-V))+(0.26±0.02). Galaxies with high color excess have a shallower, starburst-like law, and those with low color excess have a steeper, SMC-like law. Surprisingly, the galaxies in our sample show no correlation between the shape of the dust law and stellar mass, star formation rate, or β. The change in the dust law with color excess is consistent with a model where attenuation is caused by scattering, a mixed star–dust geometry, and/or trends with stellar population age, metallicity, and dust grain size. This rest-frame UV-to-near-IR method shows potential to constrain the dust law at even higher redshifts (z>3).
Resumo:
Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.
Resumo:
2016
Resumo:
2016
Resumo:
In recent years, radars have been used in many applications such as precision agriculture and advanced driver assistant systems. Optimal techniques for the estimation of the number of targets and of their coordinates require solving multidimensional optimization problems entailing huge computational efforts. This has motivated the development of sub-optimal estimation techniques able to achieve good accuracy at a manageable computational cost. Another technical issue in advanced driver assistant systems is the tracking of multiple targets. Even if various filtering techniques have been developed, new efficient and robust algorithms for target tracking can be devised exploiting a probabilistic approach, based on the use of the factor graph and the sum-product algorithm. The two contributions provided by this dissertation are the investigation of the filtering and smoothing problems from a factor graph perspective and the development of efficient algorithms for two and three-dimensional radar imaging. Concerning the first contribution, a new factor graph for filtering is derived and the sum-product rule is applied to this graphical model; this allows to interpret known algorithms and to develop new filtering techniques. Then, a general method, based on graphical modelling, is proposed to derive filtering algorithms that involve a network of interconnected Bayesian filters. Finally, the proposed graphical approach is exploited to devise a new smoothing algorithm. Numerical results for dynamic systems evidence that our algorithms can achieve a better complexity-accuracy tradeoff and tracking capability than other techniques in the literature. Regarding radar imaging, various algorithms are developed for frequency modulated continuous wave radars; these algorithms rely on novel and efficient methods for the detection and estimation of multiple superimposed tones in noise. The accuracy achieved in the presence of multiple closely spaced targets is assessed on the basis of both synthetically generated data and of the measurements acquired through two commercial multiple-input multiple-output radars.