882 resultados para Bayesian model selection


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The dynamics of a population undergoing selection is a central topic in evolutionary biology. This question is particularly intriguing in the case where selective forces act in opposing directions at two population scales. For example, a fast-replicating virus strain outcompetes slower-replicating strains at the within-host scale. However, if the fast-replicating strain causes host morbidity and is less frequently transmitted, it can be outcompeted by slower-replicating strains at the between-host scale. Here we consider a stochastic ball-and-urn process which models this type of phenomenon. We prove the weak convergence of this process under two natural scalings. The first scaling leads to a deterministic nonlinear integro-partial differential equation on the interval $[0,1]$ with dependence on a single parameter, $\lambda$. We show that the fixed points of this differential equation are Beta distributions and that their stability depends on $\lambda$ and the behavior of the initial data around $1$. The second scaling leads to a measure-valued Fleming-Viot process, an infinite dimensional stochastic process that is frequently associated with a population genetics.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The problem of selecting suppliers/partners is a crucial and important part in the process of decision making for companies that intend to perform competitively in their area of activity. The selection of supplier/partner is a time and resource-consuming task that involves data collection and a careful analysis of the factors that can positively or negatively influence the choice. Nevertheless it is a critical process that affects significantly the operational performance of each company. In this work, trough the literature review, there were identified five broad suppliers selection criteria: Quality, Financial, Synergies, Cost, and Production System. Within these criteria, it was also included five sub-criteria. Thereafter, a survey was elaborated and companies were contacted in order to answer which factors have more relevance in their decisions to choose the suppliers. Interpreted the results and processed the data, it was adopted a model of linear weighting to reflect the importance of each factor. The model has a hierarchical structure and can be applied with the Analytic Hierarchy Process (AHP) method or Simple Multi-Attribute Rating Technique (SMART). The result of the research undertaken by the authors is a reference model that represents a decision making support for the suppliers/partners selection process.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This document briefly summarizes the pavement management activities under the existing Iowa Department of Transportation (DOT) Pavement Management System. The second part of the document provides projected increase in use due to the implementation of the Iowa DOT Pavement Management Optimization System. All estimates of existing time devoted to the Pavement Management System and project increases in time requirements are estimates made by the appropriate Iowa DOT office director or function manager. Included is the new Pavement Management Optimization Structure for the three main offices which will work most closely with the Pavement Management Optimization System (Materials, Design, and Program Management).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The missions of the research are to assist the Iowa Department of Transortation (Iowa DOT) to: Define pavement management (PM) optimization; Identify the characteristics of PM optimization systems being developed or implemented; Identify specific and achievable objectives for the Iowa DOT pavement management optimization; Evaluate different PM optimization methodologies; Identify a methodology to perform PM optimization that best satisfies the Iowa DOT's objectives; Develop a plan for the implementation of the PM optimization selected. The project is divided into three (3) phases. The first phase has been completed and accomplished the first three missions (identified above). The second phase has been completed and accomplished the next two missions. Phase three will accomplish the last mission.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A decision-maker, when faced with a limited and fixed budget to collect data in support of a multiple attribute selection decision, must decide how many samples to observe from each alternative and attribute. This allocation decision is of particular importance when the information gained leads to uncertain estimates of the attribute values as with sample data collected from observations such as measurements, experimental evaluations, or simulation runs. For example, when the U.S. Department of Homeland Security must decide upon a radiation detection system to acquire, a number of performance attributes are of interest and must be measured in order to characterize each of the considered systems. We identified and evaluated several approaches to incorporate the uncertainty in the attribute value estimates into a normative model for a multiple attribute selection decision. Assuming an additive multiple attribute value model, we demonstrated the idea of propagating the attribute value uncertainty and describing the decision values for each alternative as probability distributions. These distributions were used to select an alternative. With the goal of maximizing the probability of correct selection we developed and evaluated, under several different sets of assumptions, procedures to allocate the fixed experimental budget across the multiple attributes and alternatives. Through a series of simulation studies, we compared the performance of these allocation procedures to the simple, but common, allocation procedure that distributed the sample budget equally across the alternatives and attributes. We found the allocation procedures that were developed based on the inclusion of decision-maker knowledge, such as knowledge of the decision model, outperformed those that neglected such information. Beginning with general knowledge of the attribute values provided by Bayesian prior distributions, and updating this knowledge with each observed sample, the sequential allocation procedure performed particularly well. These observations demonstrate that managing projects focused on a selection decision so that the decision modeling and the experimental planning are done jointly, rather than in isolation, can improve the overall selection results.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A Bayesian optimisation algorithm for a nurse scheduling problem is presented, which involves choosing a suitable scheduling rule from a set for each nurse's assignment. When a human scheduler works, he normally builds a schedule systematically following a set of rules. After much practice, the scheduler gradually masters the knowledge of which solution parts go well with others. He can identify good parts and is aware of the solution quality even if the scheduling process is not yet completed, thus having the ability to finish a schedule by using flexible, rather than fixed, rules. In this paper, we design a more human-like scheduling algorithm, by using a Bayesian optimisation algorithm to implement explicit learning from past solutions. A nurse scheduling problem from a UK hospital is used for testing. Unlike our previous work that used Genetic Algorithms to implement implicit learning [1], the learning in the proposed algorithm is explicit, i.e. we identify and mix building blocks directly. The Bayesian optimisation algorithm is applied to implement such explicit learning by building a Bayesian network of the joint distribution of solutions. The conditional probability of each variable in the network is computed according to an initial set of promising solutions. Subsequently, each new instance for each variable is generated by using the corresponding conditional probabilities, until all variables have been generated, i.e. in our case, new rule strings have been obtained. Sets of rule strings are generated in this way, some of which will replace previous strings based on fitness. If stopping conditions are not met, the conditional probabilities for all nodes in the Bayesian network are updated again using the current set of promising rule strings. For clarity, consider the following toy example of scheduling five nurses with two rules (1: random allocation, 2: allocate nurse to low-cost shifts). In the beginning of the search, the probabilities of choosing rule 1 or 2 for each nurse is equal, i.e. 50%. After a few iterations, due to the selection pressure and reinforcement learning, we experience two solution pathways: Because pure low-cost or random allocation produces low quality solutions, either rule 1 is used for the first 2-3 nurses and rule 2 on remainder or vice versa. In essence, Bayesian network learns 'use rule 2 after 2-3x using rule 1' or vice versa. It should be noted that for our and most other scheduling problems, the structure of the network model is known and all variables are fully observed. In this case, the goal of learning is to find the rule values that maximize the likelihood of the training data. Thus, learning can amount to 'counting' in the case of multinomial distributions. For our problem, we use our rules: Random, Cheapest Cost, Best Cover and Balance of Cost and Cover. In more detail, the steps of our Bayesian optimisation algorithm for nurse scheduling are: 1. Set t = 0, and generate an initial population P(0) at random; 2. Use roulette-wheel selection to choose a set of promising rule strings S(t) from P(t); 3. Compute conditional probabilities of each node according to this set of promising solutions; 4. Assign each nurse using roulette-wheel selection based on the rules' conditional probabilities. A set of new rule strings O(t) will be generated in this way; 5. Create a new population P(t+1) by replacing some rule strings from P(t) with O(t), and set t = t+1; 6. If the termination conditions are not met (we use 2000 generations), go to step 2. Computational results from 52 real data instances demonstrate the success of this approach. They also suggest that the learning mechanism in the proposed approach might be suitable for other scheduling problems. Another direction for further research is to see if there is a good constructing sequence for individual data instances, given a fixed nurse scheduling order. If so, the good patterns could be recognized and then extracted as new domain knowledge. Thus, by using this extracted knowledge, we can assign specific rules to the corresponding nurses beforehand, and only schedule the remaining nurses with all available rules, making it possible to reduce the solution space. Acknowledgements The work was funded by the UK Government's major funding agency, Engineering and Physical Sciences Research Council (EPSRC), under grand GR/R92899/01. References [1] Aickelin U, "An Indirect Genetic Algorithm for Set Covering Problems", Journal of the Operational Research Society, 53(10): 1118-1126,

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Understanding how virus strains offer protection against closely related emerging strains is vital for creating effective vaccines. For many viruses, including Foot-and-Mouth Disease Virus (FMDV) and the Influenza virus where multiple serotypes often co-circulate, in vitro testing of large numbers of vaccines can be infeasible. Therefore the development of an in silico predictor of cross-protection between strains is important to help optimise vaccine choice. Vaccines will offer cross-protection against closely related strains, but not against those that are antigenically distinct. To be able to predict cross-protection we must understand the antigenic variability within a virus serotype, distinct lineages of a virus, and identify the antigenic residues and evolutionary changes that cause the variability. In this thesis we present a family of sparse hierarchical Bayesian models for detecting relevant antigenic sites in virus evolution (SABRE), as well as an extended version of the method, the extended SABRE (eSABRE) method, which better takes into account the data collection process. The SABRE methods are a family of sparse Bayesian hierarchical models that use spike and slab priors to identify sites in the viral protein which are important for the neutralisation of the virus. In this thesis we demonstrate how the SABRE methods can be used to identify antigenic residues within different serotypes and show how the SABRE method outperforms established methods, mixed-effects models based on forward variable selection or l1 regularisation, on both synthetic and viral datasets. In addition we also test a number of different versions of the SABRE method, compare conjugate and semi-conjugate prior specifications and an alternative to the spike and slab prior; the binary mask model. We also propose novel proposal mechanisms for the Markov chain Monte Carlo (MCMC) simulations, which improve mixing and convergence over that of the established component-wise Gibbs sampler. The SABRE method is then applied to datasets from FMDV and the Influenza virus in order to identify a number of known antigenic residue and to provide hypotheses of other potentially antigenic residues. We also demonstrate how the SABRE methods can be used to create accurate predictions of the important evolutionary changes of the FMDV serotypes. In this thesis we provide an extended version of the SABRE method, the eSABRE method, based on a latent variable model. The eSABRE method takes further into account the structure of the datasets for FMDV and the Influenza virus through the latent variable model and gives an improvement in the modelling of the error. We show how the eSABRE method outperforms the SABRE methods in simulation studies and propose a new information criterion for selecting the random effects factors that should be included in the eSABRE method; block integrated Widely Applicable Information Criterion (biWAIC). We demonstrate how biWAIC performs equally to two other methods for selecting the random effects factors and combine it with the eSABRE method to apply it to two large Influenza datasets. Inference in these large datasets is computationally infeasible with the SABRE methods, but as a result of the improved structure of the likelihood, we are able to show how the eSABRE method offers a computational improvement, leading it to be used on these datasets. The results of the eSABRE method show that we can use the method in a fully automatic manner to identify a large number of antigenic residues on a variety of the antigenic sites of two Influenza serotypes, as well as making predictions of a number of nearby sites that may also be antigenic and are worthy of further experiment investigation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Correlation between genetic parameters and factors such as backfat thickness (BFT), rib eye area (REA), and body weight (BW) were estimated for Canchim beef cattle raised in natural pastures of Brazil. Data from 1648 animals were analyzed using multi-trait (BFT, REA, and BW) animal models by the Bayesian approach. This model included the effects of contemporary group, age, and individual heterozygosity as covariates. In addition, direct additive genetic and random residual effects were also analyzed. Heritability estimated for BFT (0.16), REA (0.50), and BW (0.44) indicated their potential for genetic improvements and response to selection processes. Furthermore, genetic correlations between BW and the remaining traits were high (P > 0.50), suggesting that selection for BW could improve REA and BFT. On the other hand, genetic correlation between BFT and REA was low (P = 0.39 ± 0.17), and included considerable variations, suggesting that these traits can be jointly included as selection criteria without influencing each other. We found that REA and BFT responded to the selection processes, as measured by ultrasound. Therefore, selection for yearling weight results in changes in REA and BFT.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gene clustering is a useful exploratory technique to group together genes with similar expression levels under distinct cell cycle phases or distinct conditions. It helps the biologist to identify potentially meaningful relationships between genes. In this study, we propose a clustering method based on multivariate normal mixture models, where the number of clusters is predicted via sequential hypothesis tests: at each step, the method considers a mixture model of m components (m = 2 in the first step) and tests if in fact it should be m - 1. If the hypothesis is rejected, m is increased and a new test is carried out. The method continues (increasing m) until the hypothesis is accepted. The theoretical core of the method is the full Bayesian significance test, an intuitive Bayesian approach, which needs no model complexity penalization nor positive probabilities for sharp hypotheses. Numerical experiments were based on a cDNA microarray dataset consisting of expression levels of 205 genes belonging to four functional categories, for 10 distinct strains of Saccharomyces cerevisiae. To analyze the method's sensitivity to data dimension, we performed principal components analysis on the original dataset and predicted the number of classes using 2 to 10 principal components. Compared to Mclust (model-based clustering), our method shows more consistent results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The implementation of confidential contracts between a container liner carrier and its customers, because of the Ocean Shipping Reform Act (OSRA) 1998, demands a revision in the methodology applied in the carrier's planning of marketing and sales. The marketing and sales planning process should be more scientific and with a better use of operational research tools considering the selection of the customers under contracts, the duration of the contracts, the freight, and the container imbalances of these contracts are basic factors for the carrier's yield. This work aims to develop a decision support system based on a linear programming model to generate the business plan for a container liner carrier, maximizing the contribution margin of its freight.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hardy-Weinberg Equilibrium (HWE) is an important genetic property that populations should have whenever they are not observing adverse situations as complete lack of panmixia, excess of mutations, excess of selection pressure, etc. HWE for decades has been evaluated; both frequentist and Bayesian methods are in use today. While historically the HWE formula was developed to examine the transmission of alleles in a population from one generation to the next, use of HWE concepts has expanded in human diseases studies to detect genotyping error and disease susceptibility (association); Ryckman and Williams (2008). Most analyses focus on trying to answer the question of whether a population is in HWE. They do not try to quantify how far from the equilibrium the population is. In this paper, we propose the use of a simple disequilibrium coefficient to a locus with two alleles. Based on the posterior density of this disequilibrium coefficient, we show how one can conduct a Bayesian analysis to verify how far from HWE a population is. There are other coefficients introduced in the literature and the advantage of the one introduced in this paper is the fact that, just like the standard correlation coefficients, its range is bounded and it is symmetric around zero (equilibrium) when comparing the positive and the negative values. To test the hypothesis of equilibrium, we use a simple Bayesian significance test, the Full Bayesian Significance Test (FBST); see Pereira, Stern andWechsler (2008) for a complete review. The disequilibrium coefficient proposed provides an easy and efficient way to make the analyses, especially if one uses Bayesian statistics. A routine in R programs (R Development Core Team, 2009) that implements the calculations is provided for the readers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Considering the broad variation in the expression of housekeeping genes among tissues and experimental situations, studies using quantitative RT-PCR require strict definition of adequate endogenous controls. For glioblastoma, the most common type of tumor in the central nervous system, there was no previous report regarding this issue. Results: Here we show that amongst seven frequently used housekeeping genes TBP and HPRT1 are adequate references for glioblastoma gene expression analysis. Evaluation of the expression levels of 12 target genes utilizing different endogenous controls revealed that the normalization method applied might introduce errors in the estimation of relative quantities. Genes presenting expression levels which do not significantly differ between tumor and normal tissues can be considered either increased or decreased if unsuitable reference genes are applied. Most importantly, genes showing significant differences in expression levels between tumor and normal tissues can be missed. We also demonstrated that the Holliday Junction Recognizing Protein, a novel DNA repair protein over expressed in lung cancer, is extremely over-expressed in glioblastoma, with a median change of about 134 fold. Conclusion: Altogether, our data show the relevance of previous validation of candidate control genes for each experimental model and indicate TBP plus HPRT1 as suitable references for studies on glioblastoma gene expression.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Survival or longevity is an economically important trait in beef cattle. The main inconvenience for its inclusion in selection criteria is delayed recording of phenotypic data and the high computational demand for including survival in proportional hazard models. Thus, identification of a longevity-correlated trait that could be recorded early in life would be very useful for selection purposes. We estimated the genetic relationship of survival with productive and reproductive traits in Nellore cattle, including weaning weight (WW), post-weaning growth (PWG), muscularity (MUSC), scrotal circumference at 18 months (SC18), and heifer pregnancy (HP). Survival was measured in discrete time intervals and modeled through a sequential threshold model. Five independent bivariate Bayesian analyses were performed, accounting for cow survival and the five productive and reproductive traits. Posterior mean estimates for heritability (standard deviation in parentheses) were 0.55 (0.01) for WW, 0.25 (0.01) for PWG, 0.23 (0.01) for MUSC, and 0.48 (0.01) for SC18. The posterior mean estimates (95% confidence interval in parentheses) for the genetic correlation with survival were 0.16 (0.13-0.19), 0.30 (0.25-0.34), 0.31 (0.25-0.36), 0.07 (0.02-0.12), and 0.82 (0.78-0.86) for WW, PWG, MUSC, SC18, and HP, respectively. Based on the high genetic correlation and heritability (0.54) posterior mean estimates for HP, the expected progeny difference for HP can be used to select bulls for longevity, as well as for post-weaning gain and muscle score.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Morphological and molecular analyses have proven to be complementary tools of taxonomic information for the redescription of the ctenostome bryozoans Amathia brasiliensis Busk, 1886 and Amathia distans Busk, 1886. The two species, originally described from material collected by the `Challenger` expedition but synonymized by later authors, now have their status fixed by means of the selection of lectotypes, morphological observations and analyses of DNA sequences described here. The morphological characters allowing the identification of living and/or preserved specimens are (1) A. brasiliensis: whitish-pale pigment spots in the frontal surface of stolons and zooids, and a wide stolon with biserial zooid clusters growing in clockwise and anti-clockwise spirals along it, the spirality direction being maintained from maternal to daughter stolons; and (2) A. distans: bright yellow pigment spots in stolonal and zooidal surfaces including lophophores, and a slender stolon, thickly cuticularized, with biserial zooid clusters growing in clockwise and anti-clockwise spirals along it and the spirality direction not maintained from maternal to daughter stolons. Pairwise comparisons of DNA sequences of the mitochondrial genes cytochrome c oxidase subunit I and large ribosomal RNA subunit revealed deep genetic divergence between A. brasiliensis and A. distans. Finally, analyses of those sequences within a Bayesian phylogenetic context recovered their genealogical species status.