946 resultados para Discrete Variables
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
Mixtures of polynomials (MoPs) are a non-parametric density estimation technique especially designed for hybrid Bayesian networks with continuous and discrete variables. Algorithms to learn one- and multi-dimensional (marginal) MoPs from data have recently been proposed. In this paper we introduce two methods for learning MoP approximations of conditional densities from data. Both approaches are based on learning MoP approximations of the joint density and the marginal density of the conditioning variables, but they differ as to how the MoP approximation of the quotient of the two densities is found. We illustrate and study the methods using data sampled from known parametric distributions, and we demonstrate their applicability by learning models based on real neuroscience data. Finally, we compare the performance of the proposed methods with an approach for learning mixtures of truncated basis functions (MoTBFs). The empirical results show that the proposed methods generally yield models that are comparable to or significantly better than those found using the MoTBF-based method.
Resumo:
O objetivo dessa pesquisa foi avaliar aspectos genéticos que relacionados à produção in vitro de embriões na raça Guzerá. O primeiro estudo focou na estimação de (co) variâncias genéticas e fenotípicas em características relacionadas a produção de embriões e na detecção de possível associação com a idade ao primeiro parto (AFC). Foi detectada baixa e média herdabilidade para características relacionadas à produção de oócitos e embriões. Houve fraca associação genética entre características ligadas a reprodução artificial e a idade ao primeiro parto. O segundo estudo avaliou tendências genéticas e de endogamia em uma população Guzerá no Brasil. Doadoras e embriões produzidos in vitro foram considerados como duas subpopulações de forma a realizar comparações acerca das diferenças de variação anual genética e do coeficiente de endogamia. A tendência anual do coeficiente de endogamia (F) foi superior para a população geral, sendo detectado efeito quadrático. No entanto, a média de F para a sub- população de embriões foi maior do que na população geral e das doadoras. Foi observado ganho genético anual superior para a idade ao primeiro parto e para a produção de leite (305 dias) entre embriões produzidos in vitro do que entre doadoras ou entre a população geral. O terceiro estudo examinou os efeitos do coeficiente de endogamia da doadora, do reprodutor (usado na fertilização in vitro) e dos embriões sobre resultados de produção in vitro de embriões na raça Guzerá. Foi detectado efeito da endogamia da doadora e dos embriões sobre as características estudadas. O quarto (e último) estudo foi elaborado para comparar a adequação de modelos mistos lineares e generalizados sob método de Máxima Verossimilhança Restrita (REML) e sua adequação a variáveis discretas. Quatro modelos hierárquicos assumindo diferentes distribuições para dados de contagem encontrados no banco. Inferência foi realizada com base em diagnósticos de resíduo e comparação de razões entre componentes de variância para os modelos em cada variável. Modelos Poisson superaram tanto o modelo linear (com e sem transformação da variável) quanto binomial negativo à qualidade do ajuste e capacidade preditiva, apesar de claras diferenças observadas na distribuição das variáveis. Entre os modelos testados, a pior qualidade de ajuste foi obtida para o modelo linear mediante transformação logarítmica (Log10 X +1) da variável resposta.
Resumo:
O objetivo do presente trabalho é a investigação e o desenvolvimento de estratégias de otimização contínua e discreta para problemas de Fluxo de Potência Ótimo (FPO), onde existe a necessidade de se considerar as variáveis de controle associadas aos taps de transformadores em-fase e chaveamentos de bancos de capacitores e reatores shunt como variáveis discretas e existe a necessidade da limitação, e/ou até mesmo a minimização do número de ações de controle. Neste trabalho, o problema de FPO será abordado por meio de três estratégias. Na primeira proposta, o problema de FPO é modelado como um problema de Programação Não Linear com Variáveis Contínuas e Discretas (PNLCD) para a minimização de perdas ativas na transmissão; são propostas três abordagens utilizando funções de discretização para o tratamento das variáveis discretas. Na segunda proposta, considera-se que o problema de FPO, com os taps de transformadores discretos e bancos de capacitores e reatores shunts fixos, possui uma limitação no número de ações de controles; variáveis binárias associadas ao número de ações de controles são tratadas por uma função quadrática. Na terceira proposta, o problema de FPO é modelado como um problema de Otimização Multiobjetivo. O método da soma ponderada e o método ε-restrito são utilizados para modificar os problemas multiobjetivos propostos em problemas mono-objetivos. As variáveis binárias associadas às ações de controles são tratadas por duas funções, uma sigmoidal e uma polinomial. Para verificar a eficácia e a robustez dos modelos e algoritmos desenvolvidos serão realizados testes com os sistemas elétricos IEEE de 14, 30, 57, 118 e 300 barras. Todos os algoritmos e modelos foram implementados em General Algebraic Modeling System (GAMS) e os solvers CONOPT, IPOPT, KNITRO e DICOPT foram utilizados na resolução dos problemas. Os resultados obtidos confirmam que as estratégias de discretização são eficientes e as propostas de modelagem para variáveis binárias permitem encontrar soluções factíveis para os problemas envolvendo as ações de controles enquanto os solvers DICOPT e KNITRO utilizados para modelar variáveis binárias não encontram soluções.
Resumo:
We present a derivative-free optimization algorithm coupled with a chemical process simulator for the optimal design of individual and complex distillation processes using a rigorous tray-by-tray model. The proposed approach serves as an alternative tool to the various models based on nonlinear programming (NLP) or mixed-integer nonlinear programming (MINLP) . This is accomplished by combining the advantages of using a commercial process simulator (Aspen Hysys), including especially suited numerical methods developed for the convergence of distillation columns, with the benefits of the particle swarm optimization (PSO) metaheuristic algorithm, which does not require gradient information and has the ability to escape from local optima. Our method inherits the superstructure developed in Yeomans, H.; Grossmann, I. E.Optimal design of complex distillation columns using rigorous tray-by-tray disjunctive programming models. Ind. Eng. Chem. Res.2000, 39 (11), 4326–4335, in which the nonexisting trays are considered as simple bypasses of liquid and vapor flows. The implemented tool provides the optimal configuration of distillation column systems, which includes continuous and discrete variables, through the minimization of the total annual cost (TAC). The robustness and flexibility of the method is proven through the successful design and synthesis of three distillation systems of increasing complexity.
Resumo:
nlcheck is a simple diagnostic tool that can be used after fitting a model to quickly check the linearity assumption for a given predictor. nlcheck categorizes the predictor into bins, refits the model including dummy variables for the bins, and then performs a joint Wald test for the added parameters. Alternative, nlcheck uses linear splines for the adaptive model. Support for discrete variables is also provided. Optionally, nlcheck also displays a graph of the adjusted linear predictions from the original model and the adaptive model
Resumo:
Cluster analysis via a finite mixture model approach is considered. With this approach to clustering, the data can be partitioned into a specified number of clusters g by first fitting a mixture model with g components. An outright clustering of the data is then obtained by assigning an observation to the component to which it has the highest estimated posterior probability of belonging; that is, the ith cluster consists of those observations assigned to the ith component (i = 1,..., g). The focus is on the use of mixtures of normal components for the cluster analysis of data that can be regarded as being continuous. But attention is also given to the case of mixed data, where the observations consist of both continuous and discrete variables.
Resumo:
Abstract The ultimate problem considered in this thesis is modeling a high-dimensional joint distribution over a set of discrete variables. For this purpose, we consider classes of context-specific graphical models and the main emphasis is on learning the structure of such models from data. Traditional graphical models compactly represent a joint distribution through a factorization justi ed by statements of conditional independence which are encoded by a graph structure. Context-speci c independence is a natural generalization of conditional independence that only holds in a certain context, speci ed by the conditioning variables. We introduce context-speci c generalizations of both Bayesian networks and Markov networks by including statements of context-specific independence which can be encoded as a part of the model structures. For the purpose of learning context-speci c model structures from data, we derive score functions, based on results from Bayesian statistics, by which the plausibility of a structure is assessed. To identify high-scoring structures, we construct stochastic and deterministic search algorithms designed to exploit the structural decomposition of our score functions. Numerical experiments on synthetic and real-world data show that the increased exibility of context-specific structures can more accurately emulate the dependence structure among the variables and thereby improve the predictive accuracy of the models.
Resumo:
Introduzione Nel 2014 è stato proposto un protocollo di studio riguardante la sorveglianza delle lesioni cistiche del pancreas (intese come IPMN) e denominato PACYFIC Study. Obiettivi Obiettivo primario era di stabilire l’impatto di un programma di sorveglianza in termini di pazienti arruolati e pazienti con indicazione chirurgica. Obiettivi secondari erano: 1) stabilire l’impatto dei fattori demografici, clinici, radiologici e della strategia di sorveglianza sull’indicazione chirurgica, sulla capacità individuare le lesioni maligne, sulla sopravvivenza. Materiali e Metodi Lo studio su cui si è basata la raccolta dei dati è uno studio di tipo prospettico, di coorte, multicentrico, internazionale. Lo studio ha incluso gli individui con una IPMN, di nuova o pregressa diagnosi, che giustifichi una sorveglianza o il trattamento chirurgico. I dati clinici, demografici, radiologici e chirurgici sono stati raccolti in un database prospettico. Le variabili discrete sono state espresse come frequenza e percentuale. Le continue come medie e deviazioni standard o mediane e range interquartile (IQR). Per l’analisi statistica sono stati utilizzati il test di Fischer, il test del Chi quadro, il test di Spearman, il test di Student. L’analisi multivariata è stata eseguita utilizzando la regressione logistica espressa come Odds Ratio e intervallo di confidenza al 95 %. Per la sopravvivenza è stato utilizzato il metodo di Kaplan-Meier. L’analisi multivariata sulle sopravvivenze è stata eseguita mediante la regressione di Cox. Risultati Il protocollo di sorveglianza ha permesso l'arruolamento di 516 pazienti. 53 pazienti hanno raggiunto l'indicazione chirurgica. La sopravvivenza globale della coorte è stata di 326.8± 9.1 mesi. I fattori predittivi la sopravvivenza sono risultati età (OR 1.07, P-value<0.001), sesso (OR 1.82, P-value=0.006), ittero, noduli murali (OR 4.84, P-value=0.018 e OR 2.19, P-value=0.016), chirurgia (OR 0.46, P-value 0.038). Conclusioni L'introduzione del protocollo di sorveglianza ha portato ad un aumento di identificazione di lesioni e ha avuto impatto sulla sopravvivenza
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Integrated choice and latent variable (ICLV) models represent a promising new class of models which merge classic choice models with the structural equation approach (SEM) for latent variables. Despite their conceptual appeal, applications of ICLV models in marketing remain rare. We extend previous ICLV applications by first estimating a multinomial choice model and, second, by estimating hierarchical relations between latent variables. An empirical study on travel mode choice clearly demonstrates the value of ICLV models to enhance the understanding of choice processes. In addition to the usually studied directly observable variables such as travel time, we show how abstract motivations such as power and hedonism as well as attitudes such as a desire for flexibility impact on travel mode choice. Furthermore, we show that it is possible to estimate such a complex ICLV model with the widely available structural equation modeling package Mplus. This finding is likely to encourage more widespread application of this appealing model class in the marketing field.
Resumo:
The dynamical discrete web (DyDW), introduced in the recent work of Howitt and Warren, is a system of coalescing simple symmetric one-dimensional random walks which evolve in an extra continuous dynamical time parameter tau. The evolution is by independent updating of the underlying Bernoulli variables indexed by discrete space-time that define the discrete web at any fixed tau. In this paper, we study the existence of exceptional (random) values of tau where the paths of the web do not behave like usual random walks and the Hausdorff dimension of the set of such exceptional tau. Our results are motivated by those about exceptional times for dynamical percolation in high dimension by Haggstrom, Peres and Steif, and in dimension two by Schramm and Steif. The exceptional behavior of the walks in the DyDW is rather different from the situation for the dynamical random walks of Benjamini, Haggstrom, Peres and Steif. For example, we prove that the walk from the origin S(0)(tau) violates the law of the iterated logarithm (LIL) on a set of tau of Hausdorff dimension one. We also discuss how these and other results should extend to the dynamical Brownian web, the natural scaling limit of the DyDW. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
In this paper we consider the existence of the maximal and mean square stabilizing solutions for a set of generalized coupled algebraic Riccati equations (GCARE for short) associated to the infinite-horizon stochastic optimal control problem of discrete-time Markov jump with multiplicative noise linear systems. The weighting matrices of the state and control for the quadratic part are allowed to be indefinite. We present a sufficient condition, based only on some positive semi-definite and kernel restrictions on some matrices, under which there exists the maximal solution and a necessary and sufficient condition under which there exists the mean square stabilizing solution fir the GCARE. We also present a solution for the discounted and long run average cost problems when the performance criterion is assumed be composed by a linear combination of an indefinite quadratic part and a linear part in the state and control variables. The paper is concluded with a numerical example for pension fund with regime switching.
Resumo:
A discrete protocol for teleportation of superpositions of coherent states of optical-cavity fields is presented. Displacement and parity operators are unconventionally used in Bell-like measurement for field states.
Resumo:
Summary Forests are key ecosystems of the earth and associated with a large range of functions. Many of these functions are beneficial to humans and are referred to as ecosystem services. Sustainable development requires that all relevant ecosystem services are quantified, managed and monitored equally. Natural resource management therefore targets the services associated with ecosystems. The main hypothesis of this thesis is that the spatial and temporal domains of relevant services do not correspond to a discrete forest ecosystem. As a consequence, the services are not quantified, managed and monitored in an equal and sustainable manner. The thesis aims were therefore to test this hypothesis, establish an improved conceptual approach and provide spatial applications for the relevant land cover and structure variables. The study was carried out in western Switzerland and based primarily on data from a countrywide landscape inventory. This inventory is part of the third Swiss national forest inventory and assesses continuous landscape variables based on a regular sampling of true colour aerial imagery. In addition, land cover variables were derived from Landsat 5 TM passive sensor data and land structure variables from active sensor data from a small footprint laserscanning system. The results confirmed the main hypothesis, as relevant services did not scale well with the forest ecosystem. Instead, a new conceptual approach for sustainable management of natural resources was described. This concept quantifies the services as a continuous function of the landscape, rather than a discrete function of the forest ecosystem. The explanatory landscape variables are therefore called continuous fields and the forest becomes a dependent and function-driven management unit. Continuous field mapping methods were established for land cover and structure variables. In conclusion, the discrete forest ecosystem is an adequate planning and management unit. However, monitoring the state of and trends in sustainability of services requires them to be quantified as a continuous function of the landscape. Sustainable natural resource management iteratively combines the ecosystem and gradient approaches. Résumé Les forêts sont des écosystèmes-clés de la terre et on leur attribue un grand nombre de fonctions. Beaucoup de ces fonctions sont bénéfiques pour l'homme et sont nommées services écosystémiques. Le développement durable exige que ces services écosystémiques soient tous quantifiés, gérés et surveillés de façon égale. La gestion des ressources naturelles a donc pour cible les services attribués aux écosystèmes. L'hypothèse principale de cette thèse est que les domaines spatiaux et temporels des services attribués à la forêt ne correspondent pas à un écosystème discret. Par conséquent, les services ne sont pas quantifiés, aménagés et surveillés d'une manière équivalente et durable. Les buts de la thèse étaient de tester cette hypothèse, d'établir une nouvelle approche conceptuelle de la gestion des ressources naturelles et de préparer des applications spatiales pour les variables paysagères et structurelles appropriées. L'étude a été menée en Suisse occidentale principalement sur la base d'un inventaire de paysage à l'échelon national. Cet inventaire fait partie du troisième inventaire forestier national suisse et mesure de façon continue des variables paysagères sur la base d'un échantillonnage régulier sur des photos aériennes couleur. En outre, des variables de couverture ? terrestre ont été dérivées des données d'un senseur passif Landsat 5 TM, ainsi que des variables structurelles, dérivées du laserscanning, un senseur actif. Les résultats confirment l'hypothèse principale, car l'échelle des services ne correspond pas à celle de l'écosystème forestier. Au lieu de cela, une nouvelle approche a été élaborée pour la gestion durable des ressources naturelles. Ce concept représente les services comme une fonction continue du paysage, plutôt qu'une fonction discrète de l'écosystème forestier. En conséquence, les variables explicatives de paysage sont dénommées continuous fields et la forêt devient une entité dépendante, définie par la fonction principale du paysage. Des méthodes correspondantes pour la couverture terrestre et la structure ont été élaborées. En conclusion, l'écosystème forestier discret est une unité adéquate pour la planification et la gestion. En revanche, la surveillance de la durabilité de l'état et de son évolution exige que les services soient quantifiés comme fonction continue du paysage. La gestion durable des ressources naturelles joint donc l'approche écosystémique avec celle du gradient de manière itérative.