925 resultados para Hierarchical logistic model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

Logistic models are studied as a tool to convert dynamical forecast information (deterministic and ensemble) into probability forecasts. A logistic model is obtained by setting the logarithmic odds ratio equal to a linear combination of the inputs. As with any statistical model, logistic models will suffer from overfitting if the number of inputs is comparable to the number of forecast instances. Computational approaches to avoid overfitting by regularization are discussed, and efficient techniques for model assessment and selection are presented. A logit version of the lasso (originally a linear regression technique), is discussed. In lasso models, less important inputs are identified and the corresponding coefficient is set to zero, providing an efficient and automatic model reduction procedure. For the same reason, lasso models are particularly appealing for diagnostic purposes.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The purpose of this article is to present a new method to predict the response variable of an observation in a new cluster for a multilevel logistic regression. The central idea is based on the empirical best estimator for the random effect. Two estimation methods for multilevel model are compared: penalized quasi-likelihood and Gauss-Hermite quadrature. The performance measures for the prediction of the probability for a new cluster observation of the multilevel logistic model in comparison with the usual logistic model are examined through simulations and an application.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work develops a new methodology in order to discriminate models for interval-censored data based on bootstrap residual simulation by observing the deviance difference from one model in relation to another, according to Hinde (1992). Generally, this sort of data can generate a large number of tied observations and, in this case, survival time can be regarded as discrete. Therefore, the Cox proportional hazards model for grouped data (Prentice & Gloeckler, 1978) and the logistic model (Lawless, 1982) can befitted by means of generalized linear models. Whitehead (1989) considered censoring to be an indicative variable with a binomial distribution and fitted the Cox proportional hazards model using complementary log-log as a link function. In addition, a logistic model can be fitted using logit as a link function. The proposed methodology arises as an alternative to the score tests developed by Colosimo et al. (2000), where such models can be obtained for discrete binary data as particular cases from the Aranda-Ordaz distribution asymmetric family. These tests are thus developed with a basis on link functions to generate such a fit. The example that motivates this study was the dataset from an experiment carried out on a flax cultivar planted on four substrata susceptible to the pathogen Fusarium oxysoprum. The response variable, which is the time until blighting, was observed in intervals during 52 days. The results were compared with the model fit and the AIC values.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Analyses of ecological data should account for the uncertainty in the process(es) that generated the data. However, accounting for these uncertainties is a difficult task, since ecology is known for its complexity. Measurement and/or process errors are often the only sources of uncertainty modeled when addressing complex ecological problems, yet analyses should also account for uncertainty in sampling design, in model specification, in parameters governing the specified model, and in initial and boundary conditions. Only then can we be confident in the scientific inferences and forecasts made from an analysis. Probability and statistics provide a framework that accounts for multiple sources of uncertainty. Given the complexities of ecological studies, the hierarchical statistical model is an invaluable tool. This approach is not new in ecology, and there are many examples (both Bayesian and non-Bayesian) in the literature illustrating the benefits of this approach. In this article, we provide a baseline for concepts, notation, and methods, from which discussion on hierarchical statistical modeling in ecology can proceed. We have also planted some seeds for discussion and tried to show where the practical difficulties lie. Our thesis is that hierarchical statistical modeling is a powerful way of approaching ecological analysis in the presence of inevitable but quantifiable uncertainties, even if practical issues sometimes require pragmatic compromises.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The motivating problem concerns the estimation of the growth curve of solitary corals that follow the nonlinear Von Bertalanffy Growth Function (VBGF). The most common parameterization of the VBGF for corals is based on two parameters: the ultimate length L∞ and the growth rate k. One aim was to find a more reliable method for estimating these parameters, which can capture the influence of environmental covariates. The main issue with current methods is that they force the linearization of VBGF and neglect intra-individual variability. The idea was to use the hierarchical nonlinear model which has the appealing features of taking into account the influence of collection sites, possible intra-site measurement correlation and variance heterogeneity, and that can handle the influence of environmental factors and all the reliable information that might influence coral growth. This method was used on two databases of different solitary corals i.e. Balanophyllia europaea and Leptopsammia pruvoti, collected in six different sites in different environmental conditions, which introduced a decisive improvement in the results. Nevertheless, the theory of the energy balance in growth ascertains the linear correlation of the two parameters and the independence of the ultimate length L∞ from the influence of environmental covariates, so a further aim of the thesis was to propose a new parameterization based on the ultimate length and parameter c which explicitly describes the part of growth ascribable to site-specific conditions such as environmental factors. We explored the possibility of estimating these parameters characterizing the VBGF new parameterization via the nonlinear hierarchical model. Again there was a general improvement with respect to traditional methods. The results of the two parameterizations were similar, although a very slight improvement was observed in the new one. This is, nevertheless, more suitable from a theoretical point of view when considering environmental covariates.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The ordinal logistic regression models are used to analyze the dependant variable with multiple outcomes that can be ranked, but have been underutilized. In this study, we describe four logistic regression models for analyzing the ordinal response variable. ^ In this methodological study, the four regression models are proposed. The first model uses the multinomial logistic model. The second is adjacent-category logit model. The third is the proportional odds model and the fourth model is the continuation-ratio model. We illustrate and compare the fit of these models using data from the survey designed by the University of Texas, School of Public Health research project PCCaSO (Promoting Colon Cancer Screening in people 50 and Over), to study the patient’s confidence in the completion colorectal cancer screening (CRCS). ^ The purpose of this study is two fold: first, to provide a synthesized review of models for analyzing data with ordinal response, and second, to evaluate their usefulness in epidemiological research, with particular emphasis on model formulation, interpretation of model coefficients, and their implications. Four ordinal logistic models that are used in this study include (1) Multinomial logistic model, (2) Adjacent-category logistic model [9], (3) Continuation-ratio logistic model [10], (4) Proportional logistic model [11]. We recommend that the analyst performs (1) goodness-of-fit tests, (2) sensitivity analysis by fitting and comparing different models.^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The history of the logistic function since its introduction in 1838 is reviewed, and the logistic model for a polychotomous response variable is presented with a discussion of the assumptions involved in its derivation and use. Following this, the maximum likelihood estimators for the model parameters are derived along with a Newton-Raphson iterative procedure for evaluation. A rigorous mathematical derivation of the limiting distribution of the maximum likelihood estimators is then presented using a characteristic function approach. An appendix with theorems on the asymptotic normality of sample sums when the observations are not identically distributed, with proofs, supports the presentation on asymptotic properties of the maximum likelihood estimators. Finally, two applications of the model are presented using data from the Hypertension Detection and Follow-up Program, a prospective, population-based, randomized trial of treatment for hypertension. The first application compares the risk of five-year mortality from cardiovascular causes with that from noncardiovascular causes; the second application compares risk factors for fatal or nonfatal coronary heart disease with those for fatal or nonfatal stroke. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Los incendios forestales son la principal causa de mortalidad de árboles en la Europa mediterránea y constituyen la amenaza más seria para los ecosistemas forestales españoles. En la Comunidad Valenciana, diariamente se despliega cerca de un centenar de vehículos de vigilancia, cuya distribución se apoya, fundamentalmente, en un índice de riesgo de incendios calculado en función de las condiciones meteorológicas. La tesis se centra en el diseño y validación de un nuevo índice de riesgo integrado de incendios, especialmente adaptado a la región mediterránea y que facilite el proceso de toma de decisiones en la distribución diaria de los medios de vigilancia contra incendios forestales. El índice adopta el enfoque de riesgo integrado introducido en la última década y que incluye dos componentes de riesgo: el peligro de ignición y la vulnerabilidad. El primero representa la probabilidad de que se inicie un fuego y el peligro potencial para que se propague, mientras que la vulnerabilidad tiene en cuenta las características del territorio y los efectos potenciales del fuego sobre el mismo. Para el cálculo del peligro potencial se han identificado indicadores relativos a los agentes naturales y humanos causantes de incendios, la ocurrencia histórica y el estado de los combustibles, extremo muy relacionado con la meteorología y las especies. En cuanto a la vulnerabilidad se han empleado indicadores representativos de los efectos potenciales del incendio (comportamiento del fuego, infraestructuras de defensa), como de las características del terreno (valor, capacidad de regeneración…). Todos estos indicadores constituyen una estructura jerárquica en la que, siguiendo las recomendaciones de la Comisión europea para índices de riesgo de incendios, se han incluido indicadores representativos del riesgo a corto plazo y a largo plazo. El cálculo del valor final del índice se ha llevado a cabo mediante la progresiva agregación de los componentes que forman cada uno de los niveles de la estructura jerárquica del índice y su integración final. Puesto que las técnicas de decisión multicriterio están especialmente orientadas a tratar con problemas basados en estructuras jerárquicas, se ha aplicado el método TOPSIS para obtener la integración final del modelo. Se ha introducido en el modelo la opinión de los expertos, mediante la ponderación de cada uno de los componentes del índice. Se ha utilizado el método AHP, para obtener las ponderaciones de cada experto y su integración en un único peso por cada indicador. Para la validación del índice se han empleado los modelos de Ecuaciones de Estimación Generalizadas, que tienen en cuenta posibles respuestas correlacionadas. Para llevarla a cabo se emplearon los datos de oficiales de incendios ocurridos durante el período 1994 al 2003, referenciados a una cuadrícula de 10x10 km empleando la ocurrencia de incendios y su superficie, como variables dependientes. Los resultados de la validación muestran un buen funcionamiento del subíndice de peligro de ocurrencia con un alto grado de correlación entre el subíndice y la ocurrencia, un buen ajuste del modelo logístico y un buen poder discriminante. Por su parte, el subíndice de vulnerabilidad no ha presentado una correlación significativa entre sus valores y la superficie de los incendios, lo que no descarta su validez, ya que algunos de sus componentes tienen un carácter subjetivo, independiente de la superficie incendiada. En general el índice presenta un buen funcionamiento para la distribución de los medios de vigilancia en función del peligro de inicio. No obstante, se identifican y discuten nuevas líneas de investigación que podrían conducir a una mejora del ajuste global del índice. En concreto se plantea la necesidad de estudiar más profundamente la aparente correlación que existe en la provincia de Valencia entre la superficie forestal que ocupa cada cuadrícula de 10 km del territorio y su riesgo de incendios y que parece que a menor superficie forestal, mayor riesgo de incendio. Otros aspectos a investigar son la sensibilidad de los pesos de cada componente o la introducción de factores relativos a los medios potenciales de extinción en el subíndice de vulnerabilidad. Summary Forest fires are the main cause of tree mortality in Mediterranean Europe and the most serious threat to the Spanisf forest. In the Spanish autonomous region of Valencia, forest administration deploys a mobile fleet of 100 surveillance vehicles in forest land whose allocation is based on meteorological index of wildlandfire risk. This thesis is focused on the design and validation of a new Integrated Wildland Fire Risk Index proposed to efficient allocation of vehicles and specially adapted to the Mediterranean conditions. Following the approaches of integrated risk developed last decade, the index includes two risk components: Wildland Fire Danger and Vulnerability. The former represents the probability a fire ignites and the potential hazard of fire propagation or spread danger, while vulnerability accounts for characteristics of the land and potential effects of fire. To calculate the Wildland Fire Danger, indicators of ignition and spread danger have been identified, including human and natural occurrence agents, fuel conditions, historical occurrence and spread rate. Regarding vulnerability se han empleado indicadores representativos de los efectos potenciales del incendio (comportamiento del fuego, infraestructurasd de defensa), como de las características del terreno (valor, capacidad de regeneración…). These indicators make up the hierarchical structure for the index, which, following the criteria of the European Commission both short and long-term indicators have been included. Integration consists of the progressive aggregation of the components that make up every level in risk the index and, after that, the integration of these levels to obtain a unique value for the index. As Munticriteria methods are oriented to deal with hierarchically structured problems and with situations in which conflicting goals prevail, TOPSIS method is used in the integration of components. Multicriteria methods were also used to incorporate expert opinion in weighting of indicators and to carry out the aggregation process into the final index. The Analytic Hierarchy Process method was used to aggregate experts' opinions on each component into a single value. Generalized Estimation Equations, which account for possible correlated responses, were used to validate the index. Historical records of daily occurrence for the period from 1994 to 2003, referred to a 10x10-km-grid cell, as well as the extent of the fires were the dependant variables. The results of validation showed good Wildland Fire Danger component performance, with high correlation degree between Danger and occurrence, a good fit of the logistic model used and a good discrimination power. The vulnerability component has not showed a significant correlation between their values and surface fires, which does not mean the index is not valid, because of the subjective character of some of its components, independent of the surface of the fires. Overall, the index could be used to optimize the preventing resources allocation. Nevertheless, new researching lines are identified and discussed to improve the overall performance of the index. More specifically the need of study the inverse relationship between the value of the wildfire Fire Danger component and the forested surface of each 10 - km cell is set out. Other points to be researched are the sensitivity of the index component´s weight and the possibility of taking into account indicators related to fire fighting resources to make up the vulnerability component.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Introdução: Em 2008, o baixo nível de atividade física (< 30 min de atividade moderada/vigorosa por dia) foi responsável por 9 por cento da ocorrência de óbito no mundo. Além disso, está associado ao comprometimento de mobilidade em idosos com 80 anos e mais. No entanto, devido a dificuldades metodológicas, poucos são os estudos populacionais que realizaram a associação entre baixo nível de atividade física e comprometimento de mobilidade e risco para óbito, utilizando método objetivo para avaliação da atividade física, e ainda não se tem conhecimento de pesquisas que verificaram essa associação na América Latina. Objetivo: Identificar a prevalência do baixo nível de atividade física e sua associação com o comprometimento da mobilidade e risco para óbito em idosos com 65 anos e mais residentes no município de São Paulo em 2010. Métodos: Estudo exploratório e quantitativo de base populacional, que utilizou a base de dados do Estudo SABE de 2010 e ocorrência de óbito em 2014. Foram avaliados 599 indivíduos em 2010. O nível de atividade física foi analisado de duas maneiras: 1) baixo nível de atividade física (< 30 minutos de atividade moderada e/ou vigorosa por dia) e alto nível de atividade física (> 30 minutos de atividade moderada e/ou vigorosa por dia); e 2) a amostra foi distribuída em tercis, de acordo com as contagens por minuto, e agrupada em dois grupos, sendo os idosos do mais baixo tercil classificados com baixo nível de atividade física e os idosos dos dois outros tercis como intermediário/alto nível de atividade física. A regressão logística hierárquica foi utilizada para: 1) identificar as variáveis associadas ao baixo nível de atividade física; 2) analisar a associação do baixo nível de atividade física no comprometimento da mobilidade; e 3) estimar o risco para óbito em idosos com baixo nível de atividade física. A curva de sobrevida foi analisada com o método de Kaplan-Meier utilizando o teste de log-rank e o risco proporcional foi calculado pelo modelo de risco proporcional de Cox. Resultados: A prevalência de baixo nível de atividade física em idosos foi de 85,4 por cento e as variáveis associadas, após ajuste, foram sexo (feminino), grupo etário (>75 anos), multimorbidade (> 2 doenças crônicas), dor crônica (dor crônica nos últimos 3 meses) e índice de massa corporal (maior valor médio). O baixo nível de atividade física permaneceu significativamente associado ao comprometimento de mobilidade (OR= 3,49; IC95 por cento = 2,00 6,13) e ao risco para (RP= 2,79; IC95 por cento = 1,71 4,57), mesmo após ajuste das variáveis sóciodemográficas e clínicas. Conclusão: A prevalência do baixo nível de atividade física em pessoas idosas residentes no Município de São Paulo é superior aos encontrados na população brasileira, mas se aproxima de outras populações que utilizaram o mesmo método de avaliação da atividade física. O baixo nível de atividade física (< 30 min de atividades moderadas/vigorosas) foi associado com variáveis sociodemográficas (sexo feminino e grupo etário) e clínicas (multimorbidade, dor crônica e índice de massa corporal). O baixo nível de atividade física (menor tercil de contagens por minuto) foi associado ao comprometimento de mobilidade e risco para óbito em quatro anos. Dessa forma, o baixo nível de atividade física pode ser utilizado como uma forma adequada para identificar idosos com maiores chances de apresentar comprometimento da mobilidade e aumento do risco para óbito.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The cell concentration and size distribution of the microalgae Nannochloropsis gaditana were studied over the whole growth process. Various samples were taken during the light and dark periods the algae were exposed to. The distributions obtained exhibited positive skew, and no change in the type of distribution was observed during the growth process. The size distribution shifted to lower diameters in dark periods while in light periods the opposite occurred. The overall trend during the growth process was one where the size distribution shifted to larger cell diameters, with differences between initial and final distributions of individual cycles becoming smaller. A model based on the Logistic model for cell concentration as a function of time in the dark period that also takes into account cell respiration and growth processes during dark and light periods, respectively, was proposed and successfully applied. This model provides a picture that is closer to the real growth and evolution of cultures, and reveals a clear effect of light and dark periods on the different ways in which cell concentration and diameter evolve with time.