920 resultados para predictive regression model


Relevância:

90.00% 90.00%

Publicador:

Resumo:

It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^

Relevância:

90.00% 90.00%

Publicador:

Resumo:

El desarrollo de sistemas agrícolas sustentables es un desafío en el contexto de políticas e incentivos tendientes a la conservación de los recursos naturales, especialmente en zonas de secano. El presente estudio examina variables demográficas y productivas que influyen en la adopción de tecnologías de conservación de suelos en 90 pequeños productores del secano interior de Chile Central, en las comunas de Pencahue y Curepto. Se utilizó un modelo de regresión Probit, el cual asocia la adopción de las tecnologías con las variables: edad del agricultor, tamaño familiar, superficie predial y forma de tenencia de la tierra; presencia de: plantaciones forestales, invernaderos, aboneras, animales mayores en el predio; experiencia en comercialización del productor y participación en actividades de capacitación. El modelo seleccionado tiene un alto poder de predicción, llegando a clasificar correctamente un 92,2% de las observaciones. Los resultados econométricos muestran que la participación en actividades de extensión, la superficie predial, la presencia de plantaciones forestales y el uso de aboneras, influyen de manera positiva y significativa sobre la adopción de tecnologías conservacionistas. Resulta relevante el impacto de la capacitación sobre la adopción de tecnologías de alto grado de inversión, así como la incorporación de prácticas de conservación de bajo nivel de inversión como las aboneras.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Interannual environmental variability in Peru is dominated by the El Niño Southern Oscillation (ENSO). The most dramatic changes are associated with the warm El Niño (EN) phase (opposite the cold La Niña phase), which disrupts the normal coastal upwelling and affects the dynamics of many coastal marine and terrestrial resources. This study presents a trophic model for Sechura Bay, located at the northern extension of the Peruvian upwelling system, where ENSO-induced environmental variability is most extreme. Using an initial steady-state model for the year 1996, we explore the dynamics of the ecosystem through the year 2003 (including the strong EN of 1997/98 and the weaker EN of 2002/03). Based on support from literature, we force biomass of several non-trophically-mediated 'drivers' (e.g. Scallops, Benthic detritivores, Octopus, and Littoral fish) to observe whether the fit between historical and simulated changes (by the trophic model) is improved. The results indicate that the Sechura Bay Ecosystem is a relatively inefficient system from a community energetics point of view, likely due to the periodic perturbations of ENSO. A combination of high system productivity and low trophic level target species of invertebrates (i.e. scallops) and fish (i.e. anchoveta) results in high catches and an efficient fishery. The importance of environmental drivers is suggested, given the relatively small improvements in the fit of the simulation with the addition of trophic drivers on remaining functional groups' dynamics. An additional multivariate regression model is presented for the scallop Argopecten purpuratus, which demonstrates a significant correlation between both spawning stock size and riverine discharge-mediated mortality on catch levels. These results are discussed in the context of the appropriateness of trophodynamic modeling in relatively open systems, and how management strategies may be focused given the highly environmentally influenced marine resources of the region.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The expected changes on rainfall in the next decades may cause significant changes of the hydroperiod of temporary wetlands and, consequently, shifts on plant community distributions. Predicting plant community responses to changes in the hydroperiod is a key issue for conservation and management of temporary wetlands. We present a predictive distribution model for Arthrocnemum macrostachyum communities in the Doñana wetland (Southern Spain). Logistic regression was used to fit the model using the number of days of inundation and the mean water height as predictors. The internal validation of the model yielded good performance measures. The model was applied to a set of expected scenarios of changes in the hydroperiod to anticipate the most likely shifts in the distribution of Arthrocnemum macrostachyum communities.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A research has been carried out in two-lanehighways in the Madrid Region to propose an alternativemodel for the speed-flowrelationship using regular loop data. The model is different in shape and, in some cases, slopes with respect to the contents of Highway Capacity Manual (HCM). A model is proposed for a mountainous area road, something for which the HCM does not provide explicitly a solution. The problem of a mountain road with high flows to access a popular recreational area is discussed, and some solutions are proposed. Up to 7 one-way sections of two-lanehighways have been selected, aiming at covering a significant number of different characteristics, to verify the proposed method the different classes of highways on which the Manual classifies them. In order to enunciate the model and to verify the basic variables of these types of roads a high number of data have been used. The counts were collected in the same way that the Madrid Region Highway Agency performs their counts. A total of 1.471 hours have been collected, in periods of 5 minutes. The models have been verified by means of specific statistical test (R2, T-Student, Durbin-Watson, ANOVA, etc.) and with the diagnostics of the contrast of assumptions (normality, linearity, homoscedasticity and independence). The model proposed for this type of highways with base conditions, can explain the different behaviors as traffic volumes increase, and follows a polynomial multiple regression model of order 3, S shaped. As secondary results of this research, the levels of service and the capacities of this road have been measured with the 2000 HCM methodology, and the results discussed. © 2011 Published by Elsevier Ltd.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

RESUMEN El apoyo a la selección de especies a la restauración de la vegetación en España en los últimos 40 años se ha basado fundamentalmente en modelos de distribución de especies, también llamados modelos de nicho ecológico, que estiman la probabilidad de presencia de las especies en función de las condiciones del medio físico (clima, suelo, etc.). Con esta tesis se ha intentado contribuir a la mejora de la capacidad predictiva de los modelos introduciendo algunas propuestas metodológicas adaptadas a los datos disponibles actualmente en España y enfocadas al uso de los modelos en la selección de especies. No siempre se dispone de datos a una resolución espacial adecuada para la escala de los proyectos de restauración de la vegetación. Sin embrago es habitual contar con datos de baja resolución espacial para casi todas las especies vegetales presentes en España. Se propone un método de recalibración que actualiza un modelo de regresión logística de baja resolución espacial con una nueva muestra de alta resolución espacial. El método permite obtener predicciones de calidad aceptable con muestras relativamente pequeñas (25 presencias de la especie) frente a las muestras mucho mayores (más de 100 presencias) que requería una estrategia de modelización convencional que no usara el modelo previo. La selección del método estadístico puede influir decisivamente en la capacidad predictiva de los modelos y por esa razón la comparación de métodos ha recibido mucha atención en la última década. Los estudios previos consideraban a la regresión logística como un método inferior a técnicas más modernas como las de máxima entropía. Los resultados de la tesis demuestran que esa diferencia observada se debe a que los modelos de máxima entropía incluyen técnicas de regularización y la versión de la regresión logística usada en las comparaciones no. Una vez incorporada la regularización a la regresión logística usando penalización, las diferencias en cuanto a capacidad predictiva desaparecen. La regresión logística penalizada es, por tanto, una alternativa más para el ajuste de modelos de distribución de especies y está a la altura de los métodos modernos con mejor capacidad predictiva como los de máxima entropía. A menudo, los modelos de distribución de especies no incluyen variables relativas al suelo debido a que no es habitual que se disponga de mediciones directas de sus propiedades físicas o químicas. La incorporación de datos de baja resolución espacial proveniente de mapas de suelo nacionales o continentales podría ser una alternativa. Los resultados de esta tesis sugieren que los modelos de distribución de especies de alta resolución espacial mejoran de forma ligera pero estadísticamente significativa su capacidad predictiva cuando se incorporan variables relativas al suelo procedente de mapas de baja resolución espacial. La validación es una de las etapas fundamentales del desarrollo de cualquier modelo empírico como los modelos de distribución de especies. Lo habitual es validar los modelos evaluando su capacidad predictiva especie a especie, es decir, comparando en un conjunto de localidades la presencia o ausencia observada de la especie con las predicciones del modelo. Este tipo de evaluación no responde a una cuestión clave en la restauración de la vegetación ¿cuales son las n especies más idóneas para el lugar a restaurar? Se ha propuesto un método de evaluación de modelos adaptado a esta cuestión que consiste en estimar la capacidad de un conjunto de modelos para discriminar entre las especies presentes y ausentes de un lugar concreto. El método se ha aplicado con éxito a la validación de 188 modelos de distribución de especies leñosas orientados a la selección de especies para la restauración de la vegetación en España. Las mejoras metodológicas propuestas permiten mejorar la capacidad predictiva de los modelos de distribución de especies aplicados a la selección de especies en la restauración de la vegetación y también permiten ampliar el número de especies para las que se puede contar con un modelo que apoye la toma de decisiones. SUMMARY During the last 40 years, decision support tools for plant species selection in ecological restoration in Spain have been based on species distribution models (also called ecological niche models), that estimate the probability of occurrence of the species as a function of environmental predictors (e.g., climate, soil). In this Thesis some methodological improvements are proposed to contribute to a better predictive performance of such models, given the current data available in Spain and focusing in the application of the models to selection of species for ecological restoration. Fine grained species distribution data are required to train models to be used at the scale of the ecological restoration projects, but this kind of data are not always available for every species. On the other hand, coarse grained data are available for almost every species in Spain. A recalibration method is proposed that updates a coarse grained logistic regression model using a new fine grained updating sample. The method allows obtaining acceptable predictive performance with reasonably small updating sample (25 occurrences of the species), in contrast with the much larger samples (more than 100 occurrences) required for a conventional modeling approach that discards the coarse grained data. The choice of the statistical method may have a dramatic effect on model performance, therefore comparisons of methods have received much interest in the last decade. Previous studies have shown a poorer performance of the logistic regression compared to novel methods like maximum entropy models. The results of this Thesis show that the observed difference is caused by the fact that maximum entropy models include regularization techniques and the versions of logistic regression compared do not. Once regularization has been added to the logistic regression using a penalization procedure, the differences in model performance disappear. Therefore, penalized logistic regression may be considered one of the best performing methods to model species distributions. Usually, species distribution models do not consider soil related predictors because direct measurements of the chemical or physical properties are often lacking. The inclusion of coarse grained soil data from national or continental soil maps could be a reasonable alternative. The results of this Thesis suggest that the performance of the models slightly increase after including soil predictors form coarse grained soil maps. Model validation is a key stage of the development of empirical models, such as species distribution models. The usual way of validating is based on the evaluation of model performance for each species separately, i.e., comparing observed species presences or absence to predicted probabilities in a set of sites. This kind of evaluation is not informative for a common question in ecological restoration projects: which n species are the most suitable for the environment of the site to be restored? A method has been proposed to address this question that estimates the ability of a set of models to discriminate among present and absent species in a evaluation site. The method has been successfully applied to the validation of 188 species distribution models used to support decisions on species selection for ecological restoration in Spain. The proposed methodological approaches improve the predictive performance of the predictive models applied to species selection in ecological restoration and increase the number of species for which a model that supports decisions can be fitted.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This paper analyses the relationship between productive efficiency and online-social-networks (OSN) in Spanish telecommunications firms. A data-envelopment-analysis (DEA) is used and several indicators of business ?social Media? activities are incorporated. A super-efficiency analysis and bootstrapping techniques are performed to increase the model?s robustness and accuracy. Then, a logistic regression model is applied to characterise factors and drivers of good performance in OSN. Results reveal the company?s ability to absorb and utilise OSNs as a key factor in improving the productive efficiency. This paper presents a model for assessing the strategic performance of the presence and activity in OSN.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Road accidents are a very relevant issue in many countries and macroeconomic models are very frequently applied by academia and administrations to reduce their frequency and consequences. The selection of explanatory variables and response transformation parameter within the Bayesian framework for the selection of the set of explanatory variables a TIM and 3IM (two input and three input models) procedures are proposed. The procedure also uses the DIC and pseudo -R2 goodness of fit criteria. The model to which the methodology is applied is a dynamic regression model with Box-Cox transformation (BCT) for the explanatory variables and autorgressive (AR) structure for the response. The initial set of 22 explanatory variables are identified. The effects of these factors on the fatal accident frequency in Spain, during 2000-2012, are estimated. The dependent variable is constructed considering the stochastic trend component.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

A análise de dados de sobrevivência tem sido tradicionalmente baseada no modelo de regressão de Cox (COX, 1972). No entanto, a suposição de taxas de falha proporcionais assumida para esse modelo pode não ser atendida em diversas situações práticas. Essa restrição do modelo de Cox tem gerado interesse em abordagens alternativas, dentre elas os modelos dinâmicos que permitem efeito das covariáveis variando no tempo. Neste trabalho, foram revisados os principais modelos de sobrevivência dinâmicos com estrutura aditiva e multiplicativa nos contextos não paramétrico e semiparamétrico. Métodos gráficos baseados em resíduos foram apresentados com a finalidade de avaliar a qualidade de ajuste desses modelos. Uma versão tempo-dependente da área sob a curva ROC, denotada por AUC(t), foi proposta com a finalidade de avaliar e comparar a qualidade de predição entre modelos de sobrevivência com estruturas aditiva e multiplicativa. O desempenho da AUC(t) foi avaliado por meio de um estudo de simulação. Dados de três estudos descritos na literatura foram também analisados para ilustrar ou complementar os cenários que foram considerados no estudo de simulação. De modo geral, os resultados obtidos indicaram que os métodos gráficos apresentados para avaliar a adequação dos modelos em conjunto com a AUC(t) se constituem em um conjunto de ferramentas estatísticas úteis para o próposito de avaliar modelos de sobrevivência dinâmicos nos contextos não paramétrico e semiparamétrico. Além disso, a aplicação desse conjunto de ferramentas em alguns conjuntos de dados evidenciou que se, por um lado, os modelos dinâmicos são atrativos por permitirem covariáveis tempo-dependentes, por outro lado podem não ser apropriados para todos os conjuntos de dados, tendo em vista que estimação pode apresentar restrições para alguns deles.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

wgttest performs a test proposed by DuMouchel and Duncan (1983) to evaluate whether the weighted and unweighted estimates of a regression model are significantly different.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The identification of biomarkers capable of providing a reliable molecular diagnostic test for prostate cancer (PCa) is highly desirabie clinically. We describe here 4 biomarkers, UDP-N-Acetyl-alpha-D-galactosamine transferase (GalNAc-T3; not previously associated with PCa), PSMA, Hepsin and DD3/PCA3, which, in combination, distinguish prostate cancer from benign prostate hyperplasia (BPH). GalNAc-T3 was identified as overexpressed in PCa tissues by microarray analysis, confirmed by quantitative real-time PCR and shown immunohistochemically to be localised to prostate epithelial cells with higher expression in malignant cells. Real-time quantitative PCR analysis across 21 PCa and 34 BPH tissues showed 4.6-fold overexpression of GalNAc-T3 (p = 0.005). The noncoding mRNA (DD3/PCA3) was overexpressed 140-fold (p = 0.007) in the cancer samples compared to BPH tissues. Hepsin was overexpressed 21-fold (p = 0.049, whereas the overexpression for PSMA was 66-fold (p = 0.047). When the gene expression data for these 4 biomarkers was combined in a logistic regression model, a predictive index was obtained that distinguished 100% of the PCa samples from all of the BPH samples. Therefore, combining these genes in a real-time PCR assay represents a powerful new approach to diagnosing PCa by molecular profiling. (c) 2005 Wiley-Liss, Inc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Count data with excess zeros relative to a Poisson distribution are common in many biomedical applications. A popular approach to the analysis of such data is to use a zero-inflated Poisson (ZIP) regression model. Often, because of the hierarchical Study design or the data collection procedure, zero-inflation and lack of independence may occur simultaneously, which tender the standard ZIP model inadequate. To account for the preponderance of zero counts and the inherent correlation of observations, a class of multi-level ZIP regression model with random effects is presented. Model fitting is facilitated using an expectation-maximization algorithm, whereas variance components are estimated via residual maximum likelihood estimating equations. A score test for zero-inflation is also presented. The multi-level ZIP model is then generalized to cope with a more complex correlation structure. Application to the analysis of correlated count data from a longitudinal infant feeding study illustrates the usefulness of the approach.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Pharmacodynamics (PD) is the study of the biochemical and physiological effects of drugs. The construction of optimal designs for dose-ranging trials with multiple periods is considered in this paper, where the outcome of the trial (the effect of the drug) is considered to be a binary response: the success or failure of a drug to bring about a particular change in the subject after a given amount of time. The carryover effect of each dose from one period to the next is assumed to be proportional to the direct effect. It is shown for a logistic regression model that the efficiency of optimal parallel (single-period) or crossover (two-period) design is substantially greater than a balanced design. The optimal designs are also shown to be robust to misspecification of the value of the parameters. Finally, the parallel and crossover designs are combined to provide the experimenter with greater flexibility.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Solving many scientific problems requires effective regression and/or classification models for large high-dimensional datasets. Experts from these problem domains (e.g. biologists, chemists, financial analysts) have insights into the domain which can be helpful in developing powerful models but they need a modelling framework that helps them to use these insights. Data visualisation is an effective technique for presenting data and requiring feedback from the experts. A single global regression model can rarely capture the full behavioural variability of a huge multi-dimensional dataset. Instead, local regression models, each focused on a separate area of input space, often work better since the behaviour of different areas may vary. Classical local models such as Mixture of Experts segment the input space automatically, which is not always effective and it also lacks involvement of the domain experts to guide a meaningful segmentation of the input space. In this paper we addresses this issue by allowing domain experts to interactively segment the input space using data visualisation. The segmentation output obtained is then further used to develop effective local regression models.