923 resultados para model selection in binary regression
Resumo:
Background. The mTOR pathway is commonly altered in human tumors and promotes cell survival and proliferation. Preliminary evidence suggests this pathway's involvement in chemoresistance to platinum and taxanes, first line therapy for epithelial ovarian cancer. A pathway-based approach was used to identify individual germline single nucleotide polymorphisms (SNPs) and cumulative effects of multiple genetic variants in mTOR pathway genes and their association with clinical outcome in women with ovarian cancer. ^ Methods. The case-series was restricted to 319 non-Hispanic white women with high grade ovarian cancer treated with surgery and platinum-based chemotherapy. 135 SNPs in 20 representative genes in the mTOR pathway were genotyped. Hazard ratios (HRs) for death and Odds ratios (ORs) for failure to respond to primary therapy were estimated for each SNP using the multivariate Cox proportional hazards model and multivariate logistic regression model, respectively, while adjusting for age, stage, histology and treatment sequence. A survival tree analysis of SNPs with a statistically significant association (p<0.05) was performed to identify higher order gene-gene interactions and their association with overall survival. ^ Results. There was no statistically significant difference in survival by tumor histology or treatment regimen. The median survival for the cohort was 48.3 months. Seven SNPs were significantly associated with decreased survival. Compared to those with no unfavorable genotypes, the HR for death increased significantly with the increasing number of unfavorable genotypes and women in the highest risk category had HR of 4.06 (95% CI 2.29–7.21). The survival tree analysis also identified patients with different survival patterns based on their genetic profiles. 13 SNPs on five different genes were found to be significantly associated with a treatment response, defined as no evidence of disease after completion of primary therapy. Rare homozygous genotype of SNP rs6973428 showed a 5.5-fold increased risk compared to the wild type carrying genotypes. In the cumulative effect analysis, the highest risk group (individuals with ≥8 unfavorable genotypes) was significantly less likely to respond to chemotherapy (OR=8.40, 95% CI 3.10–22.75) compared to the low risk group (≤4 unfavorable genotypes). ^ Conclusions. A pathway-based approach can demonstrate cumulative effects of multiple genetic variants on clinical response to chemotherapy and survival. Therapy targeting the mTOR pathway may modify outcome in select patients.^
Resumo:
Developing a Model Interruption is a known human factor that contributes to errors and catastrophic events in healthcare as well as other high-risk industries. The landmark Institute of Medicine (IOM) report, To Err is Human, brought attention to the significance of preventable errors in medicine and suggested that interruptions could be a contributing factor. Previous studies of interruptions in healthcare did not offer a conceptual model by which to study interruptions. As a result of the serious consequences of interruptions investigated in other high-risk industries, there is a need to develop a model to describe, understand, explain, and predict interruptions and their consequences in healthcare. Therefore, the purpose of this study was to develop a model grounded in the literature and to use the model to describe and explain interruptions in healthcare. Specifically, this model would be used to describe and explain interruptions occurring in a Level One Trauma Center. A trauma center was chosen because this environment is characterized as intense, unpredictable, and interrupt-driven. The first step in developing the model began with a review of the literature which revealed that the concept interruption did not have a consistent definition in either the healthcare or non-healthcare literature. Walker and Avant’s method of concept analysis was used to clarify and define the concept. The analysis led to the identification of five defining attributes which include (1) a human experience, (2) an intrusion of a secondary, unplanned, and unexpected task, (3) discontinuity, (4) externally or internally initiated, and (5) situated within a context. However, before an interruption could commence, five conditions known as antecedents must occur. For an interruption to take place (1) an intent to interrupt is formed by the initiator, (2) a physical signal must pass a threshold test of detection by the recipient, (3) the sensory system of the recipient is stimulated to respond to the initiator, (4) an interruption task is presented to recipient, and (5) the interruption task is either accepted or rejected by v the recipient. An interruption was determined to be quantifiable by (1) the frequency of occurrence of an interruption, (2) the number of times the primary task has been suspended to perform an interrupting task, (3) the length of time the primary task has been suspended, and (4) the frequency of returning to the primary task or not returning to the primary task. As a result of the concept analysis, a definition of an interruption was derived from the literature. An interruption is defined as a break in the performance of a human activity initiated internal or external to the recipient and occurring within the context of a setting or location. This break results in the suspension of the initial task by initiating the performance of an unplanned task with the assumption that the initial task will be resumed. The definition is inclusive of all the defining attributes of an interruption. This is a standard definition that can be used by the healthcare industry. From the definition, a visual model of an interruption was developed. The model was used to describe and explain the interruptions recorded for an instrumental case study of physicians and registered nurses (RNs) working in a Level One Trauma Center. Five physicians were observed for a total of 29 hours, 31 minutes. Eight registered nurses were observed for a total of 40 hours 9 minutes. Observations were made on either the 0700–1500 or the 1500-2300 shift using the shadowing technique. Observations were recorded in the field note format. The field notes were analyzed by a hybrid method of categorizing activities and interruptions. The method was developed by using both a deductive a priori classification framework and by the inductive process utilizing line-byline coding and constant comparison as stated in Grounded Theory. The following categories were identified as relative to this study: Intended Recipient - the person to be interrupted Unintended Recipient - not the intended recipient of an interruption; i.e., receiving a phone call that was incorrectly dialed Indirect Recipient – the incidental recipient of an interruption; i.e., talking with another, thereby suspending the original activity Recipient Blocked – the intended recipient does not accept the interruption Recipient Delayed – the intended recipient postpones an interruption Self-interruption – a person, independent of another person, suspends one activity to perform another; i.e., while walking, stops abruptly and talks to another person Distraction – briefly disengaging from a task Organizational Design – the physical layout of the workspace that causes a disruption in workflow Artifacts Not Available – supplies and equipment that are not available in the workspace causing a disruption in workflow Initiator – a person who initiates an interruption Interruption by Organizational Design and Artifacts Not Available were identified as two new categories of interruption. These categories had not previously been cited in the literature. Analysis of the observations indicated that physicians were found to perform slightly fewer activities per hour when compared to RNs. This variance may be attributed to differing roles and responsibilities. Physicians were found to have more activities interrupted when compared to RNs. However, RNs experienced more interruptions per hour. Other people were determined to be the most commonly used medium through which to deliver an interruption. Additional mediums used to deliver an interruption vii included the telephone, pager, and one’s self. Both physicians and RNs were observed to resume an original interrupted activity more often than not. In most interruptions, both physicians and RNs performed only one or two interrupting activities before returning to the original interrupted activity. In conclusion the model was found to explain all interruptions observed during the study. However, the model will require an even more comprehensive study in order to establish its predictive value.
Resumo:
Complex diseases such as cancer result from multiple genetic changes and environmental exposures. Due to the rapid development of genotyping and sequencing technologies, we are now able to more accurately assess causal effects of many genetic and environmental factors. Genome-wide association studies have been able to localize many causal genetic variants predisposing to certain diseases. However, these studies only explain a small portion of variations in the heritability of diseases. More advanced statistical models are urgently needed to identify and characterize some additional genetic and environmental factors and their interactions, which will enable us to better understand the causes of complex diseases. In the past decade, thanks to the increasing computational capabilities and novel statistical developments, Bayesian methods have been widely applied in the genetics/genomics researches and demonstrating superiority over some regular approaches in certain research areas. Gene-environment and gene-gene interaction studies are among the areas where Bayesian methods may fully exert its functionalities and advantages. This dissertation focuses on developing new Bayesian statistical methods for data analysis with complex gene-environment and gene-gene interactions, as well as extending some existing methods for gene-environment interactions to other related areas. It includes three sections: (1) Deriving the Bayesian variable selection framework for the hierarchical gene-environment and gene-gene interactions; (2) Developing the Bayesian Natural and Orthogonal Interaction (NOIA) models for gene-environment interactions; and (3) extending the applications of two Bayesian statistical methods which were developed for gene-environment interaction studies, to other related types of studies such as adaptive borrowing historical data. We propose a Bayesian hierarchical mixture model framework that allows us to investigate the genetic and environmental effects, gene by gene interactions (epistasis) and gene by environment interactions in the same model. It is well known that, in many practical situations, there exists a natural hierarchical structure between the main effects and interactions in the linear model. Here we propose a model that incorporates this hierarchical structure into the Bayesian mixture model, such that the irrelevant interaction effects can be removed more efficiently, resulting in more robust, parsimonious and powerful models. We evaluate both of the 'strong hierarchical' and 'weak hierarchical' models, which specify that both or one of the main effects between interacting factors must be present for the interactions to be included in the model. The extensive simulation results show that the proposed strong and weak hierarchical mixture models control the proportion of false positive discoveries and yield a powerful approach to identify the predisposing main effects and interactions in the studies with complex gene-environment and gene-gene interactions. We also compare these two models with the 'independent' model that does not impose this hierarchical constraint and observe their superior performances in most of the considered situations. The proposed models are implemented in the real data analysis of gene and environment interactions in the cases of lung cancer and cutaneous melanoma case-control studies. The Bayesian statistical models enjoy the properties of being allowed to incorporate useful prior information in the modeling process. Moreover, the Bayesian mixture model outperforms the multivariate logistic model in terms of the performances on the parameter estimation and variable selection in most cases. Our proposed models hold the hierarchical constraints, that further improve the Bayesian mixture model by reducing the proportion of false positive findings among the identified interactions and successfully identifying the reported associations. This is practically appealing for the study of investigating the causal factors from a moderate number of candidate genetic and environmental factors along with a relatively large number of interactions. The natural and orthogonal interaction (NOIA) models of genetic effects have previously been developed to provide an analysis framework, by which the estimates of effects for a quantitative trait are statistically orthogonal regardless of the existence of Hardy-Weinberg Equilibrium (HWE) within loci. Ma et al. (2012) recently developed a NOIA model for the gene-environment interaction studies and have shown the advantages of using the model for detecting the true main effects and interactions, compared with the usual functional model. In this project, we propose a novel Bayesian statistical model that combines the Bayesian hierarchical mixture model with the NOIA statistical model and the usual functional model. The proposed Bayesian NOIA model demonstrates more power at detecting the non-null effects with higher marginal posterior probabilities. Also, we review two Bayesian statistical models (Bayesian empirical shrinkage-type estimator and Bayesian model averaging), which were developed for the gene-environment interaction studies. Inspired by these Bayesian models, we develop two novel statistical methods that are able to handle the related problems such as borrowing data from historical studies. The proposed methods are analogous to the methods for the gene-environment interactions on behalf of the success on balancing the statistical efficiency and bias in a unified model. By extensive simulation studies, we compare the operating characteristics of the proposed models with the existing models including the hierarchical meta-analysis model. The results show that the proposed approaches adaptively borrow the historical data in a data-driven way. These novel models may have a broad range of statistical applications in both of genetic/genomic and clinical studies.
Resumo:
It is well known that an identification problem exists in the analysis of age-period-cohort data because of the relationship among the three factors (date of birth + age at death = date of death). There are numerous suggestions about how to analyze the data. No one solution has been satisfactory. The purpose of this study is to provide another analytic method by extending the Cox's lifetable regression model with time-dependent covariates. The new approach contains the following features: (1) It is based on the conditional maximum likelihood procedure using a proportional hazard function described by Cox (1972), treating the age factor as the underlying hazard to estimate the parameters for the cohort and period factors. (2) The model is flexible so that both the cohort and period factors can be treated as dummy or continuous variables, and the parameter estimations can be obtained for numerous combinations of variables as in a regression analysis. (3) The model is applicable even when the time period is unequally spaced.^ Two specific models are considered to illustrate the new approach and applied to the U.S. prostate cancer data. We find that there are significant differences between all cohorts and there is a significant period effect for both whites and nonwhites. The underlying hazard increases exponentially with age indicating that old people have much higher risk than young people. A log transformation of relative risk shows that the prostate cancer risk declined in recent cohorts for both models. However, prostate cancer risk declined 5 cohorts (25 years) earlier for whites than for nonwhites under the period factor model (0 0 0 1 1 1 1). These latter results are similar to the previous study by Holford (1983).^ The new approach offers a general method to analyze the age-period-cohort data without using any arbitrary constraint in the model. ^
Resumo:
The performance of the Hosmer-Lemeshow global goodness-of-fit statistic for logistic regression models was explored in a wide variety of conditions not previously fully investigated. Computer simulations, each consisting of 500 regression models, were run to assess the statistic in 23 different situations. The items which varied among the situations included the number of observations used in each regression, the number of covariates, the degree of dependence among the covariates, the combinations of continuous and discrete variables, and the generation of the values of the dependent variable for model fit or lack of fit.^ The study found that the $\rm\ C$g* statistic was adequate in tests of significance for most situations. However, when testing data which deviate from a logistic model, the statistic has low power to detect such deviation. Although grouping of the estimated probabilities into quantiles from 8 to 30 was studied, the deciles of risk approach was generally sufficient. Subdividing the estimated probabilities into more than 10 quantiles when there are many covariates in the model is not necessary, despite theoretical reasons which suggest otherwise. Because it does not follow a X$\sp2$ distribution, the statistic is not recommended for use in models containing only categorical variables with a limited number of covariate patterns.^ The statistic performed adequately when there were at least 10 observations per quantile. Large numbers of observations per quantile did not lead to incorrect conclusions that the model did not fit the data when it actually did. However, the statistic failed to detect lack of fit when it existed and should be supplemented with further tests for the influence of individual observations. Careful examination of the parameter estimates is also essential since the statistic did not perform as desired when there was moderate to severe collinearity among covariates.^ Two methods studied for handling tied values of the estimated probabilities made only a slight difference in conclusions about model fit. Neither method split observations with identical probabilities into different quantiles. Approaches which create equal size groups by separating ties should be avoided. ^
Resumo:
The use of smokeless tobacco products is undergoing an alarming resurgence in the United States. Several national surveys have reported a higher prevalence of use among those employed in blue-collar occupations. National objectives now target this group for health promotion programs which reduce the health risks associated with tobacco use.^ Drawn from a larger data set measuring health behaviors, this cross-sectional study tested the applicability of two related theories, the Theory of Reasoned Action (TRA) and the Theory of Planned Behavior (TPB), to smokeless tobacco (SLT) cessation in a blue-collar population of gas pipeline workers. In order to understand the determinants of SLT cessation, measures were obtained of demographic and normative characteristics of the population and specific constructs. Attitude toward the act of quitting (AACT) and subjective norm (SN) are constructs common to both models, perceived behavioral control (PBC) is unique to the TPB, and the number of past quit attempts is not contained in either model. In addition, a self-reported measure was taken of SLT use at two-month follow-up.^ The study population was comprised of all male SLT users who were field employees in a large gas pipeline company with gas compressor stations extending from Texas to the Canadian border. At baseline, 199 employees responded to the SLT portion of the survey, 118 completed some portion of the two-month follow-up, and 101 could be matched across time.^ As hypothesized, significant correlations were found between constructs antecedent to AACT and SN, although crossover effects occurred. Significant differences were found between SLT cessation intenders and non-intenders with regard to their personal and normative beliefs about quitting as well as their outcome expectancies and motivation to comply with others' beliefs. These differences occurred in the expected direction, with the mean intender score consistently higher than that of the non-intender.^ Contrary to hypothesis, AACT predicted intention to quit but SN did not. However, confirmatory of the TPB, PBC, operationalized as self-efficacy, independently contributed to the prediction of intention. Statistically significant relationships were not found between intention, perceived behavioral control, their interactive effects, and use behavior at two-month follow-up. The introduction of number of quit attempts into the logistic regression model resulted in insignificant findings for independent and interactive effects.^ The findings from this study are discussed in relation to their implications for program development and practice, especially within the worksite. In order to confirm and extend the findings of this investigation, recommendations for future research are also discussed. ^
Resumo:
Secchi depth is a measure of water transparency. In the Baltic Sea region, Secchi depth maps are used to assess eutrophication and as input for habitat models. Due to their spatial and temporal coverage, satellite data would be the most suitable data source for such maps. But the Baltic Sea's optical properties are so different from the open ocean that globally calibrated standard models suffer from large errors. Regional predictive models that take the Baltic Sea's special optical properties into account are thus needed. This paper tests how accurately generalized linear models (GLMs) and generalized additive models (GAMs) with MODIS/Aqua and auxiliary data as inputs can predict Secchi depth at a regional scale. It uses cross-validation to test the prediction accuracy of hundreds of GAMs and GLMs with up to 5 input variables. A GAM with 3 input variables (chlorophyll a, remote sensing reflectance at 678 nm, and long-term mean salinity) made the most accurate predictions. Tested against field observations not used for model selection and calibration, the best model's mean absolute error (MAE) for daily predictions was 1.07 m (22%), more than 50% lower than for other publicly available Baltic Sea Secchi depth maps. The MAE for predicting monthly averages was 0.86 m (15%). Thus, the proposed model selection process was able to find a regional model with good prediction accuracy. It could be useful to find predictive models for environmental variables other than Secchi depth, using data from other satellite sensors, and for other regions where non-standard remote sensing models are needed for prediction and mapping. Annual and monthly mean Secchi depth maps for 2003-2012 come with this paper as Supplementary materials.
Resumo:
In this paper, a new digital elevation model (DEM) is derived for the ice sheet in western Dronning Maud Land, Antarctica. It is based on differential interferometric synthetic aperture radar (SAR) from the European Remote Sensing 1/2 (ERS-1/2) satellites, in combination with ICESat's Geoscience Laser Altimeter System (GLAS). A DEM mosaic is compiled out of 116 scenes from the ERS-1 ice phase in 1994 and the ERS-1/2 tandem mission between 1996 and 1997 with the GLAS data acquired in 2003 that served as ground control. Using three different SAR processors, uncertainties in phase stability and baseline model, resulting in height errors of up to 20 m, are exemplified. Atmospheric influences at the same order of magnitude are demonstrated, and corresponding scenes are excluded. For validation of the DEM mosaic, covering an area of about 130,000 km**2 on a 50-m grid, independent ICESat heights (2004-2007), ground-based kinematic GPS (2005), and airborne laser scanner data (ALS, 2007) are used. Excluding small areas with low phase coherence, the DEM differs in mean and standard deviation by 0.5 +/- 10.1, 1.1 +/- 6.4, and 3.1 +/- 4.0 m from ICESat, GPS, and ALS, respectively. The excluded data points may deviate by more than 50 m. In order to suppress the spatially variable noise below a 5-m threshold, 18% of the DEM area is selectively averaged to a final product at varying horizontal spatial resolution. Apart from mountainous areas, the new DEM outperforms other currently available DEMs and may serve as a benchmark for future elevation models such as from the TanDEM-X mission to spatially monitor ice sheet elevation.
Resumo:
RESUMEN El apoyo a la selección de especies a la restauración de la vegetación en España en los últimos 40 años se ha basado fundamentalmente en modelos de distribución de especies, también llamados modelos de nicho ecológico, que estiman la probabilidad de presencia de las especies en función de las condiciones del medio físico (clima, suelo, etc.). Con esta tesis se ha intentado contribuir a la mejora de la capacidad predictiva de los modelos introduciendo algunas propuestas metodológicas adaptadas a los datos disponibles actualmente en España y enfocadas al uso de los modelos en la selección de especies. No siempre se dispone de datos a una resolución espacial adecuada para la escala de los proyectos de restauración de la vegetación. Sin embrago es habitual contar con datos de baja resolución espacial para casi todas las especies vegetales presentes en España. Se propone un método de recalibración que actualiza un modelo de regresión logística de baja resolución espacial con una nueva muestra de alta resolución espacial. El método permite obtener predicciones de calidad aceptable con muestras relativamente pequeñas (25 presencias de la especie) frente a las muestras mucho mayores (más de 100 presencias) que requería una estrategia de modelización convencional que no usara el modelo previo. La selección del método estadístico puede influir decisivamente en la capacidad predictiva de los modelos y por esa razón la comparación de métodos ha recibido mucha atención en la última década. Los estudios previos consideraban a la regresión logística como un método inferior a técnicas más modernas como las de máxima entropía. Los resultados de la tesis demuestran que esa diferencia observada se debe a que los modelos de máxima entropía incluyen técnicas de regularización y la versión de la regresión logística usada en las comparaciones no. Una vez incorporada la regularización a la regresión logística usando penalización, las diferencias en cuanto a capacidad predictiva desaparecen. La regresión logística penalizada es, por tanto, una alternativa más para el ajuste de modelos de distribución de especies y está a la altura de los métodos modernos con mejor capacidad predictiva como los de máxima entropía. A menudo, los modelos de distribución de especies no incluyen variables relativas al suelo debido a que no es habitual que se disponga de mediciones directas de sus propiedades físicas o químicas. La incorporación de datos de baja resolución espacial proveniente de mapas de suelo nacionales o continentales podría ser una alternativa. Los resultados de esta tesis sugieren que los modelos de distribución de especies de alta resolución espacial mejoran de forma ligera pero estadísticamente significativa su capacidad predictiva cuando se incorporan variables relativas al suelo procedente de mapas de baja resolución espacial. La validación es una de las etapas fundamentales del desarrollo de cualquier modelo empírico como los modelos de distribución de especies. Lo habitual es validar los modelos evaluando su capacidad predictiva especie a especie, es decir, comparando en un conjunto de localidades la presencia o ausencia observada de la especie con las predicciones del modelo. Este tipo de evaluación no responde a una cuestión clave en la restauración de la vegetación ¿cuales son las n especies más idóneas para el lugar a restaurar? Se ha propuesto un método de evaluación de modelos adaptado a esta cuestión que consiste en estimar la capacidad de un conjunto de modelos para discriminar entre las especies presentes y ausentes de un lugar concreto. El método se ha aplicado con éxito a la validación de 188 modelos de distribución de especies leñosas orientados a la selección de especies para la restauración de la vegetación en España. Las mejoras metodológicas propuestas permiten mejorar la capacidad predictiva de los modelos de distribución de especies aplicados a la selección de especies en la restauración de la vegetación y también permiten ampliar el número de especies para las que se puede contar con un modelo que apoye la toma de decisiones. SUMMARY During the last 40 years, decision support tools for plant species selection in ecological restoration in Spain have been based on species distribution models (also called ecological niche models), that estimate the probability of occurrence of the species as a function of environmental predictors (e.g., climate, soil). In this Thesis some methodological improvements are proposed to contribute to a better predictive performance of such models, given the current data available in Spain and focusing in the application of the models to selection of species for ecological restoration. Fine grained species distribution data are required to train models to be used at the scale of the ecological restoration projects, but this kind of data are not always available for every species. On the other hand, coarse grained data are available for almost every species in Spain. A recalibration method is proposed that updates a coarse grained logistic regression model using a new fine grained updating sample. The method allows obtaining acceptable predictive performance with reasonably small updating sample (25 occurrences of the species), in contrast with the much larger samples (more than 100 occurrences) required for a conventional modeling approach that discards the coarse grained data. The choice of the statistical method may have a dramatic effect on model performance, therefore comparisons of methods have received much interest in the last decade. Previous studies have shown a poorer performance of the logistic regression compared to novel methods like maximum entropy models. The results of this Thesis show that the observed difference is caused by the fact that maximum entropy models include regularization techniques and the versions of logistic regression compared do not. Once regularization has been added to the logistic regression using a penalization procedure, the differences in model performance disappear. Therefore, penalized logistic regression may be considered one of the best performing methods to model species distributions. Usually, species distribution models do not consider soil related predictors because direct measurements of the chemical or physical properties are often lacking. The inclusion of coarse grained soil data from national or continental soil maps could be a reasonable alternative. The results of this Thesis suggest that the performance of the models slightly increase after including soil predictors form coarse grained soil maps. Model validation is a key stage of the development of empirical models, such as species distribution models. The usual way of validating is based on the evaluation of model performance for each species separately, i.e., comparing observed species presences or absence to predicted probabilities in a set of sites. This kind of evaluation is not informative for a common question in ecological restoration projects: which n species are the most suitable for the environment of the site to be restored? A method has been proposed to address this question that estimates the ability of a set of models to discriminate among present and absent species in a evaluation site. The method has been successfully applied to the validation of 188 species distribution models used to support decisions on species selection for ecological restoration in Spain. The proposed methodological approaches improve the predictive performance of the predictive models applied to species selection in ecological restoration and increase the number of species for which a model that supports decisions can be fitted.
Resumo:
Leaf nitrogen and leaf surface area influence the exchange of gases between terrestrial ecosystems and the atmosphere, and play a significant role in the global cycles of carbon, nitrogen and water. The purpose of this study is to use field-based and satellite remote-sensing-based methods to assess leaf nitrogen pools in five diverse European agricultural landscapes located in Denmark, Scotland (United Kingdom), Poland, the Netherlands and Italy. REGFLEC (REGularized canopy reFLECtance) is an advanced image-based inverse canopy radiative transfer modelling system which has shown proficiency for regional mapping of leaf area index (LAI) and leaf chlorophyll (CHLl) using remote sensing data. In this study, high spatial resolution (10–20 m) remote sensing images acquired from the multispectral sensors aboard the SPOT (Satellite For Observation of Earth) satellites were used to assess the capability of REGFLEC for mapping spatial variations in LAI, CHLland the relation to leaf nitrogen (Nl) data in five diverse European agricultural landscapes. REGFLEC is based on physical laws and includes an automatic model parameterization scheme which makes the tool independent of field data for model calibration. In this study, REGFLEC performance was evaluated using LAI measurements and non-destructive measurements (using a SPAD meter) of leaf-scale CHLl and Nl concentrations in 93 fields representing crop- and grasslands of the five landscapes. Furthermore, empirical relationships between field measurements (LAI, CHLl and Nl and five spectral vegetation indices (the Normalized Difference Vegetation Index, the Simple Ratio, the Enhanced Vegetation Index-2, the Green Normalized Difference Vegetation Index, and the green chlorophyll index) were used to assess field data coherence and to serve as a comparison basis for assessing REGFLEC model performance. The field measurements showed strong vertical CHLl gradient profiles in 26% of fields which affected REGFLEC performance as well as the relationships between spectral vegetation indices (SVIs) and field measurements. When the range of surface types increased, the REGFLEC results were in better agreement with field data than the empirical SVI regression models. Selecting only homogeneous canopies with uniform CHLl distributions as reference data for evaluation, REGFLEC was able to explain 69% of LAI observations (rmse = 0.76), 46% of measured canopy chlorophyll contents (rmse = 719 mg m−2) and 51% of measured canopy nitrogen contents (rmse = 2.7 g m−2). Better results were obtained for individual landscapes, except for Italy, where REGFLEC performed poorly due to a lack of dense vegetation canopies at the time of satellite recording. Presence of vegetation is needed to parameterize the REGFLEC model. Combining REGFLEC- and SVI-based model results to minimize errors for a "snap-shot" assessment of total leaf nitrogen pools in the five landscapes, results varied from 0.6 to 4.0 t km−2. Differences in leaf nitrogen pools between landscapes are attributed to seasonal variations, extents of agricultural area, species variations, and spatial variations in nutrient availability. In order to facilitate a substantial assessment of variations in Nl pools and their relation to landscape based nitrogen and carbon cycling processes, time series of satellite data are needed. The upcoming Sentinel-2 satellite mission will provide new multiple narrowband data opportunities at high spatio-temporal resolution which are expected to further improve remote sensing capabilities for mapping LAI, CHLl and Nl.
Resumo:
Objective The main purpose of this research is the novel use of artificial metaplasticity on multilayer perceptron (AMMLP) as a data mining tool for prediction the outcome of patients with acquired brain injury (ABI) after cognitive rehabilitation. The final goal aims at increasing knowledge in the field of rehabilitation theory based on cognitive affectation. Methods and materials The data set used in this study contains records belonging to 123 ABI patients with moderate to severe cognitive affectation (according to Glasgow Coma Scale) that underwent rehabilitation at Institut Guttmann Neurorehabilitation Hospital (IG) using the tele-rehabilitation platform PREVIRNEC©. The variables included in the analysis comprise the neuropsychological initial evaluation of the patient (cognitive affectation profile), the results of the rehabilitation tasks performed by the patient in PREVIRNEC© and the outcome of the patient after a 3–5 months treatment. To achieve the treatment outcome prediction, we apply and compare three different data mining techniques: the AMMLP model, a backpropagation neural network (BPNN) and a C4.5 decision tree. Results The prediction performance of the models was measured by ten-fold cross validation and several architectures were tested. The results obtained by the AMMLP model are clearly superior, with an average predictive performance of 91.56%. BPNN and C4.5 models have a prediction average accuracy of 80.18% and 89.91% respectively. The best single AMMLP model provided a specificity of 92.38%, a sensitivity of 91.76% and a prediction accuracy of 92.07%. Conclusions The proposed prediction model presented in this study allows to increase the knowledge about the contributing factors of an ABI patient recovery and to estimate treatment efficacy in individual patients. The ability to predict treatment outcomes may provide new insights toward improving effectiveness and creating personalized therapeutic interventions based on clinical evidence.
Resumo:
In this paper, multiple regression analysis is used to model the top of descent (TOD) location of user-preferred descent trajectories computed by the flight management system (FMS) on over 1000 commercial flights into Melbourne, Australia. In addition to recording TOD, the cruise altitude, final altitude, cruise Mach, descent speed, wind, and engine type were also identified for use as the independent variables in the regression analysis. Both first-order and second-order models are considered, where cross-validation, hypothesis testing, and additional analysis are used to compare models. This identifies the models that should give the smallest errors if used to predict TOD location for new data in the future. A model that is linear in TOD altitude, final altitude, descent speed, and wind gives an estimated standard deviation of 3.9 nmi for TOD location given the trajectory parame- ters, which means about 80% of predictions would have error less than 5 nmi in absolute value. This accuracy is better than demonstrated by other ground automation predictions using kinetic models. Furthermore, this approach would enable online learning of the model. Additional data or further knowledge of algorithms is necessary to conclude definitively that no second-order terms are appropriate. Possible applications of the linear model are described, including enabling arriving aircraft to fly optimized descents computed by the FMS even in congested airspace.
Resumo:
Los accidentes del tráfico son un fenómeno social muy relevantes y una de las principales causas de mortalidad en los países desarrollados. Para entender este fenómeno complejo se aplican modelos econométricos sofisticados tanto en la literatura académica como por las administraciones públicas. Esta tesis está dedicada al análisis de modelos macroscópicos para los accidentes del tráfico en España. El objetivo de esta tesis se puede dividir en dos bloques: a. Obtener una mejor comprensión del fenómeno de accidentes de trafico mediante la aplicación y comparación de dos modelos macroscópicos utilizados frecuentemente en este área: DRAG y UCM, con la aplicación a los accidentes con implicación de furgonetas en España durante el período 2000-2009. Los análisis se llevaron a cabo con enfoque frecuencista y mediante los programas TRIO, SAS y TRAMO/SEATS. b. La aplicación de modelos y la selección de las variables más relevantes, son temas actuales de investigación y en esta tesis se ha desarrollado y aplicado una metodología que pretende mejorar, mediante herramientas teóricas y prácticas, el entendimiento de selección y comparación de los modelos macroscópicos. Se han desarrollado metodologías tanto para selección como para comparación de modelos. La metodología de selección de modelos se ha aplicado a los accidentes mortales ocurridos en la red viaria en el período 2000-2011, y la propuesta metodológica de comparación de modelos macroscópicos se ha aplicado a la frecuencia y la severidad de los accidentes con implicación de furgonetas en el período 2000-2009. Como resultado de los desarrollos anteriores se resaltan las siguientes contribuciones: a. Profundización de los modelos a través de interpretación de las variables respuesta y poder de predicción de los modelos. El conocimiento sobre el comportamiento de los accidentes con implicación de furgonetas se ha ampliado en este proceso. bl. Desarrollo de una metodología para selección de variables relevantes para la explicación de la ocurrencia de accidentes de tráfico. Teniendo en cuenta los resultados de a) la propuesta metodológica se basa en los modelos DRAG, cuyos parámetros se han estimado con enfoque bayesiano y se han aplicado a los datos de accidentes mortales entre los años 2000-2011 en España. Esta metodología novedosa y original se ha comparado con modelos de regresión dinámica (DR), que son los modelos más comunes para el trabajo con procesos estocásticos. Los resultados son comparables, y con la nueva propuesta se realiza una aportación metodológica que optimiza el proceso de selección de modelos, con escaso coste computacional. b2. En la tesis se ha diseñado una metodología de comparación teórica entre los modelos competidores mediante la aplicación conjunta de simulación Monte Cario, diseño de experimentos y análisis de la varianza ANOVA. Los modelos competidores tienen diferentes estructuras, que afectan a la estimación de efectos de las variables explicativas. Teniendo en cuenta el estudio desarrollado en bl) este desarrollo tiene el propósito de determinar como interpretar la componente de tendencia estocástica que un modelo UCM modela explícitamente, a través de un modelo DRAG, que no tiene un método específico para modelar este elemento. Los resultados de este estudio son importantes para ver si la serie necesita ser diferenciada antes de modelar. b3. Se han desarrollado nuevos algoritmos para realizar los ejercicios metodológicos, implementados en diferentes programas como R, WinBUGS, y MATLAB. El cumplimiento de los objetivos de la tesis a través de los desarrollos antes enunciados se remarcan en las siguientes conclusiones: 1. El fenómeno de accidentes del tráfico se ha analizado mediante dos modelos macroscópicos. Los efectos de los factores de influencia son diferentes dependiendo de la metodología aplicada. Los resultados de predicción son similares aunque con ligera superioridad de la metodología DRAG. 2. La metodología para selección de variables y modelos proporciona resultados prácticos en cuanto a la explicación de los accidentes de tráfico. La predicción y la interpretación también se han mejorado mediante esta nueva metodología. 3. Se ha implementado una metodología para profundizar en el conocimiento de la relación entre las estimaciones de los efectos de dos modelos competidores como DRAG y UCM. Un aspecto muy importante en este tema es la interpretación de la tendencia mediante dos modelos diferentes de la que se ha obtenido información muy útil para los investigadores en el campo del modelado. Los resultados han proporcionado una ampliación satisfactoria del conocimiento en torno al proceso de modelado y comprensión de los accidentes con implicación de furgonetas y accidentes mortales totales en España. ABSTRACT Road accidents are a very relevant social phenomenon and one of the main causes of death in industrialized countries. Sophisticated econometric models are applied in academic work and by the administrations for a better understanding of this very complex phenomenon. This thesis is thus devoted to the analysis of macro models for road accidents with application to the Spanish case. The objectives of the thesis may be divided in two blocks: a. To achieve a better understanding of the road accident phenomenon by means of the application and comparison of two of the most frequently used macro modelings: DRAG (demand for road use, accidents and their gravity) and UCM (unobserved components model); the application was made to van involved accident data in Spain in the period 2000-2009. The analysis has been carried out within the frequentist framework and using available state of the art software, TRIO, SAS and TRAMO/SEATS. b. Concern on the application of the models and on the relevant input variables to be included in the model has driven the research to try to improve, by theoretical and practical means, the understanding on methodological choice and model selection procedures. The theoretical developments have been applied to fatal accidents during the period 2000-2011 and van-involved road accidents in 2000-2009. This has resulted in the following contributions: a. Insight on the models has been gained through interpretation of the effect of the input variables on the response and prediction accuracy of both models. The behavior of van-involved road accidents has been explained during this process. b1. Development of an input variable selection procedure, which is crucial for an efficient choice of the inputs. Following the results of a) the procedure uses the DRAG-like model. The estimation is carried out within the Bayesian framework. The procedure has been applied for the total road accident data in Spain in the period 2000-2011. The results of the model selection procedure are compared and validated through a dynamic regression model given that the original data has a stochastic trend. b2. A methodology for theoretical comparison between the two models through Monte Carlo simulation, computer experiment design and ANOVA. The models have a different structure and this affects the estimation of the effects of the input variables. The comparison is thus carried out in terms of the effect of the input variables on the response, which is in general different, and should be related. Considering the results of the study carried out in b1) this study tries to find out how a stochastic time trend will be captured in DRAG model, since there is no specific trend component in DRAG. Given the results of b1) the findings of this study are crucial in order to see if the estimation of data with stochastic component through DRAG will be valid or whether the data need a certain adjustment (typically differencing) prior to the estimation. The model comparison methodology was applied to the UCM and DRAG models, considering that, as mentioned above, the UCM has a specific trend term while DRAG does not. b3. New algorithms were developed for carrying out the methodological exercises. For this purpose different softwares, R, WinBUGs and MATLAB were used. These objectives and contributions have been resulted in the following findings: 1. The road accident phenomenon has been analyzed by means of two macro models: The effects of the influential input variables may be estimated through the models, but it has been observed that the estimates vary from one model to the other, although prediction accuracy is similar, with a slight superiority of the DRAG methodology. 2. The variable selection methodology provides very practical results, as far as the explanation of road accidents is concerned. Prediction accuracy and interpretability have been improved by means of a more efficient input variable and model selection procedure. 3. Insight has been gained on the relationship between the estimates of the effects using the two models. A very relevant issue here is the role of trend in both models, relevant recommendations for the analyst have resulted from here. The results have provided a very satisfactory insight into both modeling aspects and the understanding of both van-involved and total fatal accidents behavior in Spain.
Resumo:
Bayesian network classifiers are widely used in machine learning because they intuitively represent causal relations. Multi-label classification problems require each instance to be assigned a subset of a defined set of h labels. This problem is equivalent to finding a multi-valued decision function that predicts a vector of h binary classes. In this paper we obtain the decision boundaries of two widely used Bayesian network approaches for building multi-label classifiers: Multi-label Bayesian network classifiers built using the binary relevance method and Bayesian network chain classifiers. We extend our previous single-label results to multi-label chain classifiers, and we prove that, as expected, chain classifiers provide a more expressive model than the binary relevance method.
Resumo:
Since the memristor was first built in 2008 at HP Labs, no end of devices and models have been presented. Also, new applications appear frequently. However, the integration of the device at the circuit level is not straightforward, because available models are still immature and/or suppose high computational loads, making their simulation long and cumbersome. This study assists circuit/systems designers in the integration of memristors in their applications, while aiding model developers in the validation of their proposals. We introduce the use of a memristor application framework to support the work of both the model developer and the circuit designer. First, the framework includes a library with the best-known memristor models, being easily extensible with upcoming models. Systematic modifications have been applied to these models to provide better convergence and significant simulations speedups. Second, a quick device simulator allows the study of the response of the models under different scenarios, helping the designer with the stimuli and operation time selection. Third, fine tuning of the device including parameters variations and threshold determination is also supported. Finally, SPICE/Spectre subcircuit generation is provided to ease the integration of the devices in application circuits. The framework provides the designer with total control overconvergence, computational load, and the evolution of system variables, overcoming usual problems in the integration of memristive devices.