927 resultados para LINEAR-REGRESSION MODELS
Resumo:
The mortality rate of older patients with intertrochanteric fractures has been increasing with the aging of populations in China. The purpose of this study was: 1) to develop an artificial neural network (ANN) using clinical information to predict the 1-year mortality of elderly patients with intertrochanteric fractures, and 2) to compare the ANN's predictive ability with that of logistic regression models. The ANN model was tested against actual outcomes of an intertrochanteric femoral fracture database in China. The ANN model was generated with eight clinical inputs and a single output. ANN's performance was compared with a logistic regression model created with the same inputs in terms of accuracy, sensitivity, specificity, and discriminability. The study population was composed of 2150 patients (679 males and 1471 females): 1432 in the training group and 718 new patients in the testing group. The ANN model that had eight neurons in the hidden layer had the highest accuracies among the four ANN models: 92.46 and 85.79% in both training and testing datasets, respectively. The areas under the receiver operating characteristic curves of the automatically selected ANN model for both datasets were 0.901 (95%CI=0.814-0.988) and 0.869 (95%CI=0.748-0.990), higher than the 0.745 (95%CI=0.612-0.879) and 0.728 (95%CI=0.595-0.862) of the logistic regression model. The ANN model can be used for predicting 1-year mortality in elderly patients with intertrochanteric fractures. It outperformed a logistic regression on multiple performance measures when given the same variables.
Resumo:
This study developed a gluten-free granola and evaluated it during storage with the application of multivariate and regression analysis of the sensory and instrumental parameters. The physicochemical, sensory, and nutritional characteristics of a product containing quinoa, amaranth and linseed were evaluated. The crude protein and lipid contents ranged from 97.49 and 122.72 g kg-1 of food, respectively. The polyunsaturated/saturated, and n-6:n-3 fatty acid ratios ranged from 2.82 and 2.59:1, respectively. Granola had the best alpha-linolenic acid content, nutritional indices in the lipid fraction, and mineral content. There were good hygienic and sanitary conditions during storage; probably due to the low water activity of the formulation, which contributed to inhibit microbial growth. The sensory attributes ranged from 'like very much' to 'like slightly', and the regression models were highly fitted and correlated during the storage period. A reduction in the sensory attribute levels and in the product physical stabilisation was verified by principal component analysis. The use of the affective test acceptance and instrumental analysis combined with statistical methods allowed us to obtain promising results about the characteristics of gluten-free granola.
Resumo:
Relationships between surface sediment diatom assemblages and lake trophic status were studied in 50 Canadian Precambrian Shield lakes in the Muskoka-Haliburton and southern Ontario regions. The purpose of this study was to develop mathematical regression models to infer lake trophic status from diatom assemblage data. To achieve this goal, however, additional investigations dealing with the evaluation of lake trophic status and the autecological features of key diatom species were carried out. Because a unifying index and classification for lake trophic status was not available, a new multiple index was developed in this study, by the computation of the physical, chemical and biological data from 85 south Ontario lakes. By using the new trophic parameter, the lake trophic level (TL) was determined: TL = 1.37 In[1 +(TP x Chl-a / SD)], where, TP=total phosphorus, Chl-a=chlorophyll-a and SD=Secchi depth. The boundaries between 7 lake trophic categories (Ultra-oligotrophic lakes: 0-0.24; Oligotrophic lakes: 0.241-1.8; Oligomesotrophic lakes: 1.813.0; Mesotrophic lakes: 3.01-4.20; Mesoeutrophic lakes: 4.21-5.4; Eutrophic lakes: 5.41-10 and Hyper-eutrophic lakes: above 10) were established. The new trophic parameter was more convenient for management of water quality, communication to the public and comparison with other lake trophic status indices than many of the previously published indices because the TL index attempts to Increase understanding of the characteristics of lakes and their comprehensive trophic states. It is more reasonable and clear for a unifying determination of true trophic states of lakes. Diatom specIes autecology analysis was central to this thesis. However, the autecological relationship of diatom species and lake trophic status had not previously been well documented. Based on the investigation of the diatom composition and variety of species abundance in 30 study lakes, the distribution optima of diatom species were determined. These determinations were based on a quantitative method called "weighted average" (Charles 1985). On this basis, the diatom species were classified into five trophic categories (oligotrophic, oligomesotrophic, mesotrophic, mesoeutrophic and eutrophic species groups). The resulting diatom trophic status autecological features were used in the regressIon analysis between diatom assemblages and lake trophic status. When the TL trophic level values of the 30 lakes were regressed against their fi ve corresponding diatom trophic groups, the two mathematical equations for expressing the assumed linear relationship between the diatom assemblages composition were determined by (1) uSIng a single regression technique: Trophic level of lake (TL) = 2.643 - 7.575 log (Index D) (r = 0.88 r2 = 0.77 P = 0.0001; n = 30) Where, Index D = (0% + OM% + M%)/(E% + ME% + M%); 4 (2) uSIng a' multiple regressIon technique: TL=4.285-0.076 0%- 0.055 OM% - 0.026 M% + 0.033 ME% + 0.065 E% (r=0.89, r2=0.792, P=O.OOOl, n=30) There was a significant correlation between measured and diatom inferred trophic levels both by single and multiple regressIon methods (P < 0.0001, n=20), when both models were applied to another 20 test lakes. Their correlation coefficients (r2 ) were also statistically significant (r2 >0.68, n=20). As such, the two transfer function models between diatoms and lake trophic status were validated. The two models obtained as noted above were developed using one group of lakes and then tested using an entirely different group of lakes. This study indicated that diatom assemblages are sensitive to lake trophic status. As indicators of lake trophic status, diatoms are especially useful in situations where no local trophic information is available and in studies of the paleotrophic history of lakes. Diatom autecological information was used to develop a theory assessing water quality and lake trophic status.
Resumo:
Two groups of rainbow trout were acclimated to 20 , 100 , and 18 o C. Plasma sodium, potassium, and chloride levels were determined for both. One group was employed in the estimation of branchial and renal (Na+-K+)-stimulated, (HC0 3-)-stimulated, and CMg++)-dependent ATPase activities, while the other was used in the measurement of carbonic anhydrase activity in the blood, gill and kidney. Assays were conducted using two incubation temperature schemes. One provided for incubation of all preparations at a common temperature of 2S oC, a value equivalent to the upper incipient lethal level for this species. In the other procedure the preparations were incubated at the appropriate acclimation temperature of the sampled fish. Trout were able to maintain plasma sodium and chloride levels essentially constant over the temperature range employed. The different incubation temperature protocols produced different levels of activity, and, in some cases, contrary trends with respect to acclimation temperature. This information was discussed in relation to previous work on gill and kidney. The standing-gradient flow hypothesis was discussed with reference to the structure of the chloride cell, known thermallyinduced changes in ion uptake, and the enzyme activities obtained in this study. Modifications of the model of gill lon uptake suggested by Maetz (1971) were proposed; high and low temperature models resulting. In short, ion transport at the gill at low temperatures appears to involve sodium and chloride 2 uptake by heteroionic exchange mechanisms working in association w.lth ca.rbonlc anhydrase. G.l ll ( Na + -K + ) -ATPase and erythrocyte carbonic anhydrase seem to provide the supplemental uptake required at higher temperatures. It appears that the kidney is prominent in ion transport at low temperatures while the gill is more important at high temperatures. 3 Linear regression analyses involving weight, plasma ion levels, and enzyme activities indicated several trends, the most significant being the interrelationship observed between plasma sodium and chloride. This, and other data obtained in the study was considered in light of the theory that a link exists between plasma sodium and chloride regulatory mechanisms.
Resumo:
Objective: To determine which socio-demographic, exposure, morbidity and symptom variables are associated with health-related quality of life among former and current heavy smokers. Methods: Cross sectional data from 2537 participants were studied. All participants were at ≥2% risk of developing lung cancer within 6 years. Linear and logistic regression models utilizing a multivariable fractional polynomial selection process identified variables associated with health-related quality of life, measured by the EQ-5D. Results: Upstream and downstream associations between smoking cessation and higher health-related quality of life were evident. Significant upstream associations, such as education level and current working status and were explained by the addition of morbidities and symptoms to regression models. Having arthritis, decreased forced expiratory volume in one second, fatigue, poor appetite or dyspnea were most highly and commonly associated with decreased HRQoL. Discussion: Upstream factors such as educational attainment, employment status and smoking cessation should be targeted to prevent decreased health-related quality of life. Practitioners should focus treatment on downstream factors, especially symptoms, to improve health-related quality of life.
Resumo:
In the context of multivariate linear regression (MLR) models, it is well known that commonly employed asymptotic test criteria are seriously biased towards overrejection. In this paper, we propose a general method for constructing exact tests of possibly nonlinear hypotheses on the coefficients of MLR systems. For the case of uniform linear hypotheses, we present exact distributional invariance results concerning several standard test criteria. These include Wilks' likelihood ratio (LR) criterion as well as trace and maximum root criteria. The normality assumption is not necessary for most of the results to hold. Implications for inference are two-fold. First, invariance to nuisance parameters entails that the technique of Monte Carlo tests can be applied on all these statistics to obtain exact tests of uniform linear hypotheses. Second, the invariance property of the latter statistic is exploited to derive general nuisance-parameter-free bounds on the distribution of the LR statistic for arbitrary hypotheses. Even though it may be difficult to compute these bounds analytically, they can easily be simulated, hence yielding exact bounds Monte Carlo tests. Illustrative simulation experiments show that the bounds are sufficiently tight to provide conclusive results with a high probability. Our findings illustrate the value of the bounds as a tool to be used in conjunction with more traditional simulation-based test methods (e.g., the parametric bootstrap) which may be applied when the bounds are not conclusive.
Resumo:
This paper proposes finite-sample procedures for testing the SURE specification in multi-equation regression models, i.e. whether the disturbances in different equations are contemporaneously uncorrelated or not. We apply the technique of Monte Carlo (MC) tests [Dwass (1957), Barnard (1963)] to obtain exact tests based on standard LR and LM zero correlation tests. We also suggest a MC quasi-LR (QLR) test based on feasible generalized least squares (FGLS). We show that the latter statistics are pivotal under the null, which provides the justification for applying MC tests. Furthermore, we extend the exact independence test proposed by Harvey and Phillips (1982) to the multi-equation framework. Specifically, we introduce several induced tests based on a set of simultaneous Harvey/Phillips-type tests and suggest a simulation-based solution to the associated combination problem. The properties of the proposed tests are studied in a Monte Carlo experiment which shows that standard asymptotic tests exhibit important size distortions, while MC tests achieve complete size control and display good power. Moreover, MC-QLR tests performed best in terms of power, a result of interest from the point of view of simulation-based tests. The power of the MC induced tests improves appreciably in comparison to standard Bonferroni tests and, in certain cases, outperforms the likelihood-based MC tests. The tests are applied to data used by Fischer (1993) to analyze the macroeconomic determinants of growth.
Resumo:
In this paper, we develop finite-sample inference procedures for stationary and nonstationary autoregressive (AR) models. The method is based on special properties of Markov processes and a split-sample technique. The results on Markovian processes (intercalary independence and truncation) only require the existence of conditional densities. They are proved for possibly nonstationary and/or non-Gaussian multivariate Markov processes. In the context of a linear regression model with AR(1) errors, we show how these results can be used to simplify the distributional properties of the model by conditioning a subset of the data on the remaining observations. This transformation leads to a new model which has the form of a two-sided autoregression to which standard classical linear regression inference techniques can be applied. We show how to derive tests and confidence sets for the mean and/or autoregressive parameters of the model. We also develop a test on the order of an autoregression. We show that a combination of subsample-based inferences can improve the performance of the procedure. An application to U.S. domestic investment data illustrates the method.
Resumo:
A wide range of tests for heteroskedasticity have been proposed in the econometric and statistics literature. Although a few exact homoskedasticity tests are available, the commonly employed procedures are quite generally based on asymptotic approximations which may not provide good size control in finite samples. There has been a number of recent studies that seek to improve the reliability of common heteroskedasticity tests using Edgeworth, Bartlett, jackknife and bootstrap methods. Yet the latter remain approximate. In this paper, we describe a solution to the problem of controlling the size of homoskedasticity tests in linear regression contexts. We study procedures based on the standard test statistics [e.g., the Goldfeld-Quandt, Glejser, Bartlett, Cochran, Hartley, Breusch-Pagan-Godfrey, White and Szroeter criteria] as well as tests for autoregressive conditional heteroskedasticity (ARCH-type models). We also suggest several extensions of the existing procedures (sup-type of combined test statistics) to allow for unknown breakpoints in the error variance. We exploit the technique of Monte Carlo tests to obtain provably exact p-values, for both the standard and the new tests suggested. We show that the MC test procedure conveniently solves the intractable null distribution problem, in particular those raised by the sup-type and combined test statistics as well as (when relevant) unidentified nuisance parameter problems under the null hypothesis. The method proposed works in exactly the same way with both Gaussian and non-Gaussian disturbance distributions [such as heavy-tailed or stable distributions]. The performance of the procedures is examined by simulation. The Monte Carlo experiments conducted focus on : (1) ARCH, GARCH, and ARCH-in-mean alternatives; (2) the case where the variance increases monotonically with : (i) one exogenous variable, and (ii) the mean of the dependent variable; (3) grouped heteroskedasticity; (4) breaks in variance at unknown points. We find that the proposed tests achieve perfect size control and have good power.
Resumo:
In this paper we propose exact likelihood-based mean-variance efficiency tests of the market portfolio in the context of Capital Asset Pricing Model (CAPM), allowing for a wide class of error distributions which include normality as a special case. These tests are developed in the frame-work of multivariate linear regressions (MLR). It is well known however that despite their simple statistical structure, standard asymptotically justified MLR-based tests are unreliable. In financial econometrics, exact tests have been proposed for a few specific hypotheses [Jobson and Korkie (Journal of Financial Economics, 1982), MacKinlay (Journal of Financial Economics, 1987), Gib-bons, Ross and Shanken (Econometrica, 1989), Zhou (Journal of Finance 1993)], most of which depend on normality. For the gaussian model, our tests correspond to Gibbons, Ross and Shanken’s mean-variance efficiency tests. In non-gaussian contexts, we reconsider mean-variance efficiency tests allowing for multivariate Student-t and gaussian mixture errors. Our framework allows to cast more evidence on whether the normality assumption is too restrictive when testing the CAPM. We also propose exact multivariate diagnostic checks (including tests for multivariate GARCH and mul-tivariate generalization of the well known variance ratio tests) and goodness of fit tests as well as a set estimate for the intervening nuisance parameters. Our results [over five-year subperiods] show the following: (i) multivariate normality is rejected in most subperiods, (ii) residual checks reveal no significant departures from the multivariate i.i.d. assumption, and (iii) mean-variance efficiency tests of the market portfolio is not rejected as frequently once it is allowed for the possibility of non-normal errors.
Resumo:
We propose methods for testing hypotheses of non-causality at various horizons, as defined in Dufour and Renault (1998, Econometrica). We study in detail the case of VAR models and we propose linear methods based on running vector autoregressions at different horizons. While the hypotheses considered are nonlinear, the proposed methods only require linear regression techniques as well as standard Gaussian asymptotic distributional theory. Bootstrap procedures are also considered. For the case of integrated processes, we propose extended regression methods that avoid nonstandard asymptotics. The methods are applied to a VAR model of the U.S. economy.
Resumo:
Le prix efficient est latent, il est contaminé par les frictions microstructurelles ou bruit. On explore la mesure et la prévision de la volatilité fondamentale en utilisant les données à haute fréquence. Dans le premier papier, en maintenant le cadre standard du modèle additif du bruit et le prix efficient, on montre qu’en utilisant le volume de transaction, les volumes d’achat et de vente, l’indicateur de la direction de transaction et la différence entre prix d’achat et prix de vente pour absorber le bruit, on améliore la précision des estimateurs de volatilité. Si le bruit n’est que partiellement absorbé, le bruit résiduel est plus proche d’un bruit blanc que le bruit original, ce qui diminue la misspécification des caractéristiques du bruit. Dans le deuxième papier, on part d’un fait empirique qu’on modélise par une forme linéaire de la variance du bruit microstructure en la volatilité fondamentale. Grâce à la représentation de la classe générale des modèles de volatilité stochastique, on explore la performance de prévision de différentes mesures de volatilité sous les hypothèses de notre modèle. Dans le troisième papier, on dérive de nouvelles mesures réalizées en utilisant les prix et les volumes d’achat et de vente. Comme alternative au modèle additif standard pour les prix contaminés avec le bruit microstructure, on fait des hypothèses sur la distribution du prix sans frictions qui est supposé borné par les prix de vente et d’achat.
Resumo:
Cette thèse présente des méthodes de traitement de données de comptage en particulier et des données discrètes en général. Il s'inscrit dans le cadre d'un projet stratégique du CRNSG, nommé CC-Bio, dont l'objectif est d'évaluer l'impact des changements climatiques sur la répartition des espèces animales et végétales. Après une brève introduction aux notions de biogéographie et aux modèles linéaires mixtes généralisés aux chapitres 1 et 2 respectivement, ma thèse s'articulera autour de trois idées majeures. Premièrement, nous introduisons au chapitre 3 une nouvelle forme de distribution dont les composantes ont pour distributions marginales des lois de Poisson ou des lois de Skellam. Cette nouvelle spécification permet d'incorporer de l'information pertinente sur la nature des corrélations entre toutes les composantes. De plus, nous présentons certaines propriétés de ladite distribution. Contrairement à la distribution multidimensionnelle de Poisson qu'elle généralise, celle-ci permet de traiter les variables avec des corrélations positives et/ou négatives. Une simulation permet d'illustrer les méthodes d'estimation dans le cas bidimensionnel. Les résultats obtenus par les méthodes bayésiennes par les chaînes de Markov par Monte Carlo (CMMC) indiquent un biais relatif assez faible de moins de 5% pour les coefficients de régression des moyennes contrairement à ceux du terme de covariance qui semblent un peu plus volatils. Deuxièmement, le chapitre 4 présente une extension de la régression multidimensionnelle de Poisson avec des effets aléatoires ayant une densité gamma. En effet, conscients du fait que les données d'abondance des espèces présentent une forte dispersion, ce qui rendrait fallacieux les estimateurs et écarts types obtenus, nous privilégions une approche basée sur l'intégration par Monte Carlo grâce à l'échantillonnage préférentiel. L'approche demeure la même qu'au chapitre précédent, c'est-à-dire que l'idée est de simuler des variables latentes indépendantes et de se retrouver dans le cadre d'un modèle linéaire mixte généralisé (GLMM) conventionnel avec des effets aléatoires de densité gamma. Même si l'hypothèse d'une connaissance a priori des paramètres de dispersion semble trop forte, une analyse de sensibilité basée sur la qualité de l'ajustement permet de démontrer la robustesse de notre méthode. Troisièmement, dans le dernier chapitre, nous nous intéressons à la définition et à la construction d'une mesure de concordance donc de corrélation pour les données augmentées en zéro par la modélisation de copules gaussiennes. Contrairement au tau de Kendall dont les valeurs se situent dans un intervalle dont les bornes varient selon la fréquence d'observations d'égalité entre les paires, cette mesure a pour avantage de prendre ses valeurs sur (-1;1). Initialement introduite pour modéliser les corrélations entre des variables continues, son extension au cas discret implique certaines restrictions. En effet, la nouvelle mesure pourrait être interprétée comme la corrélation entre les variables aléatoires continues dont la discrétisation constitue nos observations discrètes non négatives. Deux méthodes d'estimation des modèles augmentés en zéro seront présentées dans les contextes fréquentiste et bayésien basées respectivement sur le maximum de vraisemblance et l'intégration de Gauss-Hermite. Enfin, une étude de simulation permet de montrer la robustesse et les limites de notre approche.
Resumo:
L'objectif principal de ce travail est d’étudier en profondeur certaines techniques biostatistiques avancées en recherche évaluative en chirurgie cardiaque adulte. Les études ont été conçues pour intégrer les concepts d'analyse de survie, analyse de régression avec “propensity score”, et analyse de coûts. Le premier manuscrit évalue la survie après la réparation chirurgicale de la dissection aigüe de l’aorte ascendante. Les analyses statistiques utilisées comprennent : analyses de survie avec régression paramétrique des phases de risque et d'autres méthodes paramétriques (exponentielle, Weibull), semi-paramétriques (Cox) ou non-paramétriques (Kaplan-Meier) ; survie comparée à une cohorte appariée pour l’âge, le sexe et la race utilisant des tables de statistiques de survie gouvernementales ; modèles de régression avec “bootstrapping” et “multinomial logit model”. L'étude a démontrée que la survie s'est améliorée sur 25 ans en lien avec des changements dans les techniques chirurgicales et d’imagerie diagnostique. Le second manuscrit est axé sur les résultats des pontages coronariens isolés chez des patients ayant des antécédents d'intervention coronarienne percutanée. Les analyses statistiques utilisées comprennent : modèles de régression avec “propensity score” ; algorithme complexe d'appariement (1:3) ; analyses statistiques appropriées pour les groupes appariés (différences standardisées, “generalized estimating equations”, modèle de Cox stratifié). L'étude a démontrée que l’intervention coronarienne percutanée subie 14 jours ou plus avant la chirurgie de pontages coronariens n'est pas associée à des résultats négatifs à court ou long terme. Le troisième manuscrit évalue les conséquences financières et les changements démographiques survenant pour un centre hospitalier universitaire suite à la mise en place d'un programme de chirurgie cardiaque satellite. Les analyses statistiques utilisées comprennent : modèles de régression multivariée “two-way” ANOVA (logistique, linéaire ou ordinale) ; “propensity score” ; analyses de coûts avec modèles paramétriques Log-Normal. Des modèles d’analyse de « survie » ont également été explorés, utilisant les «coûts» au lieu du « temps » comme variable dépendante, et ont menés à des conclusions similaires. L'étude a démontrée que, après la mise en place du programme satellite, moins de patients de faible complexité étaient référés de la région du programme satellite au centre hospitalier universitaire, avec une augmentation de la charge de travail infirmier et des coûts.
Resumo:
Le logiciel de simulation des données et d'analyse est Conquest V.3