910 resultados para weighted linear regression
Resumo:
ECG criteria for left ventricular hypertrophy (LVH) have been almost exclusively elaborated and calibrated in white populations. Because several interethnic differences in ECG characteristics have been found, the applicability of these criteria to African individuals remains to be demonstrated. We therefore investigated the performance of classic ECG criteria for LVH detection in an African population. Digitized 12-lead ECG tracings were obtained from 334 African individuals randomly selected from the general population of the Republic of Seychelles (Indian Ocean). Left ventricular mass was calculated with M-mode echocardiography and indexed to body height. LVH was defined by taking the 95th percentile of body height-indexed LVM values in a reference subgroup. In the entire study sample, 16 men and 15 women (prevalence 9.3%) were finally declared to have LVH, of whom 9 were of the reference subgroup. Sensitivity, specificity, accuracy, and positive and negative predictive values for LVH were calculated for 9 classic ECG criteria, and receiver operating characteristic curves were computed. We also generated a new composite time-voltage criterion with stepwise multiple linear regression: weighted time-voltage criterion=(0.2366R(aVL)+0.0551R(V5)+0.0785S(V3)+ 0.2993T(V1))xQRS duration. The Sokolow-Lyon criterion reached the highest sensitivity (61%) and the R(aVL) voltage criterion reached the highest specificity (97%) when evaluated at their traditional partition value. However, at a fixed specificity of 95%, the sensitivity of these 10 criteria ranged from 16% to 32%. Best accuracy was obtained with the R(aVL) voltage criterion and the new composite time-voltage criterion (89% for both). Positive and negative predictive values varied considerably depending on the concomitant presence of 3 clinical risk factors for LVH (hypertension, age >/=50 years, overweight). Median positive and negative predictive values of the 10 ECG criteria were 15% and 95%, respectively, for subjects with none or 1 of these risk factors compared with 63% and 76% for subjects with all of them. In conclusion, the performance of classic ECG criteria for LVH detection was largely disparate and appeared to be lower in this population of East African origin than in white subjects. A newly generated composite time-voltage criterion might provide improved performance. The predictive value of ECG criteria for LVH was considerably enhanced with the integration of information on concomitant clinical risk factors for LVH.
Resumo:
In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).
Resumo:
We introduce several exact nonparametric tests for finite sample multivariatelinear regressions, and compare their powers. This fills an important gap inthe literature where the only known nonparametric tests are either asymptotic,or assume one covariate only.
Resumo:
Random coefficient regression models have been applied in differentfields and they constitute a unifying setup for many statisticalproblems. The nonparametric study of this model started with Beranand Hall (1992) and it has become a fruitful framework. In thispaper we propose and study statistics for testing a basic hypothesisconcerning this model: the constancy of coefficients. The asymptoticbehavior of the statistics is investigated and bootstrapapproximations are used in order to determine the critical values ofthe test statistics. A simulation study illustrates the performanceof the proposals.
Resumo:
We present an exact test for whether two random variables that have known bounds on their support are negatively correlated. The alternative hypothesis is that they are not negatively correlated. No assumptions are made on the underlying distributions. We show by example that the Spearman rank correlation test as the competing exact test of correlation in nonparametric settings rests on an additional assumption on the data generating process without which it is not valid as a test for correlation.We then show how to test for the significance of the slope in a linear regression analysis that invovles a single independent variable and where outcomes of the dependent variable belong to a known bounded set.
Resumo:
The aim of this paper is twofold. First, we study the determinants of economic growth among a wide set of potential variables for the Spanish provinces (NUTS3). Among others, we include various types of private, public and human capital in the group of growth factors. Also,we analyse whether Spanish provinces have converged in economic terms in recent decades. Thesecond objective is to obtain cross-section and panel data parameter estimates that are robustto model speci¯cation. For this purpose, we use a Bayesian Model Averaging (BMA) approach.Bayesian methodology constructs parameter estimates as a weighted average of linear regression estimates for every possible combination of included variables. The weight of each regression estimate is given by the posterior probability of each model.
Resumo:
In the areas where irrigated rice is grown in the south of Brazil, few studies have been carried out to investigate the spatial variability structure of soil properties and to establish new forms of soil management as well as determine soil corrective and fertilizer applications. In this sense, this study had the objective of evaluating the spatial variability of chemical, physical and biological soil properties in a lowland area under irrigated rice cultivation in the conventional till system. For this purpose, a 10 x 10 m grid of 100 points was established, in an experimental field of the Embrapa Clima Temperado, in the County of Capão do Leão, State of Rio Grande do Sul. The spatial variability structure was evaluated by geostatistical tools and the number of subsamples required to represent each soil property in future studies was calculated using classical statistics. Results showed that the spatial variability structure of sand, silt, SMP index, cation exchange capacity (pH 7.0), Al3+ and total N properties could be detected by geostatistical analysis. A pure nugget effect was observed for the nutrients K, S and B, as well as macroporosity, mean weighted diameter of aggregates, and soil water storage. The cross validation procedure, based on linear regression and the determination coefficient, was more efficient to evaluate the quality of the adjusted mathematical model than the degree of spatial dependence. It was also concluded that the combination of classical with geostatistics can in many cases simplify the soil sampling process without losing information quality.
Resumo:
The aim of this paper is twofold. First, we study the determinants of economic growth among a wide set of potential variables for the Spanish provinces (NUTS3). Among others, we include various types of private, public and human capital in the group of growth factors. Also,we analyse whether Spanish provinces have converged in economic terms in recent decades. Thesecond objective is to obtain cross-section and panel data parameter estimates that are robustto model speci¯cation. For this purpose, we use a Bayesian Model Averaging (BMA) approach.Bayesian methodology constructs parameter estimates as a weighted average of linear regression estimates for every possible combination of included variables. The weight of each regression estimate is given by the posterior probability of each model.
Resumo:
The practice of land leveling alters the soil surface to create a uniform slope to improve land conditions for the application of all agricultural practices. The aims of this study were to evaluate the impacts of land leveling through the magnitudes, variances and spatial distributions of selected soil physical properties of a lowland area in the State of Rio Grande do Sul, Brazil; the relationships between the magnitude of cuts and/or fills and soil physical properties after the leveling process; and evaluation of the effect of leveling on the spatial distribution of the top of the B horizon in relation to the soil surface. In the 0-0.20 m layer, a 100-point geo-referenced grid covering two taxonomic soil classes was used in assessment of the following soil properties: soil particle density (Pd) and bulk density (Bd); total porosity (Tp), macroporosity (Macro) and microporosity (Micro); available water capacity (AWC); sand, silt, clay, and dispersed clay in water (Disp clay) contents; electrical conductivity (EC); and weighted average diameter of aggregates (WAD). Soil depth to the top of the B horizon was also measured before leveling. The overall effect of leveling on selected soil physical properties was evaluated by paired "t" tests. The effect on the variability of each property was evaluated through the homogeneity of variance test. The thematic maps constructed by kriging or by the inverse of the square of the distances were visually analyzed to evaluate the effect of leveling on the spatial distribution of the properties and of the top of the B horizon in relation to the soil surface. Linear regression models were fitted with the aim of evaluating the relationship between soil properties and the magnitude of cuts and fills. Leveling altered the mean value of several soil properties and the agronomic effect was negative. The mean values of Bd and Disp clay increased and Tp, Macro and Micro, WAD, AWC and EC decreased. Spatial distributions of all soil physical properties changed as a result of leveling and its effect on all soil physical properties occurred in the whole area and not specifically in the cutting or filling areas. In future designs of leveling, we recommend overlaying a cut/fill map on the map of soil depth to the top of the B horizon in order to minimize areas with shallow surface soil after leveling.
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
Background: Epidemiological evidence of the effects of long-term exposure to air pollu tion on the chronic processes of athero genesis is limited. Objective: We investigated the association of long-term exposure to traffic-related air pollu tion with subclinical atherosclerosis, measured by carotid intima media thickness (IMT) and ankle–brachial index (ABI). Methods: We performed a cross-sectional analysis using data collected during the reexamination (2007–2010) of 2,780 participants in the REGICOR (Registre Gironí del Cor: the Gerona Heart Register) study, a population-based prospective cohort in Girona, Spain. Long-term exposure across residences was calculated as the last 10 years’ time-weighted average of residential nitrogen dioxide (NO2) estimates (based on a local-scale land-use regression model), traffic intensity in the nearest street, and traffic intensity in a 100 m buffer. Associations with IMT and ABI were estimated using linear regression and multinomial logistic regression, respectively, controlling for sex, age, smoking status, education, marital status, and several other potential confounders or intermediates. Results: Exposure contrasts between the 5th and 95th percentiles for NO2 (25 μg/m), traffic intensity in the nearest street (15,000 vehicles/day), and traffic load within 100 m (7,200,000 vehicle-m/day) were associated with differences of 0.56% (95% CI: –1.5, 2.6%), 2.32% (95% CI: 0.48, 4.17%), and 1.91% (95% CI: –0.24, 4.06) percent difference in IMT, respectively. Exposures were positively associated with an ABI of > 1.3, but not an ABI of < 0.9. Stronger associations were observed among those with a high level of education and in men ≥ 60 years of age. Conclusions: Long-term traffic-related exposures were associated with subclinical markers of atherosclerosis. Prospective studies are needed to confirm associations and further examine differences among population subgroups.key words: ankle–brachial index, average daily traffic, cardiovascular disease, exposure assessment, exposure to tailpipe emissions, intima media thickness, land use regression model, Mediterranean diet, nitrogen dioxide
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
In the literature on tests of normality, much concern has been expressed over the problems associated with residual-based procedures. Indeed, the specialized tables of critical points which are needed to perform the tests have been derived for the location-scale model; hence reliance on available significance points in the context of regression models may cause size distortions. We propose a general solution to the problem of controlling the size normality tests for the disturbances of standard linear regression, which is based on using the technique of Monte Carlo tests.
Resumo:
In this paper, we propose several finite-sample specification tests for multivariate linear regressions (MLR) with applications to asset pricing models. We focus on departures from the assumption of i.i.d. errors assumption, at univariate and multivariate levels, with Gaussian and non-Gaussian (including Student t) errors. The univariate tests studied extend existing exact procedures by allowing for unspecified parameters in the error distributions (e.g., the degrees of freedom in the case of the Student t distribution). The multivariate tests are based on properly standardized multivariate residuals to ensure invariance to MLR coefficients and error covariances. We consider tests for serial correlation, tests for multivariate GARCH and sign-type tests against general dependencies and asymmetries. The procedures proposed provide exact versions of those applied in Shanken (1990) which consist in combining univariate specification tests. Specifically, we combine tests across equations using the MC test procedure to avoid Bonferroni-type bounds. Since non-Gaussian based tests are not pivotal, we apply the “maximized MC” (MMC) test method [Dufour (2002)], where the MC p-value for the tested hypothesis (which depends on nuisance parameters) is maximized (with respect to these nuisance parameters) to control the test’s significance level. The tests proposed are applied to an asset pricing model with observable risk-free rates, using monthly returns on New York Stock Exchange (NYSE) portfolios over five-year subperiods from 1926-1995. Our empirical results reveal the following. Whereas univariate exact tests indicate significant serial correlation, asymmetries and GARCH in some equations, such effects are much less prevalent once error cross-equation covariances are accounted for. In addition, significant departures from the i.i.d. hypothesis are less evident once we allow for non-Gaussian errors.
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features