972 resultados para Regression method
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
Macroporosity is often used in the determination of soil compaction. Reduced macroporosity can lead to poor drainage, low root aeration and soil degradation. The aim of this study was to develop and test different models to estimate macro and microporosity efficiently, using multiple regression. Ten soils were selected within a large range of textures: sand (Sa) 0.07-0.84; silt 0.03-0.24; clay 0.13-0.78 kg kg-1 and subjected to three compaction levels (three bulk densities, BD). Two models with similar accuracy were selected, with a mean error of about 0.02 m³ m-3 (2 %). The model y = a + b.BD + c.Sa, named model 2, was selected for its simplicity to estimate Macro (Ma), Micro (Mi) or total porosity (TP): Ma = 0.693 - 0.465 BD + 0.212 Sa; Mi = 0.337 + 0.120 BD - 0.294 Sa; TP = 1.030 - 0.345 BD 0.082 Sa; porosity values were expressed in m³ m-3; BD in kg dm-3; and Sa in kg kg-1. The model was tested with 76 datum set of several other authors. An error of about 0.04 m³ m-3 (4 %) was observed. Simulations of variations in BD as a function of Sa are presented for Ma = 0 and Ma = 0.10 (10 %). The macroporosity equation was remodeled to obtain other compaction indexes: a) to simulate maximum bulk density (MBD) as a function of Sa (Equation 11), in agreement with literature data; b) to simulate relative bulk density (RBD) as a function of BD and Sa (Equation 13); c) another model to simulate RBD as a function of Ma and Sa (Equation 16), confirming the independence of this variable in relation to Sa for a fixed value of macroporosity and, also, proving the hypothesis of Hakansson & Lipiec that RBD = 0.87 corresponds approximately to 10 % macroporosity (Ma = 0.10 m³ m-3).
Resumo:
OBJECTIVE: To describe a method to obtain a profile of the duration and intensity (speed) of walking periods over 24 hours in women under free-living conditions. DESIGN: A new method based on accelerometry was designed for analyzing walking activity. In order to take into account inter-individual variability of acceleration, an individual calibration process was used. Different experiments were performed to highlight the variability of acceleration vs walking speed relationship, to analyze the speed prediction accuracy of the method, and to test the assessment of walking distance and duration over 24-h. SUBJECTS: Twenty-eight women were studied (mean+/-s.d.) age: 39.3+/-8.9 y; body mass: 79.7+/-11.1 kg; body height: 162.9+/-5.4 cm; and body mass index (BMI) 30.0+/-3.8 kg/m(2). RESULTS: Accelerometer output was significantly correlated with speed during treadmill walking (r=0.95, P<0.01), and short unconstrained walks (r=0.86, P<0.01), although with a large inter-individual variation of the regression parameters. By using individual calibration, it was possible to predict walking speed on a standard urban circuit (predicted vs measured r=0.93, P<0.01, s.e.e.=0.51 km/h). In the free-living experiment, women spent on average 79.9+/-36.0 (range: 31.7-168.2) min/day in displacement activities, from which discontinuous short walking activities represented about 2/3 and continuous ones 1/3. Total walking distance averaged 2.1+/-1.2 (range: 0.4-4.7) km/day. It was performed at an average speed of 5.0+/-0.5 (range: 4.1-6.0) km/h. CONCLUSION: An accelerometer measuring the anteroposterior acceleration of the body can estimate walking speed together with the pattern, intensity and duration of daily walking activity.
Resumo:
Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
This report provides techniques and procedures for estimating the probable magnitude and frequency of floods at ungaged sites on Iowa streams. Physiographic characteristics were used to define the boundaries of five hydrologic regions. Regional regression equations that relate the size of the drainage area to flood magnitude are defined for estimating peak discharges having specified recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Regional regression equations are applicable to sites on streams that have drainage areas ranging from 0.04 to 5,150 square miles provided that the streams are not affected significantly by regulation upstream from the sites and that the drainage areas upstream from the sites are not mostly urban areas. Flood-frequency characteristics for the mainstems of selected rivers are presented in graphs as a function of drainage area.
Resumo:
This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.
Resumo:
Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.
Resumo:
Peer-reviewed
Resumo:
Kolmen eri hitsausliitoksen väsymisikä arvio on analysoitu monimuuttuja regressio analyysin avulla. Regression perustana on laaja S-N tietokanta joka on kerätty kirjallisuudesta. Tarkastellut liitokset ovat tasalevy liitos, krusiformi liitos ja pitkittäisripa levyssä. Muuttujina ovat jännitysvaihtelu, kuormitetun levyn paksuus ja kuormitus tapa. Paksuus effekti on käsitelty uudelleen kaikkia kolmea liitosta ajatellen. Uudelleen käsittelyn avulla on varmistettu paksuus effektin olemassa olo ennen monimuuttuja regressioon siirtymistä. Lineaariset väsymisikä yhtalöt on ajettu kolmelle hitsausliitokselle ottaen huomioon kuormitetun levyn paksuus sekä kuormitus tapa. Väsymisikä yhtalöitä on verrattu ja keskusteltu testitulosten valossa, jotka on kerätty kirjallisuudesta. Neljä tutkimustaon tehty kerättyjen väsymistestien joukosta ja erilaisia väsymisikä arvio metodeja on käytetty väsymisiän arviointiin. Tuloksia on tarkasteltu ja niistä keskusteltu oikeiden testien valossa. Tutkimuksissa on katsottu 2mm ja 6mm symmetristäpitkittäisripaa levyssä, 12.7mm epäsymmetristä pitkittäisripaa, 38mm symmetristä pitkittäisripaa vääntökuormituksessa ja 25mm/38mm kuorman kantavaa krusiformi liitosta vääntökuormituksessa. Mallinnus on tehty niin lähelle testi liitosta kuin mahdollista. Väsymisikä arviointi metodit sisältävät hot-spot metodin jossa hot-spot jännitys on laskettu kahta lineaarista ja epälineaarista ekstrapolointiakäyttäen sekä paksuuden läpi integrointia käyttäen. Lovijännitys ja murtumismekaniikka metodeja on käytetty krusiformi liitosta laskiessa.
Resumo:
Personal results are presented to illustrate the development of immunoscintigraphy for the detection of cancer over the last 12 years, from the early experimental results in nude mice grafted with human colon carcinoma to the most modern form of immunoscintigraphy applied to patients, using I123 labeled Fab fragments from monoclonal anti-CEA antibodies detected by single photon emission computerized tomography (SPECT). The first generation of immunoscintigraphy used I131 labeled, immunoadsorbent purified, polyclonal anti-CEA antibodies and planar scintigraphy, as the detection system. The second generation used I131 labeled monoclonal anti-CEA antibodies and SPECT, while the third generation employed I123 labeled fragments of monoclonal antibodies and SPECT. The improvement in the precision of tumor images with the most recent forms of immunoscintigraphy is obvious. However, we think the usefulness of immunoscintigraphy for routine cancer management has not yet been entirely demonstrated. Further prospective trials are still necessary to determine the precise clinical role of immunoscintigraphy. A case report is presented on a patient with two liver metastases from a sigmoid carcinoma, who received through the hepatic artery a therapeutic dose (100 mCi) of I131 coupled to 40 mg of a mixture of two high affinity anti-CEA monoclonal antibodies. Excellent localisation in the metastases of the I131 labeled antibodies was demonstrated by SPECT and the treatment was well tolerated. The irradiation dose to the tumor, however, was too low at 4300 rads (with 1075 rads to the normal liver and 88 rads to the bone marrow), and no evidence of tumor regression was obtained. Different approaches for increasing the irradiation dose delivered to the tumor by the antibodies are considered.
Resumo:
PURPOSE: According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. METHOD: About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). RESULTS: The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. CONCLUSION: Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information.
Resumo:
The functional method is a new test theory using a new scoring method that assumes complexity in test structure, and thus takes into account every correlation between factors and items. The main specificity of the functional method is to model test scores by multiple regression instead of estimating them by using simplistic sums of points. In order to proceed, the functional method requires the creation of hyperspherical measurement space, in which item responses are expressed by their correlation with orthogonal factors. This method has three main qualities. First, measures are expressed in the absolute metric of correlations; therefore, items, scales and persons are expressed in the same measurement space using the same single metric. Second, factors are systematically orthogonal and without errors, which is optimal in order to predict other outcomes. Such predictions can be performed to estimate how one would answer to other tests, or even to model one's response strategy if it was perfectly coherent. Third, the functional method provides measures of individuals' response validity (i.e., control indices). Herein, we propose a standard procedure in order to identify whether test results are interpretable and to exclude invalid results caused by various response biases based on control indices.
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
The quantitative structure property relationship (QSPR) for the boiling point (Tb) of polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs) was investigated. The molecular distance-edge vector (MDEV) index was used as the structural descriptor. The quantitative relationship between the MDEV index and Tb was modeled by using multivariate linear regression (MLR) and artificial neural network (ANN), respectively. Leave-one-out cross validation and external validation were carried out to assess the prediction performance of the models developed. For the MLR method, the prediction root mean square relative error (RMSRE) of leave-one-out cross validation and external validation was 1.77 and 1.23, respectively. For the ANN method, the prediction RMSRE of leave-one-out cross validation and external validation was 1.65 and 1.16, respectively. A quantitative relationship between the MDEV index and Tb of PCDD/Fs was demonstrated. Both MLR and ANN are practicable for modeling this relationship. The MLR model and ANN model developed can be used to predict the Tb of PCDD/Fs. Thus, the Tb of each PCDD/F was predicted by the developed models.