824 resultados para mathematical regression
Resumo:
Two field experiments were conducted to evaluate the effects of multispecies weed competition on wheat grain yield and to determine their economic threshold on the crop. The experiments were conducted in 2002, on two sites in Iran: at the Agricultural Research Station on Ferdowsi University of Mashhad (E1) and on the fields of Shirvan's Agricultural College (E2). A 15 x 50 m area of a 15 ha wheat field in E1 and a 15 x 50 m area of a 28 ha wheat field in E2 were selected as experimental sites. These areas were managed like other parts of the fields, except for the use of herbicides. At the beginning of the shooting stage, 30 points were randomly selected by dropping a 50 x 50 cm square marker on each site. The weeds present in E1 were: Avena ludoviciana, Chenopodium album, Solanum nigrum, Stellaria holostea, Convolvulus spp., Fumaria spp., Sonchus spp., and Polygonum aviculare. In E2 the weeds were A. ludoviciana, Erysimum sp., P. aviculare, Rapistrum rugosum, C. album, Salsola kali, and Sonchus sp. The data obtained within the sampled squares were submitted to regression equations and weeds densities were calculated in terms of TCL (Total Competitive Load). The regression analysis model indicated that only A. ludoviciana, Convolvulus spp. and C. album, in E1; and A. ludoviciana, S. kali, and R. rugosum, in E2 had a significant effect on the wheat yield reduction. Weed economic thresholds were 5.23 TCL in E1 and 6.16 TCL in E2; which were equivalent to 5 plants m-2 of A. ludoviciana or 12 plants m-2 of Convolvulus spp. or 19 plants m-2 of C. album in E1; and 6 plants m-2 A. ludoviciana, 13 plants m-2 S. kali and 27 plants m-2 R. rugosum in E2. Simulations of economic weed thresholds using several wheat grain prices and weed control costs allowed a better comparison of the experiments, suggesting that a more competitive crop at location E1 than at E2 was the cause of a lower weed competitive ability at the first location.
Resumo:
Relationships between surface sediment diatom assemblages and lake trophic status were studied in 50 Canadian Precambrian Shield lakes in the Muskoka-Haliburton and southern Ontario regions. The purpose of this study was to develop mathematical regression models to infer lake trophic status from diatom assemblage data. To achieve this goal, however, additional investigations dealing with the evaluation of lake trophic status and the autecological features of key diatom species were carried out. Because a unifying index and classification for lake trophic status was not available, a new multiple index was developed in this study, by the computation of the physical, chemical and biological data from 85 south Ontario lakes. By using the new trophic parameter, the lake trophic level (TL) was determined: TL = 1.37 In[1 +(TP x Chl-a / SD)], where, TP=total phosphorus, Chl-a=chlorophyll-a and SD=Secchi depth. The boundaries between 7 lake trophic categories (Ultra-oligotrophic lakes: 0-0.24; Oligotrophic lakes: 0.241-1.8; Oligomesotrophic lakes: 1.813.0; Mesotrophic lakes: 3.01-4.20; Mesoeutrophic lakes: 4.21-5.4; Eutrophic lakes: 5.41-10 and Hyper-eutrophic lakes: above 10) were established. The new trophic parameter was more convenient for management of water quality, communication to the public and comparison with other lake trophic status indices than many of the previously published indices because the TL index attempts to Increase understanding of the characteristics of lakes and their comprehensive trophic states. It is more reasonable and clear for a unifying determination of true trophic states of lakes. Diatom specIes autecology analysis was central to this thesis. However, the autecological relationship of diatom species and lake trophic status had not previously been well documented. Based on the investigation of the diatom composition and variety of species abundance in 30 study lakes, the distribution optima of diatom species were determined. These determinations were based on a quantitative method called "weighted average" (Charles 1985). On this basis, the diatom species were classified into five trophic categories (oligotrophic, oligomesotrophic, mesotrophic, mesoeutrophic and eutrophic species groups). The resulting diatom trophic status autecological features were used in the regressIon analysis between diatom assemblages and lake trophic status. When the TL trophic level values of the 30 lakes were regressed against their fi ve corresponding diatom trophic groups, the two mathematical equations for expressing the assumed linear relationship between the diatom assemblages composition were determined by (1) uSIng a single regression technique: Trophic level of lake (TL) = 2.643 - 7.575 log (Index D) (r = 0.88 r2 = 0.77 P = 0.0001; n = 30) Where, Index D = (0% + OM% + M%)/(E% + ME% + M%); 4 (2) uSIng a' multiple regressIon technique: TL=4.285-0.076 0%- 0.055 OM% - 0.026 M% + 0.033 ME% + 0.065 E% (r=0.89, r2=0.792, P=O.OOOl, n=30) There was a significant correlation between measured and diatom inferred trophic levels both by single and multiple regressIon methods (P < 0.0001, n=20), when both models were applied to another 20 test lakes. Their correlation coefficients (r2 ) were also statistically significant (r2 >0.68, n=20). As such, the two transfer function models between diatoms and lake trophic status were validated. The two models obtained as noted above were developed using one group of lakes and then tested using an entirely different group of lakes. This study indicated that diatom assemblages are sensitive to lake trophic status. As indicators of lake trophic status, diatoms are especially useful in situations where no local trophic information is available and in studies of the paleotrophic history of lakes. Diatom autecological information was used to develop a theory assessing water quality and lake trophic status.
Resumo:
Objectives. This paper seeks to assess the effect on statistical power of regression model misspecification in a variety of situations. ^ Methods and results. The effect of misspecification in regression can be approximated by evaluating the correlation between the correct specification and the misspecification of the outcome variable (Harris 2010).In this paper, three misspecified models (linear, categorical and fractional polynomial) were considered. In the first section, the mathematical method of calculating the correlation between correct and misspecified models with simple mathematical forms was derived and demonstrated. In the second section, data from the National Health and Nutrition Examination Survey (NHANES 2007-2008) were used to examine such correlations. Our study shows that comparing to linear or categorical models, the fractional polynomial models, with the higher correlations, provided a better approximation of the true relationship, which was illustrated by LOESS regression. In the third section, we present the results of simulation studies that demonstrate overall misspecification in regression can produce marked decreases in power with small sample sizes. However, the categorical model had greatest power, ranging from 0.877 to 0.936 depending on sample size and outcome variable used. The power of fractional polynomial model was close to that of linear model, which ranged from 0.69 to 0.83, and appeared to be affected by the increased degrees of freedom of this model.^ Conclusion. Correlations between alternative model specifications can be used to provide a good approximation of the effect on statistical power of misspecification when the sample size is large. When model specifications have known simple mathematical forms, such correlations can be calculated mathematically. Actual public health data from NHANES 2007-2008 were used as examples to demonstrate the situations with unknown or complex correct model specification. Simulation of power for misspecified models confirmed the results based on correlation methods but also illustrated the effect of model degrees of freedom on power.^
Resumo:
The aim of this research work was primarily to examine the relevance of patient parameters, ward structures, procedures and practices, in respect of the potential hazards of wound cross-infection and nasal colonisation with multiple resistant strains of Staphylococcus aureus, which it is thought might provide a useful indication of a patient's general susceptibility to wound infection. Information from a large cross-sectional survey involving 12,000 patients from some 41 hospitals and 375 wards was collected over a five-year period from 1967-72, and its validity checked before any subsequent analysis was carried out. Many environmental factors and procedures which had previously been thought (but never conclusively proved) to have an influence on wound infection or nasal colonisation rates, were assessed, and subsequently dismissed as not being significant, provided that the standard of the current range of practices and procedures is maintained and not allowed to deteriorate. Retrospective analysis revealed that the probability of wound infection was influenced by the patient's age, duration of pre-operative hospitalisation, sex, type of wound, presence and type of drain, number of patients in ward, and other special risk factors, whilst nasal colonisation was found to be influenced by the patient's age, total duration of hospitalisation, sex, antibiotics, proportion of occupied beds in the ward, average distance between bed centres and special risk factors. A multi-variate regression analysis technique was used to develop statistical models, consisting of variable patient and environmental factors which were found to have a significant influence on the risks pertaining to wound infection and nasal colonisation. A relationship between wound infection and nasal colonisation was then established and this led to the development of a more advanced model for predicting wound infections, taking advantage of the additional knowledge of the patient's state of nasal colonisation prior to operation.
Resumo:
Adaptability and invisibility are hallmarks of modern terrorism, and keeping pace with its dynamic nature presents a serious challenge for societies throughout the world. Innovations in computer science have incorporated applied mathematics to develop a wide array of predictive models to support the variety of approaches to counterterrorism. Predictive models are usually designed to forecast the location of attacks. Although this may protect individual structures or locations, it does not reduce the threat—it merely changes the target. While predictive models dedicated to events or social relationships receive much attention where the mathematical and social science communities intersect, models dedicated to terrorist locations such as safe-houses (rather than their targets or training sites) are rare and possibly nonexistent. At the time of this research, there were no publically available models designed to predict locations where violent extremists are likely to reside. This research uses France as a case study to present a complex systems model that incorporates multiple quantitative, qualitative and geospatial variables that differ in terms of scale, weight, and type. Though many of these variables are recognized by specialists in security studies, there remains controversy with respect to their relative importance, degree of interaction, and interdependence. Additionally, some of the variables proposed in this research are not generally recognized as drivers, yet they warrant examination based on their potential role within a complex system. This research tested multiple regression models and determined that geographically-weighted regression analysis produced the most accurate result to accommodate non-stationary coefficient behavior, demonstrating that geographic variables are critical to understanding and predicting the phenomenon of terrorism. This dissertation presents a flexible prototypical model that can be refined and applied to other regions to inform stakeholders such as policy-makers and law enforcement in their efforts to improve national security and enhance quality-of-life.
Resumo:
In this paper, we consider testing for additivity in a class of nonparametric stochastic regression models. Two test statistics are constructed and their asymptotic distributions are established. We also conduct a small sample study for one of the test statistics through a simulated example. (C) 2002 Elsevier Science (USA).
Resumo:
We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright (C) 2003 John Wiley Sons, Ltd.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
In CoDaWork’05, we presented an application of discriminant function analysis (DFA) to 4 differentcompositional datasets and modelled the first canonical variable using a segmented regression modelsolely based on an observation about the scatter plots. In this paper, multiple linear regressions areapplied to different datasets to confirm the validity of our proposed model. In addition to dating theunknown tephras by calibration as discussed previously, another method of mapping the unknown tephrasinto samples of the reference set or missing samples in between consecutive reference samples isproposed. The application of these methodologies is demonstrated with both simulated and real datasets.This new proposed methodology provides an alternative, more acceptable approach for geologists as theirfocus is on mapping the unknown tephra with relevant eruptive events rather than estimating the age ofunknown tephra.Kew words: Tephrochronology; Segmented regression
Resumo:
Spatial data analysis mapping and visualization is of great importance in various fields: environment, pollution, natural hazards and risks, epidemiology, spatial econometrics, etc. A basic task of spatial mapping is to make predictions based on some empirical data (measurements). A number of state-of-the-art methods can be used for the task: deterministic interpolations, methods of geostatistics: the family of kriging estimators (Deutsch and Journel, 1997), machine learning algorithms such as artificial neural networks (ANN) of different architectures, hybrid ANN-geostatistics models (Kanevski and Maignan, 2004; Kanevski et al., 1996), etc. All the methods mentioned above can be used for solving the problem of spatial data mapping. Environmental empirical data are always contaminated/corrupted by noise, and often with noise of unknown nature. That's one of the reasons why deterministic models can be inconsistent, since they treat the measurements as values of some unknown function that should be interpolated. Kriging estimators treat the measurements as the realization of some spatial randomn process. To obtain the estimation with kriging one has to model the spatial structure of the data: spatial correlation function or (semi-)variogram. This task can be complicated if there is not sufficient number of measurements and variogram is sensitive to outliers and extremes. ANN is a powerful tool, but it also suffers from the number of reasons. of a special type ? multiplayer perceptrons ? are often used as a detrending tool in hybrid (ANN+geostatistics) models (Kanevski and Maignank, 2004). Therefore, development and adaptation of the method that would be nonlinear and robust to noise in measurements, would deal with the small empirical datasets and which has solid mathematical background is of great importance. The present paper deals with such model, based on Statistical Learning Theory (SLT) - Support Vector Regression. SLT is a general mathematical framework devoted to the problem of estimation of the dependencies from empirical data (Hastie et al, 2004; Vapnik, 1998). SLT models for classification - Support Vector Machines - have shown good results on different machine learning tasks. The results of SVM classification of spatial data are also promising (Kanevski et al, 2002). The properties of SVM for regression - Support Vector Regression (SVR) are less studied. First results of the application of SVR for spatial mapping of physical quantities were obtained by the authorsin for mapping of medium porosity (Kanevski et al, 1999), and for mapping of radioactively contaminated territories (Kanevski and Canu, 2000). The present paper is devoted to further understanding of the properties of SVR model for spatial data analysis and mapping. Detailed description of the SVR theory can be found in (Cristianini and Shawe-Taylor, 2000; Smola, 1996) and basic equations for the nonlinear modeling are given in section 2. Section 3 discusses the application of SVR for spatial data mapping on the real case study - soil pollution by Cs137 radionuclide. Section 4 discusses the properties of the modelapplied to noised data or data with outliers.
Resumo:
It is well known that regression analyses involving compositional data need special attention because the data are not of full rank. For a regression analysis where both the dependent and independent variable are components we propose a transformation of the components emphasizing their role as dependent and independent variables. A simple linear regression can be performed on the transformed components. The regression line can be depicted in a ternary diagram facilitating the interpretation of the analysis in terms of components. An exemple with time-budgets illustrates the method and the graphical features
Resumo:
The broiler rectal temperature (t rectal) is one of the most important physiological responses to classify the animal thermal comfort. Therefore, the aim of this study was to adjust regression models in order to predict the rectal temperature (t rectal) of broiler chickens under different thermal conditions based on age (A) and a meteorological variable (air temperature - t air) or a thermal comfort index (temperature and humidity index -THI or black globe humidity index - BGHI) or a physical quantity enthalpy (H). In addition, through the inversion of these models and the expected t rectal intervals for each age, the comfort limits of t air, THI, BGHI and H for the chicks in the heating phase were determined, aiding in the validation of the equations and the preliminary limits for H. The experimental data used to adjust the mathematical models were collected in two commercial poultry farms, with Cobb chicks, from 1 to 14 days of age. It was possible to predict the t rectal of conditions from the expected t rectal and determine the lower and superior comfort thresholds of broilers satisfactorily by applying the four models adjusted; as well as to invert the models for prediction of the environmental H for the chicks first 14 days of life.
Resumo:
Based on experimental tests, it was obtained the equations for drying, equilibrium moisture content, latent heat of vaporization of water contained in the product and the equation of specific heat of cassava starch pellets, essential parameters for realizing modeling and mathematical simulation of mechanical drying of cassava starch for a new technique proposed, consisting of preformed by pelleting and subsequent artificial drying of starch pellets. Drying tests were conducted in an experimental chamber by varying the air temperature, relative humidity, air velocity and product load. The specific heat of starch was determined by differential scanning calorimetry. The generated equations were validated through regression analysis, finding an appropriate correlation of the data, which indicates that by using these equations, can accurately model and simulate the drying process of cassava starch pellets.
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
Several methods are used to estimate anaerobic threshold (AT) during exercise. The aim of the present study was to compare AT obtained by a graphic visual method for the estimate of ventilatory and metabolic variables (gold standard), to a bi-segmental linear regression mathematical model of Hinkley's algorithm applied to heart rate (HR) and carbon dioxide output (VCO2) data. Thirteen young (24 ± 2.63 years old) and 16 postmenopausal (57 ± 4.79 years old) healthy and sedentary women were submitted to a continuous ergospirometric incremental test on an electromagnetic braking cycloergometer with 10 to 20 W/min increases until physical exhaustion. The ventilatory variables were recorded breath-to-breath and HR was obtained beat-to-beat over real time. Data were analyzed by the nonparametric Friedman test and Spearman correlation test with the level of significance set at 5%. Power output (W), HR (bpm), oxygen uptake (VO2; mL kg-1 min-1), VO2 (mL/min), VCO2 (mL/min), and minute ventilation (VE; L/min) data observed at the AT level were similar for both methods and groups studied (P > 0.05). The VO2 (mL kg-1 min-1) data showed significant correlation (P < 0.05) between the gold standard method and the mathematical model when applied to HR (r s = 0.75) and VCO2 (r s = 0.78) data for the subjects as a whole (N = 29). The proposed mathematical method for the detection of changes in response patterns of VCO2 and HR was adequate and promising for AT detection in young and middle-aged women, representing a semi-automatic, non-invasive and objective AT measurement.