882 resultados para non-linear regression
Resumo:
Correlation and regression are two of the statistical procedures most widely used by optometrists. However, these tests are often misused or interpreted incorrectly, leading to erroneous conclusions from clinical experiments. This review examines the major statistical tests concerned with correlation and regression that are most likely to arise in clinical investigations in optometry. First, the use, interpretation and limitations of Pearson's product moment correlation coefficient are described. Second, the least squares method of fitting a linear regression to data and for testing how well a regression line fits the data are described. Third, the problems of using linear regression methods in observational studies, if there are errors associated in measuring the independent variable and for predicting a new value of Y for a given X, are discussed. Finally, methods for testing whether a non-linear relationship provides a better fit to the data and for comparing two or more regression lines are considered.
Resumo:
Background - The binding between peptide epitopes and major histocompatibility complex proteins (MHCs) is an important event in the cellular immune response. Accurate prediction of the binding between short peptides and the MHC molecules has long been a principal challenge for immunoinformatics. Recently, the modeling of MHC-peptide binding has come to emphasize quantitative predictions: instead of categorizing peptides as "binders" or "non-binders" or as "strong binders" and "weak binders", recent methods seek to make predictions about precise binding affinities. Results - We developed a quantitative support vector machine regression (SVR) approach, called SVRMHC, to model peptide-MHC binding affinities. As a non-linear method, SVRMHC was able to generate models that out-performed existing linear models, such as the "additive method". By adopting a new "11-factor encoding" scheme, SVRMHC takes into account similarities in the physicochemical properties of the amino acids constituting the input peptides. When applied to MHC-peptide binding data for three mouse class I MHC alleles, the SVRMHC models produced more accurate predictions than those produced previously. Furthermore, comparisons based on Receiver Operating Characteristic (ROC) analysis indicated that SVRMHC was able to out-perform several prominent methods in identifying strongly binding peptides. Conclusion - As a method with demonstrated performance in the quantitative modeling of MHC-peptide binding and in identifying strong binders, SVRMHC is a promising immunoinformatics tool with not inconsiderable future potential.
Resumo:
In non-linear random effects some attention has been very recently devoted to the analysis ofsuitable transformation of the response variables separately (Taylor 1996) or not (Oberg and Davidian 2000) from the transformations of the covariates and, as far as we know, no investigation has been carried out on the choice of link function in such models. In our study we consider the use of a random effect model when a parameterized family of links (Aranda-Ordaz 1981, Prentice 1996, Pregibon 1980, Stukel 1988 and Czado 1997) is introduced. We point out the advantages and the drawbacks associated with the choice of this data-driven kind of modeling. Difficulties in the interpretation of regression parameters, and therefore in understanding the influence of covariates, as well as problems related to loss of efficiency of estimates and overfitting, are discussed. A case study on radiotherapy usage in breast cancer treatment is discussed.
Resumo:
Analysis of risk measures associated with price series data movements and its predictions are of strategic importance in the financial markets as well as to policy makers in particular for short- and longterm planning for setting up economic growth targets. For example, oilprice risk-management focuses primarily on when and how an organization can best prevent the costly exposure to price risk. Value-at-Risk (VaR) is the commonly practised instrument to measure risk and is evaluated by analysing the negative/positive tail of the probability distributions of the returns (profit or loss). In modelling applications, least-squares estimation (LSE)-based linear regression models are often employed for modeling and analyzing correlated data. These linear models are optimal and perform relatively well under conditions such as errors following normal or approximately normal distributions, being free of large size outliers and satisfying the Gauss-Markov assumptions. However, often in practical situations, the LSE-based linear regression models fail to provide optimal results, for instance, in non-Gaussian situations especially when the errors follow distributions with fat tails and error terms possess a finite variance. This is the situation in case of risk analysis which involves analyzing tail distributions. Thus, applications of the LSE-based regression models may be questioned for appropriateness and may have limited applicability. We have carried out the risk analysis of Iranian crude oil price data based on the Lp-norm regression models and have noted that the LSE-based models do not always perform the best. We discuss results from the L1, L2 and L∞-norm based linear regression models. ACM Computing Classification System (1998): B.1.2, F.1.3, F.2.3, G.3, J.2.
Resumo:
Annual average daily traffic (AADT) is important information for many transportation planning, design, operation, and maintenance activities, as well as for the allocation of highway funds. Many studies have attempted AADT estimation using factor approach, regression analysis, time series, and artificial neural networks. However, these methods are unable to account for spatially variable influence of independent variables on the dependent variable even though it is well known that to many transportation problems, including AADT estimation, spatial context is important. ^ In this study, applications of geographically weighted regression (GWR) methods to estimating AADT were investigated. The GWR based methods considered the influence of correlations among the variables over space and the spatially non-stationarity of the variables. A GWR model allows different relationships between the dependent and independent variables to exist at different points in space. In other words, model parameters vary from location to location and the locally linear regression parameters at a point are affected more by observations near that point than observations further away. ^ The study area was Broward County, Florida. Broward County lies on the Atlantic coast between Palm Beach and Miami-Dade counties. In this study, a total of 67 variables were considered as potential AADT predictors, and six variables (lanes, speed, regional accessibility, direct access, density of roadway length, and density of seasonal household) were selected to develop the models. ^ To investigate the predictive powers of various AADT predictors over the space, the statistics including local r-square, local parameter estimates, and local errors were examined and mapped. The local variations in relationships among parameters were investigated, measured, and mapped to assess the usefulness of GWR methods. ^ The results indicated that the GWR models were able to better explain the variation in the data and to predict AADT with smaller errors than the ordinary linear regression models for the same dataset. Additionally, GWR was able to model the spatial non-stationarity in the data, i.e., the spatially varying relationship between AADT and predictors, which cannot be modeled in ordinary linear regression. ^
Resumo:
Cohort programs have been instituted at many universities to accommodate the growing number of mature adult graduate students who pursue degrees while maintaining multiple commitments such as work and family. While it is estimated that as many as 40–60% of students who begin graduate study fail to complete degrees, it is thought that attrition may be even higher for this population of students. Yet, little is known about the impact of cohorts on the learning environment and whether cohort programs affect graduate student retention. Retention theory stresses the importance of the academic department, quality of faculty-student relationships and student involvement in the life of the academic community as critical determinants in students' decisions to persist to degree completion. However, students who are employed full-time typically spend little time on campus engaged in the learning environment. Using academic and social integration theory, this study examined the experiences of working adult graduate students enrolled in cohort (CEP) and non-cohort (non-CEP) programs and the influence of these experiences on intention to persist. The Graduate Program Context Questionnaire was administered to graduate students (N = 310) to examine measures of academic and social integration and intention to persist. Sample t tests and ANOVAs were conducted to determine whether differences in perceptions could be identified between cohort and non-cohort students. Multiple linear regression was used to identify variables that predict students' intention to persist. While there were many similarities, significant differences were found between CEP and non-CEP student groups on two measures. CEP students rated peer-student relationships higher and scored higher on the intention to persist measure than non-CEP students. The psychological integration measure, however, was the strongest predictor of intention to persist for both the CEP and non-CEP groups. This study supports the research literature which suggests that CEP programs encourage the development of peer-student relationships and promote students' commitment to persistence.
Resumo:
BACKGROUND: Moderate-to-vigorous physical activity (MVPA) is an important determinant of children’s physical health, and is commonly measured using accelerometers. A major limitation of accelerometers is non-wear time, which is the time the participant did not wear their device. Given that non-wear time is traditionally discarded from the dataset prior to estimating MVPA, final estimates of MVPA may be biased. Therefore, alternate approaches should be explored. OBJECTIVES: The objectives of this thesis were to 1) develop and describe an imputation approach that uses the socio-demographic, time, health, and behavioural data from participants to replace non-wear time accelerometer data, 2) determine the extent to which imputation of non-wear time data influences estimates of MVPA, and 3) determine if imputation of non-wear time data influences the associations between MVPA, body mass index (BMI), and systolic blood pressure (SBP). METHODS: Seven days of accelerometer data were collected using Actical accelerometers from 332 children aged 10-13. Three methods for handling missing accelerometer data were compared: 1) the “non-imputed” method wherein non-wear time was deleted from the dataset, 2) imputation dataset I, wherein the imputation of MVPA during non-wear time was based upon socio-demographic factors of the participant (e.g., age), health information (e.g., BMI), and time characteristics of the non-wear period (e.g., season), and 3) imputation dataset II wherein the imputation of MVPA was based upon the same variables as imputation dataset I, plus organized sport information. Associations between MVPA and health outcomes in each method were assessed using linear regression. RESULTS: Non-wear time accounted for 7.5% of epochs during waking hours. The average minutes/day of MVPA was 56.8 (95% CI: 54.2, 59.5) in the non-imputed dataset, 58.4 (95% CI: 55.8, 61.0) in imputed dataset I, and 59.0 (95% CI: 56.3, 61.5) in imputed dataset II. Estimates between datasets were not significantly different. The strength of the relationship between MVPA with BMI and SBP were comparable between all three datasets. CONCLUSION: These findings suggest that studies that achieve high accelerometer compliance with unsystematic patterns of missing data can use the traditional approach of deleting non-wear time from the dataset to obtain MVPA measures without substantial bias.
Resumo:
Veterinary medicines (VMs) from agricultural industry can enter the environment in a number of ways. This includes direct exposure through aquaculture, accidental spillage and disposal, and indirect entry by leaching from manure or runoff after treatment. Many compounds used in animal treatments have ecotoxic properties that may have chronic or sometimes lethal effects when they come into contact with non-target organisms. VMs enter the environment in mixtures, potentially having additive effects. Traditional ecotoxicology tests are used to determine the lethal and sometimes reproductive effects on freshwater and terrestrial organisms. However, organisms used in ecotoxicology tests can be unrepresentative of the populations that are likely to be exposed to the compound in the environment. Most often the tests are on single compound toxicity but mixture effects may be significant and should be included in ecotoxicology testing. This work investigates the use, measured environmental concentrations (MECs) and potential impact of sea lice treatments on salmon farms in Scotland. Alternative methods for ecotoxicology testing including mixture toxicity, and the use of in silico techniques to predict the chronic impact of VMs on different species of aquatic organisms were also investigated. The Scottish Environmental Protection Agency (SEPA) provided information on the use of five sea lice treatments from 2008-2011 on Scottish salmon farms. This information was combined with the recently available data on sediment MECs for the years 2009-2012 provided by SEPA using ArcGIS 10.1. In depth analysis of this data showed that from a total of 55 sites, 30 sites had a MEC higher than the maximum allowable concentration (MAC) as set out by SEPA for emamectin benzoate and 7 sites had a higher MEC than MAC for teflubenzuron. A number of sites that were up to 16 km away from the nearest salmon farm reported as using either emamectin benzoate or teflubenzuron measured positive for the two treatments. There was no relationship between current direction and the distribution of the sea lice treatments, nor was there any evidence for alternative sources of the compounds e.g. land treatments. The sites that had MECs higher than the MAC could pose a risk to non-target organisms and disrupt the species dynamics of the area. There was evidence that some marine protected sites might be at risk of exposure to these compounds. To complement this work, effects on acute mixture toxicity of the 5 sea lice treatments, plus one major metabolite 3-phenoxybenzoic acid (3PBA), were measured using an assay using the bioluminescent bacteria Aliivibrio fischeri. When exposed to the 5 sea lice treatments and 3PBA A. fischeri showed a response to 3PBA, emamectin benzoate and azamethiphos as well as combinations of the three. In order to establish any additive effect of the sea lice treatments, the efficacy of two mixture prediction equations, concentration addition (CA) and independent action ii(IA) were tested using the results from single compound dose response curves. In this instance IA was the more effective prediction method with a linear regression confidence interval of 82.6% compared with 22.6% of CA. In silico molecular docking was carried out to predict the chronic effects of 15 VMs (including the five used as sea lice control). Molecular docking has been proposed as an alternative screening method for the chronic effects of large animal treatments on non-target organisms. Oestrogen receptor alpha (ERα) of 7 non-target bony fish and the African clawed frog Xenopus laevis were modelled using SwissModel. These models were then ‘docked’ to oestradiol, the synthetic oestrogen ethinylestradiol, two known xenoestrogens dichlorodiphenyltrichloroethane (DDT) and bisphenol A (BPA), the antioestrogen breast cancer treatment tamoxifen and 15 VMs using Auto Dock 4. Based on the results of this work, four VMs were identified as being possible xenoestrogens or anti-oestrogens; these were cypermethrin, deltamethrin, fenbendazole and teflubenzuron. Further investigation, using in vitro assays, into these four VMs has been suggested as future work. A modified recombinant yeast oestrogen screen (YES) was attempted using the cDNA of the ERα of the zebrafish Danio rerio and the rainbow trout Oncorhynchus mykiss. Due to time and difficulties in cloning protocols this work was unable to be completed. Use of such in vitro assays would allow for further investigation of the highlighted VMs into their oestrogenic potential. In conclusion, VMs used as sea lice treatments, such as teflubenzuron and emamectin benzoate may be more persistent and have a wider range in the environment than previously thought. Mixtures of sea lice treatments have been found to persist together in the environment, and effects of these mixtures on the bacteria A. fischeri can be predicted using the IA equation. Finally, molecular docking may be a suitable tool to predict chronic endocrine disrupting effects and identify varying degrees of impact on the ERα of nine species of aquatic organisms.
Resumo:
The flow rates of drying and nebulizing gas, heat block and desolvation line temperatures and interface voltage are potential electrospray ionization parameters as they may enhance sensitivity of the mass spectrometer. The conditions that give higher sensitivity of 13 pharmaceuticals were explored. First, Plackett-Burman design was implemented to screen significant factors, and it was concluded that interface voltage and nebulizing gas flow were the only factors that influence the intensity signal for all pharmaceuticals. This fractionated factorial design was projected to set a full 2(2) factorial design with center points. The lack-of-fit test proved to be significant. Then, a central composite face-centered design was conducted. Finally, a stepwise multiple linear regression and subsequently an optimization problem solving were carried out. Two main drug clusters were found concerning the signal intensities of all runs of the augmented factorial design. p-Aminophenol, salicylic acid, and nimesulide constitute one cluster as a result of showing much higher sensitivity than the remaining drugs. The other cluster is more homogeneous with some sub-clusters comprising one pharmaceutical and its respective metabolite. It was observed that instrumental signal increased when both significant factors increased with maximum signal occurring when both codified factors are set at level +1. It was also found that, for most of the pharmaceuticals, interface voltage influences the intensity of the instrument more than the nebulizing gas flowrate. The only exceptions refer to nimesulide where the relative importance of the factors is reversed and still salicylic acid where both factors equally influence the instrumental signal. Graphical Abstract ᅟ.
Resumo:
The four-skills on tests for young native speakers commonly do not generate correlation incongruency concerning the cognitive strategies frequently reported. Considering the non-native speakers there are parse evidence to determine which tasks are important to assess properly the cognitive and academic language proficiency (Cummins, 1980; 2012). Research questions: It is of high probability that young students with origin in immigration significantly differ on their communication strategies and skills in a second language processing context (1); attached to this first assumption, it is supposed that teachers significantly differ depending on their scientific area and previous training (2). Purpose: This study intends to examine whether school teachers (K-12) as having different origin in scientific domain of teaching and training perceive differently an adapted four-skills scale, in European Portuguese. Research methods: 77 teachers of five areas scientific areas, mean of teaching year service = 32 (SD= 2,7), 57 males and 46 females (from basic and high school levels). Main findings: ANOVA (Effect size and Post-hoc Tukey tests) and linear regression analysis (stepwise method) revealed statistically significant differences among teachers of different areas, mainly between language teachers and science teachers. Language teachers perceive more accurately tasks in a multiple manner to the broad skills that require to be measured in non-native students. Conclusion: If teachers perceive differently the importance of the big-four tasks, there would be incongruence on skills measurement that teachers select for immigrant puppils. Non-balanced tasks and the teachers’ perceptions on evaluation and toward competence of students would likely determine limitations for academic and cognitive development of non-native students. Furthermore, results showed sufficient evidence to conclude that tasks are perceived differently by teachers toward importance of specific skills subareas. Reading skills are best considered compared to oral comphreension skills in non-native students.
Resumo:
Purpose: Television viewing time, independent of leisure-time physical activity, has cross-sectional relationships with the metabolic syndrome and its individual components. We examined whether baseline and five-year changes in self-reported television viewing time are associated with changes in continuous biomarkers of cardio-metabolic risk (waist circumference, triglycerides, high density lipoprotein cholesterol, systolic and diastolic blood pressure, fasting plasma glucose; and a clustered cardio-metabolic risk score) in Australian adults. Methods: AusDiab is a prospective, population-based cohort study with biological, behavioral, and demographic measures collected in 1999–2000 and 2004–2005. Non-institutionalized adults aged ≥ 25 years were measured at baseline (11,247; 55% of those completing an initial household interview); 6,400 took part in the five-year follow-up biomedical examination, and 3,846 met the inclusion criteria for this analysis. Multiple linear regression analysis was used and unstandardized B coefficients (95% CI) are provided. Results: Baseline television viewing time (10 hours/week unit) was not significantly associated with change in any of the biomarkers of cardio-metabolic risk. Increases in television viewing time over five years (10 hours/week unit) were associated with increases in: waist circumference (cm) (men: 0.43 (0.08, 0.78), P = 0.02; women: 0.68 (0.30, 1.05), P <0.001), diastolic blood pressure (mmHg) (women: 0.47 (0.02, 0.92), P = 0.04), and the clustered cardio-metabolic risk score (women: 0.03 (0.01, 0.05), P = 0.007). These associations were independent of baseline television viewing time and baseline and change in physical activity and other potential confounders. Conclusion: These findings indicate that an increase in television viewing time is associated with adverse cardio-metabolic biomarker changes. Further prospective studies using objective measures of several sedentary behaviors are required to confirm causality of the associations found.
Resumo:
In this thesis we are interested in financial risk and the instrument we want to use is Value-at-Risk (VaR). VaR is the maximum loss over a given period of time at a given confidence level. Many definitions of VaR exist and some will be introduced throughout this thesis. There two main ways to measure risk and VaR: through volatility and through percentiles. Large volatility in financial returns implies greater probability of large losses, but also larger probability of large profits. Percentiles describe tail behaviour. The estimation of VaR is a complex task. It is important to know the main characteristics of financial data to choose the best model. The existing literature is very wide, maybe controversial, but helpful in drawing a picture of the problem. It is commonly recognised that financial data are characterised by heavy tails, time-varying volatility, asymmetric response to bad and good news, and skewness. Ignoring any of these features can lead to underestimating VaR with a possible ultimate consequence being the default of the protagonist (firm, bank or investor). In recent years, skewness has attracted special attention. An open problem is the detection and modelling of time-varying skewness. Is skewness constant or there is some significant variability which in turn can affect the estimation of VaR? This thesis aims to answer this question and to open the way to a new approach to model simultaneously time-varying volatility (conditional variance) and skewness. The new tools are modifications of the Generalised Lambda Distributions (GLDs). They are four-parameter distributions, which allow the first four moments to be modelled nearly independently: in particular we are interested in what we will call para-moments, i.e., mean, variance, skewness and kurtosis. The GLDs will be used in two different ways. Firstly, semi-parametrically, we consider a moving window to estimate the parameters and calculate the percentiles of the GLDs. Secondly, parametrically, we attempt to extend the GLDs to include time-varying dependence in the parameters. We used the local linear regression to estimate semi-parametrically conditional mean and conditional variance. The method is not efficient enough to capture all the dependence structure in the three indices —ASX 200, S&P 500 and FT 30—, however it provides an idea of the DGP underlying the process and helps choosing a good technique to model the data. We find that GLDs suggest that moments up to the fourth order do not always exist, there existence appears to vary over time. This is a very important finding, considering that past papers (see for example Bali et al., 2008; Hashmi and Tay, 2007; Lanne and Pentti, 2007) modelled time-varying skewness, implicitly assuming the existence of the third moment. However, the GLDs suggest that mean, variance, skewness and in general the conditional distribution vary over time, as already suggested by the existing literature. The GLDs give good results in estimating VaR on three real indices, ASX 200, S&P 500 and FT 30, with results very similar to the results provided by historical simulation.
Resumo:
Hot and cold temperatures significantly increase mortality rates around the world, but which measure of temperature is the best predictor of mortality is not known. We used mortality data from 107 US cities for the years 1987–2000 and examined the association between temperature and mortality using Poisson regression and modelled a non-linear temperature effect and a non-linear lag structure. We examined mean, minimum and maximum temperature with and without humidity, and apparent temperature and the Humidex. The best measure was defined as that with the minimum cross-validated residual. We found large differences in the best temperature measure between age groups, seasons and cities, and there was no one temperature measure that was superior to the others. The strong correlation between different measures of temperature means that, on average, they have the same predictive ability. The best temperature measure for new studies can be chosen based on practical concerns, such as choosing the measure with the least amount of missing data.
Resumo:
During the past decade, a significant amount of research has been conducted internationally with the aim of developing, implementing, and verifying "advanced analysis" methods suitable for non-linear analysis and design of steel frame structures. Application of these methods permits comprehensive assessment of the actual failure modes and ultimate strengths of structural systems in practical design situations, without resort to simplified elastic methods of analysis and semi-empirical specification equations. Advanced analysis has the potential to extend the creativity of structural engineers and simplify the design process, while ensuring greater economy and more uniform safety with respect to the ultimate limit state. The application of advanced analysis methods has previously been restricted to steel frames comprising only members with compact cross-sections that are not subject to the effects of local buckling. This precluded the use of advanced analysis from the design of steel frames comprising a significant proportion of the most commonly used Australian sections, which are non-compact and subject to the effects of local buckling. This thesis contains a detailed description of research conducted over the past three years in an attempt to extend the scope of advanced analysis by developing methods that include the effects of local buckling in a non-linear analysis formulation, suitable for practical design of steel frames comprising non-compact sections. Two alternative concentrated plasticity formulations are presented in this thesis: the refined plastic hinge method and the pseudo plastic zone method. Both methods implicitly account for the effects of gradual cross-sectional yielding, longitudinal spread of plasticity, initial geometric imperfections, residual stresses, and local buckling. The accuracy and precision of the methods for the analysis of steel frames comprising non-compact sections has been established by comparison with a comprehensive range of analytical benchmark frame solutions. Both the refined plastic hinge and pseudo plastic zone methods are more accurate and precise than the conventional individual member design methods based on elastic analysis and specification equations. For example, the pseudo plastic zone method predicts the ultimate strength of the analytical benchmark frames with an average conservative error of less than one percent, and has an acceptable maximum unconservati_ve error of less than five percent. The pseudo plastic zone model can allow the design capacity to be increased by up to 30 percent for simple frames, mainly due to the consideration of inelastic redistribution. The benefits may be even more significant for complex frames with significant redundancy, which provides greater scope for inelastic redistribution. The analytical benchmark frame solutions were obtained using a distributed plasticity shell finite element model. A detailed description of this model and the results of all the 120 benchmark analyses are provided. The model explicitly accounts for the effects of gradual cross-sectional yielding, longitudinal spread of plasticity, initial geometric imperfections, residual stresses, and local buckling. Its accuracy was verified by comparison with a variety of analytical solutions and the results of three large-scale experimental tests of steel frames comprising non-compact sections. A description of the experimental method and test results is also provided.
Resumo:
Artificial neural network (ANN) learning methods provide a robust and non-linear approach to approximating the target function for many classification, regression and clustering problems. ANNs have demonstrated good predictive performance in a wide variety of practical problems. However, there are strong arguments as to why ANNs are not sufficient for the general representation of knowledge. The arguments are the poor comprehensibility of the learned ANN, and the inability to represent explanation structures. The overall objective of this thesis is to address these issues by: (1) explanation of the decision process in ANNs in the form of symbolic rules (predicate rules with variables); and (2) provision of explanatory capability by mapping the general conceptual knowledge that is learned by the neural networks into a knowledge base to be used in a rule-based reasoning system. A multi-stage methodology GYAN is developed and evaluated for the task of extracting knowledge from the trained ANNs. The extracted knowledge is represented in the form of restricted first-order logic rules, and subsequently allows user interaction by interfacing with a knowledge based reasoner. The performance of GYAN is demonstrated using a number of real world and artificial data sets. The empirical results demonstrate that: (1) an equivalent symbolic interpretation is derived describing the overall behaviour of the ANN with high accuracy and fidelity, and (2) a concise explanation is given (in terms of rules, facts and predicates activated in a reasoning episode) as to why a particular instance is being classified into a certain category.