937 resultados para multiple linear regression models
Resumo:
Non-linear relationships are common in microbiological research and often necessitate the use of the statistical techniques of non-linear regression or curve fitting. In some circumstances, the investigator may wish to fit an exponential model to the data, i.e., to test the hypothesis that a quantity Y either increases or decays exponentially with increasing X. This type of model is straight forward to fit as taking logarithms of the Y variable linearises the relationship which can then be treated by the methods of linear regression.
Resumo:
In some circumstances, there may be no scientific model of the relationship between X and Y that can be specified in advance and indeed the objective of the investigation may be to provide a ‘curve of best fit’ for predictive purposes. In such an example, the fitting of successive polynomials may be the best approach. There are various strategies to decide on the polynomial of best fit depending on the objectives of the investigation.
Resumo:
1. The techniques associated with regression, whether linear or non-linear, are some of the most useful statistical procedures that can be applied in clinical studies in optometry. 2. In some cases, there may be no scientific model of the relationship between X and Y that can be specified in advance and the objective may be to provide a ‘curve of best fit’ for predictive purposes. In such cases, the fitting of a general polynomial type curve may be the best approach. 3. An investigator may have a specific model in mind that relates Y to X and the data may provide a test of this hypothesis. Some of these curves can be reduced to a linear regression by transformation, e.g., the exponential and negative exponential decay curves. 4. In some circumstances, e.g., the asymptotic curve or logistic growth law, a more complex process of curve fitting involving non-linear estimation will be required.
Resumo:
In the global economy, innovation is one of the most important competitive assets for companies willing to compete in international markets. As competition moves from standardised products to customised ones, depending on each specific market needs, economies of scale are not anymore the only winning strategy. Innovation requires firms to establish processes to acquire and absorb new knowledge, leading to the recent theory of Open Innovation. Knowledge sharing and acquisition happens when firms are embedded in networks with other firms, university, institutions and many other economic actors. Several typologies of innovation and firm networks have been identified, with various geographical spans. One of the first being modelled was the Industrial Cluster (or in Italian Distretto Industriale) which was for long considered the benchmark for innovation and economic development. Other kind of networks have been modelled since the late 1970s; Regional Innovation Systems represent one of the latest and more diffuse model of innovation networks, specifically introduced to combine local networks and the global economy. This model was qualitatively exploited since its introduction, but, together with National Innovation Systems, is among the most inspiring for policy makers and is often cited by them, not always properly. The aim of this research is to setup an econometric model describing Regional Innovation Systems, becoming one the first attempts to test and enhance this theory with a quantitative approach. A dataset of 104 secondary and primary data from European regions was built in order to run a multiple linear regression, testing if Regional Innovation Systems are really correlated to regional innovation and regional innovation in cooperation with foreign partners. Furthermore, an exploratory multiple linear regression was performed to verify which variables, among those describing a Regional Innovation Systems, are the most significant for innovating, alone or with foreign partners. Furthermore, the effectiveness of present innovation policies has been tested based on the findings of the econometric model. The developed model confirmed the role of Regional Innovation Systems for creating innovation even in cooperation with international partners: this represents one of the firsts quantitative confirmation of a theory previously based on qualitative models only. Furthermore the results of this model confirmed a minor influence of National Innovation Systems: comparing the analysis of existing innovation policies, both at regional and national level, to our findings, emerged the need for potential a pivotal change in the direction currently followed by policy makers. Last, while confirming the role of the presence a learning environment in a region and the catalyst role of regional administration, this research offers a potential new perspective for the whole private sector in creating a Regional Innovation System.
Resumo:
2000 Mathematics Subject Classification: 62P10, 92C20
Resumo:
Highways are generally designed to serve a mixed traffic flow that consists of passenger cars, trucks, buses, recreational vehicles, etc. The fact that the impacts of these different vehicle types are not uniform creates problems in highway operations and safety. A common approach to reducing the impacts of truck traffic on freeways has been to restrict trucks to certain lane(s) to minimize the interaction between trucks and other vehicles and to compensate for their differences in operational characteristics. ^ The performance of different truck lane restriction alternatives differs under different traffic and geometric conditions. Thus, a good estimate of the operational performance of different truck lane restriction alternatives under prevailing conditions is needed to help make informed decisions on truck lane restriction alternatives. This study develops operational performance models that can be applied to help identify the most operationally efficient truck lane restriction alternative on a freeway under prevailing conditions. The operational performance measures examined in this study include average speed, throughput, speed difference, and lane changes. Prevailing conditions include number of lanes, interchange density, free-flow speeds, volumes, truck percentages, and ramp volumes. ^ Recognizing the difficulty of collecting sufficient data for an empirical modeling procedure that involves a high number of variables, the simulation approach was used to estimate the performance values for various truck lane restriction alternatives under various scenarios. Both the CORSIM and VISSIM simulation models were examined for their ability to model truck lane restrictions. Due to a major problem found in the CORSIM model for truck lane modeling, the VISSIM model was adopted as the simulator for this study. ^ The VISSIM model was calibrated mainly to replicate the capacity given in the 2000 Highway Capacity Manual (HCM) for various free-flow speeds under the ideal basic freeway section conditions. Non-linear regression models for average speed, throughput, average number of lane changes, and speed difference between the lane groups were developed. Based on the performance models developed, a simple decision procedure was recommended to select the desired truck lane restriction alternative for prevailing conditions. ^
Resumo:
Annual average daily traffic (AADT) is important information for many transportation planning, design, operation, and maintenance activities, as well as for the allocation of highway funds. Many studies have attempted AADT estimation using factor approach, regression analysis, time series, and artificial neural networks. However, these methods are unable to account for spatially variable influence of independent variables on the dependent variable even though it is well known that to many transportation problems, including AADT estimation, spatial context is important. ^ In this study, applications of geographically weighted regression (GWR) methods to estimating AADT were investigated. The GWR based methods considered the influence of correlations among the variables over space and the spatially non-stationarity of the variables. A GWR model allows different relationships between the dependent and independent variables to exist at different points in space. In other words, model parameters vary from location to location and the locally linear regression parameters at a point are affected more by observations near that point than observations further away. ^ The study area was Broward County, Florida. Broward County lies on the Atlantic coast between Palm Beach and Miami-Dade counties. In this study, a total of 67 variables were considered as potential AADT predictors, and six variables (lanes, speed, regional accessibility, direct access, density of roadway length, and density of seasonal household) were selected to develop the models. ^ To investigate the predictive powers of various AADT predictors over the space, the statistics including local r-square, local parameter estimates, and local errors were examined and mapped. The local variations in relationships among parameters were investigated, measured, and mapped to assess the usefulness of GWR methods. ^ The results indicated that the GWR models were able to better explain the variation in the data and to predict AADT with smaller errors than the ordinary linear regression models for the same dataset. Additionally, GWR was able to model the spatial non-stationarity in the data, i.e., the spatially varying relationship between AADT and predictors, which cannot be modeled in ordinary linear regression. ^
Resumo:
This research aimed to analyse the effect of different territorial divisions in the random fluctuation of socio-economic indicators related to social determinants of health. This is an ecological study resulting from a combination of statistical methods including individuated and aggregate data analysis, using five databases derived from the database of the Brazilian demographic census 2010: overall results of the sample by weighting area. These data were grouped into the following levels: households; weighting areas; cities; Immediate Urban Associated Regions and Intermediate Urban Associated Regions. A theoretical model related to social determinants of health was used, with the dependent variable Household with death and as independent variables: Black race; Income; Childcare and school no attendance; Illiteracy; and Low schooling. The data was analysed in a model related to social determinants of health, using Poisson regression in individual basis, multilevel Poisson regression and multiple linear regression in light of the theoretical framework of the area. It was identified a greater proportion of households with deaths among those with at least one black resident, lower-income, illiterate, who do not attend or attended school or day-care and less educated. The analysis of the adjusted model showed that most adjusted prevalence ratio was related to Income, where there is a risk value of 1.33 for households with at least one resident with lower average personal income to R$ 655,00 (Brazilian current). The multilevel analysis demonstrated that there was a context effect when the variables were subjected to the effects of areas, insofar as the random effects were significant for all models and with different prevalence rates being higher in the areas with smaller dimensions - Weighting areas with coefficient of 0.035 and Cities with coefficient of 0.024. The ecological analyses have shown that the variable Income and Low schooling presented explanatory potential for the outcome on all models, having income greater power to determine the household deaths, especially in models related to Immediate Urban Associated Regions with a standardized coefficient of -0.616 and regions intermediate urban associated regions with a standardized coefficient of -0.618. It was concluded that there was a context effect on the random fluctuation of the socioeconomic indicators related to social determinants of health. This effect was explained by the characteristics of territorial divisions and individuals who live or work there. Context effects were better identified in the areas with smaller dimensions, which are more favourable to explain phenomena related to social determinants of health, especially in studies of societies marked by social inequalities. The composition effects were better identified in the Regions of Urban Articulation, shaped through mechanisms similar to the phenomenon under study.
Resumo:
The amount and quality of available biomass is a key factor for the sustainable livestock industry and agricultural management related decision making. Globally 31.5% of land cover is grassland while 80% of Ireland’s agricultural land is grassland. In Ireland, grasslands are intensively managed and provide the cheapest feed source for animals. This dissertation presents a detailed state of the art review of satellite remote sensing of grasslands, and the potential application of optical (Moderate–resolution Imaging Spectroradiometer (MODIS)) and radar (TerraSAR-X) time series imagery to estimate the grassland biomass at two study sites (Moorepark and Grange) in the Republic of Ireland using both statistical and state of the art machine learning algorithms. High quality weather data available from the on-site weather station was also used to calculate the Growing Degree Days (GDD) for Grange to determine the impact of ancillary data on biomass estimation. In situ and satellite data covering 12 years for the Moorepark and 6 years for the Grange study sites were used to predict grassland biomass using multiple linear regression, Neuro Fuzzy Inference Systems (ANFIS) models. The results demonstrate that a dense (8-day composite) MODIS image time series, along with high quality in situ data, can be used to retrieve grassland biomass with high performance (R2 = 0:86; p < 0:05, RMSE = 11.07 for Moorepark). The model for Grange was modified to evaluate the synergistic use of vegetation indices derived from remote sensing time series and accumulated GDD information. As GDD is strongly linked to the plant development, or phonological stage, an improvement in biomass estimation would be expected. It was observed that using the ANFIS model the biomass estimation accuracy increased from R2 = 0:76 (p < 0:05) to R2 = 0:81 (p < 0:05) and the root mean square error was reduced by 2.72%. The work on the application of optical remote sensing was further developed using a TerraSAR-X Staring Spotlight mode time series over the Moorepark study site to explore the extent to which very high resolution Synthetic Aperture Radar (SAR) data of interferometrically coherent paddocks can be exploited to retrieve grassland biophysical parameters. After filtering out the non-coherent plots it is demonstrated that interferometric coherence can be used to retrieve grassland biophysical parameters (i. e., height, biomass), and that it is possible to detect changes due to the grass growth, and grazing and mowing events, when the temporal baseline is short (11 days). However, it not possible to automatically uniquely identify the cause of these changes based only on the SAR backscatter and coherence, due to the ambiguity caused by tall grass laid down due to the wind. Overall, the work presented in this dissertation has demonstrated the potential of dense remote sensing and weather data time series to predict grassland biomass using machine-learning algorithms, where high quality ground data were used for training. At present a major limitation for national scale biomass retrieval is the lack of spatial and temporal ground samples, which can be partially resolved by minor modifications in the existing PastureBaseIreland database by adding the location and extent ofeach grassland paddock in the database. As far as remote sensing data requirements are concerned, MODIS is useful for large scale evaluation but due to its coarse resolution it is not possible to detect the variations within the fields and between the fields at the farm scale. However, this issue will be resolved in terms of spatial resolution by the Sentinel-2 mission, and when both satellites (Sentinel-2A and Sentinel-2B) are operational the revisit time will reduce to 5 days, which together with Landsat-8, should enable sufficient cloud-free data for operational biomass estimation at a national scale. The Synthetic Aperture Radar Interferometry (InSAR) approach is feasible if there are enough coherent interferometric pairs available, however this is difficult to achieve due to the temporal decorrelation of the signal. For repeat-pass InSAR over a vegetated area even an 11 days temporal baseline is too large. In order to achieve better coherence a very high resolution is required at the cost of spatial coverage, which limits its scope for use in an operational context at a national scale. Future InSAR missions with pair acquisition in Tandem mode will minimize the temporal decorrelation over vegetation areas for more focused studies. The proposed approach complements the current paradigm of Big Data in Earth Observation, and illustrates the feasibility of integrating data from multiple sources. In future, this framework can be used to build an operational decision support system for retrieval of grassland biophysical parameters based on data from long term planned optical missions (e. g., Landsat, Sentinel) that will ensure the continuity of data acquisition. Similarly, Spanish X-band PAZ and TerraSAR-X2 missions will ensure the continuity of TerraSAR-X and COSMO-SkyMed.
Resumo:
Statistical association between a single nucleotide polymorphism (SNP) genotype and a quantitative trait in genome-wide association studies is usually assessed using a linear regression model, or, in the case of non-normally distributed trait values, using the Kruskal-Wallis test. While linear regression models assume an additive mode of inheritance via equi-distant genotype scores, Kruskal-Wallis test merely tests global differences in trait values associated with the three genotype groups. Both approaches thus exhibit suboptimal power when the underlying inheritance mode is dominant or recessive. Furthermore, these tests do not perform well in the common situations when only a few trait values are available in a rare genotype category (disbalance), or when the values associated with the three genotype categories exhibit unequal variance (variance heterogeneity). We propose a maximum test based on Marcus-type multiple contrast test for relative effect sizes. This test allows model-specific testing of either dominant, additive or recessive mode of inheritance, and it is robust against variance heterogeneity. We show how to obtain mode-specific simultaneous confidence intervals for the relative effect sizes to aid in interpreting the biological relevance of the results. Further, we discuss the use of a related all-pairwise comparisons contrast test with range preserving confidence intervals as an alternative to Kruskal-Wallis heterogeneity test. We applied the proposed maximum test to the Bogalusa Heart Study dataset, and gained a remarkable increase in the power to detect association, particularly for rare genotypes. Our simulation study also demonstrated that the proposed non-parametric tests control family-wise error rate in the presence of non-normality and variance heterogeneity contrary to the standard parametric approaches. We provide a publicly available R library nparcomp that can be used to estimate simultaneous confidence intervals or compatible multiplicity-adjusted p-values associated with the proposed maximum test.
Resumo:
In a industrial environment, to know the process one is working with is crucial to ensure its good functioning. In the present work, developed at Prio Biocombustíveis S.A. facilities, using process data, collected during the present work, and historical process data, the methanol recovery process was characterized, having started with the characterization of key process streams. Based on the information retrieved from the stream characterization, Aspen Plus® process simulation software was used to replicate the process and perform a sensitivity analysis with the objective of accessing the relative importance of certain key process variables (reflux/feed ratio, reflux temperature, reboiler outlet temperature, methanol, glycerol and water feed compositions). The work proceeded with the application of a set of statistical tools, starting with the Principal Components Analysis (PCA) from which the interactions between process variables and their contribution to the process variability was studied. Next, the Design of Experiments (DoE) was used to acquire experimental data and, with it, create a model for the water amount in the distillate. However, the necessary conditions to perform this method were not met and so it was abandoned. The Multiple Linear Regression method (MLR) was then used with the available data, creating several empiric models for the water at distillate, the one with the highest fit having a R2 equal to 92.93% and AARD equal to 19.44%. Despite the AARD still being relatively high, the model is still adequate to make fast estimates of the distillate’s quality. As for fouling, its presence has been noticed many times during this work. Not being possible to directly measure the fouling, the reboiler inlet steam pressure was used as an indicator of the fouling growth and its growth variation with the amount of Used Cooking Oil incorporated in the whole process. Comparing the steam cost associated to the reboiler’s operation when fouling is low (1.5 bar of steam pressure) and when fouling is high (reboiler’s steam pressure of 3 bar), an increase of about 58% occurs when the fouling increases.
Resumo:
Este estudo pretende determinar a influência das características do empresário, das características do investidor e das características da empresa sobre o montante inicialmente investido pelos BA. Para esse efeito, recorreu-se a uma amostra extraída da base de dados da Kauffman Foundation, cujos dados dizem respeito a BA dos E.U.A.. Analisaram-se as estatísticas descritivas e as hipóteses foram testadas através de seis modelos de regressão linear múltipla. Os resultados obtidos mostram que seis dos nove fatores identificados dizem respeito ao empresário e à equipa de gestão; ABSTRACT: This study aims to determine the influence of entrepreneur characteristics, investor characteristics and the characteristics of the company on the amount initially invested by BA. For this purpose, we used a sample extracted from Kauffman Foundation database, whose data refer to BA from the United States of America. We performed descriptive statistics and hypotheses were tested through six models of multiple linear regression. The results show that six of the nine factors identified are related to the entrepreneur and management team.
Resumo:
Dissertação de Mestrado em Gestão Empresarial. Faculdade de Economia, Univ. do Algarve, 2004