15 resultados para LINEAR-REGRESSION MODELS
em Indian Institute of Science - Bangalore - Índia
Resumo:
Processor architects have a challenging task of evaluating a large design space consisting of several interacting parameters and optimizations. In order to assist architects in making crucial design decisions, we build linear regression models that relate Processor performance to micro-architecture parameters, using simulation based experiments. We obtain good approximate models using an iterative process in which Akaike's information criteria is used to extract a good linear model from a small set of simulations, and limited further simulation is guided by the model using D-optimal experimental designs. The iterative process is repeated until desired error bounds are achieved. We used this procedure to establish the relationship of the CPI performance response to 26 key micro-architectural parameters using a detailed cycle-by-cycle superscalar processor simulator The resulting models provide a significance ordering on all micro-architectural parameters and their interactions, and explain the performance variations of micro-architectural techniques.
Resumo:
Chemical composition of rainwater changes from sea to inland under the influence of several major factors - topographic location of area, its distance from sea, annual rainfall. A model is developed here to quantify the variation in precipitation chemistry under the influence of inland distance and rainfall amount. Various sites in India categorized as 'urban', 'suburban' and 'rural' have been considered for model development. pH, HCO3, NO3 and Mg do not change much from coast to inland while, SO4 and Ca change is subjected to local emissions. Cl and Na originate solely from sea salinity and are the chemistry parameters in the model. Non-linear multiple regressions performed for the various categories revealed that both rainfall amount and precipitation chemistry obeyed a power law reduction with distance from sea. Cl and Na decrease rapidly for the first 100 km distance from sea, then decrease marginally for the next 100 km, and later stabilize. Regression parameters estimated for different cases were found to be consistent (R-2 similar to 0.8). Variation in one of the parameters accounted for urbanization. Model was validated using data points from the southern peninsular region of the country. Estimates are found to be within 99.9% confidence interval. Finally, this relationship between the three parameters - rainfall amount, coastline distance, and concentration (in terms of Cl and Na) was validated with experiments conducted in a small experimental watershed in the south-west India. Chemistry estimated using the model was in good correlation with observed values with a relative error of similar to 5%. Monthly variation in the chemistry is predicted from a downscaling model and then compared with the observed data. Hence, the model developed for rain chemistry is useful in estimating the concentrations at different spatio-temporal scales and is especially applicable for south-west region of India. (C) 2008 Elsevier Ltd. All rights reserved.
Resumo:
Lateral displacement and global stability are the two main stability criteria for soil nail walls. Conventional design methods do not adequately address the deformation behaviour of soil nail walls, owing to the complexity involved in handling a large number of influencing factors. Consequently, limited methods of deformation estimates based on empirical relationships and in situ performance monitoring are available in the literature. It is therefore desirable that numerical techniques and statistical methods are used in order to gain a better insight into the deformation behaviour of soil nail walls. In the present study numerical experiments are conducted using a 2 4 factorial design method. Based on analysis of the maximum lateral deformation and factor-of-safety observations from the numerical experiments, regression models for maximum lateral deformation and factor-of-safety prediction are developed and checked for adequacy. Selection of suitable design factors for the 2 4 factorial design of numerical experiments enabled the use of the proposed regression models over a practical range of soil nail wall heights and in situ soil variability. It is evident from the model adequacy analyses and illustrative example that the proposed regression models provided a reasonably good estimate of the lateral deformation and global factor of safety of the soil nail walls.
Resumo:
The momentum balance of the linear-combination integral model for the transition zone is investigated for constant pressure flows. The imbalance is found to be small enough to be negligible for all practical purposes. [S0889-504X(00)00703-0].
Resumo:
The effects of the initial height on the temporal persistence probability of steady-state height fluctuations in up-down symmetric linear models of surface growth are investigated. We study the (1 + 1)-dimensional Family model and the (1 + 1)-and (2 + 1)-dimensional larger curvature (LC) model. Both the Family and LC models have up-down symmetry, so the positive and negative persistence probabilities in the steady state, averaged over all values of the initial height h(0), are equal to each other. However, these two probabilities are not equal if one considers a fixed nonzero value of h(0). Plots of the positive persistence probability for negative initial height versus time exhibit power-law behavior if the magnitude of the initial height is larger than the interface width at saturation. By symmetry, the negative persistence probability for positive initial height also exhibits the same behavior. The persistence exponent that describes this power-law decay decreases as the magnitude of the initial height is increased. The dependence of the persistence probability on the initial height, the system size, and the discrete sampling time is found to exhibit scaling behavior.
Resumo:
Multiple input multiple output (MIMO) systems with large number of antennas have been gaining wide attention as they enable very high throughputs. A major impediment is the complexity at the receiver needed to detect the transmitted data. To this end we propose a new receiver, called LRR (Linear Regression of MMSE Residual), which improves the MMSE receiver by learning a linear regression model for the error of the MMSE receiver. The LRR receiver uses pilot data to estimate the channel, and then uses locally generated training data (not transmitted over the channel), to find the linear regression parameters. The proposed receiver is suitable for applications where the channel remains constant for a long period (slow-fading channels) and performs quite well: at a bit error rate (BER) of 10(-3), the SNR gain over MMSE receiver is about 7 dB for a 16 x 16 system; for a 64 x 64 system the gain is about 8.5 dB. For large coherence time, the complexity order of the LRR receiver is the same as that of the MMSE receiver, and in simulations we find that it needs about 4 times as many floating point operations. We also show that further gain of about 4 dB is obtained by local search around the estimate given by the LRR receiver.
Resumo:
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for'optimal' settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5% average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19% (with an average of 9.5%) over highly optimized binaries.
Resumo:
Predictions of two popular closed-form models for unsaturated hydraulic conductivity (K) are compared with in situ measurements made in a sandy loam field soil. Whereas the Van Genuchten model estimates were very close to field measured values, the Brooks-Corey model predictions were higher by about one order of magnitude in the wetter range. Estimation of parameters of the Van Genuchten soil moisture characteristic (SMC) equation, however, involves the use of non-linear regression techniques. The Brooks-Corey SMC equation has the advantage of being amenable to application of linear regression techniques for estimation of its parameters from retention data. A conversion technique, whereby known Brooks-Corey model parameters may be converted into Van Genuchten model parameters, is formulated. The proposed conversion algorithm may be used to obtain the parameters of the preferred Van Genuchten model from in situ retention data, without the use of non-linear regression techniques.
Resumo:
Several statistical downscaling models have been developed in the past couple of decades to assess the hydrologic impacts of climate change by projecting the station-scale hydrological variables from large-scale atmospheric variables simulated by general circulation models (GCMs). This paper presents and compares different statistical downscaling models that use multiple linear regression (MLR), positive coefficient regression (PCR), stepwise regression (SR), and support vector machine (SVM) techniques for estimating monthly rainfall amounts in the state of Florida. Mean sea level pressure, air temperature, geopotential height, specific humidity, U wind, and V wind are used as the explanatory variables/predictors in the downscaling models. Data for these variables are obtained from the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) reanalysis dataset and the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled Global Climate Model, version 3 (CGCM3) GCM simulations. The principal component analysis (PCA) and fuzzy c-means clustering method (FCM) are used as part of downscaling model to reduce the dimensionality of the dataset and identify the clusters in the data, respectively. Evaluation of the performances of the models using different error and statistical measures indicates that the SVM-based model performed better than all the other models in reproducing most monthly rainfall statistics at 18 sites. Output from the third-generation CGCM3 GCM for the A1B scenario was used for future projections. For the projection period 2001-10, MLR was used to relate variables at the GCM and NCEP grid scales. Use of MLR in linking the predictor variables at the GCM and NCEP grid scales yielded better reproduction of monthly rainfall statistics at most of the stations (12 out of 18) compared to those by spatial interpolation technique used in earlier studies.
Resumo:
Background: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. Results: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. Conclusion: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational – experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Resumo:
The soil moisture characteristic (SMC) forms an important input to mathematical models of water and solute transport in the unsaturated-soil zone. Owing to their simplicity and ease of use, texture-based regression models are commonly used to estimate the SMC from basic soil properties. In this study, the performances of six such regression models were evaluated on three soils. Moisture characteristics generated by the regression models were statistically compared with the characteristics developed independently from laboratory and in-situ retention data of the soil profiles. Results of the statistical performance evaluation, while providing useful information on the errors involved in estimating the SMC, also highlighted the importance of the nature of the data set underlying the regression models. Among the models evaluated, the one possessing an underlying data set of in-situ measurements was found to be the best estimator of the in-situ SMC for all the soils. Considerable errors arose when a textural model based on laboratory data was used to estimate the field retention characteristics of unsaturated soils.
Resumo:
Sub-pixel classification is essential for the successful description of many land cover (LC) features with spatial resolution less than the size of the image pixels. A commonly used approach for sub-pixel classification is linear mixture models (LMM). Even though, LMM have shown acceptable results, pragmatically, linear mixtures do not exist. A non-linear mixture model, therefore, may better describe the resultant mixture spectra for endmember (pure pixel) distribution. In this paper, we propose a new methodology for inferring LC fractions by a process called automatic linear-nonlinear mixture model (AL-NLMM). AL-NLMM is a three step process where the endmembers are first derived from an automated algorithm. These endmembers are used by the LMM in the second step that provides abundance estimation in a linear fashion. Finally, the abundance values along with the training samples representing the actual proportions are fed to multi-layer perceptron (MLP) architecture as input to train the neurons which further refines the abundance estimates to account for the non-linear nature of the mixing classes of interest. AL-NLMM is validated on computer simulated hyperspectral data of 200 bands. Validation of the output showed overall RMSE of 0.0089±0.0022 with LMM and 0.0030±0.0001 with the MLP based AL-NLMM, when compared to actual class proportions indicating that individual class abundances obtained from AL-NLMM are very close to the real observations.
Resumo:
Climate change impact assessment studies involve downscaling large-scale atmospheric predictor variables (LSAPVs) simulated by general circulation models (GCMs) to site-scale meteorological variables. This article presents a least-square support vector machine (LS-SVM)-based methodology for multi-site downscaling of maximum and minimum daily temperature series. The methodology involves (1) delineation of sites in the study area into clusters based on correlation structure of predictands, (2) downscaling LSAPVs to monthly time series of predictands at a representative site identified in each of the clusters, (3) translation of the downscaled information in each cluster from the representative site to that at other sites using LS-SVM inter-site regression relationships, and (4) disaggregation of the information at each site from monthly to daily time scale using k-nearest neighbour disaggregation methodology. Effectiveness of the methodology is demonstrated by application to data pertaining to four sites in the catchment of Beas river basin, India. Simulations of Canadian coupled global climate model (CGCM3.1/T63) for four IPCC SRES scenarios namely A1B, A2, B1 and COMMIT were downscaled to future projections of the predictands in the study area. Comparison of results with those based on recently proposed multivariate multiple linear regression (MMLR) based downscaling method and multi-site multivariate statistical downscaling (MMSD) method indicate that the proposed method is promising and it can be considered as a feasible choice in statistical downscaling studies. The performance of the method in downscaling daily minimum temperature was found to be better when compared with that in downscaling daily maximum temperature. Results indicate an increase in annual average maximum and minimum temperatures at all the sites for A1B, A2 and B1 scenarios. The projected increment is high for A2 scenario, and it is followed by that for A1B, B1 and COMMIT scenarios. Projections, in general, indicated an increase in mean monthly maximum and minimum temperatures during January to February and October to December.
Resumo:
In this paper, we present a novel algorithm for piecewise linear regression which can learn continuous as well as discontinuous piecewise linear functions. The main idea is to repeatedly partition the data and learn a linear model in each partition. The proposed algorithm is similar in spirit to k-means clustering algorithm. We show that our algorithm can also be viewed as a special case of an EM algorithm for maximum likelihood estimation under a reasonable probability model. We empirically demonstrate the effectiveness of our approach by comparing its performance with that of the state of art algorithms on various datasets. (C) 2014 Elsevier Inc. All rights reserved.
Resumo:
Nanoparticle deposition behavior observed at the Darcy scale represents an average of the processes occurring at the pore scale. Hence, the effect of various pore-scale parameters on nanoparticle deposition can be understood by studying nanoparticle transport at pore scale and upscaling the results to the Darcy scale. In this work, correlation equations for the deposition rate coefficients of nanoparticles in a cylindrical pore are developed as a function of nine pore-scale parameters: the pore radius, nanoparticle radius, mean flow velocity, solution ionic strength, viscosity, temperature, solution dielectric constant, and nanoparticle and collector surface potentials. Based on dominant processes, the pore space is divided into three different regions, namely, bulk, diffusion, and potential regions. Advection-diffusion equations for nanoparticle transport are prescribed for the bulk and diffusion regions, while the interaction between the diffusion and potential regions is included as a boundary condition. This interaction is modeled as a first-order reversible kinetic adsorption. The expressions for the mass transfer rate coefficients between the diffusion and the potential regions are derived in terms of the interaction energy profile. Among other effects, we account for nanoparticle-collector interaction forces on nanoparticle deposition. The resulting equations are solved numerically for a range of values of pore-scale parameters. The nanoparticle concentration profile obtained for the cylindrical pore is averaged over a moving averaging volume within the pore in order to get the 1-D concentration field. The latter is fitted to the 1-D advection-dispersion equation with an equilibrium or kinetic adsorption model to determine the values of the average deposition rate coefficients. In this study, pore-scale simulations are performed for three values of Peclet number, Pe = 0.05, 5, and 50. We find that under unfavorable conditions, the nanoparticle deposition at pore scale is best described by an equilibrium model at low Peclet numbers (Pe = 0.05) and by a kinetic model at high Peclet numbers (Pe = 50). But, at an intermediate Pe (e.g., near Pe = 5), both equilibrium and kinetic models fit the 1-D concentration field. Correlation equations for the pore-averaged nanoparticle deposition rate coefficients under unfavorable conditions are derived by performing a multiple-linear regression analysis between the estimated deposition rate coefficients for a single pore and various pore-scale parameters. The correlation equations, which follow a power law relation with nine pore-scale parameters, are found to be consistent with the column-scale and pore-scale experimental results, and qualitatively agree with the colloid filtration theory. These equations can be incorporated into pore network models to study the effect of pore-scale parameters on nanoparticle deposition at larger length scales such as Darcy scale.