963 resultados para Regression methods
Resumo:
Fractal and multifractal are concepts that have grown increasingly popular in recent years in the soil analysis, along with the development of fractal models. One of the common steps is to calculate the slope of a linear fit commonly using least squares method. This shouldn?t be a special problem, however, in many situations using experimental data the researcher has to select the range of scales at which is going to work neglecting the rest of points to achieve the best linearity that in this type of analysis is necessary. Robust regression is a form of regression analysis designed to circumvent some limitations of traditional parametric and non-parametric methods. In this method we don?t have to assume that the outlier point is simply an extreme observation drawn from the tail of a normal distribution not compromising the validity of the regression results. In this work we have evaluated the capacity of robust regression to select the points in the experimental data used trying to avoid subjective choices. Based on this analysis we have developed a new work methodology that implies two basic steps: ? Evaluation of the improvement of linear fitting when consecutive points are eliminated based on R pvalue. In this way we consider the implications of reducing the number of points. ? Evaluation of the significance of slope difference between fitting with the two extremes points and fitted with the available points. We compare the results applying this methodology and the common used least squares one. The data selected for these comparisons are coming from experimental soil roughness transect and simulated based on middle point displacement method adding tendencies and noise. The results are discussed indicating the advantages and disadvantages of each methodology.
Resumo:
The purpose of this study was to compare a number of state-of-the-art methods in airborne laser scan- ning (ALS) remote sensing with regards to their capacity to describe tree size inequality and other indi- cators related to forest structure. The indicators chosen were based on the analysis of the Lorenz curve: Gini coefficient ( GC ), Lorenz asymmetry ( LA ), the proportions of basal area ( BALM ) and stem density ( NSLM ) stocked above the mean quadratic diameter. Each method belonged to one of these estimation strategies: (A) estimating indicators directly; (B) estimating the whole Lorenz curve; or (C) estimating a complete tree list. Across these strategies, the most popular statistical methods for area-based approach (ABA) were used: regression, random forest (RF), and nearest neighbour imputation. The latter included distance metrics based on either RF (NN–RF) or most similar neighbour (MSN). In the case of tree list esti- mation, methods based on individual tree detection (ITD) and semi-ITD, both combined with MSN impu- tation, were also studied. The most accurate method was direct estimation by best subset regression, which obtained the lowest cross-validated coefficients of variation of their root mean squared error CV(RMSE) for most indicators: GC (16.80%), LA (8.76%), BALM (8.80%) and NSLM (14.60%). Similar figures [CV(RMSE) 16.09%, 10.49%, 10.93% and 14.07%, respectively] were obtained by MSN imputation of tree lists by ABA, a method that also showed a number of additional advantages, such as better distributing the residual variance along the predictive range. In light of our results, ITD approaches may be clearly inferior to ABA with regards to describing the structural properties related to tree size inequality in for- ested areas.
Resumo:
Social behavior is mainly based on swarm colonies, in which each individual shares its knowledge about the environment with other individuals to get optimal solutions. Such co-operative model differs from competitive models in the way that individuals die and are born by combining information of alive ones. This paper presents the particle swarm optimization with differential evolution algorithm in order to train a neural network instead the classic back propagation algorithm. The performance of a neural network for particular problems is critically dependant on the choice of the processing elements, the net architecture and the learning algorithm. This work is focused in the development of methods for the evolutionary design of artificial neural networks. This paper focuses in optimizing the topology and structure of connectivity for these networks
Resumo:
As world communication, technology, and trade become increasingly integrated through globalization, multinational corporations seek employees with global leadership experience and skills. However, the demand for these skills currently outweighs the supply. Given the rarity of globally ready leaders, global competency development should be emphasized in higher education programs. The reality, however, is that university graduate programs are often outdated and focus mostly on cognitive learning. Global leadership competence requires moving beyond the cognitive domain of learning to create socially responsible and culturally connected global leaders. This requires attention to development methods; however, limited research in global leadership development methods has been conducted. A new conceptual model, the global leadership development ecosystem, was introduced in this study to guide the design and evaluation of global leadership development programs. It was based on three theories of learning and was divided into four development methodologies. This study quantitatively tested the model and used it as a framework for an in-depth examination of the design of one International MBA program. The program was first benchmarked, by means of a qualitative best practices analysis, against the top-ranking IMBA programs in the world. Qualitative data from students, faculty, administrators, and staff was then examined, using descriptive and focused data coding. Quantitative data analysis, using PASW Statistics software, and a hierarchical regression, showed the individual effect of each of the four development methods, as well as their combined effect, on student scores on a global leadership assessment. The analysis revealed that each methodology played a distinct and important role in developing different competencies of global leadership. It also confirmed the critical link between self-efficacy and global leadership development.
Resumo:
624-B
Resumo:
Thesis (Master's)--University of Washington, 2016-06
Resumo:
Background and Objective: To examine if commonly recommended assumptions for multivariable logistic regression are addressed in two major epidemiological journals. Methods: Ninety-nine articles from the Journal of Clinical Epidemiology and the American Journal of Epidemiology were surveyed for 10 criteria: six dealing with computation and four with reporting multivariable logistic regression results. Results: Three of the 10 criteria were addressed in 50% or more of the articles. Statistical significance testing or confidence intervals were reported in all articles. Methods for selecting independent variables were described in 82%, and specific procedures used to generate the models were discussed in 65%. Fewer than 50% of the articles indicated if interactions were tested or met the recommended events per independent variable ratio of 10: 1. Fewer than 20% of the articles described conformity to a linear gradient, examined collinearity, reported information on validation procedures, goodness-of-fit, discrimination statistics, or provided complete information on variable coding. There was no significant difference (P >.05) in the proportion of articles meeting the criteria across the two journals. Conclusion: Articles reviewed frequently did not report commonly recommended assumptions for using multivariable logistic regression. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Background: Protein tertiary structure can be partly characterized via each amino acid's contact number measuring how residues are spatially arranged. The contact number of a residue in a folded protein is a measure of its exposure to the local environment, and is defined as the number of C-beta atoms in other residues within a sphere around the C-beta atom of the residue of interest. Contact number is partly conserved between protein folds and thus is useful for protein fold and structure prediction. In turn, each residue's contact number can be partially predicted from primary amino acid sequence, assisting tertiary fold analysis from sequence data. In this study, we provide a more accurate contact number prediction method from protein primary sequence. Results: We predict contact number from protein sequence using a novel support vector regression algorithm. Using protein local sequences with multiple sequence alignments (PSI-BLAST profiles), we demonstrate a correlation coefficient between predicted and observed contact numbers of 0.70, which outperforms previously achieved accuracies. Including additional information about sequence weight and amino acid composition further improves prediction accuracies significantly with the correlation coefficient reaching 0.73. If residues are classified as being either contacted or non-contacted, the prediction accuracies are all greater than 77%, regardless of the choice of classification thresholds. Conclusion: The successful application of support vector regression to the prediction of protein contact number reported here, together with previous applications of this approach to the prediction of protein accessible surface area and B-factor profile, suggests that a support vector regression approach may be very useful for determining the structure-function relation between primary sequence and higher order consecutive protein structural and functional properties.
Resumo:
Background Regression to the mean (RTM) is a statistical phenomenon that can make natural variation in repeated data look like real change. It happens when unusually large or small measurements tend to be followed by measurements that are closer to the mean. Methods We give some examples of the phenomenon, and discuss methods to overcome it at the design and analysis stages of a study. Results The effect of RTM in a sample becomes more noticeable with increasing measurement error and when follow-up measurements are only examined on a sub-sample selected using a baseline value. Conclusions RTM is a ubiquitous phenomenon in repeated data and should always be considered as a possible cause of an observed change. Its effect can be alleviated through better study design and use of suitable statistical methods.
Resumo:
Background: The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. Results: We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. Conclusion: The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
Resumo:
Promoted-ignition testing on carbon steel rods of varying cross-sectional area and shape was performed in high pressure oxygen to assess the effect of sample geometry on the regression rate of the melting interface. Cylindrical and rectangular geometries and three different cross sections were tested and the regression rates of the cylinders were compared to the regression rates of the rectangular samples at test pressures around 6.9 MPa. Tests were recorded and video analysis used to determine the regression rate of the melting interface by a new method based on a drop cycle which was found to provide a good basis for statistical analysis and provide excellent agreement to the standard averaging methods used. Both geometries tested showed the typical trend of decreasing regression rate of the melting interface with increasing cross-sectional area; however, it was shown that the effect of geometry is more significant as the sample's cross sections become larger. Discussion is provided regarding the use of 3.2-mm square rods rather than 3.2-mm cylindrical rods within the standard ASTM test and any effect this may have on the observed regression rate of the melting interface.
Resumo:
The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results.
Resumo:
Gaussian processes provide natural non-parametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Resumo:
The problem of regression under Gaussian assumptions is treated generally. The relationship between Bayesian prediction, regularization and smoothing is elucidated. The ideal regression is the posterior mean and its computation scales as O(n3), where n is the sample size. We show that the optimal m-dimensional linear model under a given prior is spanned by the first m eigenfunctions of a covariance operator, which is a trace-class operator. This is an infinite dimensional analogue of principal component analysis. The importance of Hilbert space methods to practical statistics is also discussed.
Resumo:
It is generally assumed when using Bayesian inference methods for neural networks that the input data contains no noise or corruption. For real-world (errors in variable) problems this is clearly an unsafe assumption. This paper presents a Bayesian neural network framework which allows for input noise given that some model of the noise process exists. In the limit where this noise process is small and symmetric it is shown, using the Laplace approximation, that there is an additional term to the usual Bayesian error bar which depends on the variance of the input noise process. Further, by treating the true (noiseless) input as a hidden variable and sampling this jointly with the network's weights, using Markov Chain Monte Carlo methods, it is demonstrated that it is possible to infer the unbiassed regression over the noiseless input.