919 resultados para generalised least squares
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
This study aims to present an alternative calculation methodology based on the Least Squares Method for determining the modulus of elasticity in bending wooden beams of structural dimensions. The equations developed require knowledge of three or five points measured in displacements along the piece, allowing greater reliability on the response variable, using the statistical bending test at three points and non-destructively, resulting from imposition of measures from small displacements L/300 and L/200, the largest being stipulated by the Brazilian norm NBR 7190:1997. The woods tested were Angico, Cumaru, Garapa and Jatoba. Besides obtaining the modulus of elasticity through the alternative methodology proposed, these were also obtained employing the Brazilian norm NBR 7190:1997, adapted to the condition of non-destructive testing (small displacements) and for pieces of structural dimensions. The results of the modulus of elasticity of the four species of wood according to both calculation approaches used proved to be equivalent, implying the good approximation provided by the methodology of calculation adapted from the Brazilian norm.
Resumo:
ABSTRACT This study aimed to identify wavelengths based on leaf reflectance (400-1050 nm) to estimate white mold severity in common beans at different seasons. Two experiments were carried out, one during fall and another in winter. Partial Least Squares (PLS) regression was used to establish a set of wavelengths that better estimates the disease severity at a specific date. Therefore, observations were previously divided in two sub-groups. The first one (calibration) was used for model building and the second subgroup for model testing. Error measurements and correlation between measured and predicted values of disease severity index were employed to provide the best wavelengths in both seasons. The average indexes of each experiment were of 5.8% and 7.4%, which is considered low. Spectral bands ranged between blue and green, green and red, and red and infrared, being most sensitive for disease estimation. Beyond the transition ranges, other spectral regions also presented wavelengths with potential to determine the disease severity, such as red, green, and near infrared.
Resumo:
Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i
Resumo:
A model for predicting temperature evolution for automatic controling systems in manufacturing processes requiring the coiling of bars in the transfer table is presented. Although the method is of a general nature, the presentation in this work refers to the manufacturing of steel plates in hot rolling mills. The predicting strategy is based on a mathematical model of the evolution of temperature in a coiling and uncoiling bar and is presented in the form of a parabolic partial differential equation for a shape changing domain. The mathematical model is solved numerically by a space discretization via geometrically adaptive finite elements which accomodate the change in shape of the domain, using a computationally novel treatment of the resulting thermal contact problem due to coiling. Time is discretized according to a Crank-Nicolson scheme. Since the actual physical process takes less time than the time required by the process controlling computer to solve the full mathematical model, a special predictive device was developed, in the form of a set of least squares polynomials, based on the off-line numerical solution of the mathematical model.
Resumo:
The purpose of this thesis is to investigate whether different private equity fund characteristics have any influence on the fund performance. Fund characteristics include fund type (venture capital or buyouts), fund size (sizes of funds are divided into six ranges), fund investment industry, fund sequence (first fund or follow-on fund) and investment market (US or EMEA). Fund performance is measured by internal rate of return, and tested by cross-sectional regression analysis with the method of Ordinary Least Squares. The data employs performance and characteristics of 997 private equity funds between 1985 and 2008. Our findings are that fund type has effect on fund performance. The average IRR of venture capital funds is 2.7% less than average IRR of buyout funds. However, We did not find any relationship between fund size and performance, and between fund sequence and performance. Funds based on US market perform better than funds based on EMEA market. The fund performance differs across different industries. The average IRRs of industrial/energy industry, consumer related industry, communications and media industry and medical/health industry are higher than the average IRR of other industries.
Resumo:
Experiential marketing is increasingly seen as a new magical key to consumers’ hearts. Brands are turning brick-and-mortar stores into state of the art retail spaces where memorable experiences and strong brand relationships are hoped to be born. Around the globe, several brands have opened up a special format of stores – the experience store. Although many speculations on the positive effects of experiences have been presented, few studies have provided empirical, quantified evidence for the link between store experiences and brand success. In consequence, research was needed to find out whether experience stores truly are so special. The purpose of this thesis was to investigate whether store experiences are capable of building brands and influencing store performance. For this purpose, empirical research was conducted in the Samsung Experience Store Helsinki. As main constructs of the study, store experience, brand equity, store performance, and product class involvement were measured, along with relevant background variables. Data was collected with an electronic survey from actual customers of the store, resulting in a sample of 131 respondents. Partial least squares structural equations modeling (PLS) was used for the analysis of the research model. Also, regression analysis was conducted to account for mediation and moderation effects. The results showed that store experiences do positively influence first, store performance, and second, separate dimensions of brand equity (that is, brand awareness, brand personality, and brand loyalty). Also, the effect of store experiences on store performance was found to be mediated by brand equity. Interestingly, customers’ product class involvement was detected to moderate the effect of store experience on store performance. That is, those who were highly involved with electronics had greater store experiences, and also displayed a stronger linkage between store experience and store performance. The results encourage marketers to continue with efforts to create great experiences for their customers. Experience stores can – and should be seen – as both powerful brand building tools and profitable sales channels. The creation of exceptional experiences can act as an important function of physical stores in the face of severe online competition.
Resumo:
The aim of this study was to test the hypothesis of differences in performance including differences in ST-T wave changes between healthy men and women submitted to an exercise stress test. Two hundred (45.4%) men and 241 (54.6%) women (mean age: 38.7 ± 11.0 years) were submitted to an exercise stress test. Physiologic and electrocardiographic variables were compared by the Student t-test and the chi-square test. To test the hypothesis of differences in ST-segment changes, data were ranked with functional models based on weighted least squares. To evaluate the influence of gender and age on the diagnosis of ST-segment abnormality, a logistic model was adjusted; P < 0.05 was considered to be significant. Rate-pressure product, duration of exercise and estimated functional capacity were higher in men (P < 0.05). Sixteen (6.7%) women and 9 (4.5%) men demonstrated ST-segment upslope ≥0.15 mV or downslope ≥0.10 mV; the difference was not statistically significant. Age increase of one year added 4% to the chance of upsloping of segment ST ≥0.15 mV or downsloping of segment ST ≥0.1 mV (P = 0.03; risk ratio = 1.040, 95% confidence interval (CI) = 1.002-1.080). Heart rate recovery was higher in women (P < 0.05). The chance of women showing an increase of systolic blood pressure ≤30 mmHg was 85% higher (P = 0.01; risk ratio = 1.85, 95%CI = 1.1-3.05). No significant difference in the frequency of ST-T wave changes was observed between men and women. Other differences may be related to different physical conditioning.
Resumo:
Research on molecular mechanisms of carcinogenesis plays an important role in diagnosing and treating gastric cancer. Metabolic profiling may offer the opportunity to understand the molecular mechanism of carcinogenesis and help to non-invasively identify the potential biomarkers for the early diagnosis of human gastric cancer. The aims of this study were to explore the underlying metabolic mechanisms of gastric cancer and to identify biomarkers associated with morbidity. Gas chromatography/mass spectrometry (GC/MS) was used to analyze the serum metabolites of 30 Chinese gastric cancer patients and 30 healthy controls. Diagnostic models for gastric cancer were constructed using orthogonal partial least squares discriminant analysis (OPLS-DA). Acquired metabolomic data were analyzed by the nonparametric Wilcoxon test to find serum metabolic biomarkers for gastric cancer. The OPLS-DA model showed adequate discrimination between cancer and non-cancer cohorts while the model failed to discriminate different pathological stages (I-IV) of gastric cancer patients. A total of 44 endogenous metabolites such as amino acids, organic acids, carbohydrates, fatty acids, and steroids were detected, of which 18 differential metabolites were identified with significant differences. A total of 13 variables were obtained for their greatest contribution in the discriminating OPLS-DA model [variable importance in the projection (VIP) value >1.0], among which 11 metabolites were identified using both VIP values (VIP >1) and the Wilcoxon test. These metabolites potentially revealed perturbations of glycolysis and of amino acid, fatty acid, cholesterol, and nucleotide metabolism of gastric cancer patients. These results suggest that gastric cancer serum metabolic profiling has great potential in detecting this disease and helping to understand its metabolic mechanisms.
Resumo:
A comparative analysis of the theoretical-experimental study, developed by Hsu on the hydration of Amsoy 71 soybean grain, was performed through several soaking experiments using CD 202 soybean at 10, 20, 30, 40, and 50 °C, measuring moisture content over time. The results showed that CD 202 soybean equilibrium moisture content, Xeq, does not depend on temperature and is 21% higher than that found by Hsu, suggesting that soybean cultivar exerts great influence on Xeq. The Hsu model was numerically solved and its parameters were adjusted by the least squares method, with maximum deviations of +/- 10% relative to the experimental values. The limiting step in the mass transfer process during hydration corresponds to water diffusion inside the grain, leading to radial moisture gradients that decrease over time and with an increase in temperature. Regardless of the soybean cultivar, diffusivity increases as temperature or moisture content increases. However, the values of this transport property for Amsoy 71 were superior to those of CD 202, very close at the beginning of hydration at 20 °C and almost three times higher at the end of hydration at 50 °C.
Resumo:
Decreased gustatory and olfactory capacity is one of the problems caused by tobacco use. The objectives of this study were to determine the sensory profile of six grape nectar samples sweetened with different sweeteners and to verify the drivers of liking in two distinct consumer groups: smokers and nonsmokers. The sensory profile was constructed by twelve trained panelists using quantitative descriptive analysis (QDA). Consumer tests were performed with 112 smokers and 112 nonsmokers. Partial least squares regression analyses was used to identify the drivers of acceptance and rejection of the grape nectars among the two consumer groups. According to the QDA, the samples differed regarding six of the nineteen attributes generated. The absolute averages of the affective test were lower in the group of smokers; possibly because smoking influences acceptance and eating preferences, especially with regard to sweet foods. The results showed that the grape flavor was the major driver of preference for acceptance of the nectar, while astringency, wine aroma, bitterness and sweetness, and bitter aftertaste were drivers of rejection in the two groups of consumers, with some differences between the groups.
Resumo:
This work investigates theoretical properties of symmetric and anti-symmetric kernels. First chapters give an overview of the theory of kernels used in supervised machine learning. Central focus is on the regularized least squares algorithm, which is motivated as a problem of function reconstruction through an abstract inverse problem. Brief review of reproducing kernel Hilbert spaces shows how kernels define an implicit hypothesis space with multiple equivalent characterizations and how this space may be modified by incorporating prior knowledge. Mathematical results of the abstract inverse problem, in particular spectral properties, pseudoinverse and regularization are recollected and then specialized to kernels. Symmetric and anti-symmetric kernels are applied in relation learning problems which incorporate prior knowledge that the relation is symmetric or anti-symmetric, respectively. Theoretical properties of these kernels are proved in a draft this thesis is based on and comprehensively referenced here. These proofs show that these kernels can be guaranteed to learn only symmetric or anti-symmetric relations, and they can learn any relations relative to the original kernel modified to learn only symmetric or anti-symmetric parts. Further results prove spectral properties of these kernels, central result being a simple inequality for the the trace of the estimator, also called the effective dimension. This quantity is used in learning bounds to guarantee smaller variance.
Resumo:
The aim of this study was to contribute to the current knowledge-based theory by focusing on a research gap that exists in the empirically proven determination of the simultaneous but differentiable effects of intellectual capital (IC) assets and knowledge management (KM) practices on organisational performance (OP). The analysis was built on the past research and theoreticised interactions between the latent constructs specified using the survey-based items that were measured from a sample of Finnish companies for IC and KM and the dependent construct for OP determined using information available from financial databases. Two widely used and commonly recommended measures in the literature on management science, i.e. the return on total assets (ROA) and the return on equity (ROE), were calculated for OP. Thus the investigation of the relationship between IC and KM impacting OP in relation to the hypotheses founded was possible to conduct using objectively derived performance indicators. Using financial OP measures also strengthened the dynamic features of data needed in analysing simultaneous and causal dependences between the modelled constructs specified using structural path models. The estimates were obtained for the parameters of structural path models using a partial least squares-based regression estimator. Results showed that the path dependencies between IC and OP or KM and OP were always insignificant when analysed separate to any other interactions or indirect effects caused by simultaneous modelling and regardless of the OP measure used that was either ROA or ROE. The dependency between the constructs for KM and IC appeared to be very strong and was always significant when modelled simultaneously with other possible interactions between the constructs and using either ROA or ROE to define OP. This study, however, did not find statistically unambiguous evidence for proving the hypothesised causal mediation effects suggesting, for instance, that the effects of KM practices on OP are mediated by the IC assets. Due to the fact that some indication about the fluctuations of causal effects was assessed, it was concluded that further studies are needed for verifying the fundamental and likely hidden causal effects between the constructs of interest. Therefore, it was also recommended that complementary modelling and data processing measures be conducted for elucidating whether the mediation effects occur between IC, KM and OP, the verification of which requires further investigations of measured items and can be build on the findings of this study.
Resumo:
This thesis concerns the analysis of epidemic models. We adopt the Bayesian paradigm and develop suitable Markov Chain Monte Carlo (MCMC) algorithms. This is done by considering an Ebola outbreak in the Democratic Republic of Congo, former Zaïre, 1995 as a case of SEIR epidemic models. We model the Ebola epidemic deterministically using ODEs and stochastically through SDEs to take into account a possible bias in each compartment. Since the model has unknown parameters, we use different methods to estimate them such as least squares, maximum likelihood and MCMC. The motivation behind choosing MCMC over other existing methods in this thesis is that it has the ability to tackle complicated nonlinear problems with large number of parameters. First, in a deterministic Ebola model, we compute the likelihood function by sum of square of residuals method and estimate parameters using the LSQ and MCMC methods. We sample parameters and then use them to calculate the basic reproduction number and to study the disease-free equilibrium. From the sampled chain from the posterior, we test the convergence diagnostic and confirm the viability of the model. The results show that the Ebola model fits the observed onset data with high precision, and all the unknown model parameters are well identified. Second, we convert the ODE model into a SDE Ebola model. We compute the likelihood function using extended Kalman filter (EKF) and estimate parameters again. The motivation of using the SDE formulation here is to consider the impact of modelling errors. Moreover, the EKF approach allows us to formulate a filtered likelihood for the parameters of such a stochastic model. We use the MCMC procedure to attain the posterior distributions of the parameters of the SDE Ebola model drift and diffusion parts. In this thesis, we analyse two cases: (1) the model error covariance matrix of the dynamic noise is close to zero , i.e. only small stochasticity added into the model. The results are then similar to the ones got from deterministic Ebola model, even if methods of computing the likelihood function are different (2) the model error covariance matrix is different from zero, i.e. a considerable stochasticity is introduced into the Ebola model. This accounts for the situation where we would know that the model is not exact. As a results, we obtain parameter posteriors with larger variances. Consequently, the model predictions then show larger uncertainties, in accordance with the assumption of an incomplete model.
Resumo:
The purpose of this study was to identify the impact of stressors and offsetting satistiers, measured in this study with Stress Offset Score (SOS), on intentions to quit and examine the mediating and moderating effects of three facets of work satisfaction (job satisfaction, pay satisfaction, and satisfaction with supervisor) and two facets of organizational commitment (affective and nonnative commitment) on this relationship. The sample was composed of 2990 employees from 21 public and private organizations. The interaction of each type of work satisfaction and organizational commitment, with SOS, was tested using Ordinary Least Squares (OLS) procedures. Intentions to quit was the dependent variable. The research questions were determine: (1) Does SOS predict intentions to quit? (2) Does work satisfaction mediate the predictive relationship of SOS on intentions to quit? (3) Does organizational commitment mediate the predictive relationship of SOS on intent to quit? (4) Does work satisfaction moderate the predictive relationship of SOS on intentions to quit? and (5) Does organizational commitment moderate the predictive relationship of SOS on intentions to quit? The results indicated that SOS was negatively correlated with intentions to quit. Each of the types of work satisfaction and organizational commitment variables showed a partial mediated relationship with SOS and each relationship was highly significant, while normative commitment explained more of the relationship then other mediators. The study also tested for interactions but no statistical significant relationships where established between any of the interaction terms (e.g., SOSxJob Satisfaction and SOSxAffcctive Commitment) and intentions to quit.