905 resultados para partial least-squares regression
Resumo:
Quantile regression (QR) was first introduced by Roger Koenker and Gilbert Bassett in 1978. It is robust to outliers which affect least squares estimator on a large scale in linear regression. Instead of modeling mean of the response, QR provides an alternative way to model the relationship between quantiles of the response and covariates. Therefore, QR can be widely used to solve problems in econometrics, environmental sciences and health sciences. Sample size is an important factor in the planning stage of experimental design and observational studies. In ordinary linear regression, sample size may be determined based on either precision analysis or power analysis with closed form formulas. There are also methods that calculate sample size based on precision analysis for QR like C.Jennen-Steinmetz and S.Wellek (2005). A method to estimate sample size for QR based on power analysis was proposed by Shao and Wang (2009). In this paper, a new method is proposed to calculate sample size based on power analysis under hypothesis test of covariate effects. Even though error distribution assumption is not necessary for QR analysis itself, researchers have to make assumptions of error distribution and covariate structure in the planning stage of a study to obtain a reasonable estimate of sample size. In this project, both parametric and nonparametric methods are provided to estimate error distribution. Since the method proposed can be implemented in R, user is able to choose either parametric distribution or nonparametric kernel density estimation for error distribution. User also needs to specify the covariate structure and effect size to carry out sample size and power calculation. The performance of the method proposed is further evaluated using numerical simulation. The results suggest that the sample sizes obtained from our method provide empirical powers that are closed to the nominal power level, for example, 80%.
Resumo:
One of the most disputable matters in the theory of finance has been the theory of capital structure. The seminal contributions of Modigliani and Miller (1958, 1963) gave rise to a multitude of studies and debates. Since the initial spark, the financial literature has offered two competing theories of financing decision: the trade-off theory and the pecking order theory. The trade-off theory suggests that firms have an optimal capital structure balancing the benefits and costs of debt. The pecking order theory approaches the firm capital structure from information asymmetry perspective and assumes a hierarchy of financing, with firms using first internal funds, followed by debt and as a last resort equity. This thesis analyses the trade-off and pecking order theories and their predictions on a panel data consisting 78 Finnish firms listed on the OMX Helsinki stock exchange. Estimations are performed for the period 2003–2012. The data is collected from Datastream system and consists of financial statement data. A number of capital structure characteristics are identified: firm size, profitability, firm growth opportunities, risk, asset tangibility and taxes, speed of adjustment and financial deficit. A regression analysis is used to examine the effects of the firm characteristics on capitals structure. The regression models were formed based on the relevant theories. The general capital structure model is estimated with fixed effects estimator. Additionally, dynamic models play an important role in several areas of corporate finance, but with the combination of fixed effects and lagged dependent variables the model estimation is more complicated. A dynamic partial adjustment model is estimated using Arellano and Bond (1991) first-differencing generalized method of moments, the ordinary least squares and fixed effects estimators. The results for Finnish listed firms show support for the predictions of profitability, firm size and non-debt tax shields. However, no conclusive support for the pecking-order theory is found. However, the effect of pecking order cannot be fully ignored and it is concluded that instead of being substitutes the trade-off and pecking order theory appear to complement each other. For the partial adjustment model the results show that Finnish listed firms adjust towards their target capital structure with a speed of 29% a year using book debt ratio.
Resumo:
A miniaturised gas analyser is described and evaluated based on the use of a substrate-integrated hollow waveguide (iHWG) coupled to a microsized near-infrared spectrophotometer comprising a linear variable filter and an array of InGaAs detectors. This gas sensing system was applied to analyse surrogate samples of natural fuel gas containing methane, ethane, propane and butane, quantified by using multivariate regression models based on partial least square (PLS) algorithms and Savitzky-Golay 1(st) derivative data preprocessing. The external validation of the obtained models reveals root mean square errors of prediction of 0.37, 0.36, 0.67 and 0.37% (v/v), for methane, ethane, propane and butane, respectively. The developed sensing system provides particularly rapid response times upon composition changes of the gaseous sample (approximately 2 s) due the minute volume of the iHWG-based measurement cell. The sensing system developed in this study is fully portable with a hand-held sized analyser footprint, and thus ideally suited for field analysis. Last but not least, the obtained results corroborate the potential of NIR-iHWG analysers for monitoring the quality of natural gas and petrochemical gaseous products.
Resumo:
This article analyzes the Brazilian political system from the local perspective. Following Cox (1997), we review the problems with electoral coordination that emerge from a given institutional framework. Due to the characteristics of the Brazilian Federal system and its electoral rules, linkage between the three levels of government is not guaranteed a priori, but demands a coordinating effort by the parties' leadership. According to our hypothesis, the parties are capable of coordinating their election strategies at different levels in the party system. Regression models based on two-stage least squares (2SLS) and TOBIT, analyzing a panel of Brazilian municipalities with data from the 1994 and 2000 elections, show that the proportion of votes received by a party in a given election correlates closely with its previous votes in majoritarian elections. Despite institutional incentives, the Brazilian party system shows evidence that it is organized nationally to the extent that it links the competition for votes at the three levels of government (National, State, and Municipal).
Resumo:
The majority of past and current individual-tree growth modelling methodologies have failed to characterise and incorporate structured stochastic components. Rather, they have relied on deterministic predictions or have added an unstructured random component to predictions. In particular, spatial stochastic structure has been neglected, despite being present in most applications of individual-tree growth models. Spatial stochastic structure (also called spatial dependence or spatial autocorrelation) eventuates when spatial influences such as competition and micro-site effects are not fully captured in models. Temporal stochastic structure (also called temporal dependence or temporal autocorrelation) eventuates when a sequence of measurements is taken on an individual-tree over time, and variables explaining temporal variation in these measurements are not included in the model. Nested stochastic structure eventuates when measurements are combined across sampling units and differences among the sampling units are not fully captured in the model. This review examines spatial, temporal, and nested stochastic structure and instances where each has been characterised in the forest biometry and statistical literature. Methodologies for incorporating stochastic structure in growth model estimation and prediction are described. Benefits from incorporation of stochastic structure include valid statistical inference, improved estimation efficiency, and more realistic and theoretically sound predictions. It is proposed in this review that individual-tree modelling methodologies need to characterise and include structured stochasticity. Possibilities for future research are discussed. (C) 2001 Elsevier Science B.V. All rights reserved.
Resumo:
This article examines the efficiency of the National Football League (NFL) betting market. The standard ordinary least squares (OLS) regression methodology is replaced by a probit model. This circumvents potential econometric problems, and allows us to implement more sophisticated betting strategies where bets are placed only when there is a relatively high probability of success. In-sample tests indicate that probit-based betting strategies generate statistically significant profits. Whereas the profitability of a number of these betting strategies is confirmed by out-of-sample testing, there is some inconsistency among the remaining out-of-sample predictions. Our results also suggest that widely documented inefficiencies in this market tend to dissipate over time.
Resumo:
This paper examines the trade relationship between the Gulf Cooperation Council (GCC) and the European Union (EU). A simultaneous equation regression model is developed and estimated to assist with the analysis. The regression results, using both the two stage least squares (2SLS) and ordinary least squares (OLS) estimation methods, reveal the existence of feedback effects between the two economic integrations. The results also show that during times of slack in oil prices, the GCC income from its investments overseas helped to finance its imports from the EU.
Resumo:
Environmental pollution continues to be an emerging study field, as there are thousands of anthropogenic compounds mixed in the environment whose possible mechanisms of toxicity and physiological outcomes are of great concern. Developing methods to access and prioritize the screening of these compounds at trace levels in order to support regulatory efforts is, therefore, very important. A methodology based on solid phase extraction followed by derivatization and gas chromatography-mass spectrometry analysis was developed for the assessment of four endocrine disrupting compounds (EDCs) in water matrices: bisphenol A, estrone, 17b-estradiol and 17a-ethinylestradiol. The study was performed, simultaneously, by two different laboratories in order to evaluate the robustness of the method and to increase the quality control over its application in routine analysis. Validation was done according to the International Conference on Harmonisation recommendations and other international guidelines with specifications for the GC-MS methodology. Matrix-induced chromatographic response enhancement was avoided by using matrix-standard calibration solutions and heteroscedasticity has been overtaken by a weighted least squares linear regression model application. Consistent evaluation of key analytical parameters such as extraction efficiency, sensitivity, specificity, linearity, limits of detection and quantification, precision, accuracy and robustness was done in accordance with standards established for acceptance. Finally, the application of the optimized method in the assessment of the selected analytes in environmental samples suggested that it is an expedite methodology for routine analysis of EDC residues in water matrices.
Resumo:
BACKGROUNDWhile the pharmaceutical industry keeps an eye on plasmid DNA production for new generation gene therapies, real-time monitoring techniques for plasmid bioproduction are as yet unavailable. This work shows the possibility of in situ monitoring of plasmid production in Escherichia coli cultures using a near infrared (NIR) fiber optic probe. RESULTSPartial least squares (PLS) regression models based on the NIR spectra were developed for predicting bioprocess critical variables such as the concentrations of biomass, plasmid, carbon sources (glucose and glycerol) and acetate. In order to achieve robust models able to predict the performance of plasmid production processes, independently of the composition of the cultivation medium, cultivation strategy (batch versus fed-batch) and E. coli strain used, three strategies were adopted, using: (i) E. coliDH5 cultures conducted under different media compositions and culture strategies (batch and fed-batch); (ii) engineered E. coli strains, MG1655endArecApgi and MG1655endArecA, grown on the same medium and culture strategy; (iii) diverse E. coli strains, over batch and fed-batch cultivations and using different media compositions. PLS models showed high accuracy for predicting all variables in the three groups of cultures. CONCLUSIONNIR spectroscopy combined with PLS modeling provides a fast, inexpensive and contamination-free technique to accurately monitoring plasmid bioprocesses in real time, independently of the medium composition, cultivation strategy and the E. coli strain used.
Resumo:
Human mesenchymal stem/stromal cells (MSCs) have received considerable attention in the field of cell-based therapies due to their high differentiation potential and ability to modulate immune responses. However, since these cells can only be isolated in very low quantities, successful realization of these therapies requires MSCs ex-vivo expansion to achieve relevant cell doses. The metabolic activity is one of the parameters often monitored during MSCs cultivation by using expensive multi-analytical methods, some of them time-consuming. The present work evaluates the use of mid-infrared (MIR) spectroscopy, through rapid and economic high-throughput analyses associated to multivariate data analysis, to monitor three different MSCs cultivation runs conducted in spinner flasks, under xeno-free culture conditions, which differ in the type of microcarriers used and the culture feeding strategy applied. After evaluating diverse spectral preprocessing techniques, the optimized partial least square (PLS) regression models based on the MIR spectra to estimate the glucose, lactate and ammonia concentrations yielded high coefficients of determination (R2 ≥ 0.98, ≥0.98, and ≥0.94, respectively) and low prediction errors (RMSECV ≤ 4.7%, ≤4.4% and ≤5.7%, respectively). Besides PLS models valid for specific expansion protocols, a robust model simultaneously valid for the three processes was also built for predicting glucose, lactate and ammonia, yielding a R2 of 0.95, 0.97 and 0.86, and a RMSECV of 0.33, 0.57, and 0.09 mM, respectively. Therefore, MIR spectroscopy combined with multivariate data analysis represents a promising tool for both optimization and control of MSCs expansion processes.
Resumo:
Hyperspectral remote sensing exploits the electromagnetic scattering patterns of the different materials at specific wavelengths [2, 3]. Hyperspectral sensors have been developed to sample the scattered portion of the electromagnetic spectrum extending from the visible region through the near-infrared and mid-infrared, in hundreds of narrow contiguous bands [4, 5]. The number and variety of potential civilian and military applications of hyperspectral remote sensing is enormous [6, 7]. Very often, the resolution cell corresponding to a single pixel in an image contains several substances (endmembers) [4]. In this situation, the scattered energy is a mixing of the endmember spectra. A challenging task underlying many hyperspectral imagery applications is then decomposing a mixed pixel into a collection of reflectance spectra, called endmember signatures, and the corresponding abundance fractions [8–10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. Linear mixing model holds approximately when the mixing scale is macroscopic [13] and there is negligible interaction among distinct endmembers [3, 14]. If, however, the mixing scale is microscopic (or intimate mixtures) [15, 16] and the incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [17], the linear model is no longer accurate. Linear spectral unmixing has been intensively researched in the last years [9, 10, 12, 18–21]. It considers that a mixed pixel is a linear combination of endmember signatures weighted by the correspondent abundance fractions. Under this model, and assuming that the number of substances and their reflectance spectra are known, hyperspectral unmixing is a linear problem for which many solutions have been proposed (e.g., maximum likelihood estimation [8], spectral signature matching [22], spectral angle mapper [23], subspace projection methods [24,25], and constrained least squares [26]). In most cases, the number of substances and their reflectances are not known and, then, hyperspectral unmixing falls into the class of blind source separation problems [27]. Independent component analysis (ICA) has recently been proposed as a tool to blindly unmix hyperspectral data [28–31]. ICA is based on the assumption of mutually independent sources (abundance fractions), which is not the case of hyperspectral data, since the sum of abundance fractions is constant, implying statistical dependence among them. This dependence compromises ICA applicability to hyperspectral images as shown in Refs. [21, 32]. In fact, ICA finds the endmember signatures by multiplying the spectral vectors with an unmixing matrix, which minimizes the mutual information among sources. If sources are independent, ICA provides the correct unmixing, since the minimum of the mutual information is obtained only when sources are independent. This is no longer true for dependent abundance fractions. Nevertheless, some endmembers may be approximately unmixed. These aspects are addressed in Ref. [33]. Under the linear mixing model, the observations from a scene are in a simplex whose vertices correspond to the endmembers. Several approaches [34–36] have exploited this geometric feature of hyperspectral mixtures [35]. Minimum volume transform (MVT) algorithm [36] determines the simplex of minimum volume containing the data. The method presented in Ref. [37] is also of MVT type but, by introducing the notion of bundles, it takes into account the endmember variability usually present in hyperspectral mixtures. The MVT type approaches are complex from the computational point of view. Usually, these algorithms find in the first place the convex hull defined by the observed data and then fit a minimum volume simplex to it. For example, the gift wrapping algorithm [38] computes the convex hull of n data points in a d-dimensional space with a computational complexity of O(nbd=2cþ1), where bxc is the highest integer lower or equal than x and n is the number of samples. The complexity of the method presented in Ref. [37] is even higher, since the temperature of the simulated annealing algorithm used shall follow a log( ) law [39] to assure convergence (in probability) to the desired solution. Aiming at a lower computational complexity, some algorithms such as the pixel purity index (PPI) [35] and the N-FINDR [40] still find the minimum volume simplex containing the data cloud, but they assume the presence of at least one pure pixel of each endmember in the data. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. PPI algorithm uses the minimum noise fraction (MNF) [41] as a preprocessing step to reduce dimensionality and to improve the signal-to-noise ratio (SNR). The algorithm then projects every spectral vector onto skewers (large number of random vectors) [35, 42,43]. The points corresponding to extremes, for each skewer direction, are stored. A cumulative account records the number of times each pixel (i.e., a given spectral vector) is found to be an extreme. The pixels with the highest scores are the purest ones. N-FINDR algorithm [40] is based on the fact that in p spectral dimensions, the p-volume defined by a simplex formed by the purest pixels is larger than any other volume defined by any other combination of pixels. This algorithm finds the set of pixels defining the largest volume by inflating a simplex inside the data. ORA SIS [44, 45] is a hyperspectral framework developed by the U.S. Naval Research Laboratory consisting of several algorithms organized in six modules: exemplar selector, adaptative learner, demixer, knowledge base or spectral library, and spatial postrocessor. The first step consists in flat-fielding the spectra. Next, the exemplar selection module is used to select spectral vectors that best represent the smaller convex cone containing the data. The other pixels are rejected when the spectral angle distance (SAD) is less than a given thresh old. The procedure finds the basis for a subspace of a lower dimension using a modified Gram–Schmidt orthogonalizati on. The selected vectors are then projected onto this subspace and a simplex is found by an MV T pro cess. ORA SIS is oriented to real-time target detection from uncrewed air vehicles using hyperspectral data [46]. In this chapter we develop a new algorithm to unmix linear mixtures of endmember spectra. First, the algorithm determines the number of endmembers and the signal subspace using a newly developed concept [47, 48]. Second, the algorithm extracts the most pure pixels present in the data. Unlike other methods, this algorithm is completely automatic and unsupervised. To estimate the number of endmembers and the signal subspace in hyperspectral linear mixtures, the proposed scheme begins by estimating sign al and noise correlation matrices. The latter is based on multiple regression theory. The signal subspace is then identified by selectin g the set of signal eigenvalue s that best represents the data, in the least-square sense [48,49 ], we note, however, that VCA works with projected and with unprojected data. The extraction of the end members exploits two facts: (1) the endmembers are the vertices of a simplex and (2) the affine transformation of a simplex is also a simplex. As PPI and N-FIND R algorithms, VCA also assumes the presence of pure pixels in the data. The algorithm iteratively projects data on to a direction orthogonal to the subspace spanned by the endmembers already determined. The new end member signature corresponds to the extreme of the projection. The algorithm iterates until all end members are exhausted. VCA performs much better than PPI and better than or comparable to N-FI NDR; yet it has a computational complexity between on e and two orders of magnitude lower than N-FINDR. The chapter is structure d as follows. Section 19.2 describes the fundamentals of the proposed method. Section 19.3 and Section 19.4 evaluate the proposed algorithm using simulated and real data, respectively. Section 19.5 presents some concluding remarks.
Resumo:
Submitted in partial fulfillment for the Requirements for the Degree of PhD in Mathematics, in the Speciality of Statistics in the Faculdade de Ciências e Tecnologia
Resumo:
Durante as últimas décadas observou-se o crescimento da importância das avaliações fornecidas pelas agências de rating, sendo este um fator decisivo na tomada de decisão dos investidores. Também os emitentes de dívida são largamente afetados pelas alterações das classificações atribuídas por estas agências. Esta investigação pretende, por um lado, compreender se estas agências têm poder para conseguirem influenciar a evolução da dívida pública e qual o seu papel no mercado financeiro. Por outro, pretende compreender quais os fatores determinantes da dívida pública portuguesa, bem como a realização de uma análise por percentis com o objetivo de lhe atribuir um rating. Para a análise dos fatores que poderão influenciar a dívida pública, a metodologia utilizada é uma regressão linear múltipla estimada através do Método dos Mínimos Quadrados (Ordinary Least Squares – OLS), em que num cenário inicial era composta por onze variáveis independentes, sendo a dívida pública a variável dependente, para um período compreendido entre 1996 e 2013. Foram realizados vários testes ao modelo inicial, com o objetivo de encontrar um modelo que fosse o mais explicativo possível. Conseguimos ainda identificar uma relação inversa entre o rating atribuído por estas agências e a evolução da dívida pública, no sentido em que para períodos em que o rating desce, o crescimento da dívida é mais acentuado. Não nos foi, no entanto, possível atribuir um rating à dívida pública através de uma análise de percentis.
Resumo:
Nowadays, reducing energy consumption is one of the highest priorities and biggest challenges faced worldwide and in particular in the industrial sector. Given the increasing trend of consumption and the current economical crisis, identifying cost reductions on the most energy-intensive sectors has become one of the main concerns among companies and researchers. Particularly in industrial environments, energy consumption is affected by several factors, namely production factors(e.g. equipments), human (e.g. operators experience), environmental (e.g. temperature), among others, which influence the way of how energy is used across the plant. Therefore, several approaches for identifying consumption causes have been suggested and discussed. However, the existing methods only provide guidelines for energy consumption and have shown difficulties in explaining certain energy consumption patterns due to the lack of structure to incorporate context influence, hence are not able to track down the causes of consumption to a process level, where optimization measures can actually take place. This dissertation proposes a new approach to tackle this issue, by on-line estimation of context-based energy consumption models, which are able to map operating context to consumption patterns. Context identification is performed by regression tree algorithms. Energy consumption estimation is achieved by means of a multi-model architecture using multiple RLS algorithms, locally estimated for each operating context. Lastly, the proposed approach is applied to a real cement plant grinding circuit. Experimental results prove the viability of the overall system, regarding both automatic context identification and energy consumption estimation.
Resumo:
Kernel-Functions, Machine Learning, Least Squares, Speech Recognition, Classification, Regression