918 resultados para Heuristic constrained linear least squares
Resumo:
Submitted in partial fulfillment for the Requirements for the Degree of PhD in Mathematics, in the Speciality of Statistics in the Faculdade de Ciências e Tecnologia
Resumo:
Durante as últimas décadas observou-se o crescimento da importância das avaliações fornecidas pelas agências de rating, sendo este um fator decisivo na tomada de decisão dos investidores. Também os emitentes de dívida são largamente afetados pelas alterações das classificações atribuídas por estas agências. Esta investigação pretende, por um lado, compreender se estas agências têm poder para conseguirem influenciar a evolução da dívida pública e qual o seu papel no mercado financeiro. Por outro, pretende compreender quais os fatores determinantes da dívida pública portuguesa, bem como a realização de uma análise por percentis com o objetivo de lhe atribuir um rating. Para a análise dos fatores que poderão influenciar a dívida pública, a metodologia utilizada é uma regressão linear múltipla estimada através do Método dos Mínimos Quadrados (Ordinary Least Squares – OLS), em que num cenário inicial era composta por onze variáveis independentes, sendo a dívida pública a variável dependente, para um período compreendido entre 1996 e 2013. Foram realizados vários testes ao modelo inicial, com o objetivo de encontrar um modelo que fosse o mais explicativo possível. Conseguimos ainda identificar uma relação inversa entre o rating atribuído por estas agências e a evolução da dívida pública, no sentido em que para períodos em que o rating desce, o crescimento da dívida é mais acentuado. Não nos foi, no entanto, possível atribuir um rating à dívida pública através de uma análise de percentis.
Resumo:
In this work, kriging with covariates is used to model and map the spatial distribution of salinity measurements gathered by an autonomous underwater vehicle in a sea outfall monitoring campaign aiming to distinguish the effluent plume from the receiving waters and characterize its spatial variability in the vicinity of the discharge. Four different geostatistical linear models for salinity were assumed, where the distance to diffuser, the west-east positioning, and the south-north positioning were used as covariates. Sample variograms were fitted by the Mat`ern models using weighted least squares and maximum likelihood estimation methods as a way to detect eventual discrepancies. Typically, the maximum likelihood method estimated very low ranges which have limited the kriging process. So, at least for these data sets, weighted least squares showed to be the most appropriate estimation method for variogram fitting. The kriged maps show clearly the spatial variation of salinity, and it is possible to identify the effluent plume in the area studied. The results obtained show some guidelines for sewage monitoring if a geostatistical analysis of the data is in mind. It is important to treat properly the existence of anomalous values and to adopt a sampling strategy that includes transects parallel and perpendicular to the effluent dispersion.
Resumo:
O objetivo deste estudo é o desenvolvimento e validação de métodos espectroscópicos (espectroscopia NIR) que possam vir a substituir os métodos químicos convencionais, para quantificação de grupos hidróxilo em resinas alquídicas. As resinas alquídicas estudadas neste trabalho são normalmente utilizadas em sistemas de revestimento de dois componentes, em que os seus grupos hidróxilo reagem com pré-polímeros de isocianato para formar revestimentos de alta dureza. Por este motivo e por questões processuais ligadas à estequiometria da reação existente na aplicação referida, é extremamente importante a quantificação destes grupos. O método mais comum de quantificação de grupos hidróxilo é conhecido como método de titulação. Este é um método demorado, pois cada medição implica um procedimento experimental de cerca de duas horas, para além de ser muito dispendioso, a nível económico. Foram estudadas as influências da temperatura, heterogeneidade e nível de enchimento da célula na recolha do espectro. As conclusões dos estudos mencionados levaram à fixação de um tempo ideal de permanência da célula dentro da câmara do espectrofotómetro antes da medição do espectro. Para além disto, conclui-se que para lotes standard, a heterogeneidade não é uma variável significativa. O nível da célula deve ser mantido constante. Os métodos desenvolvidos, baseados na norma de qualidade ISO 15063:2011, foram construídos a partir de algoritmos de Partial Least Squares Regression (PLS), utilizando um equipamento NIRVIS, Büchi©. Foram obtidos bons coeficientes de regressão linear para a Resina A (R2>0,9). Quanto aos restantes resultados, estes indicam a possibilidade de aplicação em resinas do mesmo tipo. Este método proporciona resultados 8 vezes mais rápidos e com custos em material que representam 1% do método standard.
Resumo:
O objetivo deste estudo é o desenvolvimento e validação de métodos espectroscópicos (espectroscopia NIR) que possam vir a substituir os métodos químicos convencionais, para quantificação de grupos hidróxilo em resinas alquídicas. As resinas alquídicas estudadas neste trabalho são normalmente utilizadas em sistemas de revestimento de dois componentes, em que os seus grupos hidróxilo reagem com pré-polímeros de isocianato para formar revestimentos de alta dureza. Por este motivo e por questões processuais ligadas à estequiometria da reação existente na aplicação referida, é extremamente importante a quantificação destes grupos. O método mais comum de quantificação de grupos hidróxilo é conhecido como método de titulação. Este é um método demorado, pois cada medição implica um procedimento experimental de cerca de duas horas, para além de ser muito dispendioso, a nível económico. Foram estudadas as influências da temperatura, heterogeneidade e nível de enchimento da célula na recolha do espectro. As conclusões dos estudos mencionados levaram à fixação de um tempo ideal de permanência da célula dentro da câmara do espectrofotómetro antes da medição do espectro. Para além disto, conclui-se que para lotes standard, a heterogeneidade não é uma variável significativa. O nível da célula deve ser mantido constante. Os métodos desenvolvidos, baseados na norma de qualidade ISO 15063:2011, foram construídos a partir de algoritmos de Partial Least Squares Regression (PLS), utilizando um equipamento NIRVIS, Büchi©. Foram obtidos bons coeficientes de regressão linear para a Resina A (R2>0,9). Quanto aos restantes resultados, estes indicam a possibilidade de aplicação em resinas do mesmo tipo. Este método proporciona resultados 8 vezes mais rápidos e com custos em material que representam 1% do método standard.
Resumo:
This study addresses the issue of the presence of a unit root on the growth rate estimation by the least-squares approach. We argue that when the log of a variable contains a unit root, i.e., it is not stationary then the growth rate estimate from the log-linear trend model is not a valid representation of the actual growth of the series. In fact, under such a situation, we show that the growth of the series is the cumulative impact of a stochastic process. As such the growth estimate from such a model is just a spurious representation of the actual growth of the series, which we refer to as a “pseudo growth rate”. Hence such an estimate should be interpreted with caution. On the other hand, we highlight that the statistical representation of a series as containing a unit root is not easy to separate from an alternative description which represents the series as fundamentally deterministic (no unit root) but containing a structural break. In search of a way around this, our study presents a survey of both the theoretical and empirical literature on unit root tests that takes into account possible structural breaks. We show that when a series is trendstationary with breaks, it is possible to use the log-linear trend model to obtain well defined estimates of growth rates for sub-periods which are valid representations of the actual growth of the series. Finally, to highlight the above issues, we carry out an empirical application whereby we estimate meaningful growth rates of real wages per worker for 51 industries from the organised manufacturing sector in India for the period 1973-2003, which are not only unbiased but also asymptotically efficient. We use these growth rate estimates to highlight the evolving inter-industry wage structure in India.
Resumo:
There are far-reaching conceptual similarities between bi-static surface georadar and post-stack, "zero-offset" seismic reflection data, which is expressed in largely identical processing flows. One important difference is, however, that standard deconvolution algorithms routinely used to enhance the vertical resolution of seismic data are notoriously problematic or even detrimental to the overall signal quality when applied to surface georadar data. We have explored various options for alleviating this problem and have tested them on a geologically well-constrained surface georadar dataset. Standard stochastic and direct deterministic deconvolution approaches proved to be largely unsatisfactory. While least-squares-type deterministic deconvolution showed some promise, the inherent uncertainties involved in estimating the source wavelet introduced some artificial "ringiness". In contrast, we found spectral balancing approaches to be effective, practical and robust means for enhancing the vertical resolution of surface georadar data, particularly, but not exclusively, in the uppermost part of the georadar section, which is notoriously plagued by the interference of the direct air- and groundwaves. For the data considered in this study, it can be argued that band-limited spectral blueing may provide somewhat better results than standard band-limited spectral whitening, particularly in the uppermost part of the section affected by the interference of the air- and groundwaves. Interestingly, this finding is consistent with the fact that the amplitude spectrum resulting from least-squares-type deterministic deconvolution is characterized by a systematic enhancement of higher frequencies at the expense of lower frequencies and hence is blue rather than white. It is also consistent with increasing evidence that spectral "blueness" is a seemingly universal, albeit enigmatic, property of the distribution of reflection coefficients in the Earth. Our results therefore indicate that spectral balancing techniques in general and spectral blueing in particular represent simple, yet effective means of enhancing the vertical resolution of surface georadar data and, in many cases, could turn out to be a preferable alternative to standard deconvolution approaches.
Resumo:
Inconsistencies about dynamic asymmetry between the on- and off-transient responses in VO2 are found in the literature. Therefore the purpose of this study was to examine VO2 on- and off-transients during moderate- and heavy-intensity cycling exercise in trained subjects. Ten men underwent an initial incremental test for the estimation of ventilatory threshold (VT) and, on different days, two bouts of square-wave exercise at moderate (<VT) and heavy (>VT) intensities. VO2 kinetics in exercise and recovery were better described by a single exponential model (<VT), or by a double exponential with two time delays (>VT). For moderate exercise, we found a symmetry of VO2 kinetics between the on- and off-transients (i.e., fundamental component), consistent with a system manifesting linear control dynamics. For heavy exercise, a slow component superimposed on the fundamental phase was expressed in both the exercise and recovery, with similar parameter estimates. But the on-transient values of the time constant were appreciably faster than the associated off-transient, and independent of the work rate imposed (<VT and >VT). Our results do not support a dynamically linear system model of VO2 during cycling exercise in the heavy-intensity domain.
Resumo:
Several methods have been suggested to estimate non-linear models with interaction terms in the presence of measurement error. Structural equation models eliminate measurement error bias, but require large samples. Ordinary least squares regression on summated scales, regression on factor scores and partial least squares are appropriate for small samples but do not correct measurement error bias. Two stage least squares regression does correct measurement error bias but the results strongly depend on the instrumental variable choice. This article discusses the old disattenuated regression method as an alternative for correcting measurement error in small samples. The method is extended to the case of interaction terms and is illustrated on a model that examines the interaction effect of innovation and style of use of budgets on business performance. Alternative reliability estimates that can be used to disattenuate the estimates are discussed. A comparison is made with the alternative methods. Methods that do not correct for measurement error bias perform very similarly and considerably worse than disattenuated regression
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
Customer satisfaction and retention are key issues for organizations in today’s competitive market place. As such, much research and revenue has been invested in developing accurate ways of assessing consumer satisfaction at both the macro (national) and micro (organizational) level, facilitating comparisons in performance both within and between industries. Since the instigation of the national customer satisfaction indices (CSI), partial least squares (PLS) has been used to estimate the CSI models in preference to structural equation models (SEM) because they do not rely on strict assumptions about the data. However, this choice was based upon some misconceptions about the use of SEM’s and does not take into consideration more recent advances in SEM, including estimation methods that are robust to non-normality and missing data. In this paper, both SEM and PLS approaches were compared by evaluating perceptions of the Isle of Man Post Office Products and Customer service using a CSI format. The new robust SEM procedures were found to be advantageous over PLS. Product quality was found to be the only driver of customer satisfaction, while image and satisfaction were the only predictors of loyalty, thus arguing for the specificity of postal services
Resumo:
The analysis of multiexponential decays is challenging because of their complex nature. When analyzing these signals, not only the parameters, but also the orders of the models, have to be estimated. We present an improved spectroscopic technique specially suited for this purpose. The proposed algorithm combines an iterative linear filter with an iterative deconvolution method. A thorough analysis of the noise effect is presented. The performance is tested with synthetic and experimental data.
Resumo:
La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.
Resumo:
Objective: Health status measures usually have an asymmetric distribution and present a highpercentage of respondents with the best possible score (ceiling effect), specially when they areassessed in the overall population. Different methods to model this type of variables have beenproposed that take into account the ceiling effect: the tobit models, the Censored Least AbsoluteDeviations (CLAD) models or the two-part models, among others. The objective of this workwas to describe the tobit model, and compare it with the Ordinary Least Squares (OLS) model,that ignores the ceiling effect.Methods: Two different data sets have been used in order to compare both models: a) real datacomming from the European Study of Mental Disorders (ESEMeD), in order to model theEQ5D index, one of the measures of utilities most commonly used for the evaluation of healthstatus; and b) data obtained from simulation. Cross-validation was used to compare thepredicted values of the tobit model and the OLS models. The following estimators werecompared: the percentage of absolute error (R1), the percentage of squared error (R2), the MeanSquared Error (MSE) and the Mean Absolute Prediction Error (MAPE). Different datasets werecreated for different values of the error variance and different percentages of individuals withceiling effect. The estimations of the coefficients, the percentage of explained variance and theplots of residuals versus predicted values obtained under each model were compared.Results: With regard to the results of the ESEMeD study, the predicted values obtained with theOLS model and those obtained with the tobit models were very similar. The regressioncoefficients of the linear model were consistently smaller than those from the tobit model. In thesimulation study, we observed that when the error variance was small (s=1), the tobit modelpresented unbiased estimations of the coefficients and accurate predicted values, specially whenthe percentage of individuals wiht the highest possible score was small. However, when theerrror variance was greater (s=10 or s=20), the percentage of explained variance for the tobitmodel and the predicted values were more similar to those obtained with an OLS model.Conclusions: The proportion of variability accounted for the models and the percentage ofindividuals with the highest possible score have an important effect in the performance of thetobit model in comparison with the linear model.
Resumo:
This article presents an experimental study about the classification ability of several classifiers for multi-classclassification of cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland lawenforcement authorities regularly ask forensic laboratories to determinate the chemotype of a seized cannabisplant and then to conclude if the plantation is legal or not. This classification is mainly performed when theplant is mature as required by the EU official protocol and then the classification of cannabis seedlings is a timeconsuming and costly procedure. A previous study made by the authors has investigated this problematic [1]and showed that it is possible to differentiate between drug type (illegal) and fibre type (legal) cannabis at anearly stage of growth using gas chromatography interfaced with mass spectrometry (GC-MS) based on therelative proportions of eight major leaf compounds. The aims of the present work are on one hand to continueformer work and to optimize the methodology for the discrimination of drug- and fibre type cannabisdeveloped in the previous study and on the other hand to investigate the possibility to predict illegal cannabisvarieties. Seven classifiers for differentiating between cannabis seedlings are evaluated in this paper, namelyLinear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), Nearest NeighbourClassification (NNC), Learning Vector Quantization (LVQ), Radial Basis Function Support Vector Machines(RBF SVMs), Random Forest (RF) and Artificial Neural Networks (ANN). The performance of each method wasassessed using the same analytical dataset that consists of 861 samples split into drug- and fibre type cannabiswith drug type cannabis being made up of 12 varieties (i.e. 12 classes). The results show that linear classifiersare not able to manage the distribution of classes in which some overlap areas exist for both classificationproblems. Unlike linear classifiers, NNC and RBF SVMs best differentiate cannabis samples both for 2-class and12-class classifications with average classification results up to 99% and 98%, respectively. Furthermore, RBFSVMs correctly classified into drug type cannabis the independent validation set, which consists of cannabisplants coming from police seizures. In forensic case work this study shows that the discrimination betweencannabis samples at an early stage of growth is possible with fairly high classification performance fordiscriminating between cannabis chemotypes or between drug type cannabis varieties.