905 resultados para partial least-squares regression
Resumo:
Based on the quantitative analysis of diatom assemblages preserved in 274 surface sediment samples recovered in the Pacific, Atlantic and western Indian sectors of the Southern Ocean we have defined a new reference database for quantitative estimation of late-middle Pleistocene Antarctic sea ice fields using the transfer function technique. The Detrended Canonical Analysis (DCA) of the diatom data set points to a unimodal distribution of the diatom assemblages. Canonical Correspondence Analysis (CCA) indicates that winter sea ice (WSI) but also summer sea surface temperature (SSST) represent the most prominent environmental variables that control the spatial species distribution. To test the applicability of transfer functions for sea ice reconstruction in terms of concentration and occurrence probability we applied four different methods, the Imbrie and Kipp Method (IKM), the Modern Analog Technique (MAT), Weighted Averaging (WA), and Weighted Averaging Partial Least Squares (WAPLS), using logarithm-transformed diatom data and satellite-derived (1981-2010) sea ice data as a reference. The best performance for IKM results was obtained using a subset of 172 samples with 28 diatom taxa/taxa groups, quadratic regression and a three-factor model (IKM-D172/28/3q) resulting in root mean square errors of prediction (RMSEP) of 7.27% and 11.4% for WSI and summer sea ice (SSI) concentration, respectively. MAT estimates were calculated with different numbers of analogs (4, 6) using a 274-sample/28-taxa reference data set (MAT-D274/28/4an, -6an) resulting in RMSEP's ranging from 5.52% (4an) to 5.91% (6an) for WSI as well as 8.93% (4an) to 9.05% (6an) for SSI. WA and WAPLS performed less well with the D274 data set, compared to MAT, achieving WSI concentration RMSEP's of 9.91% with WA and 11.29% with WAPLS, recommending the use of IKM and MAT. The application of IKM and MAT to surface sediment data revealed strong relations to the satellite-derived winter and summer sea ice field. Sea ice reconstructions performed on an Atlantic- and a Pacific Southern Ocean sediment core, both documenting sea ice variability over the past 150,000 years (MIS 1 - MIS 6), resulted in similar glacial/interglacial trends of IKM and MAT-based sea-ice estimates. On the average, however, IKM estimates display smaller WSI and slightly higher SSI concentration and probability at lower variability in comparison with MAT. This pattern is a result of different estimation techniques with integration of WSI and SSI signals in one single factor assemblage by applying IKM and selecting specific single samples, thus keeping close to the original diatom database and included variability, by MAT. In contrast to the estimation of WSI, reconstructions of past SSI variability remains weaker. Combined with diatom-based estimates, the abundance and flux pattern of biogenic opal represents an additional indication for the WSI and SSI extent.
Resumo:
Quantitative Structure-Activity Relationship (QSAR) has been applied extensively in predicting toxicity of Disinfection By-Products (DBPs) in drinking water. Among many toxicological properties, acute and chronic toxicities of DBPs have been widely used in health risk assessment of DBPs. These toxicities are correlated with molecular properties, which are usually correlated with molecular descriptors. The primary goals of this thesis are: 1) to investigate the effects of molecular descriptors (e.g., chlorine number) on molecular properties such as energy of the lowest unoccupied molecular orbital (ELUMO) via QSAR modelling and analysis; 2) to validate the models by using internal and external cross-validation techniques; 3) to quantify the model uncertainties through Taylor and Monte Carlo Simulation. One of the very important ways to predict molecular properties such as ELUMO is using QSAR analysis. In this study, number of chlorine (NCl) and number of carbon (NC) as well as energy of the highest occupied molecular orbital (EHOMO) are used as molecular descriptors. There are typically three approaches used in QSAR model development: 1) Linear or Multi-linear Regression (MLR); 2) Partial Least Squares (PLS); and 3) Principle Component Regression (PCR). In QSAR analysis, a very critical step is model validation after QSAR models are established and before applying them to toxicity prediction. The DBPs to be studied include five chemical classes: chlorinated alkanes, alkenes, and aromatics. In addition, validated QSARs are developed to describe the toxicity of selected groups (i.e., chloro-alkane and aromatic compounds with a nitro- or cyano group) of DBP chemicals to three types of organisms (e.g., Fish, T. pyriformis, and P.pyosphoreum) based on experimental toxicity data from the literature. The results show that: 1) QSAR models to predict molecular property built by MLR, PLS or PCR can be used either to select valid data points or to eliminate outliers; 2) The Leave-One-Out Cross-Validation procedure by itself is not enough to give a reliable representation of the predictive ability of the QSAR models, however, Leave-Many-Out/K-fold cross-validation and external validation can be applied together to achieve more reliable results; 3) ELUMO are shown to correlate highly with the NCl for several classes of DBPs; and 4) According to uncertainty analysis using Taylor method, the uncertainty of QSAR models is contributed mostly from NCl for all DBP classes.
Resumo:
A number of studies have shown that Fourier transform infrared spectroscopy (FTIR) can be applied to quantitatively assess lacustrine sediment constituents. In this study, we developed calibration models based on FTIR for the quantitative determination of biogenic silica (BSi; n = 420; gradient: 0.9-56.5%), total organic carbon (TOC; n = 309; gradient: 0-2.9%), and total inorganic carbon (TIC; n= 152; gradient: 0-0.4%) in a 318 m-long sediment record with a basal age of 3.6 million years from Lake El'gygytgyn, Far East Russian Arctic. The developed partial least squares (PLS) regression models yield high cross-validated (CV) R2CV = 0.86-0.91 and low root mean square error of cross-validation (RMSECV) (3.1-7.0% of the gradient for the different properties). By applying these models to 6771 samples from the entire sediment record, we obtained detailed insight into bioproductivity variations in Lake El'gygytgyn throughout the middle to late Pliocene and Quaternary. High accumulation rates of BSi indicate a productivity maximum during the middle Pliocene (3.6-3.3 Ma), followed by gradually decreasing rates during the late Pliocene and Quaternary. The average BSi accumulation during the middle Pliocene was ~3 times higher than maximum accumulation rates during the past 1.5 million years. The indicated progressive deterioration of environmental and climatic conditions in the Siberian Arctic starting at ca. 3.3 Ma is consistent with the first occurrence of glacial periods and the finally complete establishment of glacial-interglacial cycles during the Quaternary.
Resumo:
This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.
Resumo:
This paper formulates a linear kernel support vector machine (SVM) as a regularized least-squares (RLS) problem. By defining a set of indicator variables of the errors, the solution to the RLS problem is represented as an equation that relates the error vector to the indicator variables. Through partitioning the training set, the SVM weights and bias are expressed analytically using the support vectors. It is also shown how this approach naturally extends to Sums with nonlinear kernels whilst avoiding the need to make use of Lagrange multipliers and duality theory. A fast iterative solution algorithm based on Cholesky decomposition with permutation of the support vectors is suggested as a solution method. The properties of our SVM formulation are analyzed and compared with standard SVMs using a simple example that can be illustrated graphically. The correctness and behavior of our solution (merely derived in the primal context of RLS) is demonstrated using a set of public benchmarking problems for both linear and nonlinear SVMs.
Resumo:
This paper is part of a special issue of Applied Geochemistry focusing on reliable applications of compositional multivariate statistical methods. This study outlines the application of compositional data analysis (CoDa) to calibration of geochemical data and multivariate statistical modelling of geochemistry and grain-size data from a set of Holocene sedimentary cores from the Ganges-Brahmaputra (G-B) delta. Over the last two decades, understanding near-continuous records of sedimentary sequences has required the use of core-scanning X-ray fluorescence (XRF) spectrometry, for both terrestrial and marine sedimentary sequences. Initial XRF data are generally unusable in ‘raw-format’, requiring data processing in order to remove instrument bias, as well as informed sequence interpretation. The applicability of these conventional calibration equations to core-scanning XRF data are further limited by the constraints posed by unknown measurement geometry and specimen homogeneity, as well as matrix effects. Log-ratio based calibration schemes have been developed and applied to clastic sedimentary sequences focusing mainly on energy dispersive-XRF (ED-XRF) core-scanning. This study has applied high resolution core-scanning XRF to Holocene sedimentary sequences from the tidal-dominated Indian Sundarbans, (Ganges-Brahmaputra delta plain). The Log-Ratio Calibration Equation (LRCE) was applied to a sub-set of core-scan and conventional ED-XRF data to quantify elemental composition. This provides a robust calibration scheme using reduced major axis regression of log-ratio transformed geochemical data. Through partial least squares (PLS) modelling of geochemical and grain-size data, it is possible to derive robust proxy information for the Sundarbans depositional environment. The application of these techniques to Holocene sedimentary data offers an improved methodological framework for unravelling Holocene sedimentation patterns.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Motivated by environmental protection concerns, monitoring the flue gas of thermal power plant is now often mandatory due to the need to ensure that emission levels stay within safe limits. Optical based gas sensing systems are increasingly employed for this purpose, with regression techniques used to relate gas optical absorption spectra to the concentrations of specific gas components of interest (NOx, SO2 etc.). Accurately predicting gas concentrations from absorption spectra remains a challenging problem due to the presence of nonlinearities in the relationships and the high-dimensional and correlated nature of the spectral data. This article proposes a generalized fuzzy linguistic model (GFLM) to address this challenge. The GFLM is made up of a series of “If-Then” fuzzy rules. The absorption spectra are input variables in the rule antecedent. The rule consequent is a general nonlinear polynomial function of the absorption spectra. Model parameters are estimated using least squares and gradient descent optimization algorithms. The performance of GFLM is compared with other traditional prediction models, such as partial least squares, support vector machines, multilayer perceptron neural networks and radial basis function networks, for two real flue gas spectral datasets: one from a coal-fired power plant and one from a gas-fired power plant. The experimental results show that the generalized fuzzy linguistic model has good predictive ability, and is competitive with alternative approaches, while having the added advantage of providing an interpretable model.
Resumo:
Inverse simulations of musculoskeletal models computes the internal forces such as muscle and joint reaction forces, which are hard to measure, using the more easily measured motion and external forces as input data. Because of the difficulties of measuring muscle forces and joint reactions, simulations are hard to validate. One way of reducing errors for the simulations is to ensure that the mathematical problem is well-posed. This paper presents a study of regularity aspects for an inverse simulation method, often called forward dynamics or dynamical optimization, that takes into account both measurement errors and muscle dynamics. The simulation method is explained in detail. Regularity is examined for a test problem around the optimum using the approximated quadratic problem. The results shows improved rank by including a regularization term in the objective that handles the mechanical over-determinancy. Using the 3-element Hill muscle model the chosen regularization term is the norm of the activation. To make the problem full-rank only the excitation bounds should be included in the constraints. However, this results in small negative values of the activation which indicates that muscles are pushing and not pulling. Despite this unrealistic behavior the error maybe small enough to be accepted for specific applications. These results is a starting point start for achieving better results of inverse musculoskeletal simulations from a numerical point of view.
Resumo:
In this paper we consider two sources of enhancement for the meshfree Lagrangian particle method smoothed particle hydrodynamics (SPH) by improving the accuracy of the particle approximation. Namely, we will consider shape functions constructed using: moving least-squares approximation (MLS); radial basis functions (RBF). Using MLS approximation is appealing because polynomial consistency of the particle approximation can be enforced. RBFs further appeal as they allow one to dispense with the smoothing-length - the parameter in the SPH method which governs the number of particles within the support of the shape function. Currently, only ad hoc methods for choosing the smoothing-length exist. We ensure that any enhancement retains the conservative and meshfree nature of SPH. In doing so, we derive a new set of variationally-consistent hydrodynamic equations. Finally, we demonstrate the performance of the new equations on the Sod shock tube problem.
Resumo:
The routine analysis for quantization of organic acids and sugars are generally slow methods that involve the use and preparation of several reagents, require trained professional, the availability of special equipment and is expensive. In this context, it has been increasing investment in research whose purpose is the development of substitutive methods to reference, which are faster, cheap and simple, and infrared spectroscopy have been highlighted in this regard. The present study developed multivariate calibration models for the simultaneous and quantitative determination of ascorbic acid, citric, malic and tartaric and sugars sucrose, glucose and fructose, and soluble solids in juices and fruit nectars and classification models for ACP. We used methods of spectroscopy in the near infrared (Near Infrared, NIR) in association with the method regression of partial least squares (PLS). Were used 42 samples between juices and fruit nectars commercially available in local shops. For the construction of the models were performed with reference analysis using high-performance liquid chromatography (HPLC) and refractometry for the analysis of soluble solids. Subsequently, the acquisition of the spectra was done in triplicate, in the spectral range 12500 to 4000 cm-1. The best models were applied to the quantification of analytes in study on natural juices and juice samples produced in the Paraná Southwest Region. The juices used in the application of the models also underwent physical and chemical analysis. Validation of chromatographic methodology has shown satisfactory results, since the external calibration curve obtained R-square value (R2) above 0.98 and coefficient of variation (%CV) for intermediate precision and repeatability below 8.83%. Through the Principal Component Analysis (PCA) was possible to separate samples of juices into two major groups, grape and apple and tangerine and orange, while for nectars groups separated guava and grape, and pineapple and apple. Different validation methods, and pre-processes that were used separately and in combination, were obtained with multivariate calibration models with average forecast square error (RMSEP) and cross validation (RMSECV) errors below 1.33 and 1.53 g.100 mL-1, respectively and R2 above 0.771, except for malic acid. The physicochemical analysis enabled the characterization of drinks, including the pH working range (variation of 2.83 to 5.79) and acidity within the parameters Regulation for each flavor. Regression models have demonstrated the possibility of determining both ascorbic acids, citric, malic and tartaric with successfully, besides sucrose, glucose and fructose by means of only a spectrum, suggesting that the models are economically viable for quality control and product standardization in the fruit juice and nectars processing industry.
Quantificação de açúcares com uma língua eletrónica: calibração multivariada com seleção de sensores
Resumo:
Este trabalho incide na análise dos açúcares majoritários nos alimentos (glucose, frutose e sacarose) com uma língua eletrónica potenciométrica através de calibração multivariada com seleção de sensores. A análise destes compostos permite contribuir para a avaliação do impacto dos açúcares na saúde e seu efeito fisiológico, além de permitir relacionar atributos sensoriais e atuar no controlo de qualidade e autenticidade dos alimentos. Embora existam diversas metodologias analíticas usadas rotineiramente na identificação e quantificação dos açúcares nos alimentos, em geral, estes métodos apresentam diversas desvantagens, tais como lentidão das análises, consumo elevado de reagentes químicos e necessidade de pré-tratamentos destrutivos das amostras. Por isso se decidiu aplicar uma língua eletrónica potenciométrica, construída com sensores poliméricos selecionados considerando as sensibilidades aos açucares obtidas em trabalhos anteriores, na análise dos açúcares nos alimentos, visando estabelecer uma metodologia analítica e procedimentos matemáticos para quantificação destes compostos. Para este propósito foram realizadas análises em soluções padrão de misturas ternárias dos açúcares em diferentes níveis de concentração e em soluções de dissoluções de amostras de mel, que foram previamente analisadas em HPLC para se determinar as concentrações de referência dos açúcares. Foi então feita uma análise exploratória dos dados visando-se remover sensores ou observações discordantes através da realização de uma análise de componentes principais. Em seguida, foram construídos modelos de regressão linear múltipla com seleção de variáveis usando o algoritmo stepwise e foi verificado que embora fosse possível estabelecer uma boa relação entre as respostas dos sensores e as concentrações dos açúcares, os modelos não apresentavam desempenho de previsão satisfatório em dados de grupo de teste. Dessa forma, visando contornar este problema, novas abordagens foram testadas através da construção e otimização dos parâmetros de um algoritmo genético para seleção de variáveis que pudesse ser aplicado às diversas ferramentas de regressão, entre elas a regressão pelo método dos mínimos quadrados parciais. Foram obtidos bons resultados de previsão para os modelos obtidos com o método dos mínimos quadrados parciais aliado ao algoritmo genético, tanto para as soluções padrão quanto para as soluções de mel, com R²ajustado acima de 0,99 e RMSE inferior a 0,5 obtidos da relação linear entre os valores previstos e experimentais usando dados dos grupos de teste. O sistema de multi-sensores construído se mostrou uma ferramenta adequada para a análise dos iii açúcares, quando presentes em concentrações maioritárias, e alternativa a métodos instrumentais de referência, como o HPLC, por reduzir o tempo da análise e o valor monetário da análise, bem como, ter um preparo mínimo das amostras e eliminar produtos finais poluentes.
Resumo:
A finite-strain solid–shell element is proposed. It is based on least-squares in-plane assumed strains, assumed natural transverse shear and normal strains. The singular value decomposition (SVD) is used to define local (integration-point) orthogonal frames-of-reference solely from the Jacobian matrix. The complete finite-strain formulation is derived and tested. Assumed strains obtained from least-squares fitting are an alternative to the enhanced-assumed-strain (EAS) formulations and, in contrast with these, the result is an element satisfying the Patch test. There are no additional degrees-of-freedom, as it is the case with the enhanced-assumed-strain case, even by means of static condensation. Least-squares fitting produces invariant finite strain elements which are shear-locking free and amenable to be incorporated in large-scale codes. With that goal, we use automatically generated code produced by AceGen and Mathematica. All benchmarks show excellent results, similar to the best available shell and hybrid solid elements with significantly lower computational cost.
Resumo:
A finite-strain solid–shell element is proposed. It is based on least-squares in-plane assumed strains, assumed natural transverse shear and normal strains. The singular value decomposition (SVD) is used to define local (integration-point) orthogonal frames-of- reference solely from the Jacobian matrix. The complete finite-strain formulation is derived and tested. Assumed strains obtained from least-squares fitting are an alternative to the enhanced-assumed-strain (EAS) formulations and, in contrast with these, the result is an element satisfying the Patch test. There are no additional degrees-of-freedom, as it is the case with the enhanced- assumed-strain case, even by means of static condensation. Least-squares fitting produces invariant finite strain elements which are shear-locking free and amenable to be incorporated in large-scale codes. With that goal, we use automatically generated code produced by AceGen and Mathematica. All benchmarks show excellent results, similar to the best available shell and hybrid solid elements with significantly lower computational cost.
Resumo:
Two novelties are introduced: (i) a finite-strain semi-implicit integration algorithm compatible with current element technologies and (ii) the application to assumed-strain hexahedra. The Löwdin algo- rithm is adopted to obtain evolving frames applicable to finite strain anisotropy and a weighted least- squares algorithm is used to determine the mixed strain. Löwdin frames are very convenient to model anisotropic materials. Weighted least-squares circumvent the use of internal degrees-of-freedom. Het- erogeneity of element technologies introduce apparently incompatible constitutive requirements. Assumed-strain and enhanced strain elements can be either formulated in terms of the deformation gradient or the Green–Lagrange strain, many of the high-performance shell formulations are corotational and constitutive constraints (such as incompressibility, plane stress and zero normal stress in shells) also depend on specific element formulations. We propose a unified integration algorithm compatible with possibly all element technologies. To assess its validity, a least-squares based hexahedral element is implemented and tested in depth. Basic linear problems as well as 5 finite-strain examples are inspected for correctness and competitive accuracy.