920 resultados para Multivariate curve resolution-alternating least squares
Resumo:
This study developed and validated a method for moisture determination in artisanal Minas cheese, using near-infrared spectroscopy and partial-least-squares. The model robustness was assured by broad sample diversity, real conditions of routine analysis, variable selection, outlier detection and analytical validation. The model was built from 28.5-55.5% w/w, with a root-mean-square-error-of-prediction of 1.6%. After its adoption, the method stability was confirmed over a period of two years through the development of a control chart. Besides this specific method, the present study sought to provide an example multivariate metrological methodology with potential for application in several areas, including new aspects, such as more stringent evaluation of the linearity of multivariate methods.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
The 3700 A - 3000 A absorption spectra of CH3CHO and its isotopic compounds such as CH3CDO, CD3CHO and CD3CDO were studied in the gas phase at room temperature and low temperatures. The low resolution spectra of the compounds were recorded by a 1.5 m Baush and Lomb grating spectrograph. The high resolution spectra were recorded by a Ebert spectrograph with the Echelle grating and the holographic grating separately. The multiple reflection cells were used to achieve the long path length. The pressure-path length used for the absorption spectrum of CH 3CHO was up to 100 mm Hg )( 91 . 43mo The emission spectrum and the excitation spectrum of CH3CHO were also recorded in this research. The calculated satellite band patterns \vhich were ob-tailied by the method of Lewis were used to compare with the observed near UV absorption spectrum of acetaldehyde. These calculated satellite band patterns belonged to two cases: namely, the barriers-in-phase case and the barriers- out-of-phase case. Each of the calculated patterns corresponded to a stable conformation of acetaldehyde in the excited state . The comparisons showed that the patterns in the observed absorption spectra corresponded to the H-H eclipsed conformations of acetaldehyde in the excited state . The least squares fitting analysis showed that the barrier heights in the excited state were higher than in the ground state. Finally, the isotopic shifts for the isotopic compounds of acetaldehyde were compared to the compounds with the similar deuterium substitution.
Resumo:
This paper proposes finite-sample procedures for testing the SURE specification in multi-equation regression models, i.e. whether the disturbances in different equations are contemporaneously uncorrelated or not. We apply the technique of Monte Carlo (MC) tests [Dwass (1957), Barnard (1963)] to obtain exact tests based on standard LR and LM zero correlation tests. We also suggest a MC quasi-LR (QLR) test based on feasible generalized least squares (FGLS). We show that the latter statistics are pivotal under the null, which provides the justification for applying MC tests. Furthermore, we extend the exact independence test proposed by Harvey and Phillips (1982) to the multi-equation framework. Specifically, we introduce several induced tests based on a set of simultaneous Harvey/Phillips-type tests and suggest a simulation-based solution to the associated combination problem. The properties of the proposed tests are studied in a Monte Carlo experiment which shows that standard asymptotic tests exhibit important size distortions, while MC tests achieve complete size control and display good power. Moreover, MC-QLR tests performed best in terms of power, a result of interest from the point of view of simulation-based tests. The power of the MC induced tests improves appreciably in comparison to standard Bonferroni tests and, in certain cases, outperforms the likelihood-based MC tests. The tests are applied to data used by Fischer (1993) to analyze the macroeconomic determinants of growth.
Resumo:
Inhalt dieser Arbeit ist ein Verfahren zur numerischen Lösung der zweidimensionalen Flachwassergleichung, welche das Fließverhalten von Gewässern, deren Oberflächenausdehnung wesentlich größer als deren Tiefe ist, modelliert. Diese Gleichung beschreibt die gravitationsbedingte zeitliche Änderung eines gegebenen Anfangszustandes bei Gewässern mit freier Oberfläche. Diese Klasse beinhaltet Probleme wie das Verhalten von Wellen an flachen Stränden oder die Bewegung einer Flutwelle in einem Fluss. Diese Beispiele zeigen deutlich die Notwendigkeit, den Einfluss von Topographie sowie die Behandlung von Nass/Trockenübergängen im Verfahren zu berücksichtigen. In der vorliegenden Dissertation wird ein, in Gebieten mit hinreichender Wasserhöhe, hochgenaues Finite-Volumen-Verfahren zur numerischen Bestimmung des zeitlichen Verlaufs der Lösung der zweidimensionalen Flachwassergleichung aus gegebenen Anfangs- und Randbedingungen auf einem unstrukturierten Gitter vorgestellt, welches in der Lage ist, den Einfluss topographischer Quellterme auf die Strömung zu berücksichtigen, sowie in sogenannten \glqq lake at rest\grqq-stationären Zuständen diesen Einfluss mit den numerischen Flüssen exakt auszubalancieren. Basis des Verfahrens ist ein Finite-Volumen-Ansatz erster Ordnung, welcher durch eine WENO Rekonstruktion unter Verwendung der Methode der kleinsten Quadrate und eine sogenannte Space Time Expansion erweitert wird mit dem Ziel, ein Verfahren beliebig hoher Ordnung zu erhalten. Die im Verfahren auftretenden Riemannprobleme werden mit dem Riemannlöser von Chinnayya, LeRoux und Seguin von 1999 gelöst, welcher die Einflüsse der Topographie auf den Strömungsverlauf mit berücksichtigt. Es wird in der Arbeit bewiesen, dass die Koeffizienten der durch das WENO-Verfahren berechneten Rekonstruktionspolynome die räumlichen Ableitungen der zu rekonstruierenden Funktion mit einem zur Verfahrensordnung passenden Genauigkeitsgrad approximieren. Ebenso wird bewiesen, dass die Koeffizienten des aus der Space Time Expansion resultierenden Polynoms die räumlichen und zeitlichen Ableitungen der Lösung des Anfangswertproblems approximieren. Darüber hinaus wird die wohlbalanciertheit des Verfahrens für beliebig hohe numerische Ordnung bewiesen. Für die Behandlung von Nass/Trockenübergangen wird eine Methode zur Ordnungsreduktion abhängig von Wasserhöhe und Zellgröße vorgeschlagen. Dies ist notwendig, um in der Rechnung negative Werte für die Wasserhöhe, welche als Folge von Oszillationen des Raum-Zeit-Polynoms auftreten können, zu vermeiden. Numerische Ergebnisse die die theoretische Verfahrensordnung bestätigen werden ebenso präsentiert wie Beispiele, welche die hervorragenden Eigenschaften des Gesamtverfahrens in der Berechnung herausfordernder Probleme demonstrieren.
Resumo:
Customer satisfaction and retention are key issues for organizations in today’s competitive market place. As such, much research and revenue has been invested in developing accurate ways of assessing consumer satisfaction at both the macro (national) and micro (organizational) level, facilitating comparisons in performance both within and between industries. Since the instigation of the national customer satisfaction indices (CSI), partial least squares (PLS) has been used to estimate the CSI models in preference to structural equation models (SEM) because they do not rely on strict assumptions about the data. However, this choice was based upon some misconceptions about the use of SEM’s and does not take into consideration more recent advances in SEM, including estimation methods that are robust to non-normality and missing data. In this paper, both SEM and PLS approaches were compared by evaluating perceptions of the Isle of Man Post Office Products and Customer service using a CSI format. The new robust SEM procedures were found to be advantageous over PLS. Product quality was found to be the only driver of customer satisfaction, while image and satisfaction were the only predictors of loyalty, thus arguing for the specificity of postal services
Resumo:
Históricamente se ha reconocido que los conflictos internos afectan de manera directa variables a nivel individual como la salud de las personas, los niveles de escolaridad y el desplazamiento forzoso de los afectados. Sin embargo, solo hasta la última década las investigaciones académicas se han inclinado en documentar y cuantificar rigurosamente los efectos colaterales de la violencia sobre las condiciones de vida de los individuos. La presente investigación estudia cómo la exposición al conflicto en Colombia ha afectado las decisiones en términos de mercado laboral de las personas. La estrategia de identificación internaliza los reconocidos problemas de endogeneidad del conflicto con variables de actividad y desarrollo económico y presenta resultados robustos a fenómenos de migración interna y desplazamiento. En términos de participación laboral y desempleo, se encuentran efectos heterogéneos a nivel de género como respuestas a la violencia experimentada. En particular, la probabilidad de participación laboral de las mujeres se incremente como consecuencia de la exposición al conflicto, mientras que la de desempleo disminuye. Para los hombres, los resultados muestran una menor probabilidad de participación, efecto contrario al de las mujeres, y un efecto análogo en términos de desempleo. La investigación no encuentra efectos diferenciales en términos de informalidad laboral.
Resumo:
Vibration-rotation spectra of HOCl have been measured at a resolution of 0.05 cm−1 to determine vibration rotation constants, and 35–37 Cl isotope shifts in the vibration frequencies. The spectrum of DOCl has also been recorded, and a preliminary analysis for the band origins has been made. The vibrational frequency data and centrifugal distortion constants have been used to determine the harmonic force field in a least-squares refinement; the force field obtained also gives a good fit to data on the vibrational contributions to the inertial defect. The equilibrium rotational constants of HOCl have been obtained, and an equilibrium structure has been estimated.
Resumo:
Vibration rotation spectra of HO15 NO and DO15 NO have been measured at a resolution of 0•04 cm-1 to determine the isotopic shifts in the vibrational band origins. These have been used together with recently determined data on the vibrational band origins, Coriolis constants, and centrifugal distorition constants, to determine the harmonic force field of both cis and trans nitrous acid in least squares refinement calculations. The results are discussed in relation to recent ab initio calculations, the inertia defects, and the torsional potential function.
Resumo:
High resolution vibration-rotation spectra of 13C2H2 were recorded in a number of regions from 2000 to 5200 cm−1 at Doppler or pressure limited resolution. In these spectral ranges cold and hot bands involving the bending-stretching combination levels have been analyzed up to high J values. Anharmonic quartic resonances for the combination levels ν1 + mν4 + nν5, ν2 + mν4 + (n + 2) ν5 and ν3 + (m − 1) ν4 + (n + 1) ν5 have been studied, and the l-type resonances within each polyad have been explicitly taken into account in the analysis of the data. The least-squares refinement provides deperturbed values for band origins and rotational constants, obtained by fitting rotation lines only up to J ≈ 20 with root mean square errors of ≈ 0.0003 cm−1. The band origins allowed us to determine a number of the anharmonicity constants xij0.
Resumo:
The Fourier-transform spectrum of CH3F from 2800 to 3100 cm−1, obtained by Guelachvili in Orsay at a resolution of about 0.003 cm−1, was analyzed. The effective Hamiltonian used contained all symmetry allowed interactions up to second order in the Amat-Nielsen classification, together with selected third-order terms, amongst the set of nine vibrational basis functions represented by the states ν1(A1), ν4(E), 2ν2(A1), ν2 + ν5(E), 2ν50(A1), and 2ν5±2(E). A number of strong Fermi and Coriolis resonances are involved. The vibrational Hamiltonian matrix was not factorized beyond the requirements of symmetry. A total of 59 molecular parameters were refined in a simultaneous least-squares analysis to over 1500 upper-state energy levels for J ≤ 20 with a standard deviation of 0.013 cm−1. Although the standard deviation remains an order of magnitude greater than the precision of the measurements, this work breaks new ground in the simultaneous analysis of interacting symmetric top vibrational levels, in terms of the number of interacting vibrational states and the number of parameters in the Hamiltonian.
Resumo:
The rheological properties of fresh gluten in small amplitude oscillation in shear (SAOS) and creep recovery after short application of stress was related to the hearth breadbaking performance of wheat flours using the multivariate statistics partial least squares (PLS) regression. The picture was completed by dough mixing and extensional properties, flour protein size distribution determined by SE-HPLC, and high molecular weight glutenin subunit (HMW-GS) composition. The sample set comprised 20 wheat cultivars grown at two different levels of nitrogen fertilizer in one location. Flours yielding stiffer and more elastic glutens, with higher elastic and viscous moduli (G' and G") and lower tan 8 values in SAOS, gave doughs that were better able to retain their shape during proving and baking, resulting in breads of high form ratios. Creep recovery measurements after short application of stress showed that glutens from flours of good breadmaking quality had high relative elastic recovery. The nitrogen fertilizer level affected the protein size distribution by an increase in monomeric proteins (gliadins), which gave glutens of higher tan delta and flatter bread loaves (lower form ratio).
Resumo:
A construction algorithm for multioutput radial basis function (RBF) network modelling is introduced by combining a locally regularised orthogonal least squares (LROLS) model selection with a D-optimality experimental design. The proposed algorithm aims to achieve maximised model robustness and sparsity via two effective and complementary approaches. The LROLS method alone is capable of producing a very parsimonious RBF network model with excellent generalisation performance. The D-optimality design criterion enhances the model efficiency and robustness. A further advantage of the combined approach is that the user only needs to specify a weighting for the D-optimality cost in the combined RBF model selecting criterion and the entire model construction procedure becomes automatic. The value of this weighting does not influence the model selection procedure critically and it can be chosen with ease from a wide range of values.
Resumo:
Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm.
Resumo:
In this brief, a new complex-valued B-spline neural network is introduced in order to model the complex-valued Wiener system using observational input/output data. The complex-valued nonlinear static function in the Wiener system is represented using the tensor product from two univariate B-spline neural networks, using the real and imaginary parts of the system input. Following the use of a simple least squares parameter initialization scheme, the Gauss-Newton algorithm is applied for the parameter estimation, which incorporates the De Boor algorithm, including both the B-spline curve and the first-order derivatives recursion. Numerical examples, including a nonlinear high-power amplifier model in communication systems, are used to demonstrate the efficacy of the proposed approaches.