918 resultados para Heuristic constrained linear least squares


Relevância:

100.00% 100.00%

Publicador:

Resumo:

La regressió basada en distàncies és un mètode de predicció que consisteix en dos passos: a partir de les distàncies entre observacions obtenim les variables latents, les quals passen a ser els regressors en un model lineal de mínims quadrats ordinaris. Les distàncies les calculem a partir dels predictors originals fent us d'una funció de dissimilaritats adequada. Donat que, en general, els regressors estan relacionats de manera no lineal amb la resposta, la seva selecció amb el test F usual no és possible. En aquest treball proposem una solució a aquest problema de selecció de predictors definint tests estadístics generalitzats i adaptant un mètode de bootstrap no paramètric per a l'estimació dels p-valors. Incluim un exemple numèric amb dades de l'assegurança d'automòbils.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Intensity-modulated radiotherapy (IMRT) treatment plan verification by comparison with measured data requires having access to the linear accelerator and is time consuming. In this paper, we propose a method for monitor unit (MU) calculation and plan comparison for step and shoot IMRT based on the Monte Carlo code EGSnrc/BEAMnrc. The beamlets of an IMRT treatment plan are individually simulated using Monte Carlo and converted into absorbed dose to water per MU. The dose of the whole treatment can be expressed through a linear matrix equation of the MU and dose per MU of every beamlet. Due to the positivity of the absorbed dose and MU values, this equation is solved for the MU values using a non-negative least-squares fit optimization algorithm (NNLS). The Monte Carlo plan is formed by multiplying the Monte Carlo absorbed dose to water per MU with the Monte Carlo/NNLS MU. Several treatment plan localizations calculated with a commercial treatment planning system (TPS) are compared with the proposed method for validation. The Monte Carlo/NNLS MUs are close to the ones calculated by the TPS and lead to a treatment dose distribution which is clinically equivalent to the one calculated by the TPS. This procedure can be used as an IMRT QA and further development could allow this technique to be used for other radiotherapy techniques like tomotherapy or volumetric modulated arc therapy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In 1957, the Iowa State Highway Commission, with financial assistance from the aluminum industry, constructed a 220-ft (67-m) long, four-span continuous, aluminum girder bridge to carry traffic on Clive Road (86th Street) over Interstate 80 near Des Moines, Iowa. The bridge had four, welded I-shape girders that were fabricated in pairs with welded diaphragms between an exterior and an interior girder. The interior diaphragms between the girder pairs were bolted to girder brackets. A composite, reinforced concrete deck served as the roadway surface. The bridge, which had performed successfully for about 35 years of service, was removed in the fall of 1993 to make way for an interchange at the same location. Prior to the bridge demolition, load tests were conducted to monitor girder and diaphragm bending strains and deflections in the northern end span. Fatigue testing of the aluminum girders that were removed from the end spans were conducted by applying constant-amplitude, cyclic loads. These tests established the fatigue strength of an existing, welded, flange-splice detail and added, welded, flange-cover plates and horizontal web plate attachment details. This part, Part 2, of the final report focuses on the fatigue tests of the aluminum girder sections that were removed from the bridge and on the analysis of the experimental data to establish the fatigue strength of full-size specimens. Seventeen fatigue fractures that were classified as Category E weld details developed in the seven girder test specimens. Linear regression analyses of the fatigue test results established both nominal and experimental stress-range versus load cycle relationships (SN curves) for the fatigue strength of fillet-welded connections. The nominal strength SN curve obtained by this research essentially matched the SN curve for Category E aluminum weldments given in the AASHTO LRFD specifications. All of the Category E fatigue fractures that developed in the girder test specimens satisfied the allowable SN relationship specified by the fatigue provisions of the Aluminum Association. The lower-bound strength line that was set at two standard deviations below the least squares regression line through the fatigue fracture data points related well with the Aluminum Association SN curve. The results from the experimental tests of this research have provided additional information regarding behavioral characteristics of full-size, aluminum members and have confirmed that aluminum has the strength properties needed for highway bridge girders.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

BACKGROUND AND PURPOSE: Knowledge of cerebral blood flow (CBF) alterations in cases of acute stroke could be valuable in the early management of these cases. Among imaging techniques affording evaluation of cerebral perfusion, perfusion CT studies involve sequential acquisition of cerebral CT sections obtained in an axial mode during the IV administration of iodinated contrast material. They are thus very easy to perform in emergency settings. Perfusion CT values of CBF have proved to be accurate in animals, and perfusion CT affords plausible values in humans. The purpose of this study was to validate perfusion CT studies of CBF by comparison with the results provided by stable xenon CT, which have been reported to be accurate, and to evaluate acquisition and processing modalities of CT data, notably the possible deconvolution methods and the selection of the reference artery. METHODS: Twelve stable xenon CT and perfusion CT cerebral examinations were performed within an interval of a few minutes in patients with various cerebrovascular diseases. CBF maps were obtained from perfusion CT data by deconvolution using singular value decomposition and least mean square methods. The CBF were compared with the stable xenon CT results in multiple regions of interest through linear regression analysis and bilateral t tests for matched variables. RESULTS: Linear regression analysis showed good correlation between perfusion CT and stable xenon CT CBF values (singular value decomposition method: R(2) = 0.79, slope = 0.87; least mean square method: R(2) = 0.67, slope = 0.83). Bilateral t tests for matched variables did not identify a significant difference between the two imaging methods (P >.1). Both deconvolution methods were equivalent (P >.1). The choice of the reference artery is a major concern and has a strong influence on the final perfusion CT CBF map. CONCLUSION: Perfusion CT studies of CBF achieved with adequate acquisition parameters and processing lead to accurate and reliable results.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We present the first density model of Stromboli volcano (Aeolian Islands, Italy) obtained by simultaneously inverting land-based (543) and sea-surface (327) relative gravity data. Modern positioning technology, a 1 x 1 m digital elevation model, and a 15 x 15 m bathymetric model made it possible to obtain a detailed 3-D density model through an iteratively reweighted smoothness-constrained least-squares inversion that explained the land-based gravity data to 0.09 mGal and the sea-surface data to 5 mGal. Our inverse formulation avoids introducing any assumptions about density magnitudes. At 125 m depth from the land surface, the inferred mean density of the island is 2380 kg m(-3), with corresponding 2.5 and 97.5 percentiles of 2200 and 2530 kg m-3. This density range covers the rock densities of new and previously published samples of Paleostromboli I, Vancori, Neostromboli and San Bartolo lava flows. High-density anomalies in the central and southern part of the island can be related to two main degassing faults crossing the island (N41 and NM) that are interpreted as preferential regions of dyke intrusions. In addition, two low-density anomalies are found in the northeastern part and in the summit area of the island. These anomalies seem to be geographically related with past paroxysmal explosive phreato-magmatic events that have played important roles in the evolution of Stromboli Island by forming the Scari caldera and the Neostromboli crater, respectively. (C) 2014 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic algorithm was used for variable selection in simultaneous determination of mixtures of glucose, maltose and fructose by mid infrared spectroscopy. Different models, using partial least squares (PLS) and multiple linear regression (MLR) with and without data pre-processing, were used. Based on the results obtained, it was verified that a simpler model (multiple linear regression with variable selection by genetic algorithm) produces results comparable to more complex methods (partial least squares). The relative errors obtained for the best model was around 3% for the sugar determination, which is acceptable for this kind of determination.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A new analytical method was developed to non-destructively determine pH and degree of polymerisation (DP) of cellulose in fibres in 19th 20th century painting canvases, and to identify the fibre type: cotton, linen, hemp, ramie or jute. The method is based on NIR spectroscopy and multivariate data analysis, while for calibration and validation a reference collection of 199 historical canvas samples was used. The reference collection was analysed destructively using microscopy and chemical analytical methods. Partial least squares regression was used to build quantitative methods to determine pH and DP, and linear discriminant analysis was used to determine the fibre type. To interpret the obtained chemical information, an expert assessment panel developed a categorisation system to discriminate between canvases that may not be fit to withstand excessive mechanical stress, e.g. transportation. The limiting DP for this category was found to be 600. With the new method and categorisation system, canvases of 12 Dalí paintings from the Fundació Gala-Salvador Dalí (Figueres, Spain) were non-destructively analysed for pH, DP and fibre type, and their fitness determined, which informs conservation recommendations. The study demonstrates that collection-wide canvas condition surveys can be performed efficiently and non-destructively, which could significantly improve collection management.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two spectrophotometric methods are described for the simultaneous determination of ezetimibe (EZE) and simvastatin (SIM) in pharmaceutical preparations. The obtained data was evaluated by using two different chemometric techniques, Principal Component Regression (PCR) and Partial Least-Squares (PLS-1). In these techniques, the concentration data matrix was prepared by using the mixtures containing these drugs in methanol. The absorbance data matrix corresponding to the concentration data matrix was obtained by the measurements of absorbances in the range of 240 - 300 nm in the intervals with Δλ = 1 nm at 61 wavelengths in their zero order spectra, then, calibration or regression was obtained by using the absorbance data matrix and concentration data matrix for the prediction of the unknown concentrations of EZE and SIM in their mixture. The procedure did not require any separation step. The linear range was found to be 5 - 20 µg mL-1 for EZE and SIM in both methods. The accuracy and precision of the methods were assessed. These methods were successfully applied to a pharmaceutical preparation, tablet; and the results were compared with each other.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genetic algorithm and multiple linear regression (GA-MLR), partial least square (GA-PLS), kernel PLS (GA-KPLS) and Levenberg-Marquardt artificial neural network (L-M ANN) techniques were used to investigate the correlation between retention index (RI) and descriptors for 116 diverse compounds in essential oils of six Stachys species. The correlation coefficient LGO-CV (Q²) between experimental and predicted RI for test set by GA-MLR, GA-PLS, GA-KPLS and L-M ANN was 0.886, 0.912, 0.937 and 0.964, respectively. This is the first research on the QSRR of the essential oil compounds against the RI using the GA-KPLS and L-M ANN.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper studies seemingly unrelated linear models with integrated regressors and stationary errors. By adding leads and lags of the first differences of the regressors and estimating this augmented dynamic regression model by feasible generalized least squares using the long-run covariance matrix, we obtain an efficient estimator of the cointegrating vector that has a limiting mixed normal distribution. Simulation results suggest that this new estimator compares favorably with others already proposed in the literature. We apply these new estimators to the testing of purchasing power parity (PPP) among the G-7 countries. The test based on the efficient estimates rejects the PPP hypothesis for most countries.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper proposes finite-sample procedures for testing the SURE specification in multi-equation regression models, i.e. whether the disturbances in different equations are contemporaneously uncorrelated or not. We apply the technique of Monte Carlo (MC) tests [Dwass (1957), Barnard (1963)] to obtain exact tests based on standard LR and LM zero correlation tests. We also suggest a MC quasi-LR (QLR) test based on feasible generalized least squares (FGLS). We show that the latter statistics are pivotal under the null, which provides the justification for applying MC tests. Furthermore, we extend the exact independence test proposed by Harvey and Phillips (1982) to the multi-equation framework. Specifically, we introduce several induced tests based on a set of simultaneous Harvey/Phillips-type tests and suggest a simulation-based solution to the associated combination problem. The properties of the proposed tests are studied in a Monte Carlo experiment which shows that standard asymptotic tests exhibit important size distortions, while MC tests achieve complete size control and display good power. Moreover, MC-QLR tests performed best in terms of power, a result of interest from the point of view of simulation-based tests. The power of the MC induced tests improves appreciably in comparison to standard Bonferroni tests and, in certain cases, outperforms the likelihood-based MC tests. The tests are applied to data used by Fischer (1993) to analyze the macroeconomic determinants of growth.