28 resultados para sparse Bayesian regression
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.
Resumo:
Forest inventories are used to estimate forest characteristics and the condition of forest for many different applications: operational tree logging for forest industry, forest health state estimation, carbon balance estimation, land-cover and land use analysis in order to avoid forest degradation etc. Recent inventory methods are strongly based on remote sensing data combined with field sample measurements, which are used to define estimates covering the whole area of interest. Remote sensing data from satellites, aerial photographs or aerial laser scannings are used, depending on the scale of inventory. To be applicable in operational use, forest inventory methods need to be easily adjusted to local conditions of the study area at hand. All the data handling and parameter tuning should be objective and automated as much as possible. The methods also need to be robust when applied to different forest types. Since there generally are no extensive direct physical models connecting the remote sensing data from different sources to the forest parameters that are estimated, mathematical estimation models are of "black-box" type, connecting the independent auxiliary data to dependent response data with linear or nonlinear arbitrary models. To avoid redundant complexity and over-fitting of the model, which is based on up to hundreds of possibly collinear variables extracted from the auxiliary data, variable selection is needed. To connect the auxiliary data to the inventory parameters that are estimated, field work must be performed. In larger study areas with dense forests, field work is expensive, and should therefore be minimized. To get cost-efficient inventories, field work could partly be replaced with information from formerly measured sites, databases. The work in this thesis is devoted to the development of automated, adaptive computation methods for aerial forest inventory. The mathematical model parameter definition steps are automated, and the cost-efficiency is improved by setting up a procedure that utilizes databases in the estimation of new area characteristics.
Resumo:
In order to reduce greenhouse emissions from forest degradation and deforestation the international programme REDD (Reducing Emissions from Deforestation and forest Degradation) was established in 2005 by the United Nations Framework Convention on Climate Change (UNFCCC). This programme is aimed to financially reward to developing countries for any emissions reductions. Under this programm the project of setting up the payment system in Nepal was established. This project is aimed to engage local communities in forest monitoring. The major objective of this thesis is to compare and verify data obtained from di erect sources - remotely sensed data, namely LiDAR and field sample measurements made by two groups of researchers using two regression models - Sparse Bayesian Regression and Bayesian Regression with Orthogonal Variables.
Resumo:
The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.
Resumo:
Summary
Resumo:
Abstract
Resumo:
Abstract
Resumo:
Kolmen eri hitsausliitoksen väsymisikä arvio on analysoitu monimuuttuja regressio analyysin avulla. Regression perustana on laaja S-N tietokanta joka on kerätty kirjallisuudesta. Tarkastellut liitokset ovat tasalevy liitos, krusiformi liitos ja pitkittäisripa levyssä. Muuttujina ovat jännitysvaihtelu, kuormitetun levyn paksuus ja kuormitus tapa. Paksuus effekti on käsitelty uudelleen kaikkia kolmea liitosta ajatellen. Uudelleen käsittelyn avulla on varmistettu paksuus effektin olemassa olo ennen monimuuttuja regressioon siirtymistä. Lineaariset väsymisikä yhtalöt on ajettu kolmelle hitsausliitokselle ottaen huomioon kuormitetun levyn paksuus sekä kuormitus tapa. Väsymisikä yhtalöitä on verrattu ja keskusteltu testitulosten valossa, jotka on kerätty kirjallisuudesta. Neljä tutkimustaon tehty kerättyjen väsymistestien joukosta ja erilaisia väsymisikä arvio metodeja on käytetty väsymisiän arviointiin. Tuloksia on tarkasteltu ja niistä keskusteltu oikeiden testien valossa. Tutkimuksissa on katsottu 2mm ja 6mm symmetristäpitkittäisripaa levyssä, 12.7mm epäsymmetristä pitkittäisripaa, 38mm symmetristä pitkittäisripaa vääntökuormituksessa ja 25mm/38mm kuorman kantavaa krusiformi liitosta vääntökuormituksessa. Mallinnus on tehty niin lähelle testi liitosta kuin mahdollista. Väsymisikä arviointi metodit sisältävät hot-spot metodin jossa hot-spot jännitys on laskettu kahta lineaarista ja epälineaarista ekstrapolointiakäyttäen sekä paksuuden läpi integrointia käyttäen. Lovijännitys ja murtumismekaniikka metodeja on käytetty krusiformi liitosta laskiessa.
Resumo:
Huolimatta korkeasta automaatioasteesta sorvausteollisuudessa, muutama keskeinen ongelma estää sorvauksen täydellisen automatisoinnin. Yksi näistä ongelmista on työkalun kuluminen. Tämä työ keskittyy toteuttamaan automaattisen järjestelmän kulumisen, erityisesti viistekulumisen, mittaukseen konenäön avulla. Kulumisen mittausjärjestelmä poistaa manuaalisen mittauksen tarpeen ja minimoi ajan, joka käytetään työkalun kulumisen mittaukseen. Mittauksen lisäksi tutkitaan kulumisen mallinnusta sekä ennustamista. Automaattinen mittausjärjestelmä sijoitettiin sorvin sisälle ja järjestelmä integroitiin onnistuneesti ulkopuolisten järjestelmien kanssa. Tehdyt kokeet osoittivat, että mittausjärjestelmä kykenee mittaamaan työkalun kulumisen järjestelmän oikeassa ympäristössä. Mittausjärjestelmä pystyy myös kestämään häiriöitä, jotka ovat konenäköjärjestelmille yleisiä. Työkalun kulumista mallinnusta tutkittiin useilla eri menetelmillä. Näihin kuuluivat muiden muassa neuroverkot ja tukivektoriregressio. Kokeet osoittivat, että tutkitut mallit pystyivät ennustamaan työkalun kulumisasteen käytetyn ajan perusteella. Parhaan tuloksen antoivat neuroverkot Bayesiläisellä regularisoinnilla.
Resumo:
This thesis investigates performance persistence among the equity funds investing in Russia during 2003-2007. Fund performance is measured using several methods including the Jensen alpha, the Fama-French 3- factor alpha, the Sharpe ratio and two of its variations. Moreover, we apply the Bayesian shrinkage estimation in performance measurement and evaluate its usefulness compared with the OLS 3-factor alphas. The pattern of performance persistence is analyzed using the Spearman rank correlation test, cross-sectional regression analysis and stacked return time series. Empirical results indicate that the Bayesian shrinkage estimates may provide better and more accurate estimates of fund performance compared with the OLS 3-factor alphas. Secondly, based on the results it seems that the degree of performance persistence is strongly related to length of the observation period. For the full sample period the results show strong signs of performance reversal whereas for the subperiod analysis the results indicate performance persistence during the most recent years.
Resumo:
In mathematical modeling the estimation of the model parameters is one of the most common problems. The goal is to seek parameters that fit to the measurements as well as possible. There is always error in the measurements which implies uncertainty to the model estimates. In Bayesian statistics all the unknown quantities are presented as probability distributions. If there is knowledge about parameters beforehand, it can be formulated as a prior distribution. The Bays’ rule combines the prior and the measurements to posterior distribution. Mathematical models are typically nonlinear, to produce statistics for them requires efficient sampling algorithms. In this thesis both Metropolis-Hastings (MH), Adaptive Metropolis (AM) algorithms and Gibbs sampling are introduced. In the thesis different ways to present prior distributions are introduced. The main issue is in the measurement error estimation and how to obtain prior knowledge for variance or covariance. Variance and covariance sampling is combined with the algorithms above. The examples of the hyperprior models are applied to estimation of model parameters and error in an outlier case.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
Stratospheric ozone can be measured accurately using a limb scatter remote sensing technique at the UV-visible spectral region of solar light. The advantages of this technique includes a good vertical resolution and a good daytime coverage of the measurements. In addition to ozone, UV-visible limb scatter measurements contain information about NO2, NO3, OClO, BrO and aerosols. There are currently several satellite instruments continuously scanning the atmosphere and measuring the UVvisible region of the spectrum, e.g., the Optical Spectrograph and Infrared Imager System (OSIRIS) launched on the Odin satellite in February 2001, and the Scanning Imaging Absorption SpectroMeter for Atmospheric CartograpHY (SCIAMACHY) launched on Envisat in March 2002. Envisat also carries the Global Ozone Monitoring by Occultation of Stars (GOMOS) instrument, which also measures limb-scattered sunlight under bright limb occultation conditions. These conditions occur during daytime occultation measurements. The global coverage of the satellite measurements is far better than any other ozone measurement technique, but still the measurements are sparse in the spatial domain. Measurements are also repeated relatively rarely over a certain area, and the composition of the Earth’s atmosphere changes dynamically. Assimilation methods are therefore needed in order to combine the information of the measurements with the atmospheric model. In recent years, the focus of assimilation algorithm research has turned towards filtering methods. The traditional Extended Kalman filter (EKF) method takes into account not only the uncertainty of the measurements, but also the uncertainty of the evolution model of the system. However, the computational cost of full blown EKF increases rapidly as the number of the model parameters increases. Therefore the EKF method cannot be applied directly to the stratospheric ozone assimilation problem. The work in this thesis is devoted to the development of inversion methods for satellite instruments and the development of assimilation methods used with atmospheric models.