929 resultados para Asymptotic behaviour, Bayesian methods, Mixture models, Overfitting, Posterior concentration
Resumo:
Teoreettisen populaatiosynteesin avulla voidaan mallintaa tähtijoukkojen ja galaksien fotometrisiä ominaisuuksia yhdistämällä yksittäisten tähtien tuottama säteily, joka saadaan teoreettisista tähtien kehitysmalleista. Valitsemalla sopiva massajakauma syntyville tähdille voidaan muodostaa yksinkertainen tähtipopulaatio, joka koostuu saman ikäisistä ja kemialliselta koostumukseltaan yhtenäisistä tähdistä. Monimutkaisempia tähtipopulaatioita voidaan muodostaa konvoloimalla yksinkertaisten tähtipopulaatioiden luminositeetti jonkin valitun tähtienmuodostushistorian kanssa sekä yhdistämällä näin muodostettuja populaatioita. Tässä työssä tarkastellaan asymptoottisen jättiläishaaran (AGB) tähtien uusien, tarkentuneiden evoluutiomallien vaikutusta populaatiosynteesin tuloksiin niin yksinkertaisten tähtipopulaatioiden kuin galaksien mallinnukseen soveltuvien monimutkaisempien tähtipopulaatioiden kohdalla. Työn päätarkoitus on tuottaa uudistuneisiin malleihin perustuvat populaation massa-luminositeetti -suhteen ja värin väliset relaatiot (MLC-relaatiot). MLC-relaatioita voidaan käyttää populaation massan määrittämiseen sen fotometristen ominaisuuksien (väri, luminositeetti) perusteella. Lisäksi tutkitaan tähtienvälisen pölyn vaikutusta yksinkertaisen spiraaligalaksimallin MLC-relaatioihin. Työssä käytetyt tähtien kehitysmallit perustuvat julkaisuun Marigo et al. (Astronomy & Astrophysics 482, 2008). Havaitaan, että AGB-tähtien vaikutus populaation integroituun luminositeettiin on pieni näkyvillä aallonpituuksilla, mutta merkittävä lähi-infrapuna-alueella. Vaikutus MLC-relaatioihin on vastaavasti merkittävä tarkkailtaessa luminositeettia lähi-infrapunassa sekä käytettäessä värejä, joissa yhdistetään optisia ja lähi-infrapunan kaistoja. Todetaan, että MLC-relaatioiden käyttö lähi-infrapunassa edellyttää tarkentuneen AGB-vaiheen sisällyttämistä populaatiosynteesin malleihin. Tähtienvälisen pölyn vaikutus MLC-relaatioihin todetaan riippuvan käytetystä kaistasta ja väristä, mutta vaikutuksen havaitaan olevan suurin optisen ja lähi-infrapunan väriyhdistelmillä.
Resumo:
The results shown in this thesis are based on selected publications of the 2000s decade. The work was carried out in several national and EC funded public research projects and in close cooperation with industrial partners. The main objective of the thesis was to study and quantify the most important phenomena of circulating fluidized bed combustors by developing and applying proper experimental and modelling methods using laboratory scale equipments. An understanding of the phenomena plays an essential role in the development of combustion and emission performance, and the availability and controls of CFB boilers. Experimental procedures to study fuel combustion behaviour under CFB conditions are presented in the thesis. Steady state and dynamic measurements under well controlled conditions were carried out to produce the data needed for the development of high efficiency, utility scale CFB technology. The importance of combustion control and furnace dynamics is emphasized when CFB boilers are scaled up with a once through steam cycle. Qualitative information on fuel combustion characteristics was obtained directly by comparing flue gas oxygen responses during the impulse change experiments with fuel feed. A one-dimensional, time dependent model was developed to analyse the measurement data Emission formation was studied combined with fuel combustion behaviour. Correlations were developed for NO, N2O, CO and char loading, as a function of temperature and oxygen concentration in the bed area. An online method to characterize char loading under CFB conditions was developed and validated with the pilot scale CFB tests. Finally, a new method to control air and fuel feeds in CFB combustion was introduced. The method is based on models and an analysis of the fluctuation of the flue gas oxygen concentration. The effect of high oxygen concentrations on fuel combustion behaviour was also studied to evaluate the potential of CFB boilers to apply oxygenfiring technology to CCS. In future studies, it will be necessary to go through the whole scale up chain from laboratory phenomena devices through pilot scale test rigs to large scale, commercial boilers in order to validate the applicability and scalability of the, results. This thesis shows the chain between the laboratory scale phenomena test rig (bench scale) and the CFB process test rig (pilot). CFB technology has been scaled up successfully from an industrial scale to a utility scale during the last decade. The work shown in the thesis, for its part, has supported the development by producing new detailed information on combustion under CFB conditions.
Resumo:
Forest inventories are used to estimate forest characteristics and the condition of forest for many different applications: operational tree logging for forest industry, forest health state estimation, carbon balance estimation, land-cover and land use analysis in order to avoid forest degradation etc. Recent inventory methods are strongly based on remote sensing data combined with field sample measurements, which are used to define estimates covering the whole area of interest. Remote sensing data from satellites, aerial photographs or aerial laser scannings are used, depending on the scale of inventory. To be applicable in operational use, forest inventory methods need to be easily adjusted to local conditions of the study area at hand. All the data handling and parameter tuning should be objective and automated as much as possible. The methods also need to be robust when applied to different forest types. Since there generally are no extensive direct physical models connecting the remote sensing data from different sources to the forest parameters that are estimated, mathematical estimation models are of "black-box" type, connecting the independent auxiliary data to dependent response data with linear or nonlinear arbitrary models. To avoid redundant complexity and over-fitting of the model, which is based on up to hundreds of possibly collinear variables extracted from the auxiliary data, variable selection is needed. To connect the auxiliary data to the inventory parameters that are estimated, field work must be performed. In larger study areas with dense forests, field work is expensive, and should therefore be minimized. To get cost-efficient inventories, field work could partly be replaced with information from formerly measured sites, databases. The work in this thesis is devoted to the development of automated, adaptive computation methods for aerial forest inventory. The mathematical model parameter definition steps are automated, and the cost-efficiency is improved by setting up a procedure that utilizes databases in the estimation of new area characteristics.
Resumo:
In any decision making under uncertainties, the goal is mostly to minimize the expected cost. The minimization of cost under uncertainties is usually done by optimization. For simple models, the optimization can easily be done using deterministic methods.However, many models practically contain some complex and varying parameters that can not easily be taken into account using usual deterministic methods of optimization. Thus, it is very important to look for other methods that can be used to get insight into such models. MCMC method is one of the practical methods that can be used for optimization of stochastic models under uncertainty. This method is based on simulation that provides a general methodology which can be applied in nonlinear and non-Gaussian state models. MCMC method is very important for practical applications because it is a uni ed estimation procedure which simultaneously estimates both parameters and state variables. MCMC computes the distribution of the state variables and parameters of the given data measurements. MCMC method is faster in terms of computing time when compared to other optimization methods. This thesis discusses the use of Markov chain Monte Carlo (MCMC) methods for optimization of Stochastic models under uncertainties .The thesis begins with a short discussion about Bayesian Inference, MCMC and Stochastic optimization methods. Then an example is given of how MCMC can be applied for maximizing production at a minimum cost in a chemical reaction process. It is observed that this method performs better in optimizing the given cost function with a very high certainty.
Resumo:
Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.
Resumo:
The aim of this work is to apply approximate Bayesian computation in combination with Marcov chain Monte Carlo methods in order to estimate the parameters of tuberculosis transmission. The methods are applied to San Francisco data and the results are compared with the outcomes of previous works. Moreover, a methodological idea with the aim to reduce computational time is also described. Despite the fact that this approach is proved to work in an appropriate way, further analysis is needed to understand and test its behaviour in different cases. Some related suggestions to its further enhancement are described in the corresponding chapter.
Resumo:
Human activity recognition in everyday environments is a critical, but challenging task in Ambient Intelligence applications to achieve proper Ambient Assisted Living, and key challenges still remain to be dealt with to realize robust methods. One of the major limitations of the Ambient Intelligence systems today is the lack of semantic models of those activities on the environment, so that the system can recognize the speci c activity being performed by the user(s) and act accordingly. In this context, this thesis addresses the general problem of knowledge representation in Smart Spaces. The main objective is to develop knowledge-based models, equipped with semantics to learn, infer and monitor human behaviours in Smart Spaces. Moreover, it is easy to recognize that some aspects of this problem have a high degree of uncertainty, and therefore, the developed models must be equipped with mechanisms to manage this type of information. A fuzzy ontology and a semantic hybrid system are presented to allow modelling and recognition of a set of complex real-life scenarios where vagueness and uncertainty are inherent to the human nature of the users that perform it. The handling of uncertain, incomplete and vague data (i.e., missing sensor readings and activity execution variations, since human behaviour is non-deterministic) is approached for the rst time through a fuzzy ontology validated on real-time settings within a hybrid data-driven and knowledgebased architecture. The semantics of activities, sub-activities and real-time object interaction are taken into consideration. The proposed framework consists of two main modules: the low-level sub-activity recognizer and the high-level activity recognizer. The rst module detects sub-activities (i.e., actions or basic activities) that take input data directly from a depth sensor (Kinect). The main contribution of this thesis tackles the second component of the hybrid system, which lays on top of the previous one, in a superior level of abstraction, and acquires the input data from the rst module's output, and executes ontological inference to provide users, activities and their in uence in the environment, with semantics. This component is thus knowledge-based, and a fuzzy ontology was designed to model the high-level activities. Since activity recognition requires context-awareness and the ability to discriminate among activities in di erent environments, the semantic framework allows for modelling common-sense knowledge in the form of a rule-based system that supports expressions close to natural language in the form of fuzzy linguistic labels. The framework advantages have been evaluated with a challenging and new public dataset, CAD-120, achieving an accuracy of 90.1% and 91.1% respectively for low and high-level activities. This entails an improvement over both, entirely data-driven approaches, and merely ontology-based approaches. As an added value, for the system to be su ciently simple and exible to be managed by non-expert users, and thus, facilitate the transfer of research to industry, a development framework composed by a programming toolbox, a hybrid crisp and fuzzy architecture, and graphical models to represent and con gure human behaviour in Smart Spaces, were developed in order to provide the framework with more usability in the nal application. As a result, human behaviour recognition can help assisting people with special needs such as in healthcare, independent elderly living, in remote rehabilitation monitoring, industrial process guideline control, and many other cases. This thesis shows use cases in these areas.
Resumo:
The two main objectives of Bayesian inference are to estimate parameters and states. In this thesis, we are interested in how this can be done in the framework of state-space models when there is a complete or partial lack of knowledge of the initial state of a continuous nonlinear dynamical system. In literature, similar problems have been referred to as diffuse initialization problems. This is achieved first by extending the previously developed diffuse initialization Kalman filtering techniques for discrete systems to continuous systems. The second objective is to estimate parameters using MCMC methods with a likelihood function obtained from the diffuse filtering. These methods are tried on the data collected from the 1995 Ebola outbreak in Kikwit, DRC in order to estimate the parameters of the system.
Resumo:
Solid mixtures for refreshment are already totally integrated to the Brazilian consumers' daily routine, because of their quick preparation method, yield and reasonable price - quite lower if compared to 'ready-to-drink' products or products for prompt consumption, what makes them economically more accessible to low-income populations. Within such a context, the aim of this work was to evaluate the physicochemical and mineral composition, as well as the hygroscopic behavior of four different brands of solid mixture for mango refreshment. The BET, GAB, Oswim and Henderson mathematical models were built through the adjustment of experimental data to the isotherms of adsorption. Results from the physiochemical evaluation showed that the solid mixtures for refreshments are considerable sources of ascorbic acid and reductor sugar; and regarding mineral compounds, they are significant sources of calcium, sodium and potassium. It was also verified that the solid mixtures for refreshments of the four studied brands are considered highly hygroscopic.
Resumo:
Background: The purpose of this study was to examine the relationships between physical activity and healthy eating behaviour with the participant's motives and goals for each health behaviour. Methods: Participants (N 121; 93.2% female) enrolled in commercial weightloss programs at the time of data collection, completed self-reported instruments using a web-based interface that were in accordance with Deci and Ryan's (2002) Self-Determination Theory (SDT). Results: Multiple linear regression models revealed that motivation and goals collectively accounted for between 0.21 to 0.29 percent and 0.03 to 0.16 percent of the variance in physical and healthy eating behaviours in this sample. In general, goals regarding either behaviour did not appear to have strong predictive relationships with each health behaviour beyond the contributions of motives. Discussion: Overall, findings from this study suggest that motives seem to mattermore than goals for both physical activity and healthy eating behaviour in clientele of commercial weight-loss programs. Therefore commercial weight-loss program implementers may want to consider placing more attention on motives I than goals for their clientele when designing weight-loss and weight-maintenance initiatives.
Resumo:
The purpose of this study is to examine the impact of the choice of cut-off points, sampling procedures, and the business cycle on the accuracy of bankruptcy prediction models. Misclassification can result in erroneous predictions leading to prohibitive costs to firms, investors and the economy. To test the impact of the choice of cut-off points and sampling procedures, three bankruptcy prediction models are assessed- Bayesian, Hazard and Mixed Logit. A salient feature of the study is that the analysis includes both parametric and nonparametric bankruptcy prediction models. A sample of firms from Lynn M. LoPucki Bankruptcy Research Database in the U. S. was used to evaluate the relative performance of the three models. The choice of a cut-off point and sampling procedures were found to affect the rankings of the various models. In general, the results indicate that the empirical cut-off point estimated from the training sample resulted in the lowest misclassification costs for all three models. Although the Hazard and Mixed Logit models resulted in lower costs of misclassification in the randomly selected samples, the Mixed Logit model did not perform as well across varying business-cycles. In general, the Hazard model has the highest predictive power. However, the higher predictive power of the Bayesian model, when the ratio of the cost of Type I errors to the cost of Type II errors is high, is relatively consistent across all sampling methods. Such an advantage of the Bayesian model may make it more attractive in the current economic environment. This study extends recent research comparing the performance of bankruptcy prediction models by identifying under what conditions a model performs better. It also allays a range of user groups, including auditors, shareholders, employees, suppliers, rating agencies, and creditors' concerns with respect to assessing failure risk.
Resumo:
Volume(density)-independent pair-potentials cannot describe metallic cohesion adequately as the presence of the free electron gas renders the total energy strongly dependent on the electron density. The embedded atom method (EAM) addresses this issue by replacing part of the total energy with an explicitly density-dependent term called the embedding function. Finnis and Sinclair proposed a model where the embedding function is taken to be proportional to the square root of the electron density. Models of this type are known as Finnis-Sinclair many body potentials. In this work we study a particular parametrization of the Finnis-Sinclair type potential, called the "Sutton-Chen" model, and a later version, called the "Quantum Sutton-Chen" model, to study the phonon spectra and the temperature variation thermodynamic properties of fcc metals. Both models give poor results for thermal expansion, which can be traced to rapid softening of transverse phonon frequencies with increasing lattice parameter. We identify the power law decay of the electron density with distance assumed by the model as the main cause of this behaviour and show that an exponentially decaying form of charge density improves the results significantly. Results for Sutton-Chen and our improved version of Sutton-Chen models are compared for four fcc metals: Cu, Ag, Au and Pt. The calculated properties are the phonon spectra, thermal expansion coefficient, isobaric heat capacity, adiabatic and isothermal bulk moduli, atomic root-mean-square displacement and Gr\"{u}neisen parameter. For the sake of comparison we have also considered two other models where the distance-dependence of the charge density is an exponential multiplied by polynomials. None of these models exhibits the instability against thermal expansion (premature melting) as shown by the Sutton-Chen model. We also present results obtained via pure pair potential models, in order to identify advantages and disadvantages of methods used to obtain the parameters of these potentials.
Resumo:
We study the problem of measuring the uncertainty of CGE (or RBC)-type model simulations associated with parameter uncertainty. We describe two approaches for building confidence sets on model endogenous variables. The first one uses a standard Wald-type statistic. The second approach assumes that a confidence set (sampling or Bayesian) is available for the free parameters, from which confidence sets are derived by a projection technique. The latter has two advantages: first, confidence set validity is not affected by model nonlinearities; second, we can easily build simultaneous confidence intervals for an unlimited number of variables. We study conditions under which these confidence sets take the form of intervals and show they can be implemented using standard methods for solving CGE models. We present an application to a CGE model of the Moroccan economy to study the effects of policy-induced increases of transfers from Moroccan expatriates.
Resumo:
In the context of multivariate regression (MLR) and seemingly unrelated regressions (SURE) models, it is well known that commonly employed asymptotic test criteria are seriously biased towards overrejection. in this paper, we propose finite-and large-sample likelihood-based test procedures for possibly non-linear hypotheses on the coefficients of MLR and SURE systems.