24 resultados para PROBABILISTIC FORECASTS


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Minimum Description Length (MDL) is an information-theoretic principle that can be used for model selection and other statistical inference tasks. There are various ways to use the principle in practice. One theoretically valid way is to use the normalized maximum likelihood (NML) criterion. Due to computational difficulties, this approach has not been used very often. This thesis presents efficient floating-point algorithms that make it possible to compute the NML for multinomial, Naive Bayes and Bayesian forest models. None of the presented algorithms rely on asymptotic analysis and with the first two model classes we also discuss how to compute exact rational number solutions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

What can the statistical structure of natural images teach us about the human brain? Even though the visual cortex is one of the most studied parts of the brain, surprisingly little is known about how exactly images are processed to leave us with a coherent percept of the world around us, so we can recognize a friend or drive on a crowded street without any effort. By constructing probabilistic models of natural images, the goal of this thesis is to understand the structure of the stimulus that is the raison d etre for the visual system. Following the hypothesis that the optimal processing has to be matched to the structure of that stimulus, we attempt to derive computational principles, features that the visual system should compute, and properties that cells in the visual system should have. Starting from machine learning techniques such as principal component analysis and independent component analysis we construct a variety of sta- tistical models to discover structure in natural images that can be linked to receptive field properties of neurons in primary visual cortex such as simple and complex cells. We show that by representing images with phase invariant, complex cell-like units, a better statistical description of the vi- sual environment is obtained than with linear simple cell units, and that complex cell pooling can be learned by estimating both layers of a two-layer model of natural images. We investigate how a simplified model of the processing in the retina, where adaptation and contrast normalization take place, is connected to the nat- ural stimulus statistics. Analyzing the effect that retinal gain control has on later cortical processing, we propose a novel method to perform gain control in a data-driven way. Finally we show how models like those pre- sented here can be extended to capture whole visual scenes rather than just small image patches. By using a Markov random field approach we can model images of arbitrary size, while still being able to estimate the model parameters from the data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We study how probabilistic reasoning and inductive querying can be combined within ProbLog, a recent probabilistic extension of Prolog. ProbLog can be regarded as a database system that supports both probabilistic and inductive reasoning through a variety of querying mechanisms. After a short introduction to ProbLog, we provide a survey of the different types of inductive queries that ProbLog supports, and show how it can be applied to the mining of large biological networks.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The trade of the financial analyst is currently a much-debated issue in today’s media. As a large part of the investment analysis is conducted under the broker firms’ regime, the incentives of the financial analyst and the investor do not always align. The broker firm’s commercial incentives may be to maximise its commission from securities trading and underwriting fees. The purpose of this thesis is to extend our understanding of the work of a financial analyst, the incentives he faces and how these affect his actions. The first essay investigates how the economic significance of the coverage of a particular firm impacts the analysts’ accuracy of estimation. The hypothesis is that analysts put more effort in analysing firms with a relatively higher trading volume, as these firms usually yield higher commissions. The second essay investigates how analysts interpret new financial statement information. The essay shows that analysts underreact or overreact to prior reported earnings, depending on the short-term pattern in reported earnings. The third essay investigates the possible investment value in Finnish stock recommendations, issued by sell side analysts. It is established that consensus recommendations issued on Finnish stocks contain investment value. Further, the investment value in consensus recommendations improves significantly through the exclusion of recommendations issued by banks. The fourth essay investigates investors’ behaviour prior to financial analysts’ earnings forecast revisions. Lately, the financial press have reported cases were financial analysts warn their preferred clients of possible earnings forecast revisions. However, in the light of the empirical results, it appears that the problem of analysts leaking information to some selected customers does not appear systematically on the Finnish stock market.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Vuorokausivirtaaman ennustaminen yhdyskuntien vesi- ja viemärilaitosten yleissuunnittelussa.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this dissertation, I present an overall methodological framework for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely ajatella, miettiä, pohtia and harkita ‘think, reflect, ponder, consider’. As a continuation to previous work, I describe in considerable detail the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. The applied statistical methods are arranged into a succession of stages with increasing complexity, proceeding from univariate via bivariate to multivariate techniques in the end. As the central multivariate method, I argue for the use of polytomous logistic regression and demonstrate its practical implementation to the studied phenomenon, thus extending the work by Bresnan et al. (2007), who applied simple (binary) logistic regression to a dichotomous structural alternation in English. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are indeed associated with the use and selection of the selected think lexemes; however, a substantial part of these features are not exemplified in current Finnish lexicographical descriptions. The multivariate analysis results indicate that the semantic classifications of syntactic argument types are on the average the most distinctive feature category, followed by overall semantic characterizations of the verb chains, and then syntactic argument types alone, with morphological features pertaining to the verb chain and extra-linguistic features relegated to the last position. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seems to reach a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the immediate sentential context and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. The results also support Bresnan’s (2007) and others’ (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context – represented as a feature cluster – that can be analytically grasped and identified. Instead, most contexts exhibit degrees of variation as to their outcomes, resulting in proportionate choices over longer stretches of usage in texts or speech.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The general change in the population structure and its impacts on the forest ownership structure were investigated in the thesis. The research assumed that the structural change in society has an effect on the outlook of the non-industrial private forest ownership. The changes in the structure of society were mainly restricted to population, education and occupation structures. The migration of the rural population into cities was also taken into consideration. The structural changes both in society and the non-industrial private forest ownership were examined as phenomena and their development directions were investigated since the middle of the 1970s. It could be established that the changes in the structures were mainly of the same kind in society as in forest owner structure. The clearest similarities between the changes in population and forest owner structure could be found in an increased mean age, a decrease in the 18 to 39 age bracket, those without a degree and in the farmers' shares. Furthermore it could be stated that migration into cities had taken place among both the forest owners and the general population. The main part of the research was concentrated on estimating regression models that explain the non-industrial private forest ownership change by the structural change in the population. A panel data was gathered from population statistics and previous forest ownership research information. The panel contained the years 1990 and 1999. With the assistance of the panel data it was possible to estimate regression and fixed effects' models that explained the structural changes in the non-industrial private forest ownership by evolution in the whole population. In the use of the estimated models authorities' forecasts considering the population were exploited. Only a few of the estimated models were statistically significant. This could be explained due to lack of a larger panel data. In addition the structural change of the non-industrial forest ownership was forecasted by trends.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Köyhiä maanviljelijöitä on usein syytetty kehitysmaiden ympäristöongelmista. On väitetty, että eloonjäämistaistelu pakottaa heidät käyttämään maata ja muita luonnonvaroja lyhytnäköisesti. Harva asiaa koskeva tutkimus on kuitenkaan tukenut tätä väitettä; perheiden köyhyyden astetta ja heidän aiheuttamaansa ympäristövaikutusta ei ole kyetty kytkemään toisiinsa. Selkeyttääkseen köyhyys-ympäristö –keskustelua, Thomas Reardon ja Steven Vosti kehittivät investointiköyhyyden käsitteen. Se tunnistaa sen kenties suuren joukon maanviljelijäperheitä, jotka eivät ole köyhiä perinteisten köyhyysmittareiden mukaan, mutta joiden hyvinvointi ei ole riittävästi köyhyysrajojen yläpuolella salliakseen perheen investoida kestävämpään maankäyttöön. Reardon ja Vosti korostivat myös omaisuuden vaikutusta perheiden hyvinvointiin, ja uskoivat sen vaikuttavan tuotanto- ja investointipäätöksiin. Tässä tutkimuksessa pyritään vastaamaan kahteen kysymykseen: Miten investointiköyhyyttä voidaan ymmärtää ja mitata? Ja, mikä on viljelijäperheiden omaisuuden hyvinvointia lisäävä vaikutus? Tätä tutkimusta varten haastateltiin 402 maanviljelijäperhettä Väli-Amerikassa, Panaman tasavallan Herreran läänissä. Näiden perheiden hyvinvointia mitattiin heidän kulutuksensa mukaan, ja paikalliset köyhyysrajat laskettiin paikallisen ruoan hinnan mukaan. Herrerassa ihminen tarvitsee keskimäärin 494 dollaria vuodessa saadakseen riittävän ravinnon, tai 876 dollaria vuodessa voidakseen ravinnon lisäksi kattaa muitakin välttämättömiä menoja. Ruoka- eli äärimmäisen köyhyyden rajan alle jäi 15,4% tutkituista perheistä, ja 33,6% oli jokseenkin köyhiä, eli saavutti kyllä riittävän ravitsemuksen, muttei kyennyt kustantamaan muita perustarpeitaan. Molempien köyhyysrajojen yläpuolelle ylsi siis 51% tutkituista perheistä. Näiden köyhyysryhmien välillä on merkittäviä eroavaisuuksia ei vain perheiden varallisuuden, tulojen ja investointistrategioiden välillä, mutta myös perheiden rakenteessa, elinympäristössä ja mahdollisuuksissa saada palveluja. Investointiköyhyyden mittaaminen osoittautui haastavaksi. Herrerassa viljelijät eivät tee investointeja puhtaasti ympäristönsuojeluun, eikä maankäytön kestävyyttä muutenkaan pystytty yhdistämään perheiden hyvinvoinnin tasoon. Siksi investointiköyhyyttä etsittiin sellaisena hyvinvoinnin tasona, jonka alapuolella elävien perheiden parissa tuottavat maanparannusinvestoinnit eivät enää ole suorassa suhteessa hyvinvointiin. Tällaisia investointeja ovat mm. istutetut aidat, lannoitus ja paranneltujen laiduntyyppien viljely. Havaittiin, että jos perheen hyvinvointi putoaa alle 1000 dollarin/henkilö/vuosi, tällaiset tuottavat maanparannusinvestoinnit muuttuvat erittäin harvinaisiksi. Investointiköyhyyden raja on siis noin kaksi kertaa riittävän ravitsemuksen hinta, ja sen ylitti 42,3% tutkituista perheistä. Heille on tyypillistä, että molemmat puolisot käyvät työssä, ovat korkeasti koulutettuja ja yhteisössään aktiivisia, maatila tuottaa paremmin, tilalla kasvatetaan vaativampia kasveja, ja että he ovat kerryttäneet enemmän omaisuutta kuin investointi-köyhyyden rajan alla elävät perheet. Tässä tutkimuksessa kyseenalaistettiin yleinen oletus, että omaisuudesta olisi poikkeuksetta hyötyä viljelijäperheelle. Niinpä omaisuuden vaikutusta perheiden hyvinvointiin tutkittiin selvittämällä, mitä reittejä pitkin perheiden omistama maa, karja, koulutus ja työikäiset perheenjäsenet voisivat lisätä perheen hyvinvointia. Näiden hyvinvointi-mekanismien ajateltiin myös riippuvan monista väliin tulevista tekijöistä. Esimerkiksi koulutus voisi lisätä hyvinvointia, jos sen avulla saataisiin paremmin palkattuja töitä tai perustettaisiin yritys; mutta näihin mekanismeihin saattaa vaikuttaa vaikkapa etäisyys kaupungeista tai se, omistaako perhe ajoneuvon. Köyhimpien perheiden parissa nimenomaan koulutus olikin ainoa tutkittu omaisuuden muoto, joka edisti perheen hyvinvointia, kun taas maasta, karjasta tai työvoimasta ei ollut apua köyhyydestä nousemiseen. Varakkaampien perheiden parissa sen sijaan korkeampaa hyvinvointia tuottivat koulutuksen lisäksi myös maa ja työvoima, joskin monesta väliin tulevasta muuttujasta, kuten tuotantopanoksista riippuen. Ei siis ole automaatiota, jolla omaisuus parantaisi perheiden hyvinvointia. Vaikka rikkailla onkin yleensä enemmän karjaa kuin köyhemmillä, ei tässä aineistossa löydetty yhtään mekanismia, jota kautta karjan määrä tuottaisi korkeampaa hyvinvointia viljelijäperheille. Omaisuuden keräämisen ja hyödyntämisen strategiat myös muuttuvat hyvinvoinnin kasvaessa ja niihin vaikuttavat monet ulkoiset tekijät. Ympäristön ja köyhyyden suhde on siis edelleen epäselvä. Köyhyyden voittaminen vaatii pitkällä tähtäimellä sitä, että viljelijäperheet nousisivat investointiköyhyyden rajan yläpuolelle. Näin heillä olisi varaa alkaa kartuttaa omaisuutta ja investoida kestävämpään maankäyttöön. Tällä hetkellä kuitenkin isolle osalle herreralaisia perheitä tuo raja on kaukana tavoittamattomissa. Miten päästä yli tuhannen dollarin kulutukseen perheenjäsentä kohden, mikäli elintaso ei yllä edes riittävään ravitsemukseen? Ja sittenkin, vaikka hyvinvointi kohenisi, ei ympäristön kannalta parannuksia ole välttämättä odotettavissa, mikäli karjalaumat kasvavat ja eroosioalttiit laitumet leviävät.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The aim of this work was the assessment about the structure and use of the conceptual model of occlusion in operational weather forecasting. In the beginning a survey has been made about the conceptual model of occlusion as introduced to operational forecasters in the Finnish Meteorological Institute (FMI). In the same context an overview has been performed about the use of the conceptual model in modern operational weather forecasting, especially in connection with the widespread use of numerical forecasts. In order to evaluate the features of the occlusions in operational weather forecasting, all the occlusion processes occurring during year 2003 over Europe and Northern Atlantic area have been investigated using the conceptual model of occlusion and the methods suggested in the FMI. The investigation has yielded a classification of the occluded cyclones on the basis of the extent the conceptual model has fitted the description of the observed thermal structure. The seasonal and geographical distribution of the classes has been inspected. Some relevant cases belonging to different classes have been collected and analyzed in detail: in this deeper investigation tools and techniques, which are not routinely used in operational weather forecasting, have been adopted. Both the statistical investigation of the occluded cyclones during year 2003 and the case studies have revealed that the traditional classification of the types of the occlusion on the basis of the thermal structure doesn t take into account the bigger variety of occlusion structures which can be observed. Moreover the conceptual model of occlusion has turned out to be often inadequate in describing well developed cyclones. A deep and constructive revision of the conceptual model of occlusion is therefore suggested in light of the result obtained in this work. The revision should take into account both the progresses which are being made in building a theoretical footing for the occlusion process and the recent tools and meteorological quantities which are nowadays available.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Whether a statistician wants to complement a probability model for observed data with a prior distribution and carry out fully probabilistic inference, or base the inference only on the likelihood function, may be a fundamental question in theory, but in practice it may well be of less importance if the likelihood contains much more information than the prior. Maximum likelihood inference can be justified as a Gaussian approximation at the posterior mode, using flat priors. However, in situations where parametric assumptions in standard statistical models would be too rigid, more flexible model formulation, combined with fully probabilistic inference, can be achieved using hierarchical Bayesian parametrization. This work includes five articles, all of which apply probability modeling under various problems involving incomplete observation. Three of the papers apply maximum likelihood estimation and two of them hierarchical Bayesian modeling. Because maximum likelihood may be presented as a special case of Bayesian inference, but not the other way round, in the introductory part of this work we present a framework for probability-based inference using only Bayesian concepts. We also re-derive some results presented in the original articles using the toolbox equipped herein, to show that they are also justifiable under this more general framework. Here the assumption of exchangeability and de Finetti's representation theorem are applied repeatedly for justifying the use of standard parametric probability models with conditionally independent likelihood contributions. It is argued that this same reasoning can be applied also under sampling from a finite population. The main emphasis here is in probability-based inference under incomplete observation due to study design. This is illustrated using a generic two-phase cohort sampling design as an example. The alternative approaches presented for analysis of such a design are full likelihood, which utilizes all observed information, and conditional likelihood, which is restricted to a completely observed set, conditioning on the rule that generated that set. Conditional likelihood inference is also applied for a joint analysis of prevalence and incidence data, a situation subject to both left censoring and left truncation. Other topics covered are model uncertainty and causal inference using posterior predictive distributions. We formulate a non-parametric monotonic regression model for one or more covariates and a Bayesian estimation procedure, and apply the model in the context of optimal sequential treatment regimes, demonstrating that inference based on posterior predictive distributions is feasible also in this case.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genetics, the science of heredity and variation in living organisms, has a central role in medicine, in breeding crops and livestock, and in studying fundamental topics of biological sciences such as evolution and cell functioning. Currently the field of genetics is under a rapid development because of the recent advances in technologies by which molecular data can be obtained from living organisms. In order that most information from such data can be extracted, the analyses need to be carried out using statistical models that are tailored to take account of the particular genetic processes. In this thesis we formulate and analyze Bayesian models for genetic marker data of contemporary individuals. The major focus is on the modeling of the unobserved recent ancestry of the sampled individuals (say, for tens of generations or so), which is carried out by using explicit probabilistic reconstructions of the pedigree structures accompanied by the gene flows at the marker loci. For such a recent history, the recombination process is the major genetic force that shapes the genomes of the individuals, and it is included in the model by assuming that the recombination fractions between the adjacent markers are known. The posterior distribution of the unobserved history of the individuals is studied conditionally on the observed marker data by using a Markov chain Monte Carlo algorithm (MCMC). The example analyses consider estimation of the population structure, relatedness structure (both at the level of whole genomes as well as at each marker separately), and haplotype configurations. For situations where the pedigree structure is partially known, an algorithm to create an initial state for the MCMC algorithm is given. Furthermore, the thesis includes an extension of the model for the recent genetic history to situations where also a quantitative phenotype has been measured from the contemporary individuals. In that case the goal is to identify positions on the genome that affect the observed phenotypic values. This task is carried out within the Bayesian framework, where the number and the relative effects of the quantitative trait loci are treated as random variables whose posterior distribution is studied conditionally on the observed genetic and phenotypic data. In addition, the thesis contains an extension of a widely-used haplotyping method, the PHASE algorithm, to settings where genetic material from several individuals has been pooled together, and the allele frequencies of each pool are determined in a single genotyping.