23 resultados para Probabilities.
Resumo:
In this paper, we re-examine the relationship between overweight and labour market success, using indicators of individual body composition along with BMI (Body Mass Index). We use the dataset from Finland in which weight, height, fat mass and waist circumference are not self-reported, but obtained as part of the overall health examination. We find that waist circumference, but not weight or fat mass, has a negative effect on wages for women, whereas all measures of obesity have negative effects on women’s employment probabilities. For men, the only obesity measure that is significant for men’s employment probabilities is fat mass. One interpretation of our findings is that the negative wage effects of overweight on wages run through the discrimination channel, but that the negative effects of overweight on employment have more to do with ill health. All in all, measures of body composition provide a more refined picture about the effects of obesity on wages and employment.
Resumo:
Bootstrap likelihood ratio tests of cointegration rank are commonly used because they tend to have rejection probabilities that are closer to the nominal level than the rejection probabilities of the correspond- ing asymptotic tests. The e¤ect of bootstrapping the test on its power is largely unknown. We show that a new computationally inexpensive procedure can be applied to the estimation of the power function of the bootstrap test of cointegration rank. The bootstrap test is found to have a power function close to that of the level-adjusted asymp- totic test. The bootstrap test estimates the level-adjusted power of the asymptotic test highly accurately. The bootstrap test may have low power to reject the null hypothesis of cointegration rank zero, or underestimate the cointegration rank. An empirical application to Euribor interest rates is provided as an illustration of the findings.
Resumo:
The Thesis presents a state-space model for a basketball league and a Kalman filter algorithm for the estimation of the state of the league. In the state-space model, each of the basketball teams is associated with a rating that represents its strength compared to the other teams. The ratings are assumed to evolve in time following a stochastic process with independent Gaussian increments. The estimation of the team ratings is based on the observed game scores that are assumed to depend linearly on the true strengths of the teams and independent Gaussian noise. The team ratings are estimated using a recursive Kalman filter algorithm that produces least squares optimal estimates for the team strengths and predictions for the scores of the future games. Additionally, if the Gaussianity assumption holds, the predictions given by the Kalman filter maximize the likelihood of the observed scores. The team ratings allow probabilistic inference about the ranking of the teams and their relative strengths as well as about the teams’ winning probabilities in future games. The predictions about the winners of the games are correct 65-70% of the time. The team ratings explain 16% of the random variation observed in the game scores. Furthermore, the winning probabilities given by the model are concurrent with the observed scores. The state-space model includes four independent parameters that involve the variances of noise terms and the home court advantage observed in the scores. The Thesis presents the estimation of these parameters using the maximum likelihood method as well as using other techniques. The Thesis also gives various example analyses related to the American professional basketball league, i.e., National Basketball Association (NBA), and regular seasons played in year 2005 through 2010. Additionally, the season 2009-2010 is discussed in full detail, including the playoffs.
Resumo:
Vegetation maps and bioclimatic zone classifications communicate the vegetation of an area and are used to explain how the environment regulates the occurrence of plants on large scales. Many practises and methods for dividing the world’s vegetation into smaller entities have been presented. Climatic parameters, floristic characteristics, or edaphic features have been relied upon as decisive factors, and plant species have been used as indicators for vegetation types or zones. Systems depicting vegetation patterns that mainly reflect climatic variation are termed ‘bioclimatic’ vegetation maps. Based on these it has been judged logical to deduce that plants moved between corresponding bioclimatic areas should thrive in the target location, whereas plants moved from a different zone should languish. This principle is routinely applied in forestry and horticulture but actual tests of the validity of bioclimatic maps in this sense seem scanty. In this study I tested the Finnish bioclimatic vegetation zone system (BZS). Relying on the plant collection of Helsinki University Botanic Garden’s Kumpula collection, which according to the BZS is situated at the northern limit of the hemiboreal zone, I aimed to test how the plants’ survival depends on their provenance. My expectation was that plants from the hemiboreal or southern boreal zones should do best in Kumpula, whereas plants from more southern and more northern zones should show progressively lower survival probabilities. I estimated probability of survival using collection database information of plant accessions of known wild origin grown in Kumpula since the mid 1990s, and logistic regression models. The total number of accessions I included in the analyses was 494. Because of problems with some accessions I chose to separately analyse a subset of the complete data, which included 379 accessions. I also analysed different growth forms separately in order to identify differences in probability of survival due to different life strategies. In most analyses accessions of temperate and hemiarctic origin showed lower survival probability than those originating from any of the boreal subzones, which among them exhibited rather evenly high probabilities. Exceptionally mild and wet winters during the study period may have killed off hemiarctic plants. Some winters may have been too harsh for temperate accessions. Trees behaved differently: they showed an almost steadily increasing survival probability from temperate to northern boreal origins. Various factors that could not be controlled for may have affected the results, some of which were difficult to interpret. This was the case in particular with herbs, for which the reliability of the analysis suffered because of difficulties in managing their curatorial data. In all, the results gave some support to the BZS, and especially its hierarchical zonation. However, I question the validity of the formulation of the hypothesis I tested since it may not be entirely justified by the BZS, which was designed for intercontinental comparison of vegetation zones, but not specifically for transcontinental provenance trials. I conclude that botanic gardens should pay due attention to information management and curational practices to ensure the widest possible applicability of their plant collections.
Resumo:
Questions of the small size of non-industrial private forest (NIPF) holdings in Finland are considered and factors affecting their partitioning are analyzed. This work arises out of Finnish forest policy statements in which the small average size of holdings has been seen to have a negative influence on the economics of forestry. A survey of the literature indicates that the size of holdings is an important factor determining the costs of logging and silvicultural operations, while its influence on the timber supply is slight. The empirical data are based on a sample of 314 holdings collected by interviewing forest owners in the years 1980-86. In 1990-91 the same holdings were resurveyed by means of a postal inquiry and partly by interviewing forest owners. The principal objective in compiling the data is to assist in quantifying ownership factors that influence partitioning among different kinds of NIPF holdings. Thus the mechanism of partitioning were described and a maximum likelihood logistic regression model was constructed using seven independent holding and ownership variables. One out of four holdings had undergone partitioning in conjunction with a change in ownership, one fifth among family owned holdings and nearly a half among jointly owned holdings. The results of the logistic regression model indicate, for instance, that the odds on partitioning is about three times greater for jointly owned holdings than for family owned ones. Also, the probabilities of partitioning were estimated and the impact of independent dichotomous variables on the probability of partitioning ranged between 0.02 and 0.10. The low value of the Hosmer-Lemeshow test statistic indicates a good fit of the model and the rate of correct classification was estimated to be 88 per cent with a cutoff point of 0.5. The average size of holdings undergoing ownership changes decreased from 29.9 ha to 28.7 ha over the approximate interval 1983-90. In addition, the transition probability matrix showed that the trends towards smaller size categories mostly involved in the small size categories, less than 20 ha. The results of the study can be used in considering the effects of the small size of holdings for forestry and if the purpose is to influence partitioning through forest or rural policy.
Resumo:
Modern sample surveys started to spread after statistician at the U.S. Bureau of the Census in the 1940s had developed a sampling design for the Current Population Survey (CPS). A significant factor was also that digital computers became available for statisticians. In the beginning of 1950s, the theory was documented in textbooks on survey sampling. This thesis is about the development of the statistical inference for sample surveys. For the first time the idea of statistical inference was enunciated by a French scientist, P. S. Laplace. In 1781, he published a plan for a partial investigation in which he determined the sample size needed to reach the desired accuracy in estimation. The plan was based on Laplace s Principle of Inverse Probability and on his derivation of the Central Limit Theorem. They were published in a memoir in 1774 which is one of the origins of statistical inference. Laplace s inference model was based on Bernoulli trials and binominal probabilities. He assumed that populations were changing constantly. It was depicted by assuming a priori distributions for parameters. Laplace s inference model dominated statistical thinking for a century. Sample selection in Laplace s investigations was purposive. In 1894 in the International Statistical Institute meeting, Norwegian Anders Kiaer presented the idea of the Representative Method to draw samples. Its idea was that the sample would be a miniature of the population. It is still prevailing. The virtues of random sampling were known but practical problems of sample selection and data collection hindered its use. Arhtur Bowley realized the potentials of Kiaer s method and in the beginning of the 20th century carried out several surveys in the UK. He also developed the theory of statistical inference for finite populations. It was based on Laplace s inference model. R. A. Fisher contributions in the 1920 s constitute a watershed in the statistical science He revolutionized the theory of statistics. In addition, he introduced a new statistical inference model which is still the prevailing paradigm. The essential idea is to draw repeatedly samples from the same population and the assumption that population parameters are constants. Fisher s theory did not include a priori probabilities. Jerzy Neyman adopted Fisher s inference model and applied it to finite populations with the difference that Neyman s inference model does not include any assumptions of the distributions of the study variables. Applying Fisher s fiducial argument he developed the theory for confidence intervals. Neyman s last contribution to survey sampling presented a theory for double sampling. This gave the central idea for statisticians at the U.S. Census Bureau to develop the complex survey design for the CPS. Important criterion was to have a method in which the costs of data collection were acceptable, and which provided approximately equal interviewer workloads, besides sufficient accuracy in estimation.
Resumo:
Tasaikäisen metsän alle muodostuvilla alikasvoksilla on merkitystä puunkorjuun, metsänuudistamisen, näkemä-ja maisema-analyysien sekä biodiversiteetin ja hiilitaseen arvioinnin kannalta. Ilma-aluksista tehtävä laserkeilaus on osoittautunut tehokkaaksi kaukokartoitusmenetelmäksi varttuneiden puustojen mittauksessa. Laserkeilauksen käyttöönotto operatiivisessa metsäsuunnittelussa mahdollistaa aiempaa tarkemman tiedon tuottamisen alikasvoksista, mikäli alikasvoksen ominaisuuksia voidaan tulkita laseraineistoista. Tässä työssä käytettiin tarkasti mitattuja maastokoealoja ja kaikulaserkeilausaineistoja (discrete return LiDAR) usealta vuodelta (1–2 km lentokorkeus, 0,9–9,7 pulssia m-2). Laserkeilausaineistot oli hankittu Optech ALTM3100 ja Leica ALS50-II sensoreilla. Koealat edustavat suomalaisia tasaikäisiä männiköitä eri kehitysvaiheissa. Tutkimuskysymykset olivat: 1) Minkälainen on alikasvoksesta saatu lasersignaali yksittäisen pulssin tasolla ja mitkä tekijät signaaliin vaikuttavat? 2) Mikä on käytännön sovelluksissa hyödynnettävien aluepohjaisten laserpiirteiden selitysvoima alikasvospuuston ominaisuuksien ennustamisessa? Erityisesti haluttiin selvittää, miten laserpulssin energiahäviöt ylempiin latvuskerroksiin vaikuttavat saatuun signaaliin, ja voidaanko laserkaikujen intensiteetille tehdä energiahäviöiden korjaus. Puulajien väliset erot laserkaiun intensiteetissä olivat pieniä ja vaihtelivat keilauksesta toiseen. Intensiteetin käyttömahdollisuudet alikasvoksen puulajin tulkinnassa ovat siten hyvin rajoittuneet. Energiahäviöt ylempiin latvuskerroksiin aiheuttivat alikasvoksesta saatuun lasersignaaliin kohinaa. Energiahäviöiden korjaus tehtiin alikasvoksesta saaduille laserpulssin 2. ja 3. kaiuille. Korjauksen avulla pystyttiin pienentämään kohteen sisäistä intensiteetin hajontaa ja parantamaan kohteiden luokittelutarkkuutta alikasvoskerroksessa. Käytettäessä 2. kaikuja oikeinluokitusprosentti luokituksessa maan ja yleisimmän puulajin välillä oli ennen korjausta 49,2–54,9 % ja korjauksen jälkeen 57,3–62,0 %. Vastaavat kappa-arvot olivat 0,03–0,13 ja 0,10–0,22. Tärkein energiahäviöitä selittävä tekijä oli pulssista saatujen aikaisempien kaikujen intensiteetti, mutta hieman merkitystä oli myös pulssin leikkausgeometrialla ylemmän latvuskerroksen puiden kanssa. Myös 3. kaiuilla luokitustarkkuus parani. Puulajien välillä havaittiin eroja siinä, kuinka herkästi ne tuottavat kaiun laserpulssin osuessa puuhun. Kuusi tuotti kaiun suuremmalla todennäköisyydellä kuin lehtipuut. Erityisen selvä tämä ero oli pulsseilla, joissa oli energiahäviöitä. Laserkaikujen korkeusjakaumapiirteet voivat siten olla riippuvaisia puulajista. Sensorien välillä havaittiin selviä eroja intensiteettijakaumissa, mikä vaikeuttaa eri sensoreilla hankittujen aineistojen yhdistämistä. Myös kaiun todennäköisyydet erosivat jonkin verran sensorien välillä, mikä aiheutti pieniä eroavaisuuksia kaikujen korkeusjakaumiin. Aluepohjaisista laserpiirteistä löydettiin alikasvoksen runkolukua ja keskipituutta hyvin selittäviä piirteitä, kun rajoitettiin tarkastelu yli 1 m pituisiin puihin. Piirteiden selitysvoima oli parempi runkoluvulle kuin keskipituudelle. Selitysvoima ei merkittävästi alentunut pulssitiheyden pienentyessä, mikä on hyvä asia käytännön sovelluksia ajatellen. Lehtipuun osuutta ei pystytty selittämään. Tulosten perusteella kaikulaserkeilausta voi olla mahdollista hyödyntää esimerkiksi ennakkoraivaustarpeen arvioinnissa. Sen sijaan alikasvoksen tarkempi luokittelu (esim. puulajitulkinta) voi olla vaikeaa. Kaikkein pienimpiä alikasvospuita ei pystytä havaitsemaan. Lisää tutkimuksia tarvitaan tulosten yleistämiseksi erilaisiin metsiköihin.
Resumo:
Bayesian networks are compact, flexible, and interpretable representations of a joint distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure. This is called structure discovery. This thesis contributes to two areas of structure discovery in Bayesian networks: space--time tradeoffs and learning ancestor relations. The fastest exact algorithms for structure discovery in Bayesian networks are based on dynamic programming and use excessive amounts of space. Motivated by the space usage, several schemes for trading space against time are presented. These schemes are presented in a general setting for a class of computational problems called permutation problems; structure discovery in Bayesian networks is seen as a challenging variant of the permutation problems. The main contribution in the area of the space--time tradeoffs is the partial order approach, in which the standard dynamic programming algorithm is extended to run over partial orders. In particular, a certain family of partial orders called parallel bucket orders is considered. A partial order scheme that provably yields an optimal space--time tradeoff within parallel bucket orders is presented. Also practical issues concerning parallel bucket orders are discussed. Learning ancestor relations, that is, directed paths between nodes, is motivated by the need for robust summaries of the network structures when there are unobserved nodes at work. Ancestor relations are nonmodular features and hence learning them is more difficult than modular features. A dynamic programming algorithm is presented for computing posterior probabilities of ancestor relations exactly. Empirical tests suggest that ancestor relations can be learned from observational data almost as accurately as arcs even in the presence of unobserved nodes.