982 resultados para Probability Distribution
Resumo:
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice. This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences. We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
Resumo:
In many applications the observed data can be viewed as a censored high dimensional full data random variable X. By the curve of dimensionality it is typically not possible to construct estimators that are asymptotically efficient at every probability distribution in a semiparametric censored data model of such a high dimensional censored data structure. We provide a general method for construction of one-step estimators that are efficient at a chosen submodel of the full-data model, are still well behaved off this submodel and can be chosen to always improve on a given initial estimator. These one-step estimators rely on good estimators of the censoring mechanism and thus will require a parametric or semiparametric model for the censoring mechanism. We present a general theorem that provides a template for proving the desired asymptotic results. We illustrate the general one-step estimation methods by constructing locally efficient one-step estimators of marginal distributions and regression parameters with right-censored data, current status data and bivariate right-censored data, in all models allowing the presence of time-dependent covariates. The conditions of the asymptotics theorem are rigorously verified in one of the examples and the key condition of the general theorem is verified for all examples.
Analysis of spring break-up and its effects on a biomass feedstock supply chain in northern Michigan
Resumo:
Demand for bio-fuels is expected to increase, due to rising prices of fossil fuels and concerns over greenhouse gas emissions and energy security. The overall cost of biomass energy generation is primarily related to biomass harvesting activity, transportation, and storage. With a commercial-scale cellulosic ethanol processing facility in Kinross Township of Chippewa County, Michigan about to be built, models including a simulation model and an optimization model have been developed to provide decision support for the facility. Both models track cost, emissions and energy consumption. While the optimization model provides guidance for a long-term strategic plan, the simulation model aims to present detailed output for specified operational scenarios over an annual period. Most importantly, the simulation model considers the uncertainty of spring break-up timing, i.e., seasonal road restrictions. Spring break-up timing is important because it will impact the feasibility of harvesting activity and the time duration of transportation restrictions, which significantly changes the availability of feedstock for the processing facility. This thesis focuses on the statistical model of spring break-up used in the simulation model. Spring break-up timing depends on various factors, including temperature, road conditions and soil type, as well as individual decision making processes at the county level. The spring break-up model, based on the historical spring break-up data from 27 counties over the period of 2002-2010, starts by specifying the probability distribution of a particular county’s spring break-up start day and end day, and then relates the spring break-up timing of the other counties in the harvesting zone to the first county. In order to estimate the dependence relationship between counties, regression analyses, including standard linear regression and reduced major axis regression, are conducted. Using realizations (scenarios) of spring break-up generated by the statistical spring breakup model, the simulation model is able to probabilistically evaluate different harvesting and transportation plans to help the bio-fuel facility select the most effective strategy. For early spring break-up, which usually indicates a longer than average break-up period, more log storage is required, total cost increases, and the probability of plant closure increases. The risk of plant closure may be partially offset through increased use of rail transportation, which is not subject to spring break-up restrictions. However, rail availability and rail yard storage may then become limiting factors in the supply chain. Rail use will impact total cost, energy consumption, system-wide CO2 emissions, and the reliability of providing feedstock to the bio-fuel processing facility.
Resumo:
Let P be a probability distribution on q -dimensional space. The so-called Diaconis-Freedman effect means that for a fixed dimension d<distributions. The present paper provides necessary and sufficient conditions for this phenomenon in a suitable asymptotic framework with increasing dimension q . It turns out, that the conditions formulated by Diaconis and Freedman (1984) are not only sufficient but necessary as well. Moreover, letting P ^ be the empirical distribution of n independent random vectors with distribution P , we investigate the behavior of the empirical process n √ (P ^ −P) under random projections, conditional on P ^ .
Resumo:
Statistical physicists assume a probability distribution over micro-states to explain thermodynamic behavior. The question of this paper is whether these probabilities are part of a best system and can thus be interpreted as Humean chances. I consider two strategies, viz. a globalist as suggested by Loewer, and a localist as advocated by Frigg and Hoefer. Both strategies fail because the system they are part of have rivals that are roughly equally good, while ontic probabilities should be part of a clearly winning system. I conclude with the diagnosis that well-defined micro-probabilities under-estimate the robust character of explanations in statistical physics.
Resumo:
Statistical physicists assume a probability distribution over micro-states to explain thermodynamic behavior. The question of this paper is whether these probabilities are part of a best system and can thus be interpreted as Humean chances. I consider two Boltzmannian accounts of the Second Law, viz.\ a globalist and a localist one. In both cases, the probabilities fail to be chances because they have rivals that are roughly equally good. I conclude with the diagnosis that well-defined micro-probabilities under-estimate the robust character of explanations in statistical physics.
Resumo:
In this paper, we extend the debate concerning Credit Default Swap valuation to include time varying correlation and co-variances. Traditional multi-variate techniques treat the correlations between covariates as constant over time; however, this view is not supported by the data. Secondly, since financial data does not follow a normal distribution because of its heavy tails, modeling the data using a Generalized Linear model (GLM) incorporating copulas emerge as a more robust technique over traditional approaches. This paper also includes an empirical analysis of the regime switching dynamics of credit risk in the presence of liquidity by following the general practice of assuming that credit and market risk follow a Markov process. The study was based on Credit Default Swap data obtained from Bloomberg that spanned the period January 1st 2004 to August 08th 2006. The empirical examination of the regime switching tendencies provided quantitative support to the anecdotal view that liquidity decreases as credit quality deteriorates. The analysis also examined the joint probability distribution of the credit risk determinants across credit quality through the use of a copula function which disaggregates the behavior embedded in the marginal gamma distributions, so as to isolate the level of dependence which is captured in the copula function. The results suggest that the time varying joint correlation matrix performed far superior as compared to the constant correlation matrix; the centerpiece of linear regression models.
Resumo:
Many datasets used by economists and other social scientists are collected by stratified sampling. The sampling scheme used to collect the data induces a probability distribution on the observed sample that differs from the target or underlying distribution for which inference is to be made. If this effect is not taken into account, subsequent statistical inference can be seriously biased. This paper shows how to do efficient semiparametric inference in moment restriction models when data from the target population is collected by three widely used sampling schemes: variable probability sampling, multinomial sampling, and standard stratified sampling.
Resumo:
This paper explores the dynamic linkages that portray different facets of the joint probability distribution of stock market returns in NAFTA (i.e., Canada, Mexico, and the US). Our examination of interactions of the NAFTA stock markets considers three issues. First, we examine the long-run relationship between the three markets, using cointegration techniques. Second, we evaluate the dynamic relationships between the three markets, using impulse-response analysis. Finally, we explore the volatility transmission process between the three markets, using a variety of multivariate GARCH models. Our results also exhibit significant volatility transmission between the second moments of the NAFTA stock markets, albeit not homogenous. The magnitude and trend of the conditional correlations indicate that in the last few years, the Mexican stock market exhibited a tendency toward increased integration with the US market. Finally, we do note that evidence exists that the Peso and Asian financial crises as well as the stock-market crash in the US affect the return and volatility time-series relationships.
Resumo:
In Part One, the foundations of Bayesian inference are reviewed, and the technicalities of the Bayesian method are illustrated. Part Two applies the Bayesian meta-analysis program, the Confidence Profile Method (CPM), to clinical trial data and evaluates the merits of using Bayesian meta-analysis for overviews of clinical trials.^ The Bayesian method of meta-analysis produced similar results to the classical results because of the large sample size, along with the input of a non-preferential prior probability distribution. These results were anticipated through explanations in Part One of the mechanics of the Bayesian approach. ^
Resumo:
We present a new record of eolian dust flux to the western Subarctic North Pacific (SNP) covering the past 27000 years based on a core from the Detroit Seamount. Comparing the SNP dust record to the NGRIP ice core record shows significant differences in the amplitude of dust changes to the two regions during the last deglaciation, while the timing of abrupt changes is synchronous. If dust deposition in the SNP faithfully records its mobilization in East Asian source regions, then the difference in the relative amplitude must reflect climate-related changes in atmospheric dust transport to Greenland. Based on the synchronicity in the timing of dust changes in the SNP and Greenland, we tie abrupt deglacial transitions in the 230Th-normalized 4He flux record to corresponding transitions in the well-dated NGRIP dust flux record to provide a new chronostratigraphic technique for marine sediments from the SNP. Results from this technique are complemented by radiocarbon dating, which allows us to independently constrain radiocarbon paleoreservoir ages. We find paleoreservoir ages of 745 ± 140 yr at 11653 yr BP, 680 ± 228 yr at 14630 yr BP and 790 ± 498 yr at 23290 yr BP. Our reconstructed paleoreservoir ages are consistent with modern surface water reservoir ages in the western SNP. Good temporal synchronicity between eolian dust records from the Subantarctic Atlantic and equatorial Pacific and the ice core record from Antarctica supports the reliability of the proposed dust tuning method to be used more widely in other global ocean regions.
Resumo:
The modern subarctic Pacific is characterized by a steep salinity-driven surface water stratification, which hampers the supply of saline and nutrient-rich deeper waters into the euphotic zone, limiting productivity. However, the strength of the halocline might have varied in the past. Here, we present diatom oxygen (d18Odiat) and silicon (d30Sidiat) stable isotope data from the open subarctic North-East (NE) Pacific (SO202-27-6; Gulf of Alaska), in combination with other proxy data (Neogloboquadrina pachydermasin d18O, biogenic opal, Ca and Fe intensities, IRD), to evaluate changes in surface water hydrography and productivity during Marine Isotope Stage (MIS) 3, characterized by millennial-scale temperature changes (Dansgaard-Oeschger (D-O) cycles) documented in Greenland ice cores.
Resumo:
The glacial-to-Holocene evolution of subarctic Pacific surface water stratification and silicic acid (Si) dynamics is investigated based on new combined diatom oxygen (d18Odiat) and silicon (d30Sidiat) isotope records, along with new biogenic opal, subsurface foraminiferal d18O, alkenone-based sea surface temperature, sea ice, diatom, and core logging data from the NE Pacific. Our results suggest that d18Odiat values are primarily influenced by changes in freshwater discharge from the Cordilleran Ice Sheet (CIS), while corresponding d30Sidiat are primarily influenced by changes in Si supply to surface waters. Our data indicate enhanced glacial to mid Heinrich Stadial 1 (HS1) NE Pacific surface water stratification, generally limiting the Si supply to surface waters. However, we suggest that an increase in Si supply during early HS1, when surface waters were still stratified, is linked to increased North Pacific Intermediate Water formation. The coincidence between fresh surface waters during HS1 and enhanced ice-rafted debris sedimentation in the North Atlantic indicates a close link between CIS and Laurentide Ice Sheet dynamics and a dominant atmospheric control on CIS deglaciation. The Bølling/Allerød (B/A) is characterized by destratification in the subarctic Pacific and an increased supply of saline, Si-rich waters to surface waters. This change toward increased convection occurred prior to the Bølling warming and is likely triggered by a switch to sea ice-free conditions during late HS1. Our results furthermore indicate a decreased efficiency of the biological pump during late HS1 and the B/A (possibly also the Younger Dryas), suggesting that the subarctic Pacific has then been a source region of atmospheric CO2.
Resumo:
A 6200 year old peat sequence, cored in a volcanic crater on the sub-Antarctic Ile de la Possession (Iles Crozet), has been investigated, based on a multi-proxy approach. The methods applied are macrobotanical (mosses, seeds and fruits) and diatom analyses, complemented by geochemical (Rock-Eval6) and rock magnetic measurements. The chronology of the core is based on 5 radiocarbon dates. When combining all the proxy data the following changes could be inferred. From the onset of the peat formation (6200 cal yr BP) until ca. 5550 cal yr BP, biological production was high and climatic conditions must have been relatively warm. At ca. 5550 cal yr BP a shift to low biological production occurred, lasting until ca. 4600 cal yr BP. During this period the organic matter is well preserved, pointing to a cold and/or wet environment. At ca. 4600 cal yr BP, biological production increased again. From ca. 4600 cal yr BP until ca. 4100 cal yr BP a 'hollow and hummock' micro topography developed at the peat surface, resulting in the presence of a mixture of wetter and drier species in the macrobotanical record. After ca. 4100 cal yr BP, the wet species disappear and a generally drier, acidic bog came into existence. A major shift in all the proxy data is observed at ca. 2800 cal yr BP, pointing to wetter and especially windier climatic conditions on the island probably caused by an intensification and/or latitudinal shift of the southern westerly belt. Caused by a stronger wind regime, erosion of the peat surface occurred at that time and a lake was formed in the peat deposits of the crater, which is still present today.
Resumo:
Three ice type regimes at Ice Station Belgica (ISB), during the 2007 International Polar Year SIMBA (Sea Ice Mass Balance in Antarctica) expedition, were characterized and assessed for elevation, snow depth, ice freeboard and thickness. Analyses of the probability distribution functions showed great potential for satellite-based altimetry for estimating ice thickness. In question is the required altimeter sampling density for reasonably accurate estimation of snow surface elevation given inherent spatial averaging. This study assesses an effort to determine the number of laser altimeter 'hits' of the ISB floe, as a representative Antarctic floe of mixed first- and multi-year ice types, for the purpose of statistically recreating the in situ-determined ice-thickness and snow depth distribution based on the fractional coverage of each ice type. Estimates of the fractional coverage and spatial distribution of the ice types, referred to as ice 'towns', for the 5 km**2 floe were assessed by in situ mapping and photo-visual documentation. Simulated ICESat altimeter tracks, with spot size ~70 m and spacing ~170 m, sampled the floe's towns, generating a buoyancy-derived ice thickness distribution. 115 altimeter hits were required to statistically recreate the regional thickness mean and distribution for a three-town assemblage of mixed first- and multi-year ice, and 85 hits for a two-town assemblage of first-year ice only: equivalent to 19.5 and 14.5 km respectively of continuous altimeter track over a floe region of similar structure. Results have significant implications toward model development of sea-ice sampling performance of the ICESat laser altimeter record as well as maximizing sampling characteristics of satellite/airborne laser and radar altimetry missions for sea-ice thickness.