Biblioteca Digital

915 resultados para Normal distribution

A versatile gene-based test for genome-wide association studies

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We have derived a versatile gene-based test for genome-wide association studies (GWAS). Our approach, called VEGAS (versatile gene-based association study), is applicable to all GWAS designs, including family-based GWAS, meta-analyses of GWAS on the basis of summary data, and DNA-pooling-based GWAS, where existing approaches based on permutation are not possible, as well as singleton data, where they are. The test incorporates information from a full set of markers (or a defined subset) within a gene and accounts for linkage disequilibrium between markers by using simulations from the multivariate normal distribution. We show that for an association study using singletons, our approach produces results equivalent to those obtained via permutation in a fraction of the computation time. We demonstrate proof-of-principle by using the gene-based test to replicate several genes known to be associated on the basis of results from a family-based GWAS for height in 11,536 individuals and a DNA-pooling-based GWAS for melanoma in approximately 1300 cases and controls. Our method has the potential to identify novel associated genes; provide a basis for selecting SNPs for replication; and be directly used in network (pathway) approaches that require per-gene association test statistics. We have implemented the approach in both an easy-to-use web interface, which only requires the uploading of markers with their association p-values, and a separate downloadable application.

Stochastic model for simulating maize yield

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Maize is one of the most important crops in the world. The products generated from this crop are largely used in the starch industry, the animal and human nutrition sector, and biomass energy production and refineries. For these reasons, there is much interest in figuring the potential grain yield of maize genotypes in relation to the environment in which they will be grown, as the productivity directly affects agribusiness or farm profitability. Questions like these can be investigated with ecophysiological crop models, which can be organized according to different philosophies and structures. The main objective of this work is to conceptualize a stochastic model for predicting maize grain yield and productivity under different conditions of water supply while considering the uncertainties of daily climate data. Therefore, one focus is to explain the model construction in detail, and the other is to present some results in light of the philosophy adopted. A deterministic model was built as the basis for the stochastic model. The former performed well in terms of the curve shape of the above-ground dry matter over time as well as the grain yield under full and moderate water deficit conditions. Through the use of a triangular distribution for the harvest index and a bivariate normal distribution of the averaged daily solar radiation and air temperature, the stochastic model satisfactorily simulated grain productivity, i.e., it was found that 10,604 kg ha(-1) is the most likely grain productivity, very similar to the productivity simulated by the deterministic model and for the real conditions based on a field experiment. © 2012 American Society of Agricultural and Biological Engineers.

Bayesian synthetic likelihood

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Having the ability to work with complex models can be highly beneficial, but the computational cost of doing so is often large. Complex models often have intractable likelihoods, so methods that directly use the likelihood function are infeasible. In these situations, the benefits of working with likelihood-free methods become apparent. Likelihood-free methods, such as parametric Bayesian indirect likelihood that uses the likelihood of an alternative parametric auxiliary model, have been explored throughout the literature as a good alternative when the model of interest is complex. One of these methods is called the synthetic likelihood (SL), which assumes a multivariate normal approximation to the likelihood of a summary statistic of interest. This paper explores the accuracy and computational efficiency of the Bayesian version of the synthetic likelihood (BSL) approach in comparison to a competitor known as approximate Bayesian computation (ABC) and its sensitivity to its tuning parameters and assumptions. We relate BSL to pseudo-marginal methods and propose to use an alternative SL that uses an unbiased estimator of the exact working normal likelihood when the summary statistic has a multivariate normal distribution. Several applications of varying complexity are considered to illustrate the findings of this paper.

Reputation model based on rating data and application in recommender systems

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis introduced two novel reputation models to generate accurate item reputation scores using ratings data and the statistics of the dataset. It also presented an innovative method that incorporates reputation awareness in recommender systems by employing voting system methods to produce more accurate top-N item recommendations. Additionally, this thesis introduced a personalisation method for generating reputation scores based on users' interests, where a single item can have different reputation scores for different users. The personalised reputation scores are then used in the proposed reputation-aware recommender systems to enhance the recommendation quality.

Estimating specific yield and transmissivity with magnetic resonance sounding in an unconfined sandstone aquifer (Niger)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The unconfined aquifer of the Continental Terminal in Niger was investigated by magnetic resonance sounding (MRS) and by 14 pumping tests in order to improve calibration of MRS outputs at field scale. The reliability of the standard relationship used for estimating aquifer transmissivity by MRS was checked; it was found that the parametric factor can be estimated with an uncertainty a parts per thousand currency sign150% by a single point of calibration. The MRS water content (theta (MRS)) was shown to be positively correlated with the specific yield (Sy), and theta (MRS) always displayed higher values than Sy. A conceptual model was subsequently developed, based on estimated changes of the total porosity, Sy, and the specific retention Sr as a function of the median grain size. The resulting relationship between theta (MRS) and Sy showed a reasonably good fit with the experimental dataset, considering the inherent heterogeneity of the aquifer matrix (residual error is similar to 60%). Interpreted in terms of aquifer parameters, MRS data suggest a log-normal distribution of the permeability and a one-sided Gaussian distribution of Sy. These results demonstrate the efficiency of the MRS method for fast and low-cost prospection of hydraulic parameters for large unconfined aquifers.

Statistical tools for analyzing water quality data

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Water quality data are often collected at different sites over time to improve water quality management. Water quality data usually exhibit the following characteristics: non-normal distribution, presence of outliers, missing values, values below detection limits (censored), and serial dependence. It is essential to apply appropriate statistical methodology when analyzing water quality data to draw valid conclusions and hence provide useful advice in water management. In this chapter, we will provide and demonstrate various statistical tools for analyzing such water quality data, and will also introduce how to use a statistical software R to analyze water quality data by various statistical methods. A dataset collected from the Susquehanna River Basin will be used to demonstrate various statistical methods provided in this chapter. The dataset can be downloaded from website http://www.srbc.net/programs/CBP/nutrientprogram.htm.

Diagnostic Tests Based on Quantile Residuals for Nonlinear Time Series Models

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis studies quantile residuals and uses different methodologies to develop test statistics that are applicable in evaluating linear and nonlinear time series models based on continuous distributions. Models based on mixtures of distributions are of special interest because it turns out that for those models traditional residuals, often referred to as Pearson's residuals, are not appropriate. As such models have become more and more popular in practice, especially with financial time series data there is a need for reliable diagnostic tools that can be used to evaluate them. The aim of the thesis is to show how such diagnostic tools can be obtained and used in model evaluation. The quantile residuals considered here are defined in such a way that, when the model is correctly specified and its parameters are consistently estimated, they are approximately independent with standard normal distribution. All the tests derived in the thesis are pure significance type tests and are theoretically sound in that they properly take the uncertainty caused by parameter estimation into account. -- In Chapter 2 a general framework based on the likelihood function and smooth functions of univariate quantile residuals is derived that can be used to obtain misspecification tests for various purposes. Three easy-to-use tests aimed at detecting non-normality, autocorrelation, and conditional heteroscedasticity in quantile residuals are formulated. It also turns out that these tests can be interpreted as Lagrange Multiplier or score tests so that they are asymptotically optimal against local alternatives. Chapter 3 extends the concept of quantile residuals to multivariate models. The framework of Chapter 2 is generalized and tests aimed at detecting non-normality, serial correlation, and conditional heteroscedasticity in multivariate quantile residuals are derived based on it. Score test interpretations are obtained for the serial correlation and conditional heteroscedasticity tests and in a rather restricted special case for the normality test. In Chapter 4 the tests are constructed using the empirical distribution function of quantile residuals. So-called Khmaladze s martingale transformation is applied in order to eliminate the uncertainty caused by parameter estimation. Various test statistics are considered so that critical bounds for histogram type plots as well as Quantile-Quantile and Probability-Probability type plots of quantile residuals are obtained. Chapters 2, 3, and 4 contain simulations and empirical examples which illustrate the finite sample size and power properties of the derived tests and also how the tests and related graphical tools based on residuals are applied in practice.

Essays on the Modeling and Prediction of Volatility and Higher Moments of Stock Returns (summary section only)

Relevância:

60.00% 60.00%

Publicador:

Resumo:

One of the most fundamental and widely accepted ideas in finance is that investors are compensated through higher returns for taking on non-diversifiable risk. Hence the quantification, modeling and prediction of risk have been, and still are one of the most prolific research areas in financial economics. It was recognized early on that there are predictable patterns in the variance of speculative prices. Later research has shown that there may also be systematic variation in the skewness and kurtosis of financial returns. Lacking in the literature so far, is an out-of-sample forecast evaluation of the potential benefits of these new more complicated models with time-varying higher moments. Such an evaluation is the topic of this dissertation. Essay 1 investigates the forecast performance of the GARCH (1,1) model when estimated with 9 different error distributions on Standard and Poor’s 500 Index Future returns. By utilizing the theory of realized variance to construct an appropriate ex post measure of variance from intra-day data it is shown that allowing for a leptokurtic error distribution leads to significant improvements in variance forecasts compared to using the normal distribution. This result holds for daily, weekly as well as monthly forecast horizons. It is also found that allowing for skewness and time variation in the higher moments of the distribution does not further improve forecasts. In Essay 2, by using 20 years of daily Standard and Poor 500 index returns, it is found that density forecasts are much improved by allowing for constant excess kurtosis but not improved by allowing for skewness. By allowing the kurtosis and skewness to be time varying the density forecasts are not further improved but on the contrary made slightly worse. In Essay 3 a new model incorporating conditional variance, skewness and kurtosis based on the Normal Inverse Gaussian (NIG) distribution is proposed. The new model and two previously used NIG models are evaluated by their Value at Risk (VaR) forecasts on a long series of daily Standard and Poor’s 500 returns. The results show that only the new model produces satisfactory VaR forecasts for both 1% and 5% VaR Taken together the results of the thesis show that kurtosis appears not to exhibit predictable time variation, whereas there is found some predictability in the skewness. However, the dynamic properties of the skewness are not completely captured by any of the models.

Modeling Nonlinearities and Asymmetries in Asset Pricing

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Financial time series tend to behave in a manner that is not directly drawn from a normal distribution. Asymmetries and nonlinearities are usually seen and these characteristics need to be taken into account. To make forecasts and predictions of future return and risk is rather complicated. The existing models for predicting risk are of help to a certain degree, but the complexity in financial time series data makes it difficult. The introduction of nonlinearities and asymmetries for the purpose of better models and forecasts regarding both mean and variance is supported by the essays in this dissertation. Linear and nonlinear models are consequently introduced in this dissertation. The advantages of nonlinear models are that they can take into account asymmetries. Asymmetric patterns usually mean that large negative returns appear more often than positive returns of the same magnitude. This goes hand in hand with the fact that negative returns are associated with higher risk than in the case where positive returns of the same magnitude are observed. The reason why these models are of high importance lies in the ability to make the best possible estimations and predictions of future returns and for predicting risk.

Probabilistic Analysis of Cracking Moment of Reinforced Concrete Beams

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Probabilistic analysis of cracking moment from 22 simply supported reinforced concrete beams is performed. When the basic variables follow the distribution considered in this study, the cracking moment of a beam is found to follow a normal distribution. An expression is derived, for characteristic cracking moment, which will be useful in examining reinforced concrete beams for a limit state of cracking.

Climatic adaptation of Norway spruce (Picea abies (L.) Karsten) in Finland based on male flowering phenology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Anthesis was studied at the canopy level in 10 Norway spruce stands from 9 localities in Finland from 1963 to 1974. Distributions of pollen catches were compared to the normal Gaussian distribution. The basis for the timing studies was the 50 per cent point of the anthesis-fitted normal distribution. Development up to this point was given in calendar days, in degree days (>5 °C) and in period units. The count of each parameter began on March 19 (included). Male flowering in Norway spruce stands was found to have more annual variation in quantity than in Scots pine stands studied earlier. Anthesis in spruce in northern Finland occurred at a later date than in the south. The heat sums needed for anthesis varied latitudinally less in spruce than in pine. The variation of pollen catches in spruce increased towards north-west as in the case of Scots pine. In the unprocessed data, calendar days were found to be the most accurate forecast of anthesis in Norway spruce both for a single year and for the majority of cases of stand averages over several years. Locally, the period unit could be a more accurate parameter for the stand average. However, on a calendar day basis, when annual deviations between expected and measured heat sums were converted to days, period units were narrowly superior to days. The geographical correlations respect to timing of flowering, calculated against distances measured along simulated post-glacial migration routes, were stronger than purely latitudinal correlations. Effects of the reinvasion of Norway spruce into Finland are thus still visible in spruce populations just as they were in Scots pine populations. The proportion of the average annual heat sum needed for spruce anthesis grew rapidly north of a latitude of ca. 63° and the heat sum needed for anthesis decreased only slighty towards the timberline. In light of flowering phenology, it seems probable that the northwesterly third of Finnish Norway spruce populations are incompletely adapted to the prevailing cold climate. A moderate warming of the climate would therefore be beneficial for Norway spruce. This accords roughly with the adaptive situation in Scots pine.

Differences in the climatic adaptation of silver birch (Betula pendula) and downy birch (B. pubescens) in Finland based on male flowering phenology.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Male flowering was studied at the canopy level in 10 silver birch (Betula pendula Roth) stands from 8 localities and in 14 downy birch (B. pubescens Ehrh.) stands from 10 localities in Finland from 1963 to 1973. Distributions of cumulative pollen catches were compared to the normal Gaussian distribution. The basis for the timing of flowering was the 50 per cent point of the anthesis-fitted normal distribution. To eliminate effects of background pollen, only the central, normally distributed part of the cumulative distribution was used. Development up to the median point of the distribution was measured and tested in calendar days, in degree days (> 5 °C) and in period units. The count of each parameter began on and included March 19. Male flowering in silver birch occurred from late April to late June depending on latitude, and flowering in downy birch took place from early May to early July. The heat sums needed for male flowering varied in downy birch stands latitudinally but there was practically no latitudinal variation in heat sums needed for silver birch flowering. The amount of male flowering in stands of both birch species were found to have a large annual variation but without any clear periodicity. The between years pollen catch variation in stands of either birch species did not show any significant latitudinal correlation in contrast to Norway spruce stands. The period unit heat sum gave the most accurate forecast of the timing of flowering for 60 per cent of the silver birch stands and for 78.6 per cent of the for downy birch stands. Calendar days, however, gave the best forecast for silver birch in 25 per cent of the cases, while degree days gave the best forecast for downy birch in 21.4 per cent of the cases. Silver birch seems to have a local inclination for a more fixed flowering date compared to downy birch, which could mean a considerable photoperiodic influence on flowering time of silver birch. Silver birch and downy birch had different geographical correlations. Frequent hybridization of birch species occurs more often in northern Finland in than in more southern latitudes. The different timing in flowering caused increasing scatter in flowering times in the north, especially in the case of downy birch. The chance of simultaneous flowering of silver birch and downy birch so increased northwards due to a more variable climate and also higher altitudinal variations. Compared with conifers, the reproduction cycles of both birch species were found to be well protected from damage by frost.

Long term monitering of vegetation in a tropical deciduous forest in Mudumalai,southern India

Relevância:

60.00% 60.00%

Publicador:

Resumo:

As part of an international network of large plots to study tropical vegetation dynamics on a long-term basis, a 50-hectare permanent plot was set up during 1988-89 in the deciduous forests of Mudumalai, southern India. Within this plot 25,929 living woody plants (71 species) above 1 cm DBH (diameter at breast height) were identified, measured, tagged and mapped. Species abundances corresponded to the characteristic log-normal distribution. The four most abundant species (Kydia calycina, Lagerstroemia microcarpa, Terminalia crenulata and Helicteres isora) constituted nearly 56% of total stems, while seven species were represented by only one individual each in the plot. Variance/mean ratios of density showed most species to have clumped distributions. The population declined overall by 14% during the first two years, largely due to elephant and fire-mediated damage to Kydia calycina and Helicteres isora. In this article we discuss the need for large plots to study vegetation dynamics.

The collective dynamics of self-propelled particles

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose a method for the dynamic simulation of a collection of self-propelled particles in a viscous Newtonian fluid. We restrict attention to particles whose size and velocity are small enough that the fluid motion is in the creeping flow regime. We propose a simple model for a self-propelled particle, and extended the Stokesian Dynamics method to conduct dynamic simulations of a collection of such particles. In our description, each particle is treated as a sphere with an orientation vector p, whose locomotion is driven by the action of a force dipole Sp of constant magnitude S0 at a point slightly displaced from its centre. To simplify the calculation, we place the dipole at the centre of the particle, and introduce a virtual propulsion force Fp to effect propulsion. The magnitude F0 of this force is proportional to S0. The directions of Sp and Fp are determined by p. In isolation, a self-propelled particle moves at a constant velocity u0 p, with the speed u0 determined by S0. When it coexists with many such particles, its hydrodynamic interaction with the other particles alters its velocity and, more importantly, its orientation. As a result, the motion of the particle is chaotic. Our simulations are not restricted to low particle concentration, as we implement the full hydrodynamic interactions between the particles, but we restrict the motion of particles to two dimensions to reduce computation. We have studied the statistical properties of a suspension of self-propelled particles for a range of the particle concentration, quantified by the area fraction φa. We find several interesting features in the microstructure and statistics. We find that particles tend to swim in clusters wherein they are in close proximity. Consequently, incorporating the finite size of the particles and the near-field hydrodynamic interactions is of the essence. There is a continuous process of breakage and formation of the clusters. We find that the distributions of particle velocity at low and high φa are qualitatively different; it is close to the normal distribution at high φa, in agreement with experimental measurements. The motion of the particles is diffusive at long time, and the self-diffusivity decreases with increasing φa. The pair correlation function shows a large anisotropic build-up near contact, which decays rapidly with separation. There is also an anisotropic orientation correlation near contact, which decays more slowly with separation. Movies are available with the online version of the paper.

Robust formulations for clustering-based large-scale classification

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets.

«
1
2
3
4
5
6
7
8
...
60
61
»