854 resultados para mean-variance frontiers
Resumo:
Analyzing the relationship between the baseline value and subsequent change of a continuous variable is a frequent matter of inquiry in cohort studies. These analyses are surprisingly complex, particularly if only two waves of data are available. It is unclear for non-biostatisticians where the complexity of this analysis lies and which statistical method is adequate.With the help of simulated longitudinal data of body mass index in children,we review statistical methods for the analysis of the association between the baseline value and subsequent change, assuming linear growth with time. Key issues in such analyses are mathematical coupling, measurement error, variability of change between individuals, and regression to the mean. Ideally, it is better to rely on multiple repeated measurements at different times and a linear random effects model is a standard approach if more than two waves of data are available. If only two waves of data are available, our simulations show that Blomqvist's method - which consists in adjusting for measurement error variance the estimated regression coefficient of observed change on baseline value - provides accurate estimates. The adequacy of the methods to assess the relationship between the baseline value and subsequent change depends on the number of data waves, the availability of information on measurement error, and the variability of change between individuals.
Resumo:
This contribution compares existing and newly developed techniques for geometrically representing mean-variances-kewness portfolio frontiers based on the rather widely adapted methodology of polynomial goal programming (PGP) on the one hand and the more recent approach based on the shortage function on the other hand. Moreover, we explain the working of these different methodologies in detail and provide graphical illustrations. Inspired by these illustrations, we prove a generalization of the well-known two fund separation theorem from traditionalmean-variance portfolio theory.
Resumo:
This paper investigates a simple procedure to estimate robustly the mean of an asymmetric distribution. The procedure removes the observations which are larger or smaller than certain limits and takes the arithmetic mean of the remaining observations, the limits being determined with the help of a parametric model, e.g., the Gamma, the Weibull or the Lognormal distribution. The breakdown point, the influence function, the (asymptotic) variance, and the contamination bias of this estimator are explored and compared numerically with those of competing estimates.
Resumo:
In a series of seminal articles in 1974, 1975, and 1977, J. H. Gillespie challenged the notion that the "fittest" individuals are those that produce on average the highest number of offspring. He showed that in small populations, the variance in fecundity can determine fitness as much as mean fecundity. One likely reason why Gillespie's concept of within-generation bet hedging has been largely ignored is the general consensus that natural populations are of large size. As a consequence, essentially no work has investigated the role of the fecundity variance on the evolutionary stable state of life-history strategies. While typically large, natural populations also tend to be subdivided in local demes connected by migration. Here, we integrate Gillespie's measure of selection for within-generation bet hedging into the inclusive fitness and game theoretic measure of selection for structured populations. The resulting framework demonstrates that selection against high variance in offspring number is a potent force in large, but structured populations. More generally, the results highlight that variance in offspring number will directly affect various life-history strategies, especially those involving kin interaction. The selective pressures on three key traits are directly investigated here, namely within-generation bet hedging, helping behaviors, and the evolutionary stable dispersal rate. The evolutionary dynamics of all three traits are markedly affected by variance in offspring number, although to a different extent and under different demographic conditions.
Resumo:
This paper focused on four alternatives of analysis of experiments in square lattice as far as the estimation of variance components and some genetic parameters are concerned: 1) intra-block analysis with adjusted treatment and blocks within unadjusted repetitions; 2) lattice analysis as complete randomized blocks; 3) intrablock analysis with unadjusted treatment and blocks within adjusted repetitions; 4) lattice analysis as complete randomized blocks, by utilizing the adjusted means of treatments, obtained from the analysis with recovery of interblock information, having as mean square of the error the mean effective variance of this same analysis with recovery of inter-block information. For the four alternatives of analysis, the estimators and estimates were obtained for the variance components and heritability coefficients. The classification of material was also studied. The present study suggests that for each experiment and depending of the objectives of the analysis, one should observe which alternative of analysis is preferable, mainly in cases where a negative estimate is obtained for the variance component due to effects of blocks within adjusted repetitions.
Resumo:
Plant growth analysis presents difficulties related to statistical comparison of growth rates, and the analysis of variance of primary data could guide the interpretation of results. The objective of this work was to evaluate the analysis of variance of data from distinct harvests of an experiment, focusing especially on the homogeneity of variances and the choice of an adequate ANOVA model. Data from five experiments covering different crops and growth conditions were used. From the total number of variables, 19% were originally homoscedastic, 60% became homoscedastic after logarithmic transformation, and 21% remained heteroscedastic after transformation. Data transformation did not affect the F test in one experiment, whereas in the other experiments transformation modified the F test usually reducing the number of significant effects. Even when transformation has not altered the F test, mean comparisons led to divergent interpretations. The mixed ANOVA model, considering harvest as a random effect, reduced the number of significant effects of every factor which had the F test modified by this model. Examples illustrated that analysis of variance of primary variables provides a tool for identifying significant differences in growth rates. The analysis of variance imposes restrictions to experimental design thereby eliminating some advantages of the functional growth analysis.
Resumo:
Presently, conditions ensuring the validity of bootstrap methods for the sample mean of (possibly heterogeneous) near epoch dependent (NED) functions of mixing processes are unknown. Here we establish the validity of the bootstrap in this context, extending the applicability of bootstrap methods to a class of processes broadly relevant for applications in economics and finance. Our results apply to two block bootstrap methods: the moving blocks bootstrap of Künsch ( 989) and Liu and Singh ( 992), and the stationary bootstrap of Politis and Romano ( 994). In particular, the consistency of the bootstrap variance estimator for the sample mean is shown to be robust against heteroskedasticity and dependence of unknown form. The first order asymptotic validity of the bootstrap approximation to the actual distribution of the sample mean is also established in this heterogeneous NED context.
Resumo:
Les travaux portent sur l’estimation de la variance dans le cas d’une non- réponse partielle traitée par une procédure d’imputation. Traiter les valeurs imputées comme si elles avaient été observées peut mener à une sous-estimation substantielle de la variance des estimateurs ponctuels. Les estimateurs de variance usuels reposent sur la disponibilité des probabilités d’inclusion d’ordre deux, qui sont parfois difficiles (voire impossibles) à calculer. Nous proposons d’examiner les propriétés d’estimateurs de variance obtenus au moyen d’approximations des probabilités d’inclusion d’ordre deux. Ces approximations s’expriment comme une fonction des probabilités d’inclusion d’ordre un et sont généralement valides pour des plans à grande entropie. Les résultats d’une étude de simulation, évaluant les propriétés des estimateurs de variance proposés en termes de biais et d’erreur quadratique moyenne, seront présentés.
Resumo:
Computational Biology is the research are that contributes to the analysis of biological data through the development of algorithms which will address significant research problems.The data from molecular biology includes DNA,RNA ,Protein and Gene expression data.Gene Expression Data provides the expression level of genes under different conditions.Gene expression is the process of transcribing the DNA sequence of a gene into mRNA sequences which in turn are later translated into proteins.The number of copies of mRNA produced is called the expression level of a gene.Gene expression data is organized in the form of a matrix. Rows in the matrix represent genes and columns in the matrix represent experimental conditions.Experimental conditions can be different tissue types or time points.Entries in the gene expression matrix are real values.Through the analysis of gene expression data it is possible to determine the behavioral patterns of genes such as similarity of their behavior,nature of their interaction,their respective contribution to the same pathways and so on. Similar expression patterns are exhibited by the genes participating in the same biological process.These patterns have immense relevance and application in bioinformatics and clinical research.Theses patterns are used in the medical domain for aid in more accurate diagnosis,prognosis,treatment planning.drug discovery and protein network analysis.To identify various patterns from gene expression data,data mining techniques are essential.Clustering is an important data mining technique for the analysis of gene expression data.To overcome the problems associated with clustering,biclustering is introduced.Biclustering refers to simultaneous clustering of both rows and columns of a data matrix. Clustering is a global whereas biclustering is a local model.Discovering local expression patterns is essential for identfying many genetic pathways that are not apparent otherwise.It is therefore necessary to move beyond the clustering paradigm towards developing approaches which are capable of discovering local patterns in gene expression data.A biclusters is a submatrix of the gene expression data matrix.The rows and columns in the submatrix need not be contiguous as in the gene expression data matrix.Biclusters are not disjoint.Computation of biclusters is costly because one will have to consider all the combinations of columans and rows in order to find out all the biclusters.The search space for the biclustering problem is 2 m+n where m and n are the number of genes and conditions respectively.Usually m+n is more than 3000.The biclustering problem is NP-hard.Biclustering is a powerful analytical tool for the biologist.The research reported in this thesis addresses the problem of biclustering.Ten algorithms are developed for the identification of coherent biclusters from gene expression data.All these algorithms are making use of a measure called mean squared residue to search for biclusters.The objective here is to identify the biclusters of maximum size with the mean squared residue lower than a given threshold. All these algorithms begin the search from tightly coregulated submatrices called the seeds.These seeds are generated by K-Means clustering algorithm.The algorithms developed can be classified as constraint based,greedy and metaheuristic.Constarint based algorithms uses one or more of the various constaints namely the MSR threshold and the MSR difference threshold.The greedy approach makes a locally optimal choice at each stage with the objective of finding the global optimum.In metaheuristic approaches particle Swarm Optimization(PSO) and variants of Greedy Randomized Adaptive Search Procedure(GRASP) are used for the identification of biclusters.These algorithms are implemented on the Yeast and Lymphoma datasets.Biologically relevant and statistically significant biclusters are identified by all these algorithms which are validated by Gene Ontology database.All these algorithms are compared with some other biclustering algorithms.Algorithms developed in this work overcome some of the problems associated with the already existing algorithms.With the help of some of the algorithms which are developed in this work biclusters with very high row variance,which is higher than the row variance of any other algorithm using mean squared residue, are identified from both Yeast and Lymphoma data sets.Such biclusters which make significant change in the expression level are highly relevant biologically.
Resumo:
The behavior of the Asian summer monsoon is documented and compared using the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA) and the National Centers for Environmental Prediction-National Center for Atmospheric Research (NCEP-NCAR) Reanalysis. In terms of seasonal mean climatologies the results suggest that, in several respects, the ERA is superior to the NCEP-NCAR Reanalysis. The overall better simulation of the precipitation and hence the diabatic heating field over the monsoon domain in ERA means that the analyzed circulation is probably nearer reality. In terms of interannual variability, inconsistencies in the definition of weak and strong monsoon years based on typical monsoon indices such as All-India Rainfall (AIR) anomalies and the large-scale wind shear based dynamical monsoon index (DMI) still exist. Two dominant modes of interannual variability have been identified that together explain nearly 50% of the variance. Individually, they have many features in common with the composite flow patterns associated with weak and strong monsoons, when defined in terms of regional AIR anomalies and the large-scale DMI. The reanalyses also show a common dominant mode of intraseasonal variability that describes the latitudinal displacement of the tropical convergence zone from its oceanic-to-continental regime and essentially captures the low-frequency active/break cycles of the monsoon. The relationship between interannual and intraseasonal variability has been investigated by considering the probability density function (PDF) of the principal component of the dominant intraseasonal mode. Based on the DMI, there is an indication that in years with a weaker monsoon circulation, the PDF is skewed toward negative values (i,e., break conditions). Similarly, the PDFs for El Nino and La Nina years suggest that El Nino predisposes the system to more break spells, although the sample size may limit the statistical significance of the results.
Resumo:
The sensitivity of the UK Universities Global Atmospheric Modelling Programme (UGAMP) General Circulation Model (UGCM) to two very different approaches to convective parametrization is described. Comparison is made between a Kuo scheme, which is constrained by large-scale moisture convergence, and a convective-adjustment scheme, which relaxes to observed thermodynamic states. Results from 360-day integrations with perpetual January conditions are used to describe the model's tropical time-mean climate and its variability. Both convection schemes give reasonable simulations of the time-mean climate, but the representation of the main modes of tropical variability is markedly different. The Kuo scheme has much weaker variance, confined to synoptic frequencies near 4 days, and a poor simulation of intraseasonal variability. In contrast, the convective-adjustment scheme has much more transient activity at all time-scales. The various aspects of the two schemes which might explain this difference are discussed. The particular closure on moisture convergence used in this version of the Kuo scheme is identified as being inappropriate.
Resumo:
Observations suggest a possible link between the Atlantic Multidecadal Oscillation (AMO) and El Nino Southern Oscillation (ENSO) variability, with the warm AMO phase being related to weaker ENSO variability. A coupled ocean-atmosphere model is used to investigate this relationship and to elucidate mechanisms responsible for it. Anomalous sea surface temperatures (SSTs) associated with the positive AMO lead to change in the basic state in the tropical Pacific Ocean. This basic state change is associated with a deepened thermocline and reduced vertical stratification of the equatorial Pacific ocean, which in turn leads to weakened ENSO variability. We suggest a role for an atmospheric bridge that rapidly conveys the influence of the Atlantic Ocean to the tropical Pacific. The results suggest a non-local mechanism for changes in ENSO statistics and imply that anomalous Atlantic ocean SSTs can modulate both mean climate and climate variability over the Pacific.
Resumo:
Markowitz showed that assets can be combined to produce an 'Efficient' portfolio that will give the highest level of portfolio return for any level of portfolio risk, as measured by the variance or standard deviation. These portfolios can then be connected to generate what is termed an 'Efficient Frontier' (EF). In this paper we discuss the calculation of the Efficient Frontier for combinations of assets, again using the spreadsheet Optimiser. To illustrate the derivation of the Efficient Frontier, we use the data from the Investment Property Databank Long Term Index of Investment Returns for the period 1971 to 1993. Many investors might require a certain specific level of holding or a restriction on holdings in at least some of the assets. Such additional constraints may be readily incorporated into the model to generate a constrained EF with upper and/or lower bounds. This can then be compared with the unconstrained EF to see whether the reduction in return is acceptable. To see the effect that these additional constraints may have, we adopt a fairly typical pension fund profile, with no more than 20% of the total held in Property. The paper shows that it is now relatively easy to use the Optimiser available in at least one spreadsheet (EXCEL) to calculate efficient portfolios for various levels of risk and return, both constrained and unconstrained, so as to be able to generate any number of Efficient Frontiers.
Resumo:
This paper uses appropriately modified information criteria to select models from the GARCH family, which are subsequently used for predicting US dollar exchange rate return volatility. The out of sample forecast accuracy of models chosen in this manner compares favourably on mean absolute error grounds, although less favourably on mean squared error grounds, with those generated by the commonly used GARCH(1, 1) model. An examination of the orders of models selected by the criteria reveals that (1, 1) models are typically selected less than 20% of the time.
Resumo:
We discuss the estimation of the expected value of the quality-adjusted survival, based on multistate models. We generalize an earlier work, considering the sojourn times in health states are not identically distributed, for a given vector of covariates. Approaches based on semiparametric and parametric (exponential and Weibull distributions) methodologies are considered. A simulation study is conducted to evaluate the performance of the proposed estimator and the jackknife resampling method is used to estimate the variance of such estimator. An application to a real data set is also included.