950 resultados para Empirical evaluation


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers. Results: In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10(-5) for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers. Conclusions: Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two main approaches are commonly used to empirically evaluate linear factor pricingmodels: regression and SDF methods, with centred and uncentred versions of the latter.We show that unlike standard two-step or iterated GMM procedures, single-step estimatorssuch as continuously updated GMM yield numerically identical values for prices of risk,pricing errors, Jensen s alphas and overidentifying restrictions tests irrespective of the modelvalidity. Therefore, there is arguably a single approach regardless of the factors being tradedor not, or the use of excess or gross returns. We illustrate our results by revisiting Lustigand Verdelhan s (2007) empirical analysis of currency returns.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper investigates the comparative performance of five small areaestimators. We use Monte Carlo simulation in the context of boththeoretical and empirical populations. In addition to the direct andindirect estimators, we consider the optimal composite estimator withpopulation weights, and two composite estimators with estimatedweights: one that assumes homogeneity of within area variance andsquare bias, and another one that uses area specific estimates ofvariance and square bias. It is found that among the feasibleestimators, the best choice is the one that uses area specificestimates of variance and square bias.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Discussion on improving the power of genome-wide association studies to identify candidate variants and genes is generally centered on issues of maximizing sample size; less attention is given to the role of phenotype definition and ascertainment. The authors used genome-wide data from patients infected with human immunodeficiency virus type 1 (HIV-1) to assess whether differences in type of population (622 seroconverters vs. 636 seroprevalent subjects) or the number of measurements available for defining the phenotype resulted in differences in the effect sizes of associations between single nucleotide polymorphisms and the phenotype, HIV-1 viral load at set point. The effect estimate for the top 100 single nucleotide polymorphisms was 0.092 (95% confidence interval: 0.074, 0.110) log(10) viral load (log(10) copies of HIV-1 per mL of blood) greater in seroconverters than in seroprevalent subjects. The difference was even larger when the authors focused on chromosome 6 variants (0.153 log(10) viral load) or on variants that achieved genome-wide significance (0.232 log(10) viral load). The estimates of the genetic effects tended to be slightly larger when more viral load measurements were available, particularly among seroconverters and for variants that achieved genome-wide significance. Differences in phenotype definition and ascertainment may affect the estimated magnitude of genetic effects and should be considered in optimizing power for discovering new associations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Programming and mathematics are core areas of computer science (CS) and consequently also important parts of CS education. Introductory instruction in these two topics is, however, not without problems. Studies show that CS students find programming difficult to learn and that teaching mathematical topics to CS novices is challenging. One reason for the latter is the disconnection between mathematics and programming found in many CS curricula, which results in students not seeing the relevance of the subject for their studies. In addition, reports indicate that students' mathematical capability and maturity levels are dropping. The challenges faced when teaching mathematics and programming at CS departments can also be traced back to gaps in students' prior education. In Finland the high school curriculum does not include CS as a subject; instead, focus is on learning to use the computer and its applications as tools. Similarly, many of the mathematics courses emphasize application of formulas, while logic, formalisms and proofs, which are important in CS, are avoided. Consequently, high school graduates are not well prepared for studies in CS. Motivated by these challenges, the goal of the present work is to describe new approaches to teaching mathematics and programming aimed at addressing these issues: Structured derivations is a logic-based approach to teaching mathematics, where formalisms and justifications are made explicit. The aim is to help students become better at communicating their reasoning using mathematical language and logical notation at the same time as they become more confident with formalisms. The Python programming language was originally designed with education in mind, and has a simple syntax compared to many other popular languages. The aim of using it in instruction is to address algorithms and their implementation in a way that allows focus to be put on learning algorithmic thinking and programming instead of on learning a complex syntax. Invariant based programming is a diagrammatic approach to developing programs that are correct by construction. The approach is based on elementary propositional and predicate logic, and makes explicit the underlying mathematical foundations of programming. The aim is also to show how mathematics in general, and logic in particular, can be used to create better programs.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Two experiments compared people's interpretation of verbal and numerical descriptions of the risk of medication side effects occurring. The verbal descriptors were selected from those recommended for use by the European Union (very common, common, uncommon, rare, very rare). Both experiments used a controlled empirical methodology, in which nearly 500 members of the general population were presented with a fictitious (but realistic) scenario about visiting the doctor and being prescribed medication, together with information about the medicine's side effects and their probability of occurrence. Experiment 1 found that, in all three age groups tested (18 - 40, 41 - 60 and over 60), participants given a verbal descriptor (very common) estimated side effect risk to be considerably higher than those given a comparable numerical description. Furthermore, the differences in interpretation were reflected in their judgements of side effect severity, risk to health, and intention to comply. Experiment 2 confirmed these findings using two different verbal descriptors (common and rare) and in scenarios which described either relatively severe or relatively mild side effects. Strikingly, only 7 out of 180 participants in this study gave a probability estimate which fell within the EU assigned numerical range. Thus, large scale use of the descriptors could have serious negative consequences for individual and public health. We therefore recommend that the EU and National authorities suspend their recommendations regarding these descriptors until a more substantial evidence base is available to support their appropriate use.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We compare three frequently used volatility modelling techniques: GARCH, Markovian switching and cumulative daily volatility models. Our primary goal is to highlight a practical and systematic way to measure the relative effectiveness of these techniques. Evaluation comprises the analysis of the validity of the statistical requirements of the various models and their performance in simple options hedging strategies. The latter puts them to test in a "real life" application. Though there was not much difference between the three techniques, a tendency in favour of the cumulative daily volatility estimates, based on tick data, seems dear. As the improvement is not very big, the message for the practitioner - out of the restricted evidence of our experiment - is that he will probably not be losing much if working with the Markovian switching method. This highlights that, in terms of volatility estimation, no clear winner exists among the more sophisticated techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Random Forests™ is reported to be one of the most accurate classification algorithms in complex data analysis. It shows excellent performance even when most predictors are noisy and the number of variables is much larger than the number of observations. In this thesis Random Forests was applied to a large-scale lung cancer case-control study. A novel way of automatically selecting prognostic factors was proposed. Also, synthetic positive control was used to validate Random Forests method. Throughout this study we showed that Random Forests can deal with large number of weak input variables without overfitting. It can account for non-additive interactions between these input variables. Random Forests can also be used for variable selection without being adversely affected by collinearities. ^ Random Forests can deal with the large-scale data sets without rigorous data preprocessing. It has robust variable importance ranking measure. Proposed is a novel variable selection method in context of Random Forests that uses the data noise level as the cut-off value to determine the subset of the important predictors. This new approach enhanced the ability of the Random Forests algorithm to automatically identify important predictors for complex data. The cut-off value can also be adjusted based on the results of the synthetic positive control experiments. ^ When the data set had high variables to observations ratio, Random Forests complemented the established logistic regression. This study suggested that Random Forests is recommended for such high dimensionality data. One can use Random Forests to select the important variables and then use logistic regression or Random Forests itself to estimate the effect size of the predictors and to classify new observations. ^ We also found that the mean decrease of accuracy is a more reliable variable ranking measurement than mean decrease of Gini. ^

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One way to achieve the large sample sizes required for genetic studies of complex traits is to combine samples collected by different groups. It is not often clear, however, whether this practice is reasonable from a genetic perspective. To assess the comparability of samples from the Australian and the Netherlands twin studies, we estimated F,, (the proportion of total genetic variability attributable to genetic differences between cohorts) based on 359 short tandem repeat polymorphisms in 1068 individuals. IF,, was estimated to be 0.30% between the Australian and the Netherlands cohorts, a smaller value than between many European groups. We conclude that it is reasonable to combine the Australian and the Netherlands samples for joint genetic analyses.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare algorithms using as many different parameter settings and test problems as possible, in border to have a clear and detailed picture of their performance. Unfortunately, the total number of experiments required may be very large, which often makes such research work computationally prohibitive. In this paper, the application of a statistical method called racing is proposed as a general-purpose tool to reduce the computational requirements of large-scale experimental studies in evolutionary algorithms. Experimental results are presented that show that racing typically requires only a small fraction of the cost of an exhaustive experimental study.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The part-of or part-whole construct is a fundamental element of many conceptual modeling grammars that is used to associate one thing (a component) with another thing (a composite). Substantive theoretical issues surrounding the part-whole construct remain to be resolved, however. For instance, contrary to widespread claims, the relationship between components and composites is not always transitive. Moreover, how the partwhole construct should be represented in a conceptual schema diagram remains a contentious issue. Some analysts argue composites should be represented as a relationship or association. Others argue they should be represented as an entity. In this paper we use an ontological theory to support our arguments that composites should be represented as entities and not relationships or associations. We also describe an experiment that we undertook to test whether representing composites as relationships or entities enables users to understand a domain better. Our results support our arguments that using entities to represent composites enables users to better understand a domain.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The binding theme of this thesis is the examination of both phakic and pseudophakic accommodation by means of theoretical modelling and the application of a new biometric measuring technique. Anterior Segment Optical Coherence Tomography (AS-OCT) was used to assess phakic accommodative changes in 30 young subjects (19.4 2.0 years; range, 18 to 25 years). A new method of assessing curvature change with this technique was employed with limited success. Changes in axial accommodative spacing, however, proved to be very similar to those of the Scheimpflug-based data. A unique biphasic trend in the position of the posterior crystalline lens surface during accommodation was discovered, which has not been alluded to in the literature. All axial changes with accommodation were statistically significant (p < 0.01) with the exception of corneal thickness (p = 0.81). A two-year follow-up study was undertaken for a cohort of subjects previously implanted with a new accommodating intraocular lens (AIOL) (Lenstec Tetraflex KH3500). All measures of best corrected distance visual acuity (BCDVA; +0.04 0.24 logMAR), distance corrected near visual acuity (DCNVA; +0.61 0.17 logMAR) and contrast sensitivity (+1.35 0.21 log units) were good. The subjective accommodation response quantified with the push-up technique (1.53 0.64 D) and defocus curves (0.77 0.29 D) was greater than the objective stimulus response (0.21 0.19 D). AS-OCT measures with accommodation stimulus revealed a small mean posterior movement of the AIOLs (0.02 0.03 mm for a 4.0 D stimulus); this is contrary to proposed mechanism of the anterior focus-shift principle.