944 resultados para Data snooping bias
Outperformance in exchange-traded fund pricing deviations: Generalized control of data snooping bias
Resumo:
An investigation into exchange-traded fund (ETF) outperforrnance during the period 2008-2012 is undertaken utilizing a data set of 288 U.S. traded securities. ETFs are tested for net asset value (NAV) premium, underlying index and market benchmark outperformance, with Sharpe, Treynor, and Sortino ratios employed as risk-adjusted performance measures. A key contribution is the application of an innovative generalized stepdown procedure in controlling for data snooping bias. We find that a large proportion of optimized replication and debt asset class ETFs display risk-adjusted premiums with energy and precious metals focused funds outperforming the S&P 500 market benchmark.
Resumo:
Background Jumping to conclusions (JTC) is associated with psychotic disorder and psychotic symptoms. If JTC represents a trait, the rate should be (i) increased in people with elevated levels of psychosis proneness such as individuals diagnosed with borderline personality disorder (BPD), and (ii) show a degree of stability over time. Methods The JTC rate was examined in 3 groups: patients with first episode psychosis (FEP), BPD patients and controls, using the Beads Task. PANSS, SIS-R and CAPE scales were used to assess positive psychotic symptoms. Four WAIS III subtests were used to assess IQ. Results A total of 61 FEP, 26 BPD and 150 controls were evaluated. 29 FEP were revaluated after one year. 44% of FEP (OR = 8.4, 95% CI: 3.9-17.9) displayed a JTC reasoning bias versus 19% of BPD (OR = 2.5, 95% CI: 0.8-7.8) and 9% of controls. JTC was not associated with level of psychotic symptoms or specifically delusionality across the different groups. Differences between FEP and controls were independent of sex, educational level, cannabis use and IQ. After one year, 47.8% of FEP with JTC at baseline again displayed JTC. Conclusions JTC in part reflects trait vulnerability to develop disorders with expression of psychotic symptoms.
Resumo:
We provide a comprehensive study of out-of-sample forecasts for the EUR/USD exchange rate based on multivariate macroeconomic models and forecast combinations. We use profit maximization measures based on directional accuracy and trading strategies in addition to standard loss minimization measures. When comparing predictive accuracy and profit measures, data snooping bias free tests are used. The results indicate that forecast combinations, in particular those based on principal components of forecasts, help to improve over benchmark trading strategies, although the excess return per unit of deviation is limited.
Resumo:
Simulations of the top-of-atmosphere radiative-energy budget from the Met Office global numerical weather-prediction model are evaluated using new data from the Geostationary Earth Radiation Budget (GERB) instrument on board the Meteosat-8 satellite. Systematic discrepancies between the model simulations and GERB measurements greater than 20 Wm-2 in outgoing long-wave radiation (OLR) and greater than 60 Wm-2 in reflected short-wave radiation (RSR) are identified over the period April-September 2006 using 12 UTC data. Convective cloud over equatorial Africa is spatially less organized and less reflective than in the GERB data. This bias depends strongly on convective-cloud cover, which is highly sensitive to changes in the model convective parametrization. Underestimates in model OLR over the Gulf of Guinea coincide with unrealistic southerly cloud outflow from convective centres to the north. Large overestimates in model RSR over the subtropical ocean, greater than 50 Wm-2 at 12 UTC, are explained by unrealistic radiative properties of low-level cloud relating to overestimation of cloud liquid water compared with independent satellite measurements. The results of this analysis contribute to the development and improvement of parametrizations in the global forecast model.
Resumo:
This thesis addresses the problem of information hiding in low dimensional digital data focussing on issues of privacy and security in Electronic Patient Health Records (EPHRs). The thesis proposes a new security protocol based on data hiding techniques for EPHRs. This thesis contends that embedding of sensitive patient information inside the EPHR is the most appropriate solution currently available to resolve the issues of security in EPHRs. Watermarking techniques are applied to one-dimensional time series data such as the electroencephalogram (EEG) to show that they add a level of confidence (in terms of privacy and security) in an individual’s diverse bio-profile (the digital fingerprint of an individual’s medical history), ensure belief that the data being analysed does indeed belong to the correct person, and also that it is not being accessed by unauthorised personnel. Embedding information inside single channel biomedical time series data is more difficult than the standard application for images due to the reduced redundancy. A data hiding approach which has an in built capability to protect against illegal data snooping is developed. The capability of this secure method is enhanced by embedding not just a single message but multiple messages into an example one-dimensional EEG signal. Embedding multiple messages of similar characteristics, for example identities of clinicians accessing the medical record helps in creating a log of access while embedding multiple messages of dissimilar characteristics into an EPHR enhances confidence in the use of the EPHR. The novel method of embedding multiple messages of both similar and dissimilar characteristics into a single channel EEG demonstrated in this thesis shows how this embedding of data boosts the implementation and use of the EPHR securely.
Resumo:
Population dynamics are generally viewed as the result of intrinsic (purely density dependent) and extrinsic (environmental) processes. Both components, and potential interactions between those two, have to be modelled in order to understand and predict dynamics of natural populations; a topic that is of great importance in population management and conservation. This thesis focuses on modelling environmental effects in population dynamics and how effects of potentially relevant environmental variables can be statistically identified and quantified from time series data. Chapter I presents some useful models of multiplicative environmental effects for unstructured density dependent populations. The presented models can be written as standard multiple regression models that are easy to fit to data. Chapters II IV constitute empirical studies that statistically model environmental effects on population dynamics of several migratory bird species with different life history characteristics and migration strategies. In Chapter II, spruce cone crops are found to have a strong positive effect on the population growth of the great spotted woodpecker (Dendrocopos major), while cone crops of pine another important food resource for the species do not effectively explain population growth. The study compares rate- and ratio-dependent effects of cone availability, using state-space models that distinguish between process and observation error in the time series data. Chapter III shows how drought, in combination with settling behaviour during migration, produces asymmetric spatially synchronous patterns of population dynamics in North American ducks (genus Anas). Chapter IV investigates the dynamics of a Finnish population of skylark (Alauda arvensis), and point out effects of rainfall and habitat quality on population growth. Because the skylark time series and some of the environmental variables included show strong positive autocorrelation, the statistical significances are calculated using a Monte Carlo method, where random autocorrelated time series are generated. Chapter V is a simulation-based study, showing that ignoring observation error in analyses of population time series data can bias the estimated effects and measures of uncertainty, if the environmental variables are autocorrelated. It is concluded that the use of state-space models is an effective way to reach more accurate results. In summary, there are several biological assumptions and methodological issues that can affect the inferential outcome when estimating environmental effects from time series data, and that therefore need special attention. The functional form of the environmental effects and potential interactions between environment and population density are important to deal with. Other issues that should be considered are assumptions about density dependent regulation, modelling potential observation error, and when needed, accounting for spatial and/or temporal autocorrelation.
Resumo:
BACKGROUND: Measurement of CD4+ T-lymphocytes (CD4) is a crucial parameter in the management of HIV patients, particularly in determining eligibility to initiate antiretroviral treatment (ART). A number of technologies exist for CD4 enumeration, with considerable variation in cost, complexity, and operational requirements. We conducted a systematic review of the performance of technologies for CD4 enumeration. METHODS AND FINDINGS: Studies were identified by searching electronic databases MEDLINE and EMBASE using a pre-defined search strategy. Data on test accuracy and precision included bias and limits of agreement with a reference standard, and misclassification probabilities around CD4 thresholds of 200 and 350 cells/μl over a clinically relevant range. The secondary outcome measure was test imprecision, expressed as % coefficient of variation. Thirty-two studies evaluating 15 CD4 technologies were included, of which less than half presented data on bias and misclassification compared to the same reference technology. At CD4 counts <350 cells/μl, bias ranged from -35.2 to +13.1 cells/μl while at counts >350 cells/μl, bias ranged from -70.7 to +47 cells/μl, compared to the BD FACSCount as a reference technology. Misclassification around the threshold of 350 cells/μl ranged from 1-29% for upward classification, resulting in under-treatment, and 7-68% for downward classification resulting in overtreatment. Less than half of these studies reported within laboratory precision or reproducibility of the CD4 values obtained. CONCLUSIONS: A wide range of bias and percent misclassification around treatment thresholds were reported on the CD4 enumeration technologies included in this review, with few studies reporting assay precision. The lack of standardised methodology on test evaluation, including the use of different reference standards, is a barrier to assessing relative assay performance and could hinder the introduction of new point-of-care assays in countries where they are most needed.
Resumo:
Background: Skeletal muscle wasting and weakness are significant complications of critical illness, associated with the degree of illness severity and periods of reduced mobility during mechanical ventilation. They contribute to the profound physical and functional deficits observed in survivors. These impairments may persist for many years following discharge from the intensive care unit (ICU) and may markedly influence health-related quality of life. Rehabilitation is a key strategy in the recovery of patients following critical illness. Exercise based interventions are aimed at targeting this muscle wasting and weakness. Physical rehabilitation delivered during ICU admission has been systematically evaluated and shown to be beneficial. However its effectiveness when initiated after ICU discharge has yet to be established. Objectives: To assess the effectiveness of exercise rehabilitation programmes, initiated after ICU discharge, on functional exercise capacity and health-related quality of life in adult ICU survivors who have been mechanically ventilated for more than 24 hours. Search methods:We searched the following databases: the Cochrane Central Register of Controlled Trials (CENTRAL) (The Cochrane Library), OvidSP MEDLINE, Ovid SP EMBASE, and CINAHL via EBSCO host to 15th May 2014. We used a specific search strategy for each database. This included synonyms for ICU and critical illness, exercise training and rehabilitation. We searched the reference lists of included studies and contacted primary authors to obtain further information regarding potentially eligible studies. We also searched major clinical trials registries (Clinical Trials and Current Controlled Trials) and the personal libraries of the review authors. We applied no language or publication restriction. We reran the search in February 2015. We will deal with any studies of interest when we update the review. Selection criteria:We included randomized controlled trials (RCTs), quasi-RCTs, and controlled clinical trials (CCTs) that compared an exercise interventioninitiated after ICU discharge to any other intervention or a control or ‘usual care’ programme in adult (≥18years) survivors ofcritical illness. Data collection and analysis:We used standard methodological procedures expected by The Cochrane Collaboration. Main results:We included six trials (483 adult ICU participants). Exercise-based interventions were delivered on the ward in two studies; both onthe ward and in the community in one study; and in the community in three studies. The duration of the intervention varied according to the length of stay in hospital following ICU discharge (up to a fixed duration of 12 weeks).Risk of bias was variable for all domains across all trials. High risk of bias was evident in all studies for performance bias, although blinding of participants and personnel in therapeutic rehabilitation trials can be pragmatically challenging. Low risk of bias was at least 50% for all other domains across all trials, although high risk of bias was present in one study for random sequence generation (selection bias), incomplete outcome data (attrition bias) and other sources. Risk of bias was unclear for remaining studies across the domains.All six studies measured effect on the primary outcome of functional exercise capacity, although there was wide variability in natureof intervention, outcome measures and associated metrics, and data reporting. Overall quality of the evidence was very low. Only two studies using the same outcome measure for functional exercise capacity, had the potential for pooling of data and assessment of heterogeneity. On statistical advice, this was considered inappropriate to perform this analysis and study findings were therefore qualitatively described. Individually, three studies reported positive results in favour of the intervention. A small benefit (versus. control)was evident in anaerobic threshold in one study (mean difference, MD (95% confidence interval, CI), 1.8 mlO2/kg/min (0.4 to 3.2),P value = 0.02), although this effect was short-term, and in a second study, both incremental (MD 4.7 (95% CI 1.69 to 7.75) Watts, P value = 0.003) and endurance (MD 4.12 (95% CI 0.68 to 7.56) minutes, P value = 0.021) exercise testing demonstrated improvement.Finally self-reported physical function increased significantly following a rehabilitation manual (P value = 0.006). Remaining studies found no effect of the intervention.Similar variability in with regard findings for the primary outcome of health-related quality of life were also evident. Only two studies evaluated this outcome. Following statistical advice, these data again were considered inappropriate for pooling to determine overall effect and assessment of heterogeneity. Qualitative description of findings was therefore undertaken. Individually, neither study reported differences between intervention and control groups for health-related quality of life as a result of the intervention. Overall quality of the evidence was very low.Mortality was reported by all studies, ranging from 0% to 18.8%. Only one non-mortality adverse event was reported across all patients in all studies (a minor musculoskeletal injury). Withdrawals, reported in four studies, ranged from 0% to 26.5% in control groups,and 8.2% to 27.6% in intervention groups. Loss to follow-up, reported in all studies, ranged from 0% to 14% in control groups, and 0% to 12.5% in intervention groups. Authors’ conclusions:We are unable, at this time, to determine an overall effect on functional exercise capacity, or health-related quality of life, of an exercise based intervention initiated after ICU discharge in survivors of critical illness. Meta-analysis of findings was not appropriate. This was due to insufficient study number and data. Individual study findings were inconsistent. Some studies reported a beneficial effect of the intervention on functional exercise capacity, and others not. No effect was reported on health-related quality of life. Methodological rigour was lacking across a number of domains influencing quality of the evidence. There was also wide variability in the characteristics of interventions, outcome measures and associated metrics, and data reporting.If further trials are identified, we may be able to determine the effect of exercise-based interventions following ICU discharge, on functional exercise capacity and health-related quality of life in survivors of critical illness.
Adjusting HIV Prevalence Estimates for Non-participation: an Application to Demographic Surveillance
Resumo:
Introduction: HIV testing is a cornerstone of efforts to combat the HIV epidemic, and testing conducted as part of surveillance provides invaluable data on the spread of infection and the effectiveness of campaigns to reduce the transmission of HIV. However, participation in HIV testing can be low, and if respondents systematically select not to be tested because they know or suspect they are HIV positive (and fear disclosure), standard approaches to deal with missing data will fail to remove selection bias. We implemented Heckman-type selection models, which can be used to adjust for missing data that are not missing at random, and established the extent of selection bias in a population-based HIV survey in an HIV hyperendemic community in rural South Africa.
Methods: We used data from a population-based HIV survey carried out in 2009 in rural KwaZulu-Natal, South Africa. In this survey, 5565 women (35%) and 2567 men (27%) provided blood for an HIV test. We accounted for missing data using interviewer identity as a selection variable which predicted consent to HIV testing but was unlikely to be independently associated with HIV status. Our approach involved using this selection variable to examine the HIV status of residents who would ordinarily refuse to test, except that they were allocated a persuasive interviewer. Our copula model allows for flexibility when modelling the dependence structure between HIV survey participation and HIV status.
Results: For women, our selection model generated an HIV prevalence estimate of 33% (95% CI 27–40) for all people eligible to consent to HIV testing in the survey. This estimate is higher than the estimate of 24% generated when only information from respondents who participated in testing is used in the analysis, and the estimate of 27% when imputation analysis is used to predict missing data on HIV status. For men, we found an HIV prevalence of 25% (95% CI 15–35) using the selection model, compared to 16% among those who participated in testing, and 18% estimated with imputation. We provide new confidence intervals that correct for the fact that the relationship between testing and HIV status is unknown and requires estimation.
Conclusions: We confirm the feasibility and value of adopting selection models to account for missing data in population-based HIV surveys and surveillance systems. Elements of survey design, such as interviewer identity, present the opportunity to adopt this approach in routine applications. Where non-participation is high, true confidence intervals are much wider than those generated by standard approaches to dealing with missing data suggest.
Resumo:
Este trabalho busca testar a validade da hipótese de eficiência dos mercados no mercado futuro do índice lbovespa através do uso das chamadas estratégias de análise técnica. São utilizados testes de habilidade preditiva para verificar a hipótese de superioridade destas regras de decisão como forma de investimento. Estes testes possuem a vantagem de considerar a possibilidade de data-snooping na escolha da melhor estratégia, permitindo identificar se a aparente capacidade preditiva destes modelos é realmente significativa ou mero produto do acaso. Os resultados indicam que as estratégias de análise técnica não são capazes de gerar retornos estatisticamente significativos quando os efeitos de data-snooping são levados em conta. Estes resultados estão de acordo com o previsto pela hipótese fraca de eficiência de mercado.
Resumo:
Este trabalho estuda a lucratividade dos modelos de Análise Técnica no mercado de câmbio brasileiro. Utilizando a metodologia de White (2000) para testar 1712 regras geradas a partir de quatro modelos de Análise Técnica verifica-se que a melhor regra não possui poder de previsibilidade significante ao se considerar os efeitos de data-snooping. Os resultados indicam que o mercado de câmbio brasileiro está de acordo com a hipótese de mercado eficiente sugerida pela literatura.
Resumo:
Ever since Adam Smith, economists have argued that share contracts do not provide proper incentives. This paper uses tenancy data from India to assess the existence of missing incentives in this classical example of moral hazard. Sharecroppers are found to be less productive than owners, but as productive as fixed-rent tenants. Also, the productivity gap between owners and both types of tenants is driven by sample-selection issues. An endogenous selection rule matches tenancy contracts with less-skilled farmers and lower-quality lands. Due to complementarity, such a matching affects tenants’ input choices. Controlling for that, the contract form has no effect on the expected output. Next, I explicitly model farmer’s optimal decisions to test the existence of non-contractible inputs being misused. No evidence of missing incentives is found.
Resumo:
Image orientation is a basic problem in Digital Photogrammetry. While interior and relative orientations were succesfully automated, the same can not be said about absolute orientation. This process can be automated by using an approach based on relational matching and a heuristic that uses the analytical relation between straight features in the object space and its homologous in the image space. A build-in self-diagnosis is also used in this method, that is based on the implementation of data snooping statistic test in the process of spatial resection, using the Iterated Extended Kalman Filtering (IEKF). The aim of this paper is to present the basic principles of the proposed approach and results based on real data.
Resumo:
The identification of ground control on photographs or images is usually carried out by a human operator, who uses his natural skills to make interpretations. In Digital Photogrammetry, which uses techniques of digital image processing extraction of ground control can be automated by using an approach based on relational matching and a heuristic that uses the analytical relation between straight features of object space and its homologous in the image space. A build-in self-diagnosis is also used in this method. It is based on implementation of data snooping statistic test in the process of spatial resection using the Iterated Extended Kalman Filtering (IEKF). The aim of this paper is to present the basic principles of the proposed approach and results based on real data.
Resumo:
Molecular data are now widely used in epidemiological studies to investigate the transmission, distribution, biology, and diversity of pathogens. Our objective was to establish recommendations to support good scientific reporting of molecular epidemiological studies to encourage authors to consider specific threats to valid inference. The statement Strengthening the Reporting of Molecular Epidemiology for Infectious Diseases (STROME-ID) builds upon the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) initiative. The STROME-ID statement was developed by a working group of epidemiologists, statisticians, bioinformaticians, virologists, and microbiologists with expertise in control of infection and communicable diseases. The statement focuses on issues relating to the reporting of epidemiological studies of infectious diseases using molecular data that were not addressed by STROBE. STROME-ID addresses terminology, measures of genetic diversity within pathogen populations, laboratory methods, sample collection, use of molecular markers, molecular clocks, timeframe, multiple-strain infections, non-independence of infectious-disease data, missing data, ascertainment bias, consistency between molecular and epidemiological data, and ethical considerations with respect to infectious-disease research. In total, 20 items were added to the 22 item STROBE checklist. When used, the STROME-ID recommendations should advance the quality and transparency of scientific reporting, with clear benefits for evidence reviews and health-policy decision making.