993 resultados para Data Pooling
Resumo:
Combining datasets across independent studies can boost statistical power by increasing the numbers of observations and can achieve more accurate estimates of effect sizes. This is especially important for genetic studies where a large number of observations are required to obtain sufficient power to detect and replicate genetic effects. There is a need to develop and evaluate methods for joint-analytical analyses of rich datasets collected in imaging genetics studies. The ENIGMA-DTI consortium is developing and evaluating approaches for obtaining pooled estimates of heritability through meta-and mega-genetic analytical approaches, to estimate the general additive genetic contributions to the intersubject variance in fractional anisotropy (FA) measured from diffusion tensor imaging (DTI). We used the ENIGMA-DTI data harmonization protocol for uniform processing of DTI data from multiple sites. We evaluated this protocol in five family-based cohorts providing data from a total of 2248 children and adults (ages: 9-85) collected with various imaging protocols. We used the imaging genetics analysis tool, SOLAR-Eclipse, to combine twin and family data from Dutch, Australian and Mexican-American cohorts into one large "mega-family". We showed that heritability estimates may vary from one cohort to another. We used two meta-analytical (the sample-size and standard-error weighted) approaches and a mega-genetic analysis to calculate heritability estimates across-population. We performed leave-one-out analysis of the joint estimates of heritability, removing a different cohort each time to understand the estimate variability. Overall, meta- and mega-genetic analyses of heritability produced robust estimates of heritability.
Resumo:
Issued May 1978.
Resumo:
Client owners usually need an estimate or forecast of their likely building costs in advance of detailed design in order to confirm the financial feasibility of their projects. Because of their timing in the project life cycle, these early stage forecasts are characterized by the minimal amount of information available concerning the new (target) project to the point that often only its size and type are known. One approach is to use the mean contract sum of a sample, or base group, of previous projects of a similar type and size to the project for which the estimate is needed. Bernoulli’s law of large numbers implies that this base group should be as large as possible. However, increasing the size of the base group inevitably involves including projects that are less and less similar to the target project. Deciding on the optimal number of base group projects is known as the homogeneity or pooling problem. A method of solving the homogeneity problem is described involving the use of closed form equations to compare three different sampling arrangements of previous projects for their simulated forecasting ability by a cross-validation method, where a series of targets are extracted, with replacement, from the groups and compared with the mean value of the projects in the base groups. The procedure is then demonstrated with 450 Hong Kong projects (with different project types: Residential, Commercial centre, Car parking, Social community centre, School, Office, Hotel, Industrial, University and Hospital) clustered into base groups according to their type and size.
Resumo:
Objective. Ankylosing spondylitis (AS) is a debilitating chronic inflammatory condition with a high degree of familiality (λs=82) and heritability (>90%) that primarily affects spinal and sacroiliac joints. Whole genome scans for linkage to AS phenotypes have been conducted, although results have been inconsistent between studies and all have had modest sample sizes. One potential solution to these issues is to combine data from multiple studies in a retrospective meta-analysis. Methods: The International Genetics of Ankylosing Spondylitis Consortium combined data from three whole genome linkage scans for AS (n=3744 subjects) to determine chromosomal markers that show evidence of linkage with disease. Linkage markers typed in different centres were integrated into a consensus map to facilitate effective data pooling. We performed a weighted meta-analysis to combine the linkage results, and compared them with the three individual scans and a combined pooled scan. Results: In addition to the expected region surrounding the HLA-B27 gene on chromosome 6, we determined that several marker regions showed significant evidence of linkage with disease status. Regions on chromosome 10q and 16q achieved 'suggestive' evidence of linkage, and regions on chromosomes 1q, 3q, 5q, 6q, 9q, 17q and 19q showed at least nominal linkage in two or more scans and in the weighted meta-analysis. Regions previously associated with AS on chromosome 2q (the IL-1 gene cluster) and 22q (CYP2D6) exhibited nominal linkage in the meta-analysis, providing further statistical support for their involvement in susceptibility to AS. Conclusion: These findings provide a useful guide for future studies aiming to identify the genes involved in this highly heritable condition. . Published by on behalf of the British Society for Rheumatology.
Resumo:
There has been a recent spate of high profile infrastructure cost overruns in Australia and internationally. This is just the tip of a longer-term and more deeply-seated problem with initial budget estimating practice, well recognised in both academic research and industry reviews: the problem of uncertainty. A case study of the Sydney Opera House is used to identify and illustrate the key causal factors and system dynamics of cost overruns. It is conventionally the role of risk management to deal with such uncertainty, but the type and extent of the uncertainty involved in complex projects is shown to render established risk management techniques ineffective. This paper considers a radical advance on current budget estimating practice which involves a particular approach to statistical modelling complemented by explicit training in estimating practice. The statistical modelling approach combines the probability management techniques of Savage, which operate on actual distributions of values rather than flawed representations of distributions, and the data pooling technique of Skitmore, where the size of the reference set is optimised. Estimating training employs particular calibration development methods pioneered by Hubbard, which reduce the bias of experts caused by over-confidence and improve the consistency of subjective decision-making. A new framework for initial budget estimating practice is developed based on the combined statistical and training methods, with each technique being explained and discussed.
Resumo:
We consider a setting in which several operators offer downlink wireless data access services in a certain geographical region. Each operator deploys several base stations or access points, and registers some subscribers. In such a situation, if operators pool their infrastructure, and permit the possibility of subscribers being served by any of the cooperating operators, then there can be overall better user satisfaction, and increased operator revenue. We use coalitional game theory to investigate such resource pooling and cooperation between operators.We use utility functions to model user satisfaction, and show that the resulting coalitional game has the property that if all operators cooperate (i.e., form a grand coalition) then there is an operating point that maximizes the sum utility over the operators while providing the operators revenues such that no subset of operators has an incentive to break away from the coalition. We investigate whether such operating points can result in utility unfairness between users of the various operators. We also study other revenue sharing concepts, namely, the nucleolus and the Shapely value. Such investigations throw light on criteria for operators to accept or reject subscribers, based on the service level agreements proposed by them. We also investigate the situation in which only certain subsets of operators may be willing to cooperate.
Resumo:
"August 1987"--P. iii.
Resumo:
The use of expert knowledge to quantify a Bayesian Network (BN) is necessary when data is not available. This however raises questions regarding how opinions from multiple experts can be used in a BN. Linear pooling is a popular method for combining probability assessments from multiple experts. In particular, Prior Linear Pooling (PrLP), which pools opinions then places them into the BN is a common method. This paper firstly proposes an alternative pooling method, Posterior Linear Pooling (PoLP). This method constructs a BN for each expert, then pools the resulting probabilities at the nodes of interest. Secondly, it investigates the advantages and disadvantages of using these pooling methods to combine the opinions of multiple experts. Finally, the methods are applied to an existing BN, the Wayfinding Bayesian Network Model, to investigate the behaviour of different groups of people and how these different methods may be able to capture such differences. The paper focusses on 6 nodes Human Factors, Environmental Factors, Wayfinding, Communication, Visual Elements of Communication and Navigation Pathway, and three subgroups Gender (female, male),Travel Experience (experienced, inexperienced), and Travel Purpose (business, personal) and finds that different behaviors can indeed be captured by the different methods.
Resumo:
Stable isotope (SI) values of carbon (δ13C) and nitrogen (δ15N) are useful for determining the trophic connectivity between species within an ecosystem, but interpretation of these data involves important assumptions about sources of intrapopulation variability. We compared intrapopulation variability in δ13C and δ15N for an estuarine omnivore, Spotted Seatrout (Cynoscion nebulosus), to test assumptions and assess the utility of SI analysis for delineation of the connectivity of this species with other species in estuarine food webs. Both δ13C and δ15N values showed patterns of enrichment in fish caught from coastal to offshore sites and as a function of fish size. Results for δ13C were consistent in liver and muscle tissue, but liver δ15N showed a negative bias when compared with muscle that increased with absolute δ15N value. Natural variability in both isotopes was 5–10 times higher than that observed in laboratory populations, indicating that environmentally driven intrapopulation variability is detectable particularly after individual bias is removed through sample pooling. These results corroborate the utility of SI analysis for examination of the position of Spotted Seatrout in an estuarine food web. On the basis of these results, we conclude that interpretation of SI data in fishes should account for measurable and ecologically relevant intrapopulation variability for each species and system on a case by case basis.
Resumo:
Motivation: Many biomedical experiments are carried out by pooling individual biological samples. However, pooling samples can potentially hide biological variance and give false confidence concerning the data significance. In the context of microarray experiments for detecting differentially expressed genes, recent publications have addressed the problem of the efficiency of sample pooling, and some approximate formulas were provided for the power and sample size calculations. It is desirable to have exact formulas for these calculations and have the approximate results checked against the exact ones. We show that the difference between the approximate and the exact results can be large.
Resumo:
There is substantial international variation in human papillomavirus (HPV) prevalence; this study details the first report from Northern Ireland and additionally provides a systematic review and meta-analysis pooling the prevalence of high-risk (HR-HPV) subtypes among women with normal cytology in the UK and Ireland. Between February and December 2009, routine liquid based cytology (LBC) samples were collected for HPV detection (Roche Cobas® 4800 [PCR]) among unselected women attending for cervical cytology testing. Four electronic databases, including MEDLINE, were then searched from their inception till April 2011. A random effects meta-analysis was used to calculate a pooled HR-HPV prevalence and associated 95% confidence intervals (CI). 5,712 women, mean age 39 years (±SD 11.9 years; range 20-64 years), were included in the analysis, of which 5,068 (88.7%), 417 (7.3%) and 72 (1.3%) had normal, low, and high-grade cytological findings, respectively. Crude HR-HPV prevalence was 13.2% (95% CI, 12.7-13.7) among women with normal cytology and increased with cytological grade. In meta-analysis the pooled HR-HPV prevalence among those with normal cytology was 0.12 (95% CIs, 0.10-0.14; 21 studies) with the highest prevalence in younger women. HPV 16 and HPV 18 specific estimates were 0.03 (95% CI, 0.02-0.05) and 0.01 (95% CI, 0.01-0.02), respectively. The findings of this Northern Ireland study and meta-analysis verify the prevalent nature of HPV infection among younger women. Reporting of the type-specific prevalence of HPV infection is relevant for evaluating the impact of future HPV immunization initiatives, particularly against HR-HPV types other than HPV 16 and 18. J. Med. Virol. 85:295-308, 2013. © 2012 Wiley Periodicals, Inc. Copyright © 2012 Wiley Periodicals, Inc.
Resumo:
Aim Species distribution models (SDMs) based on current species ranges underestimate the potential distribution when projected in time and/or space. A multi-temporal model calibration approach has been suggested as an alternative, and we evaluate this using 13,000 years of data. Location Europe. Methods We used fossil-based records of presence for Picea abies, Abies alba and Fagus sylvatica and six climatic variables for the period 13,000 to 1000 yr bp. To measure the contribution of each 1000-year time step to the total niche of each species (the niche measured by pooling all the data), we employed a principal components analysis (PCA) calibrated with data over the entire range of possible climates. Then we projected both the total niche and the partial niches from single time frames into the PCA space, and tested if the partial niches were more similar to the total niche than random. Using an ensemble forecasting approach, we calibrated SDMs for each time frame and for the pooled database. We projected each model to current climate and evaluated the results against current pollen data. We also projected all models into the future. Results Niche similarity between the partial and the total-SDMs was almost always statistically significant and increased through time. SDMs calibrated from single time frames gave different results when projected to current climate, providing evidence of a change in the species realized niches through time. Moreover, they predicted limited climate suitability when compared with the total-SDMs. The same results were obtained when projected to future climates. Main conclusions The realized climatic niche of species differed for current and future climates when SDMs were calibrated considering different past climates. Building the niche as an ensemble through time represents a way forward to a better understanding of a species' range and its ecology in a changing climate.
Resumo:
We consider forecasting using a combination, when no model coincides with a non-constant data generation process (DGP). Practical experience suggests that combining forecasts adds value, and can even dominate the best individual device. We show why this can occur when forecasting models are differentially mis-specified, and is likely to occur when the DGP is subject to location shifts. Moreover, averaging may then dominate over estimated weights in the combination. Finally, it cannot be proved that only non-encompassed devices should be retained in the combination. Empirical and Monte Carlo illustrations confirm the analysis.
Resumo:
In this paper, we propose a novel approach to econometric forecasting of stationary and ergodic time series within a panel-data framework. Our key element is to employ the bias-corrected average forecast. Using panel-data sequential asymptotics we show that it is potentially superior to other techniques in several contexts. In particular it delivers a zero-limiting mean-squared error if the number of forecasts and the number of post-sample time periods is sufficiently large. We also develop a zero-mean test for the average bias. Monte-Carlo simulations are conducted to evaluate the performance of this new technique in finite samples. An empirical exercise, based upon data from well known surveys is also presented. Overall, these results show promise for the bias-corrected average forecast.