32 resultados para missing data recovery
em CentAUR: Central Archive University of Reading - UK
Resumo:
Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (similar to30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.
Resumo:
An important feature of agribusiness promotion programs is their lagged impact on consumption. Efficient investment in advertising requires reliable estimates of these lagged responses and it is desirable from both applied and theoretical standpoints to have a flexible method for estimating them. This note derives an alternative Bayesian methodology for estimating lagged responses when investments occur intermittently within a time series. The method exploits a latent-variable extension of the natural-conjugate, normal-linear model, Gibbs sampling and data augmentation. It is applied to a monthly time series on Turkish pasta consumption (1993:5-1998:3) and three, nonconsecutive promotion campaigns (1996:3, 1997:3, 1997:10). The results suggest that responses were greatest to the second campaign, which allocated its entire budget to television media; that its impact peaked in the sixth month following expenditure; and that the rate of return (measured in metric tons additional consumption per thousand dollars expended) was around a factor of 20.
Resumo:
An improved algorithm for the generation of gridded window brightness temperatures is presented. The primary data source is the International Satellite Cloud Climatology Project, level B3 data, covering the period from July 1983 to the present. The algorithm rakes window brightness, temperatures from multiple satellites, both geostationary and polar orbiting, which have already been navigated and normalized radiometrically to the National Oceanic and Atmospheric Administration's Advanced Very High Resolution Radiometer, and generates 3-hourly global images on a 0.5 degrees by 0.5 degrees latitude-longitude grid. The gridding uses a hierarchical scheme based on spherical kernel estimators. As part of the gridding procedure, the geostationary data are corrected for limb effects using a simple empirical correction to the radiances, from which the corrected temperatures are computed. This is in addition to the application of satellite zenith angle weighting to downweight limb pixels in preference to nearer-nadir pixels. The polar orbiter data are windowed on the target time with temporal weighting to account for the noncontemporaneous nature of the data. Large regions of missing data are interpolated from adjacent processed images using a form of motion compensated interpolation based on the estimation of motion vectors using an hierarchical block matching scheme. Examples are shown of the various stages in the process. Also shown are examples of the usefulness of this type of data in GCM validation.
Resumo:
Relationships between the four families placed in the angiosperm order Fabales (Leguminosae, Polygalaceae, Quillajaceae, Surianaceae) were hitherto poorly resolved. We combine published molecular data for the chloroplast regions matK and rbcL with 66 morphological characters surveyed for 73 ingroup and two outgroup species, and use Parsimony and Bayesian approaches to explore matrices with different missing data. All combined analyses using Parsimony recovered the topology Polygalaceae (Leguminosae (Quillajaceae + Surianaceae)). Bayesian analyses with matched morphological and molecular sampling recover the same topology, but analyses based on other data recover a different Bayesian topology: ((Polygalaceae + Leguminosae) (Quillajaceae + Surianaceae)). We explore the evolution of floral characters in the context of the more consistent topology: Polygalaceae (Leguminosae (Quillajaceae + Surianaceae)). This reveals synapomorphies for (Leguminosae (Quillajaceae + Surianaceae)) as the presence of free filaments and marginal/ventral placentation, for (Quillajaceae + Surianaceae) as pentamery and apocarpy, and for Leguminosae the presence of an abaxial median sepal and unicarpellate gynoecium. An octamerous androecium is synapomorphic for Polygalaceae. The development of papilionate flowers, and the evolutionary context in which these phenotypes appeared in Leguminosae and Polygalaceae, shows that the morphologies are convergent rather than synapomorphic within Fabales.
Resumo:
Data augmentation is a powerful technique for estimating models with latent or missing data, but applications in agricultural economics have thus far been few. This paper showcases the technique in an application to data on milk market participation in the Ethiopian highlands. There, a key impediment to economic development is an apparently low rate of market participation. Consequently, economic interest centers on the “locations” of nonparticipants in relation to the market and their “reservation values” across covariates. These quantities are of policy interest because they provide measures of the additional inputs necessary in order for nonparticipants to enter the market. One quantity of primary interest is the minimum amount of surplus milk (the “minimum efficient scale of operations”) that the household must acquire before market participation becomes feasible. We estimate this quantity through routine application of data augmentation and Gibbs sampling applied to a random-censored Tobit regression. Incorporating random censoring affects markedly the marketable-surplus requirements of the household, but only slightly the covariates requirements estimates and, generally, leads to more plausible policy estimates than the estimates obtained from the zero-censored formulation
Resumo:
Considerable effort is presently being devoted to producing high-resolution sea surface temperature (SST) analyses with a goal of spatial grid resolutions as low as 1 km. Because grid resolution is not the same as feature resolution, a method is needed to objectively determine the resolution capability and accuracy of SST analysis products. Ocean model SST fields are used in this study as simulated “true” SST data and subsampled based on actual infrared and microwave satellite data coverage. The subsampled data are used to simulate sampling errors due to missing data. Two different SST analyses are considered and run using both the full and the subsampled model SST fields, with and without additional noise. The results are compared as a function of spatial scales of variability using wavenumber auto- and cross-spectral analysis. The spectral variance at high wavenumbers (smallest wavelengths) is shown to be attenuated relative to the true SST because of smoothing that is inherent to both analysis procedures. Comparisons of the two analyses (both having grid sizes of roughly ) show important differences. One analysis tends to reproduce small-scale features more accurately when the high-resolution data coverage is good but produces more spurious small-scale noise when the high-resolution data coverage is poor. Analysis procedures can thus generate small-scale features with and without data, but the small-scale features in an SST analysis may be just noise when high-resolution data are sparse. Users must therefore be skeptical of high-resolution SST products, especially in regions where high-resolution (~5 km) infrared satellite data are limited because of cloud cover.
Resumo:
Background: Dietary intervention studies suggest that flavan-3-ol intake can improve vascular function and reduce the risk of cardiovascular diseases (CVD). However, results from prospective studies failed to show a consistent beneficial effect. Objective: To investigate associations between flavan-3-ol intake and CVD risk in the Norfolk arm of the European Prospective Investigation into Cancer and Nutrition (EPIC-Norfolk). Design: Data was available from 24,885 (11,252 men; 13,633 women) participants, recruited between 1993 and 1997 into the EPIC-Norfolk study. Flavan-3-ol intake was assessed using 7-day food diaries and the FLAVIOLA Flavanol Food Composition database. Missing data for plasma cholesterol and vitamin C were imputed using multiple imputation. Associations between flavan-3-ol intake and blood pressure at baseline were determined using linear regression models. Associations with CVD risk were estimated using Cox regression analyses. Results: Median intake of total flavan-3-ols was 1034 mg/d (range: 0 – 8531 mg/d) for men and 970 mg/d (0 – 6695 mg/d) for women, median intake of flavan-3-ol monomers was 233 mg/d (0 – 3248 mg/d) for men and 217 (0 – 2712 mg/d) for women. There were no consistent associations between flavan-3-ol monomer intake and baseline systolic and diastolic blood pressure (BP). After 286,147 person-years of follow up, there were 8463 cardio-vascular events and 1987 CVD related deaths; no consistent association between flavan-3-ol intake and CVD risk (HR 0.93, 95% CI:0.87; 1.00; Q1 vs Q5) or mortality was observed (HR 0.93, 95% CI: 0.84; 1.04). Conclusions: Flavan-3-ol intake in EPIC-Norfolk is not sufficient to achieve a statistically significant reduction in CVD risk.
Resumo:
Background Cognitive–behavioural therapy (CBT) for childhood anxiety disorders is associated with modest outcomes in the context of parental anxiety disorder. Objectives This study evaluated whether or not the outcome of CBT for children with anxiety disorders in the context of maternal anxiety disorders is improved by the addition of (i) treatment of maternal anxiety disorders, or (ii) treatment focused on maternal responses. The incremental cost-effectiveness of the additional treatments was also evaluated. Design Participants were randomised to receive (i) child cognitive–behavioural therapy (CCBT); (ii) CCBT with CBT to target maternal anxiety disorders [CCBT + maternal cognitive–behavioural therapy (MCBT)]; or (iii) CCBT with an intervention to target mother–child interactions (MCIs) (CCBT + MCI). Setting A NHS university clinic in Berkshire, UK. Participants Two hundred and eleven children with a primary anxiety disorder, whose mothers also had an anxiety disorder. Interventions All families received eight sessions of individual CCBT. Mothers in the CCBT + MCBT arm also received eight sessions of CBT targeting their own anxiety disorders. Mothers in the MCI arm received 10 sessions targeting maternal parenting cognitions and behaviours. Non-specific interventions were delivered to balance groups for therapist contact. Main outcome measures Primary clinical outcomes were the child’s primary anxiety disorder status and degree of improvement at the end of treatment. Follow-up assessments were conducted at 6 and 12 months. Outcomes in the economic analyses were identified and measured using estimated quality-adjusted life-years (QALYs). QALYS were combined with treatment, health and social care costs and presented within an incremental cost–utility analysis framework with associated uncertainty. Results MCBT was associated with significant short-term improvement in maternal anxiety; however, after children had received CCBT, group differences were no longer apparent. CCBT + MCI was associated with a reduction in maternal overinvolvement and more confident expectations of the child. However, neither CCBT + MCBT nor CCBT + MCI conferred a significant post-treatment benefit over CCBT in terms of child anxiety disorder diagnoses [adjusted risk ratio (RR) 1.18, 95% confidence interval (CI) 0.87 to 1.62, p = 0.29; adjusted RR CCBT + MCI vs. control: adjusted RR 1.22, 95% CI 0.90 to 1.67, p = 0.20, respectively] or global improvement ratings (adjusted RR 1.25, 95% CI 1.00 to 1.59, p = 0.05; adjusted RR 1.20, 95% CI 0.95 to 1.53, p = 0.13). CCBT + MCI outperformed CCBT on some secondary outcome measures. Furthermore, primary economic analyses suggested that, at commonly accepted thresholds of cost-effectiveness, the probability that CCBT + MCI will be cost-effective in comparison with CCBT (plus non-specific interventions) is about 75%. Conclusions Good outcomes were achieved for children and their mothers across treatment conditions. There was no evidence of a benefit to child outcome of supplementing CCBT with either intervention focusing on maternal anxiety disorder or maternal cognitions and behaviours. However, supplementing CCBT with treatment that targeted maternal cognitions and behaviours represented a cost-effective use of resources, although the high percentage of missing data on some economic variables is a shortcoming. Future work should consider whether or not effects of the adjunct interventions are enhanced in particular contexts. The economic findings highlight the utility of considering the use of a broad range of services when evaluating interventions with this client group. Trial registration Current Controlled Trials ISRCTN19762288. Funding This trial was funded by the Medical Research Council (MRC) and Berkshire Healthcare Foundation Trust and managed by the National Institute for Health Research (NIHR) on behalf of the MRC–NIHR partnership (09/800/17) and will be published in full in Health Technology Assessment; Vol. 19, No. 38.
Resumo:
1. Comparative analyses are used to address the key question of what makes a species more prone to extinction by exploring the links between vulnerability and intrinsic species’ traits and/or extrinsic factors. This approach requires comprehensive species data but information is rarely available for all species of interest. As a result comparative analyses often rely on subsets of relatively few species that are assumed to be representative samples of the overall studied group. 2. Our study challenges this assumption and quantifies the taxonomic, spatial, and data type biases associated with the quantity of data available for 5415 mammalian species using the freely available life-history database PanTHERIA. 3. Moreover, we explore how existing biases influence results of comparative analyses of extinction risk by using subsets of data that attempt to correct for detected biases. In particular, we focus on links between four species’ traits commonly linked to vulnerability (distribution range area, adult body mass, population density and gestation length) and conduct univariate and multivariate analyses to understand how biases affect model predictions. 4. Our results show important biases in data availability with c.22% of mammals completely lacking data. Missing data, which appear to be not missing at random, occur frequently in all traits (14–99% of cases missing). Data availability is explained by intrinsic traits, with larger mammals occupying bigger range areas being the best studied. Importantly, we find that existing biases affect the results of comparative analyses by overestimating the risk of extinction and changing which traits are identified as important predictors. 5. Our results raise concerns over our ability to draw general conclusions regarding what makes a species more prone to extinction. Missing data represent a prevalent problem in comparative analyses, and unfortunately, because data are not missing at random, conventional approaches to fill data gaps, are not valid or present important challenges. These results show the importance of making appropriate inferences from comparative analyses by focusing on the subset of species for which data are available. Ultimately, addressing the data bias problem requires greater investment in data collection and dissemination, as well as the development of methodological approaches to effectively correct existing biases.
Resumo:
A new formulation of a pose refinement technique using ``active'' models is described. An error term derived from the detection of image derivatives close to an initial object hypothesis is linearised and solved by least squares. The method is particularly well suited to problems involving external geometrical constraints (such as the ground-plane constraint). We show that the method is able to recover both the pose of a rigid model, and the structure of a deformable model. We report an initial assessment of the performance and cost of pose and structure recovery using the active model in comparison with our previously reported ``passive'' model-based techniques in the context of traffic surveillance. The new method is more stable, and requires fewer iterations, especially when the number of free parameters increases, but shows somewhat poorer convergence.
Resumo:
The common GIS-based approach to regional analyses of soil organic carbon (SOC) stocks and changes is to define geographic layers for which unique sets of driving variables are derived, which include land use, climate, and soils. These GIS layers, with their associated attribute data, can then be fed into a range of empirical and dynamic models. Common methodologies for collating and formatting regional data sets on land use, climate, and soils were adopted for the project Assessment of Soil Organic Carbon Stocks and Changes at National Scale (GEFSOC). This permitted the development of a uniform protocol for handling the various input for the dynamic GEFSOC Modelling System. Consistent soil data sets for Amazon-Brazil, the Indo-Gangetic Plains (IGP) of India, Jordan and Kenya, the case study areas considered in the GEFSOC project, were prepared using methodologies developed for the World Soils and Terrain Database (SOTER). The approach involved three main stages: (1) compiling new soil geographic and attribute data in SOTER format; (2) using expert estimates and common sense to fill selected gaps in the measured or primary data; (3) using a scheme of taxonomy-based pedotransfer rules and expert-rules to derive soil parameter estimates for similar soil units with missing soil analytical data. The most appropriate approach varied from country to country, depending largely on the overall accessibility and quality of the primary soil data available in the case study areas. The secondary SOTER data sets discussed here are appropriate for a wide range of environmental applications at national scale. These include agro-ecological zoning, land evaluation, modelling of soil C stocks and changes, and studies of soil vulnerability to pollution. Estimates of national-scale stocks of SOC, calculated using SOTER methods, are presented as a first example of database application. Independent estimates of SOC stocks are needed to evaluate the outcome of the GEFSOC Modelling System for current conditions of land use and climate. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Studies on aging and emotion suggest an increase in reported positive affect, a processing bias of positive over negative information, as well as increasingly adaptive regulation in response to negative events with advancing age. These findings imply that older individuals evaluate information differently, resulting in lowered reactivity to, and/or faster recovery from, negative information, while maintaining more positive responding to positive information. We examined this hypothesis in an ongoing study on Midlife in the US (MIDUS II) where emotional reactivity and recovery were assessed in a large number of respondents (N = 159) from a wide age range (36–84 years). We recorded eye-blink startle magnitudes and corrugator activity during and after the presentation of positive, neutral and negative pictures. The most robust age effect was found in response to neutral stimuli, where increasing age is associated with a decreased corrugator and eyeblink startle response to neutral stimuli. These data suggest that an age-related positivity effect does not essentially alter the response to emotion-laden information, but is reflected in a more positive interpretation of affectively ambiguous information. Furthermore, older women showed reduced corrugator recovery from negative pictures relative to the younger women and men, suggesting that an age-related prioritization of well-being is not necessarily reflected in adaptive regulation of negative affect.
Resumo:
To construct Biodiversity richness maps from Environmental Niche Models (ENMs) of thousands of species is time consuming. A separate species occurrence data pre-processing phase enables the experimenter to control test AUC score variance due to species dataset size. Besides, removing duplicate occurrences and points with missing environmental data, we discuss the need for coordinate precision, wide dispersion, temporal and synonymity filters. After species data filtering, the final task of a pre-processing phase should be the automatic generation of species occurrence datasets which can then be directly ’plugged-in’ to the ENM. A software application capable of carrying out all these tasks will be a valuable time-saver particularly for large scale biodiversity studies.
Resumo:
Evolutionary theory predicts that individuals, in order to increase their relative fitness, can evolve behaviours that are detrimental for the group or population. This mismatch is particularly visible in social organisms. Despite its potential to affect the population dynamics of social animals, this principle has not yet been applied to real-life conservation. Social group structure has been argued to stabilize population dynamics due to the buffering effects of nonreproducing subordinates. However, competition for breeding positions in such species can also interfere with the reproduction of breeding pairs. Seychelles magpie robins, Copsychus sechellarum, live in social groups where subordinate individuals do not breed. Analysis of long-term individual-based data and short-term behavioural observations show that subordinates increase the territorial takeover frequency of established breeders. Such takeovers delay offspring production and decrease territory productivity. Individual-based simulations of the Seychelles magpie robin population parameterized with the long-term data show that this process has significantly postponed the recovery of the species from the Critically Endangered status. Social conflict thus can extend the period of high extinction risk, which we show to have population consequences that should be taken into account in management programmes. This is the first quantitative assessment of the effects of social conflict on conservation.
Resumo:
The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.