949 resultados para missing data imputation
Resumo:
Providing transportation system operators and travelers with accurate travel time information allows them to make more informed decisions, yielding benefits for individual travelers and for the entire transportation system. Most existing advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS) use instantaneous travel time values estimated based on the current measurements, assuming that traffic conditions remain constant in the near future. For more effective applications, it has been proposed that ATIS and ATMS should use travel times predicted for short-term future conditions rather than instantaneous travel times measured or estimated for current conditions. ^ This dissertation research investigates short-term freeway travel time prediction using Dynamic Neural Networks (DNN) based on traffic detector data collected by radar traffic detectors installed along a freeway corridor. DNN comprises a class of neural networks that are particularly suitable for predicting variables like travel time, but has not been adequately investigated for this purpose. Before this investigation, it was necessary to identifying methods for data imputation to account for missing data usually encountered when collecting data using traffic detectors. It was also necessary to identify a method to estimate the travel time on the freeway corridor based on data collected using point traffic detectors. A new travel time estimation method referred to as the Piecewise Constant Acceleration Based (PCAB) method was developed and compared with other methods reported in the literatures. The results show that one of the simple travel time estimation methods (the average speed method) can work as well as the PCAB method, and both of them out-perform other methods. This study also compared the travel time prediction performance of three different DNN topologies with different memory setups. The results show that one DNN topology (the time-delay neural networks) out-performs the other two DNN topologies for the investigated prediction problem. This topology also performs slightly better than the simple multilayer perceptron (MLP) neural network topology that has been used in a number of previous studies for travel time prediction.^
Resumo:
Over the last two decades social vulnerability has emerged as a major area of study, with increasing attention to the study of vulnerable populations. Generally, the elderly are among the most vulnerable members of any society, and widespread population aging has led to greater focus on elderly vulnerability. However, the absence of a valid and practical measure constrains the ability of policy-makers to address this issue in a comprehensive way. This study developed a composite indicator, The Elderly Social Vulnerability Index (ESVI), and used it to undertake a comparative analysis of the availability of support for elderly Jamaicans based on their access to human, material and social resources. The results of the ESVI indicated that while the elderly are more vulnerable overall, certain segments of the population appear to be at greater risk. Females had consistently lower scores than males, and the oldest-old had the highest scores of all groups of older persons. Vulnerability scores also varied according to place of residence, with more rural parishes having higher scores than their urban counterparts. These findings support the political economy framework which locates disadvantage in old age within political and ideological structures. The findings also point to the pervasiveness and persistence of gender inequality as argued by feminist theories of aging. Based on the results of the study it is clear that there is a need for policies that target specific population segments, in addition to universal policies that could make the experience of old age less challenging for the majority of older persons. Overall, the ESVI has displayed usefulness as a tool for theoretical analysis and demonstrated its potential as a policy instrument to assist decision-makers in determining where to target their efforts as they seek to address the issue of social vulnerability in old age. Data for this study came from the 2001 population and housing census of Jamaica, with multiple imputation for missing data. The index was derived from the linear aggregation of three equally weighted domains, comprised of eleven unweighted indicators which were normalized using z-scores. Indicators were selected based on theoretical relevance and data availability.
Resumo:
Providing transportation system operators and travelers with accurate travel time information allows them to make more informed decisions, yielding benefits for individual travelers and for the entire transportation system. Most existing advanced traveler information systems (ATIS) and advanced traffic management systems (ATMS) use instantaneous travel time values estimated based on the current measurements, assuming that traffic conditions remain constant in the near future. For more effective applications, it has been proposed that ATIS and ATMS should use travel times predicted for short-term future conditions rather than instantaneous travel times measured or estimated for current conditions. This dissertation research investigates short-term freeway travel time prediction using Dynamic Neural Networks (DNN) based on traffic detector data collected by radar traffic detectors installed along a freeway corridor. DNN comprises a class of neural networks that are particularly suitable for predicting variables like travel time, but has not been adequately investigated for this purpose. Before this investigation, it was necessary to identifying methods for data imputation to account for missing data usually encountered when collecting data using traffic detectors. It was also necessary to identify a method to estimate the travel time on the freeway corridor based on data collected using point traffic detectors. A new travel time estimation method referred to as the Piecewise Constant Acceleration Based (PCAB) method was developed and compared with other methods reported in the literatures. The results show that one of the simple travel time estimation methods (the average speed method) can work as well as the PCAB method, and both of them out-perform other methods. This study also compared the travel time prediction performance of three different DNN topologies with different memory setups. The results show that one DNN topology (the time-delay neural networks) out-performs the other two DNN topologies for the investigated prediction problem. This topology also performs slightly better than the simple multilayer perceptron (MLP) neural network topology that has been used in a number of previous studies for travel time prediction.
Resumo:
This paper provides a method for constructing a new historical global nitrogen fertilizer application map (0.5° × 0.5° resolution) for the period 1961-2010 based on country-specific information from Food and Agriculture Organization statistics (FAOSTAT) and various global datasets. This new map incorporates the fraction of NH+4 (and NONO-3) in N fertilizer inputs by utilizing fertilizer species information in FAOSTAT, in which species can be categorized as NH+4 and/or NO-3-forming N fertilizers. During data processing, we applied a statistical data imputation method for the missing data (19 % of national N fertilizer consumption) in FAOSTAT. The multiple imputation method enabled us to fill gaps in the time-series data using plausible values using covariates information (year, population, GDP, and crop area). After the imputation, we downscaled the national consumption data to a gridded cropland map. Also, we applied the multiple imputation method to the available chemical fertilizer species consumption, allowing for the estimation of the NH+4/NO-3 ratio in national fertilizer consumption. In this study, the synthetic N fertilizer inputs in 2000 showed a general consistency with the existing N fertilizer map (Potter et al., 2010, doi:10.1175/2009EI288.1) in relation to the ranges of N fertilizer inputs. Globally, the estimated N fertilizer inputs based on the sum of filled data increased from 15 Tg-N to 110 Tg-N during 1961-2010. On the other hand, the global NO-3 input started to decline after the late 1980s and the fraction of NO-3 in global N fertilizer decreased consistently from 35 % to 13 % over a 50-year period. NH+4 based fertilizers are dominant in most countries; however, the NH+4/NO-3 ratio in N fertilizer inputs shows clear differences temporally and geographically. This new map can be utilized as an input data to global model studies and bring new insights for the assessment of historical terrestrial N cycling changes.
Resumo:
A certain type of bacterial inclusion, known as a bacterial microcompartment, was recently identified and imaged through cryo-electron tomography. A reconstructed 3D object from single-axis limited angle tilt-series cryo-electron tomography contains missing regions and this problem is known as the missing wedge problem. Due to missing regions on the reconstructed images, analyzing their 3D structures is a challenging problem. The existing methods overcome this problem by aligning and averaging several similar shaped objects. These schemes work well if the objects are symmetric and several objects with almost similar shapes and sizes are available. Since the bacterial inclusions studied here are not symmetric, are deformed, and show a wide range of shapes and sizes, the existing approaches are not appropriate. This research develops new statistical methods for analyzing geometric properties, such as volume, symmetry, aspect ratio, polyhedral structures etc., of these bacterial inclusions in presence of missing data. These methods work with deformed and non-symmetric varied shaped objects and do not necessitate multiple objects for handling the missing wedge problem. The developed methods and contributions include: (a) an improved method for manual image segmentation, (b) a new approach to 'complete' the segmented and reconstructed incomplete 3D images, (c) a polyhedral structural distance model to predict the polyhedral shapes of these microstructures, (d) a new shape descriptor for polyhedral shapes, named as polyhedron profile statistic, and (e) the Bayes classifier, linear discriminant analysis and support vector machine based classifiers for supervised incomplete polyhedral shape classification. Finally, the predicted 3D shapes for these bacterial microstructures belong to the Johnson solids family, and these shapes along with their other geometric properties are important for better understanding of their chemical and biological characteristics.
Resumo:
Previously developed models for predicting absolute risk of invasive epithelial ovarian cancer have included a limited number of risk factors and have had low discriminatory power (area under the receiver operating characteristic curve (AUC) < 0.60). Because of this, we developed and internally validated a relative risk prediction model that incorporates 17 established epidemiologic risk factors and 17 genome-wide significant single nucleotide polymorphisms (SNPs) using data from 11 case-control studies in the United States (5,793 cases; 9,512 controls) from the Ovarian Cancer Association Consortium (data accrued from 1992 to 2010). We developed a hierarchical logistic regression model for predicting case-control status that included imputation of missing data. We randomly divided the data into an 80% training sample and used the remaining 20% for model evaluation. The AUC for the full model was 0.664. A reduced model without SNPs performed similarly (AUC = 0.649). Both models performed better than a baseline model that included age and study site only (AUC = 0.563). The best predictive power was obtained in the full model among women younger than 50 years of age (AUC = 0.714); however, the addition of SNPs increased the AUC the most for women older than 50 years of age (AUC = 0.638 vs. 0.616). Adapting this improved model to estimate absolute risk and evaluating it in prospective data sets is warranted.
Resumo:
Over the last two decades social vulnerability has emerged as a major area of study, with increasing attention to the study of vulnerable populations. Generally, the elderly are among the most vulnerable members of any society, and widespread population aging has led to greater focus on elderly vulnerability. However, the absence of a valid and practical measure constrains the ability of policy-makers to address this issue in a comprehensive way. This study developed a composite indicator, The Elderly Social Vulnerability Index (ESVI), and used it to undertake a comparative analysis of the availability of support for elderly Jamaicans based on their access to human, material and social resources. The results of the ESVI indicated that while the elderly are more vulnerable overall, certain segments of the population appear to be at greater risk. Females had consistently lower scores than males, and the oldest-old had the highest scores of all groups of older persons. Vulnerability scores also varied according to place of residence, with more rural parishes having higher scores than their urban counterparts. These findings support the political economy framework which locates disadvantage in old age within political and ideological structures. The findings also point to the pervasiveness and persistence of gender inequality as argued by feminist theories of aging. Based on the results of the study it is clear that there is a need for policies that target specific population segments, in addition to universal policies that could make the experience of old age less challenging for the majority of older persons. Overall, the ESVI has displayed usefulness as a tool for theoretical analysis and demonstrated its potential as a policy instrument to assist decision-makers in determining where to target their efforts as they seek to address the issue of social vulnerability in old age. Data for this study came from the 2001 population and housing census of Jamaica, with multiple imputation for missing data. The index was derived from the linear aggregation of three equally weighted domains, comprised of eleven unweighted indicators which were normalized using z-scores. Indicators were selected based on theoretical relevance and data availability.
Resumo:
BACKGROUND: Moderate-to-vigorous physical activity (MVPA) is an important determinant of children’s physical health, and is commonly measured using accelerometers. A major limitation of accelerometers is non-wear time, which is the time the participant did not wear their device. Given that non-wear time is traditionally discarded from the dataset prior to estimating MVPA, final estimates of MVPA may be biased. Therefore, alternate approaches should be explored. OBJECTIVES: The objectives of this thesis were to 1) develop and describe an imputation approach that uses the socio-demographic, time, health, and behavioural data from participants to replace non-wear time accelerometer data, 2) determine the extent to which imputation of non-wear time data influences estimates of MVPA, and 3) determine if imputation of non-wear time data influences the associations between MVPA, body mass index (BMI), and systolic blood pressure (SBP). METHODS: Seven days of accelerometer data were collected using Actical accelerometers from 332 children aged 10-13. Three methods for handling missing accelerometer data were compared: 1) the “non-imputed” method wherein non-wear time was deleted from the dataset, 2) imputation dataset I, wherein the imputation of MVPA during non-wear time was based upon socio-demographic factors of the participant (e.g., age), health information (e.g., BMI), and time characteristics of the non-wear period (e.g., season), and 3) imputation dataset II wherein the imputation of MVPA was based upon the same variables as imputation dataset I, plus organized sport information. Associations between MVPA and health outcomes in each method were assessed using linear regression. RESULTS: Non-wear time accounted for 7.5% of epochs during waking hours. The average minutes/day of MVPA was 56.8 (95% CI: 54.2, 59.5) in the non-imputed dataset, 58.4 (95% CI: 55.8, 61.0) in imputed dataset I, and 59.0 (95% CI: 56.3, 61.5) in imputed dataset II. Estimates between datasets were not significantly different. The strength of the relationship between MVPA with BMI and SBP were comparable between all three datasets. CONCLUSION: These findings suggest that studies that achieve high accelerometer compliance with unsystematic patterns of missing data can use the traditional approach of deleting non-wear time from the dataset to obtain MVPA measures without substantial bias.
Resumo:
Background: Primary total knee replacement is a common operation that is performed to provide pain relief and restore functional ability. Inpatient physiotherapy is routinely provided after surgery to enhance recovery prior to hospital discharge. However, international variation exists in the provision of outpatient physiotherapy after hospital discharge. While evidence indicates that outpatient physiotherapy can improve short-term function, the longer term benefits are unknown. The aim of this randomised controlled trial is to evaluate the long-term clinical effectiveness and cost-effectiveness of a 6-week group-based outpatient physiotherapy intervention following knee replacement. Methods/design: Two hundred and fifty-six patients waiting for knee replacement because of osteoarthritis will be recruited from two orthopaedic centres. Participants randomised to the usual-care group (n = 128) will be given a booklet about exercise and referred for physiotherapy if deemed appropriate by the clinical care team. The intervention group (n = 128) will receive the same usual care and additionally be invited to attend a group-based outpatient physiotherapy class starting 6 weeks after surgery. The 1-hour class will be run on a weekly basis over 6 weeks and will involve task-orientated and individualised exercises. The primary outcome will be the Lower Extremity Functional Scale at 12 months post-operative. Secondary outcomes include: quality of life, knee pain and function, depression, anxiety and satisfaction. Data collection will be by questionnaire prior to surgery and 3, 6 and 12 months after surgery and will include a resource-use questionnaire to enable a trial-based economic evaluation. Trial participation and satisfaction with the classes will be evaluated through structured telephone interviews. The primary statistical and economic analyses will be conducted on an intention-to-treat basis with and without imputation of missing data. The primary economic result will estimate the incremental cost per quality-adjusted life year gained from this intervention from a National Health Services (NHS) and personal social services perspective. Discussion: This research aims to benefit patients and the NHS by providing evidence on the long-term effectiveness and cost-effectiveness of outpatient physiotherapy after knee replacement. If the intervention is found to be effective and cost-effective, implementation into clinical practice could lead to improvement in patients’ outcomes and improved health care resource efficiency.
Resumo:
Snapper (Pagrus auratus) is widely distributed throughout subtropical and temperate southern oceans and forms a significant recreational and commercial fishery in Queensland, Australia. Using data from government reports, media sources, popular publications and a government fisheries survey carried out in 1910, we compiled information on individual snapper fishing trips that took place prior to the commencement of fisherywide organized data collection, from 1871 to 1939. In addition to extracting all available quantitative data, we translated qualitative information into bounded estimates and used multiple imputation to handle missing values, forming 287 records for which catch rate (snapper fisher−1 h−1) could be derived. Uncertainty was handled through a parametric maximum likelihood framework (a transformed trivariate Gaussian), which facilitated statistical comparisons between data sources. No statistically significant differences in catch rates were found among media sources and the government fisheries survey. Catch rates remained stable throughout the time series, averaging 3.75 snapper fisher−1 h−1 (95% confidence interval, 3.42–4.09) as the fishery expanded into new grounds. In comparison, a contemporary (1993–2002) south-east Queensland charter fishery produced an average catch rate of 0.4 snapper fisher−1 h−1 (95% confidence interval, 0.31–0.58). These data illustrate the productivity of a fishery during its earliest years of development and represent the earliest catch rate data globally for this species. By adopting a formalized approach to address issues common to many historical records – missing data, a lack of quantitative information and reporting bias – our analysis demonstrates the potential for historical narratives to contribute to contemporary fisheries management.
Resumo:
Credible spatial information characterizing the structure and site quality of forests is critical to sustainable forest management and planning, especially given the increasing demands and threats to forest products and services. Forest managers and planners are required to evaluate forest conditions over a broad range of scales, contingent on operational or reporting requirements. Traditionally, forest inventory estimates are generated via a design-based approach that involves generalizing sample plot measurements to characterize an unknown population across a larger area of interest. However, field plot measurements are costly and as a consequence spatial coverage is limited. Remote sensing technologies have shown remarkable success in augmenting limited sample plot data to generate stand- and landscape-level spatial predictions of forest inventory attributes. Further enhancement of forest inventory approaches that couple field measurements with cutting edge remotely sensed and geospatial datasets are essential to sustainable forest management. We evaluated a novel Random Forest based k Nearest Neighbors (RF-kNN) imputation approach to couple remote sensing and geospatial data with field inventory collected by different sampling methods to generate forest inventory information across large spatial extents. The forest inventory data collected by the FIA program of US Forest Service was integrated with optical remote sensing and other geospatial datasets to produce biomass distribution maps for a part of the Lake States and species-specific site index maps for the entire Lake State. Targeting small-area application of the state-of-art remote sensing, LiDAR (light detection and ranging) data was integrated with the field data collected by an inexpensive method, called variable plot sampling, in the Ford Forest of Michigan Tech to derive standing volume map in a cost-effective way. The outputs of the RF-kNN imputation were compared with independent validation datasets and extant map products based on different sampling and modeling strategies. The RF-kNN modeling approach was found to be very effective, especially for large-area estimation, and produced results statistically equivalent to the field observations or the estimates derived from secondary data sources. The models are useful to resource managers for operational and strategic purposes.
Resumo:
Some factors complicate comparisons between linkage maps from different studies. This problem can be resolved if measures of precision, such as confidence intervals and frequency distributions, are associated with markers. We examined the precision of distances and ordering of microsatellite markers in the consensus linkage maps of chromosomes 1, 3 and 4 from two F 2 reciprocal Brazilian chicken populations, using bootstrap sampling. Single and consensus maps were constructed. The consensus map was compared with the International Consensus Linkage Map and with the whole genome sequence. Some loci showed segregation distortion and missing data, but this did not affect the analyses negatively. Several inversions and position shifts were detected, based on 95% confidence intervals and frequency distributions of loci. Some discrepancies in distances between loci and in ordering were due to chance, whereas others could be attributed to other effects, including reciprocal crosses, sampling error of the founder animals from the two populations, F(2) population structure, number of and distance between microsatellite markers, number of informative meioses, loci segregation patterns, and sex. In the Brazilian consensus GGA1, locus LEI1038 was in a position closer to the true genome sequence than in the International Consensus Map, whereas for GGA3 and GGA4, no such differences were found. Extending these analyses to the remaining chromosomes should facilitate comparisons and the integration of several available genetic maps, allowing meta-analyses for map construction and quantitative trait loci (QTL) mapping. The precision of the estimates of QTL positions and their effects would be increased with such information.
Resumo:
Background: Cerebral palsy (CP) patients have motor limitations that can affect functionality and abilities for activities of daily living (ADL). Health related quality of life and health status instruments validated to be applied to these patients do not directly approach the concepts of functionality or ADL. The Child Health Assessment Questionnaire (CHAQ) seems to be a good instrument to approach this dimension, but it was never used for CP patients. The purpose of the study was to verify the psychometric properties of CHAQ applied to children and adolescents with CP. Methods: Parents or guardians of children and adolescents with CP, aged 5 to 18 years, answered the CHAQ. A healthy group of 314 children and adolescents was recruited during the validation of the CHAQ Brazilian-version. Data quality, reliability and validity were studied. The motor function was evaluated by the Gross Motor Function Measure (GMFM). Results: Ninety-six parents/guardians answered the questionnaire. The age of the patients ranged from 5 to 17.9 years (average: 9.3). The rate of missing data was low(< 9.3%). The floor effect was observed in two domains, being higher only in the visual analogue scales (<= 35.5%). The ceiling effect was significant in all domains and particularly high in patients with quadriplegia (81.8 to 90.9%) and extrapyramidal (45.4 to 91.0%). The Cronbach alpha coefficient ranged from 0.85 to 0.95. The validity was appropriate: for the discriminant validity the correlation of the disability index with the visual analogue scales was not significant; for the convergent validity CHAQ disability index had a strong correlation with the GMFM (0.77); for the divergent validity there was no correlation between GMFM and the pain and overall evaluation scales; for the criterion validity GMFM as well as CHAQ detected differences in the scores among the clinical type of CP (p < 0.01); for the construct validity, the patients' disability index score (mean: 2.16; SD: 0.72) was higher than the healthy group ( mean: 0.12; SD: 0.23)(p < 0.01). Conclusion: CHAQ reliability and validity were adequate to this population. However, further studies are necessary to verify the influence of the ceiling effect on the responsiveness of the instrument.
Resumo:
When building genetic maps, it is necessary to choose from several marker ordering algorithms and criteria, and the choice is not always simple. In this study, we evaluate the efficiency of algorithms try (TRY), seriation (SER), rapid chain delineation (RCD), recombination counting and ordering (RECORD) and unidirectional growth (UG), as well as the criteria PARF (product of adjacent recombination fractions), SARF (sum of adjacent recombination fractions), SALOD (sum of adjacent LOD scores) and LHMC (likelihood through hidden Markov chains), used with the RIPPLE algorithm for error verification, in the construction of genetic linkage maps. A linkage map of a hypothetical diploid and monoecious plant species was simulated containing one linkage group and 21 markers with fixed distance of 3 cM between them. In all, 700 F(2) populations were randomly simulated with and 400 individuals with different combinations of dominant and co-dominant markers, as well as 10 and 20% of missing data. The simulations showed that, in the presence of co-dominant markers only, any combination of algorithm and criteria may be used, even for a reduced population size. In the case of a smaller proportion of dominant markers, any of the algorithms and criteria (except SALOD) investigated may be used. In the presence of high proportions of dominant markers and smaller samples (around 100), the probability of repulsion linkage increases between them and, in this case, use of the algorithms TRY and SER associated to RIPPLE with criterion LHMC would provide better results. Heredity (2009) 103, 494-502; doi:10.1038/hdy.2009.96; published online 29 July 2009
Resumo:
Crown group Archosauria, which includes birds, dinosaurs, crocodylomorphs, and several extinct Mesozoic groups, is a primary division of the vertebrate tree of life. However, the higher-level phylogenetic relationships within Archosauria are poorly resolved and controversial, despite years of study. The phylogeny of crocodile-line archosaurs (Crurotarsi) is particularly contentious, and has been plagued by problematic taxon and character sampling. Recent discoveries and renewed focus on archosaur anatomy enable the compilation of a new dataset, which assimilates and standardizes character data pertinent to higher-level archosaur phylogeny, and is scored across the largest group of taxa yet analysed. This dataset includes 47 new characters (25% of total) and eight taxa that have yet to be included in an analysis, and total taxonomic sampling is more than twice that of any previous study. This analysis produces a well-resolved phylogeny, which recovers mostly traditional relationships within Avemetatarsalia, places Phytosauria as a basal crurotarsan clade, finds a close relationship between Aetosauria and Crocodylomorpha, and recovers a monophyletic Rauisuchia comprised of two major subclades. Support values are low, suggesting rampant homoplasy and missing data within Archosauria, but the phylogeny is highly congruent with stratigraphy. Comparison with alternative analyses identifies numerous scoring differences, but indicates that character sampling is the main source of incongruence. The phylogeny implies major missing lineages in the Early Triassic and may support a Carnian-Norian extinction event.