924 resultados para Divergence time estimation
Resumo:
This paper considers a wide class of semiparametric problems with a parametric part for some covariate effects and repeated evaluations of a nonparametric function. Special cases in our approach include marginal models for longitudinal/clustered data, conditional logistic regression for matched case-control studies, multivariate measurement error models, generalized linear mixed models with a semiparametric component, and many others. We propose profile-kernel and backfitting estimation methods for these problems, derive their asymptotic distributions, and show that in likelihood problems the methods are semiparametric efficient. While generally not true, with our methods profiling and backfitting are asymptotically equivalent. We also consider pseudolikelihood methods where some nuisance parameters are estimated from a different algorithm. The proposed methods are evaluated using simulation studies and applied to the Kenya hemoglobin data.
Resumo:
In many clinical trials to evaluate treatment efficacy, it is believed that there may exist latent treatment effectiveness lag times after which medical procedure or chemical compound would be in full effect. In this article, semiparametric regression models are proposed and studied to estimate the treatment effect accounting for such latent lag times. The new models take advantage of the invariance property of the additive hazards model in marginalizing over random effects, so parameters in the models are easy to be estimated and interpreted, while the flexibility without specifying baseline hazard function is kept. Monte Carlo simulation studies demonstrate the appropriateness of the proposed semiparametric estimation procedure. Data collected in the actual randomized clinical trial, which evaluates the effectiveness of biodegradable carmustine polymers for treatment of recurrent brain tumors, are analyzed.
Resumo:
In environmental epidemiology, exposure X and health outcome Y vary in space and time. We present a method to diagnose the possible influence of unmeasured confounders U on the estimated effect of X on Y and to propose several approaches to robust estimation. The idea is to use space and time as proxy measures for the unmeasured factors U. We start with the time series case where X and Y are continuous variables at equally-spaced times and assume a linear model. We define matching estimator b(u)s that correspond to pairs of observations with specific lag u. Controlling for a smooth function of time, St, using a kernel estimator is roughly equivalent to estimating the association with a linear combination of the b(u)s with weights that involve two components: the assumptions about the smoothness of St and the normalized variogram of the X process. When an unmeasured confounder U exists, but the model otherwise correctly controls for measured confounders, the excess variation in b(u)s is evidence of confounding by U. We use the plot of b(u)s versus lag u, lagged-estimator-plot (LEP), to diagnose the influence of U on the effect of X on Y. We use appropriate linear combination of b(u)s or extrapolate to b(0) to obtain novel estimators that are more robust to the influence of smooth U. The methods are extended to time series log-linear models and to spatial analyses. The LEP plot gives us a direct view of the magnitude of the estimators for each lag u and provides evidence when models did not adequately describe the data.
Resumo:
Multi-site time series studies of air pollution and mortality and morbidity have figured prominently in the literature as comprehensive approaches for estimating acute effects of air pollution on health. Hierarchical models are generally used to combine site-specific information and estimate pooled air pollution effects taking into account both within-site statistical uncertainty, and across-site heterogeneity. Within a site, characteristics of time series data of air pollution and health (small pollution effects, missing data, highly correlated predictors, non linear confounding etc.) make modelling all sources of uncertainty challenging. One potential consequence is underestimation of the statistical variance of the site-specific effects to be combined. In this paper we investigate the impact of variance underestimation on the pooled relative rate estimate. We focus on two-stage normal-normal hierarchical models and on under- estimation of the statistical variance at the first stage. By mathematical considerations and simulation studies, we found that variance underestimation does not affect the pooled estimate substantially. However, some sensitivity of the pooled estimate to variance underestimation is observed when the number of sites is small and underestimation is severe. These simulation results are applicable to any two-stage normal-normal hierarchical model for combining information of site-specific results, and they can be easily extended to more general hierarchical formulations. We also examined the impact of variance underestimation on the national average relative rate estimate from the National Morbidity Mortality Air Pollution Study and we found that variance underestimation as much as 40% has little effect on the national average.
Resumo:
Submicroscopic changes in chromosomal DNA copy number dosage are common and have been implicated in many heritable diseases and cancers. Recent high-throughput technologies have a resolution that permits the detection of segmental changes in DNA copy number that span thousands of basepairs across the genome. Genome-wide association studies (GWAS) may simultaneously screen for copy number-phenotype and SNP-phenotype associations as part of the analytic strategy. However, genome-wide array analyses are particularly susceptible to batch effects as the logistics of preparing DNA and processing thousands of arrays often involves multiple laboratories and technicians, or changes over calendar time to the reagents and laboratory equipment. Failure to adjust for batch effects can lead to incorrect inference and requires inefficient post-hoc quality control procedures that exclude regions that are associated with batch. Our work extends previous model-based approaches for copy number estimation by explicitly modeling batch effects and using shrinkage to improve locus-specific estimates of copy number uncertainty. Key features of this approach include the use of diallelic genotype calls from experimental data to estimate batch- and locus-specific parameters of background and signal without the requirement of training data. We illustrate these ideas using a study of bipolar disease and a study of chromosome 21 trisomy. The former has batch effects that dominate much of the observed variation in quantile-normalized intensities, while the latter illustrates the robustness of our approach to datasets where as many as 25% of the samples have altered copy number. Locus-specific estimates of copy number can be plotted on the copy-number scale to investigate mosaicism and guide the choice of appropriate downstream approaches for smoothing the copy number as a function of physical position. The software is open source and implemented in the R package CRLMM available at Bioconductor (http:www.bioconductor.org).
Resumo:
Numerous time series studies have provided strong evidence of an association between increased levels of ambient air pollution and increased levels of hospital admissions, typically at 0, 1, or 2 days after an air pollution episode. An important research aim is to extend existing statistical models so that a more detailed understanding of the time course of hospitalization after exposure to air pollution can be obtained. Information about this time course, combined with prior knowledge about biological mechanisms, could provide the basis for hypotheses concerning the mechanism by which air pollution causes disease. Previous studies have identified two important methodological questions: (1) How can we estimate the shape of the distributed lag between increased air pollution exposure and increased mortality or morbidity? and (2) How should we estimate the cumulative population health risk from short-term exposure to air pollution? Distributed lag models are appropriate tools for estimating air pollution health effects that may be spread over several days. However, estimation for distributed lag models in air pollution and health applications is hampered by the substantial noise in the data and the inherently weak signal that is the target of investigation. We introduce an hierarchical Bayesian distributed lag model that incorporates prior information about the time course of pollution effects and combines information across multiple locations. The model has a connection to penalized spline smoothing using a special type of penalty matrix. We apply the model to estimating the distributed lag between exposure to particulate matter air pollution and hospitalization for cardiovascular and respiratory disease using data from a large United States air pollution and hospitalization database of Medicare enrollees in 94 counties covering the years 1999-2002.
Resumo:
The amount and type of ground cover is an important characteristic to measure when collecting soil disturbance monitoring data after a timber harvest. Estimates of ground cover and bare soil can be used for tracking changes in invasive species, plant growth and regeneration, woody debris loadings, and the risk of surface water runoff and soil erosion. A new method of assessing ground cover and soil disturbance was recently published by the U.S. Forest Service, the Forest Soil Disturbance Monitoring Protocol (FSDMP). This protocol uses the frequency of cover types in small circular (15cm) plots to compare ground surface in pre- and post-harvest condition. While both frequency and percent cover are common methods of describing vegetation, frequency has rarely been used to measure ground surface cover. In this study, three methods for assessing ground cover percent (step-point, 15cm dia. circular and 1x5m visual plot estimates) were compared to the FSDMP frequency method. Results show that the FSDMP method provides significantly higher estimates of ground surface condition for most soil cover types, except coarse wood. The three cover methods had similar estimates for most cover values. The FSDMP method also produced the highest value when bare soil estimates were used to model erosion risk. In a person-hour analysis, estimating ground cover percent in 15cm dia. plots required the least sampling time, and provided standard errors similar to the other cover estimates even at low sampling intensities (n=18). If ground cover estimates are desired in soil monitoring, then a small plot size (15cm dia. circle), or a step-point method can provide a more accurate estimate in less time than the current FSDMP method.
Resumo:
Regional flood frequency techniques are commonly used to estimate flood quantiles when flood data is unavailable or the record length at an individual gauging station is insufficient for reliable analyses. These methods compensate for limited or unavailable data by pooling data from nearby gauged sites. This requires the delineation of hydrologically homogeneous regions in which the flood regime is sufficiently similar to allow the spatial transfer of information. It is generally accepted that hydrologic similarity results from similar physiographic characteristics, and thus these characteristics can be used to delineate regions and classify ungauged sites. However, as currently practiced, the delineation is highly subjective and dependent on the similarity measures and classification techniques employed. A standardized procedure for delineation of hydrologically homogeneous regions is presented herein. Key aspects are a new statistical metric to identify physically discordant sites, and the identification of an appropriate set of physically based measures of extreme hydrological similarity. A combination of multivariate statistical techniques applied to multiple flood statistics and basin characteristics for gauging stations in the Southeastern U.S. revealed that basin slope, elevation, and soil drainage largely determine the extreme hydrological behavior of a watershed. Use of these characteristics as similarity measures in the standardized approach for region delineation yields regions which are more homogeneous and more efficient for quantile estimation at ungauged sites than those delineated using alternative physically-based procedures typically employed in practice. The proposed methods and key physical characteristics are also shown to be efficient for region delineation and quantile development in alternative areas composed of watersheds with statistically different physical composition. In addition, the use of aggregated values of key watershed characteristics was found to be sufficient for the regionalization of flood data; the added time and computational effort required to derive spatially distributed watershed variables does not increase the accuracy of quantile estimators for ungauged sites. This dissertation also presents a methodology by which flood quantile estimates in Haiti can be derived using relationships developed for data rich regions of the U.S. As currently practiced, regional flood frequency techniques can only be applied within the predefined area used for model development. However, results presented herein demonstrate that the regional flood distribution can successfully be extrapolated to areas of similar physical composition located beyond the extent of that used for model development provided differences in precipitation are accounted for and the site in question can be appropriately classified within a delineated region.
Resumo:
Renewable energy is growing in demand, and thus the the manufacture of solar cells and photovoltaic arrays has advanced dramatically in recent years. This is proved by the fact that the photovoltaic production has doubled every 2 years, increasing by an average of 48% each year since 2002. Covering the general overview of solar cell working, and its model, this thesis will start with the three generations of photovoltaic solar cell technology, and move to the motivation of dedicating research to nanostructured solar cell. For the current generation solar cells, among several factors, like photon capture, photon reflection, carrier generation by photons, carrier transport and collection, the efficiency also depends on the absorption of photons. The absorption coefficient,α, and its dependence on the wavelength, λ, is of major concern to improve the efficiency. Nano-silicon structures (quantum wells and quantum dots) have a unique advantage compared to bulk and thin film crystalline silicon that multiple direct and indirect band gaps can be realized by appropriate size control of the quantum wells. This enables multiple wavelength photons of the solar spectrum to be absorbed efficiently. There is limited research on the calculation of absorption coefficient in nano structures of silicon. We present a theoretical approach to calculate the absorption coefficient using quantum mechanical calculations on the interaction of photons with the electrons of the valence band. One model is that the oscillator strength of the direct optical transitions is enhanced by the quantumconfinement effect in Si nanocrystallites. These kinds of quantum wells can be realized in practice in porous silicon. The absorption coefficient shows a peak of 64638.2 cm-1 at = 343 nm at photon energy of ξ = 3.49 eV ( = 355.532 nm). I have shown that a large value of absorption coefficient α comparable to that of bulk silicon is possible in silicon QDs because of carrier confinement. Our results have shown that we can enhance the absorption coefficient by an order of 10, and at the same time a nearly constant absorption coefficient curve over the visible spectrum. The validity of plots is verified by the correlation with experimental photoluminescence plots. A very generic comparison for the efficiency of p-i-n junction solar cell is given for a cell incorporating QDs and sans QDs. The design and fabrication technique is discussed in brief. I have shown that by using QDs in the intrinsic region of a cell, we can improve the efficiency by a factor of 1.865 times. Thus for a solar cell of efficiency of 26% for first generation solar cell, we can improve the efficiency to nearly 48.5% on using QDs.
Resumo:
Large Power transformers, an aging and vulnerable part of our energy infrastructure, are at choke points in the grid and are key to reliability and security. Damage or destruction due to vandalism, misoperation, or other unexpected events is of great concern, given replacement costs upward of $2M and lead time of 12 months. Transient overvoltages can cause great damage and there is much interest in improving computer simulation models to correctly predict and avoid the consequences. EMTP (the Electromagnetic Transients Program) has been developed for computer simulation of power system transients. Component models for most equipment have been developed and benchmarked. Power transformers would appear to be simple. However, due to their nonlinear and frequency-dependent behaviors, they can be one of the most complex system components to model. It is imperative that the applied models be appropriate for the range of frequencies and excitation levels that the system experiences. Thus, transformer modeling is not a mature field and newer improved models must be made available. In this work, improved topologically-correct duality-based models are developed for three-phase autotransformers having five-legged, three-legged, and shell-form cores. The main problem in the implementation of detailed models is the lack of complete and reliable data, as no international standard suggests how to measure and calculate parameters. Therefore, parameter estimation methods are developed here to determine the parameters of a given model in cases where available information is incomplete. The transformer nameplate data is required and relative physical dimensions of the core are estimated. The models include a separate representation of each segment of the core, including hysteresis of the core, λ-i saturation characteristic, capacitive effects, and frequency dependency of winding resistance and core loss. Steady-state excitation, and de-energization and re-energization transients are simulated and compared with an earlier-developed BCTRAN-based model. Black start energization cases are also simulated as a means of model evaluation and compared with actual event records. The simulated results using the model developed here are reasonable and more correct than those of the BCTRAN-based model. Simulation accuracy is dependent on the accuracy of the equipment model and its parameters. This work is significant in that it advances existing parameter estimation methods in cases where the available data and measurements are incomplete. The accuracy of EMTP simulation for power systems including three-phase autotransformers is thus enhanced. Theoretical results obtained from this work provide a sound foundation for development of transformer parameter estimation methods using engineering optimization. In addition, it should be possible to refine which information and measurement data are necessary for complete duality-based transformer models. To further refine and develop the models and transformer parameter estimation methods developed here, iterative full-scale laboratory tests using high-voltage and high-power three-phase transformer would be helpful.
Resumo:
Range estimation is the core of many positioning systems such as radar, and Wireless Local Positioning Systems (WLPS). The estimation of range is achieved by estimating Time-of-Arrival (TOA). TOA represents the signal propagation delay between a transmitter and a receiver. Thus, error in TOA estimation causes degradation in range estimation performance. In wireless environments, noise, multipath, and limited bandwidth reduce TOA estimation performance. TOA estimation algorithms that are designed for wireless environments aim to improve the TOA estimation performance by mitigating the effect of closely spaced paths in practical (positive) signal-to-noise ratio (SNR) regions. Limited bandwidth avoids the discrimination of closely spaced paths. This reduces TOA estimation performance. TOA estimation methods are evaluated as a function of SNR, bandwidth, and the number of reflections in multipath wireless environments, as well as their complexity. In this research, a TOA estimation technique based on Blind signal Separation (BSS) is proposed. This frequency domain method estimates TOA in wireless multipath environments for a given signal bandwidth. The structure of the proposed technique is presented and its complexity and performance are theoretically evaluated. It is depicted that the proposed method is not sensitive to SNR, number of reflections, and bandwidth. In general, as bandwidth increases, TOA estimation performance improves. However, spectrum is the most valuable resource in wireless systems and usually a large portion of spectrum to support high performance TOA estimation is not available. In addition, the radio frequency (RF) components of wideband systems suffer from high cost and complexity. Thus, a novel, multiband positioning structure is proposed. The proposed technique uses the available (non-contiguous) bands to support high performance TOA estimation. This system incorporates the capabilities of cognitive radio (CR) systems to sense the available spectrum (also called white spaces) and to incorporate white spaces for high-performance localization. First, contiguous bands that are divided into several non-equal, narrow sub-bands that possess the same SNR are concatenated to attain an accuracy corresponding to the equivalent full band. Two radio architectures are proposed and investigated: the signal is transmitted over available spectrum either simultaneously (parallel concatenation) or sequentially (serial concatenation). Low complexity radio designs that handle the concatenation process sequentially and in parallel are introduced. Different TOA estimation algorithms that are applicable to multiband scenarios are studied and their performance is theoretically evaluated and compared to simulations. Next, the results are extended to non-contiguous, non-equal sub-bands with the same SNR. These are more realistic assumptions in practical systems. The performance and complexity of the proposed technique is investigated as well. This study’s results show that selecting bandwidth, center frequency, and SNR levels for each sub-band can adapt positioning accuracy.