939 resultados para non-parametric estimation
Resumo:
We propose an iterative procedure to minimize the sum of squares function which avoids the nonlinear nature of estimating the first order moving average parameter and provides a closed form of the estimator. The asymptotic properties of the method are discussed and the consistency of the linear least squares estimator is proved for the invertible case. We perform various Monte Carlo experiments in order to compare the sample properties of the linear least squares estimator with its nonlinear counterpart for the conditional and unconditional cases. Some examples are also discussed
Resumo:
We present a Bayesian approach for estimating the relative frequencies of multi-single nucleotide polymorphism (SNP) haplotypes in populations of the malaria parasite Plasmodium falciparum by using microarray SNP data from human blood samples. Each sample comes from a malaria patient and contains one or several parasite clones that may genetically differ. Samples containing multiple parasite clones with different genetic markers pose a special challenge. The situation is comparable with a polyploid organism. The data from each blood sample indicates whether the parasites in the blood carry a mutant or a wildtype allele at various selected genomic positions. If both mutant and wildtype alleles are detected at a given position in a multiply infected sample, the data indicates the presence of both alleles, but the ratio is unknown. Thus, the data only partially reveals which specific combinations of genetic markers (i.e. haplotypes across the examined SNPs) occur in distinct parasite clones. In addition, SNP data may contain errors at non-negligible rates. We use a multinomial mixture model with partially missing observations to represent this data and a Markov chain Monte Carlo method to estimate the haplotype frequencies in a population. Our approach addresses both challenges, multiple infections and data errors.
Resumo:
Preface The starting point for this work and eventually the subject of the whole thesis was the question: how to estimate parameters of the affine stochastic volatility jump-diffusion models. These models are very important for contingent claim pricing. Their major advantage, availability T of analytical solutions for characteristic functions, made them the models of choice for many theoretical constructions and practical applications. At the same time, estimation of parameters of stochastic volatility jump-diffusion models is not a straightforward task. The problem is coming from the variance process, which is non-observable. There are several estimation methodologies that deal with estimation problems of latent variables. One appeared to be particularly interesting. It proposes the estimator that in contrast to the other methods requires neither discretization nor simulation of the process: the Continuous Empirical Characteristic function estimator (EGF) based on the unconditional characteristic function. However, the procedure was derived only for the stochastic volatility models without jumps. Thus, it has become the subject of my research. This thesis consists of three parts. Each one is written as independent and self contained article. At the same time, questions that are answered by the second and third parts of this Work arise naturally from the issues investigated and results obtained in the first one. The first chapter is the theoretical foundation of the thesis. It proposes an estimation procedure for the stochastic volatility models with jumps both in the asset price and variance processes. The estimation procedure is based on the joint unconditional characteristic function for the stochastic process. The major analytical result of this part as well as of the whole thesis is the closed form expression for the joint unconditional characteristic function for the stochastic volatility jump-diffusion models. The empirical part of the chapter suggests that besides a stochastic volatility, jumps both in the mean and the volatility equation are relevant for modelling returns of the S&P500 index, which has been chosen as a general representative of the stock asset class. Hence, the next question is: what jump process to use to model returns of the S&P500. The decision about the jump process in the framework of the affine jump- diffusion models boils down to defining the intensity of the compound Poisson process, a constant or some function of state variables, and to choosing the distribution of the jump size. While the jump in the variance process is usually assumed to be exponential, there are at least three distributions of the jump size which are currently used for the asset log-prices: normal, exponential and double exponential. The second part of this thesis shows that normal jumps in the asset log-returns should be used if we are to model S&P500 index by a stochastic volatility jump-diffusion model. This is a surprising result. Exponential distribution has fatter tails and for this reason either exponential or double exponential jump size was expected to provide the best it of the stochastic volatility jump-diffusion models to the data. The idea of testing the efficiency of the Continuous ECF estimator on the simulated data has already appeared when the first estimation results of the first chapter were obtained. In the absence of a benchmark or any ground for comparison it is unreasonable to be sure that our parameter estimates and the true parameters of the models coincide. The conclusion of the second chapter provides one more reason to do that kind of test. Thus, the third part of this thesis concentrates on the estimation of parameters of stochastic volatility jump- diffusion models on the basis of the asset price time-series simulated from various "true" parameter sets. The goal is to show that the Continuous ECF estimator based on the joint unconditional characteristic function is capable of finding the true parameters. And, the third chapter proves that our estimator indeed has the ability to do so. Once it is clear that the Continuous ECF estimator based on the unconditional characteristic function is working, the next question does not wait to appear. The question is whether the computation effort can be reduced without affecting the efficiency of the estimator, or whether the efficiency of the estimator can be improved without dramatically increasing the computational burden. The efficiency of the Continuous ECF estimator depends on the number of dimensions of the joint unconditional characteristic function which is used for its construction. Theoretically, the more dimensions there are, the more efficient is the estimation procedure. In practice, however, this relationship is not so straightforward due to the increasing computational difficulties. The second chapter, for example, in addition to the choice of the jump process, discusses the possibility of using the marginal, i.e. one-dimensional, unconditional characteristic function in the estimation instead of the joint, bi-dimensional, unconditional characteristic function. As result, the preference for one or the other depends on the model to be estimated. Thus, the computational effort can be reduced in some cases without affecting the efficiency of the estimator. The improvement of the estimator s efficiency by increasing its dimensionality faces more difficulties. The third chapter of this thesis, in addition to what was discussed above, compares the performance of the estimators with bi- and three-dimensional unconditional characteristic functions on the simulated data. It shows that the theoretical efficiency of the Continuous ECF estimator based on the three-dimensional unconditional characteristic function is not attainable in practice, at least for the moment, due to the limitations on the computer power and optimization toolboxes available to the general public. Thus, the Continuous ECF estimator based on the joint, bi-dimensional, unconditional characteristic function has all the reasons to exist and to be used for the estimation of parameters of the stochastic volatility jump-diffusion models.
Resumo:
Numerous sources of evidence point to the fact that heterogeneity within the Earth's deep crystalline crust is complex and hence may be best described through stochastic rather than deterministic approaches. As seismic reflection imaging arguably offers the best means of sampling deep crustal rocks in situ, much interest has been expressed in using such data to characterize the stochastic nature of crustal heterogeneity. Previous work on this problem has shown that the spatial statistics of seismic reflection data are indeed related to those of the underlying heterogeneous seismic velocity distribution. As of yet, however, the nature of this relationship has remained elusive due to the fact that most of the work was either strictly empirical or based on incorrect methodological approaches. Here, we introduce a conceptual model, based on the assumption of weak scattering, that allows us to quantitatively link the second-order statistics of a 2-D seismic velocity distribution with those of the corresponding processed and depth-migrated seismic reflection image. We then perform a sensitivity study in order to investigate what information regarding the stochastic model parameters describing crustal velocity heterogeneity might potentially be recovered from the statistics of a seismic reflection image using this model. Finally, we present a Monte Carlo inversion strategy to estimate these parameters and we show examples of its application at two different source frequencies and using two different sets of prior information. Our results indicate that the inverse problem is inherently non-unique and that many different combinations of the vertical and lateral correlation lengths describing the velocity heterogeneity can yield seismic images with the same 2-D autocorrelation structure. The ratio of all of these possible combinations of vertical and lateral correlation lengths, however, remains roughly constant which indicates that, without additional prior information, the aspect ratio is the only parameter describing the stochastic seismic velocity structure that can be reliably recovered.
Resumo:
We propose an iterative procedure to minimize the sum of squares function which avoids the nonlinear nature of estimating the first order moving average parameter and provides a closed form of the estimator. The asymptotic properties of the method are discussed and the consistency of the linear least squares estimator is proved for the invertible case. We perform various Monte Carlo experiments in order to compare the sample properties of the linear least squares estimator with its nonlinear counterpart for the conditional and unconditional cases. Some examples are also discussed
Resumo:
The amino acid composition of the protein from three strains of rat (Wistar, Zucker lean and Zucker obese), subjected to reference and high-fat diets has been used to determine the mean empirical formula, molecular weight and N content of whole-rat protein. The combined whole protein of the rat was uniform for the six experimental groups, containing an estimate of 17.3% N and a mean aminoacyl residue molecular weight of 103.7. This suggests that the appropriate protein factor for the calculation of rat protein from its N content should be 5.77 instead of the classical 6.25. In addition, an estimate of the size of the non-protein N mass in the whole rat gave a figure in the range of 5.5 % of all N. The combination of the two calculations gives a protein factor of 5.5 for the conversion of total N into rat protein.
Resumo:
A method is proposed for the estimation of absolute binding free energy of interaction between proteins and ligands. Conformational sampling of the protein-ligand complex is performed by molecular dynamics (MD) in vacuo and the solvent effect is calculated a posteriori by solving the Poisson or the Poisson-Boltzmann equation for selected frames of the trajectory. The binding free energy is written as a linear combination of the buried surface upon complexation, SASbur, the electrostatic interaction energy between the ligand and the protein, Eelec, and the difference of the solvation free energies of the complex and the isolated ligand and protein, deltaGsolv. The method uses the buried surface upon complexation to account for the non-polar contribution to the binding free energy because it is less sensitive to the details of the structure than the van der Waals interaction energy. The parameters of the method are developed for a training set of 16 HIV-1 protease-inhibitor complexes of known 3D structure. A correlation coefficient of 0.91 was obtained with an unsigned mean error of 0.8 kcal/mol. When applied to a set of 25 HIV-1 protease-inhibitor complexes of unknown 3D structures, the method provides a satisfactory correlation between the calculated binding free energy and the experimental pIC5o without reparametrization.
Resumo:
A major issue in the application of waveform inversion methods to crosshole ground-penetrating radar (GPR) data is the accurate estimation of the source wavelet. Here, we explore the viability and robustness of incorporating this step into a recently published time-domain inversion procedure through an iterative deconvolution approach. Our results indicate that, at least in non-dispersive electrical environments, such an approach provides remarkably accurate and robust estimates of the source wavelet even in the presence of strong heterogeneity of both the dielectric permittivity and electrical conductivity. Our results also indicate that the proposed source wavelet estimation approach is relatively insensitive to ambient noise and to the phase characteristics of the starting wavelet. Finally, there appears to be little to no trade-off between the wavelet estimation and the tomographic imaging procedures.
Resumo:
Captan and folpet are two fungicides largely used in agriculture, but biomonitoring data are mostly limited to measurements of captan metabolite concentrations in spot urine samples of workers, which complicate interpretation of results in terms of internal dose estimation, daily variations according to tasks performed, and most plausible routes of exposure. This study aimed at performing repeated biological measurements of exposure to captan and folpet in field workers (i) to better assess internal dose along with main routes-of-entry according to tasks and (ii) to establish most appropriate sampling and analysis strategies. The detailed urinary excretion time courses of specific and non-specific biomarkers of exposure to captan and folpet were established in tree farmers (n = 2) and grape growers (n = 3) over a typical workweek (seven consecutive days), including spraying and harvest activities. The impact of the expression of urinary measurements [excretion rate values adjusted or not for creatinine or cumulative amounts over given time periods (8, 12, and 24 h)] was evaluated. Absorbed doses and main routes-of-entry were then estimated from the 24-h cumulative urinary amounts through the use of a kinetic model. The time courses showed that exposure levels were higher during spraying than harvest activities. Model simulations also suggest a limited absorption in the studied workers and an exposure mostly through the dermal route. It further pointed out the advantage of expressing biomarker values in terms of body weight-adjusted amounts in repeated 24-h urine collections as compared to concentrations or excretion rates in spot samples, without the necessity for creatinine corrections.
Resumo:
As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).
Resumo:
Multi-centre data repositories like the Alzheimer's Disease Neuroimaging Initiative (ADNI) offer a unique research platform, but pose questions concerning comparability of results when using a range of imaging protocols and data processing algorithms. The variability is mainly due to the non-quantitative character of the widely used structural T1-weighted magnetic resonance (MR) images. Although the stability of the main effect of Alzheimer's disease (AD) on brain structure across platforms and field strength has been addressed in previous studies using multi-site MR images, there are only sparse empirically-based recommendations for processing and analysis of pooled multi-centre structural MR data acquired at different magnetic field strengths (MFS). Aiming to minimise potential systematic bias when using ADNI data we investigate the specific contributions of spatial registration strategies and the impact of MFS on voxel-based morphometry in AD. We perform a whole-brain analysis within the framework of Statistical Parametric Mapping, testing for main effects of various diffeomorphic spatial registration strategies, of MFS and their interaction with disease status. Beyond the confirmation of medial temporal lobe volume loss in AD, we detect a significant impact of spatial registration strategy on estimation of AD related atrophy. Additionally, we report a significant effect of MFS on the assessment of brain anatomy (i) in the cerebellum, (ii) the precentral gyrus and (iii) the thalamus bilaterally, showing no interaction with the disease status. We provide empirical evidence in support of pooling data in multi-centre VBM studies irrespective of disease status or MFS.
Resumo:
Image registration has been proposed as an automatic method for recovering cardiac displacement fields from Tagged Magnetic Resonance Imaging (tMRI) sequences. Initially performed as a set of pairwise registrations, these techniques have evolved to the use of 3D+t deformation models, requiring metrics of joint image alignment (JA). However, only linear combinations of cost functions defined with respect to the first frame have been used. In this paper, we have applied k-Nearest Neighbors Graphs (kNNG) estimators of the -entropy (H ) to measure the joint similarity between frames, and to combine the information provided by different cardiac views in an unified metric. Experiments performed on six subjects showed a significantly higher accuracy (p < 0.05) with respect to a standard pairwise alignment (PA) approach in terms of mean positional error and variance with respect to manually placed landmarks. The developed method was used to study strains in patients with myocardial infarction, showing a consistency between strain, infarction location, and coronary occlusion. This paper also presentsan interesting clinical application of graph-based metric estimators, showing their value for solving practical problems found in medical imaging.
Resumo:
OBJECTIVE: To compare the heart-rate monitoring with the doubly labelled water (2H2(18)O) method to estimate total daily energy expenditure in obese and non-obese children. DESIGN: Cross sectional study of obese and normal weight children. SUBJECTS: 13 prepubertal children: six obese (4M, 2F, 9.1 +/- 1.5 years, 47.3 +/- 9.7 kg) and seven non-obese (3M, 4F, 9.3 +/- 0.6 years, 31.8 +/- 3.2 kg). MEASUREMENTS: Total daily energy expenditure was assessed by means of the doubly labelled water method (TEEDLW) and of heart-rate monitoring (TEEHR). RESULTS: TEEHR was significantly (P < 0.05) higher than TEEDLW in obese children (9.47 +/- 0.84 MJ/d vs 8.99 +/- 0.63 MJ/d) whereas it was not different in non-obese children (8.43 +/- 2.02 MJ/d vs 8.42 +/- 2.30 MJ/d, P = NS). The difference of TEE assessed by HR monitoring in the obese group averaged 6.2 +/- 4.7%. At the individual level, the degree of agreement (difference between TEEHR and TEEDLW +/- 2s.d.) was low both in obese (-0.36, 1.32 MJ/d) and in non-obese children (-1.30, 1.34 MJ/d). At the group level, the agreement between the two methods was good in nonobese children (95% c.i. for the bias:-0.59, 0.63 MJ/d) but not in obese children (0.04, 0.92 MJ/d). Duration of sleep and energy expenditure during resting and physical activity were not significantly different in the two groups. Patterns of heart-rate (or derived energy expenditure) during the day-time were similar in obese and non-obese children. CONCLUSION: The HR monitoring technique provides an estimation of TEE close to that assessed by the DLW method in non-obese prepubertal children. In comparison with DLW, the HR monitoring method yields a greater TEE value in obese children.
Resumo:
This paper deals with the goodness of the Gaussian assumption when designing second-order blind estimationmethods in the context of digital communications. The low- andhigh-signal-to-noise ratio (SNR) asymptotic performance of the maximum likelihood estimator—derived assuming Gaussiantransmitted symbols—is compared with the performance of the optimal second-order estimator, which exploits the actualdistribution of the discrete constellation. The asymptotic study concludes that the Gaussian assumption leads to the optimalsecond-order solution if the SNR is very low or if the symbols belong to a multilevel constellation such as quadrature-amplitudemodulation (QAM) or amplitude-phase-shift keying (APSK). On the other hand, the Gaussian assumption can yield importantlosses at high SNR if the transmitted symbols are drawn from a constant modulus constellation such as phase-shift keying (PSK)or continuous-phase modulations (CPM). These conclusions are illustrated for the problem of direction-of-arrival (DOA) estimation of multiple digitally-modulated signals.
Resumo:
In this paper, the theory of hidden Markov models (HMM) isapplied to the problem of blind (without training sequences) channel estimationand data detection. Within a HMM framework, the Baum–Welch(BW) identification algorithm is frequently used to find out maximum-likelihood (ML) estimates of the corresponding model. However, such a procedureassumes the model (i.e., the channel response) to be static throughoutthe observation sequence. By means of introducing a parametric model fortime-varying channel responses, a version of the algorithm, which is moreappropriate for mobile channels [time-dependent Baum-Welch (TDBW)] isderived. Aiming to compare algorithm behavior, a set of computer simulationsfor a GSM scenario is provided. Results indicate that, in comparisonto other Baum–Welch (BW) versions of the algorithm, the TDBW approachattains a remarkable enhancement in performance. For that purpose, onlya moderate increase in computational complexity is needed.