888 resultados para likelihood-based inference
Resumo:
Abstract Background A popular model for gene regulatory networks is the Boolean network model. In this paper, we propose an algorithm to perform an analysis of gene regulatory interactions using the Boolean network model and time-series data. Actually, the Boolean network is restricted in the sense that only a subset of all possible Boolean functions are considered. We explore some mathematical properties of the restricted Boolean networks in order to avoid the full search approach. The problem is modeled as a Constraint Satisfaction Problem (CSP) and CSP techniques are used to solve it. Results We applied the proposed algorithm in two data sets. First, we used an artificial dataset obtained from a model for the budding yeast cell cycle. The second data set is derived from experiments performed using HeLa cells. The results show that some interactions can be fully or, at least, partially determined under the Boolean model considered. Conclusions The algorithm proposed can be used as a first step for detection of gene/protein interactions. It is able to infer gene relationships from time-series data of gene expression, and this inference process can be aided by a priori knowledge available.
Resumo:
[EN]Our study concentrates on the epistemic adverbs used in conveying author stance in academic English. The Contrastive Interlanguage Analysis (Granger, 1996) was run to three sets of corpora comprising doctoral dissertations written by native and non-native academic authors of English. Epistemic adverbs occurring in the dissertations were identified through a computer programme and their frequencies were separately computed for each corpus. Lastly, a log-likelihood test was administered to see whether there is a statistically significant difference across the groups in concern concerning the use of these adverbs.
Resumo:
In this treatise we consider finite systems of branching particles where the particles move independently of each other according to d-dimensional diffusions. Particles are killed at a position dependent rate, leaving at their death position a random number of descendants according to a position dependent reproduction law. In addition particles immigrate at constant rate (one immigrant per immigration time). A process with above properties is called a branching diffusion withimmigration (BDI). In the first part we present the model in detail and discuss the properties of the BDI under our basic assumptions. In the second part we consider the problem of reconstruction of the trajectory of a BDI from discrete observations. We observe positions of the particles at discrete times; in particular we assume that we have no information about the pedigree of the particles. A natural question arises if we want to apply statistical procedures on the discrete observations: How can we find couples of particle positions which belong to the same particle? We give an easy to implement 'reconstruction scheme' which allows us to redraw or 'reconstruct' parts of the trajectory of the BDI with high accuracy. Moreover asymptotically the whole path can be reconstructed. Further we present simulations which show that our partial reconstruction rule is tractable in practice. In the third part we study how the partial reconstruction rule fits into statistical applications. As an extensive example we present a nonparametric estimator for the diffusion coefficient of a BDI where the particles move according to one-dimensional diffusions. This estimator is based on the Nadaraya-Watson estimator for the diffusion coefficient of one-dimensional diffusions and it uses the partial reconstruction rule developed in the second part above. We are able to prove a rate of convergence of this estimator and finally we present simulations which show that the estimator works well even if we leave our set of assumptions.
Resumo:
In the last couple of decades we assisted to a reappraisal of spatial design-based techniques. Usually the spatial information regarding the spatial location of the individuals of a population has been used to develop efficient sampling designs. This thesis aims at offering a new technique for both inference on individual values and global population values able to employ the spatial information available before sampling at estimation level by rewriting a deterministic interpolator under a design-based framework. The achieved point estimator of the individual values is treated both in the case of finite spatial populations and continuous spatial domains, while the theory on the estimator of the population global value covers the finite population case only. A fairly broad simulation study compares the results of the point estimator with the simple random sampling without replacement estimator in predictive form and the kriging, which is the benchmark technique for inference on spatial data. The Monte Carlo experiment is carried out on populations generated according to different superpopulation methods in order to manage different aspects of the spatial structure. The simulation outcomes point out that the proposed point estimator has almost the same behaviour as the kriging predictor regardless of the parameters adopted for generating the populations, especially for low sampling fractions. Moreover, the use of the spatial information improves substantially design-based spatial inference on individual values.
Resumo:
Die vorliegende Arbeit ist motiviert durch biologische Fragestellungen bezüglich des Verhaltens von Membranpotentialen in Neuronen. Ein vielfach betrachtetes Modell für spikende Neuronen ist das Folgende. Zwischen den Spikes verhält sich das Membranpotential wie ein Diffusionsprozess X der durch die SDGL dX_t= beta(X_t) dt+ sigma(X_t) dB_t gegeben ist, wobei (B_t) eine Standard-Brown'sche Bewegung bezeichnet. Spikes erklärt man wie folgt. Sobald das Potential X eine gewisse Exzitationsschwelle S überschreitet entsteht ein Spike. Danach wird das Potential wieder auf einen bestimmten Wert x_0 zurückgesetzt. In Anwendungen ist es manchmal möglich, einen Diffusionsprozess X zwischen den Spikes zu beobachten und die Koeffizienten der SDGL beta() und sigma() zu schätzen. Dennoch ist es nötig, die Schwellen x_0 und S zu bestimmen um das Modell festzulegen. Eine Möglichkeit, dieses Problem anzugehen, ist x_0 und S als Parameter eines statistischen Modells aufzufassen und diese zu schätzen. In der vorliegenden Arbeit werden vier verschiedene Fälle diskutiert, in denen wir jeweils annehmen, dass das Membranpotential X zwischen den Spikes eine Brown'sche Bewegung mit Drift, eine geometrische Brown'sche Bewegung, ein Ornstein-Uhlenbeck Prozess oder ein Cox-Ingersoll-Ross Prozess ist. Darüber hinaus beobachten wir die Zeiten zwischen aufeinander folgenden Spikes, die wir als iid Treffzeiten der Schwelle S von X gestartet in x_0 auffassen. Die ersten beiden Fälle ähneln sich sehr und man kann jeweils den Maximum-Likelihood-Schätzer explizit angeben. Darüber hinaus wird, unter Verwendung der LAN-Theorie, die Optimalität dieser Schätzer gezeigt. In den Fällen OU- und CIR-Prozess wählen wir eine Minimum-Distanz-Methode, die auf dem Vergleich von empirischer und wahrer Laplace-Transformation bezüglich einer Hilbertraumnorm beruht. Wir werden beweisen, dass alle Schätzer stark konsistent und asymptotisch normalverteilt sind. Im letzten Kapitel werden wir die Effizienz der Minimum-Distanz-Schätzer anhand simulierter Daten überprüfen. Ferner, werden Anwendungen auf reale Datensätze und deren Resultate ausführlich diskutiert.
Resumo:
In the present thesis, a new methodology of diagnosis based on advanced use of time-frequency technique analysis is presented. More precisely, a new fault index that allows tracking individual fault components in a single frequency band is defined. More in detail, a frequency sliding is applied to the signals being analyzed (currents, voltages, vibration signals), so that each single fault frequency component is shifted into a prefixed single frequency band. Then, the discrete Wavelet Transform is applied to the resulting signal to extract the fault signature in the frequency band that has been chosen. Once the state of the machine has been qualitatively diagnosed, a quantitative evaluation of the fault degree is necessary. For this purpose, a fault index based on the energy calculation of approximation and/or detail signals resulting from wavelet decomposition has been introduced to quantify the fault extend. The main advantages of the developed new method over existing Diagnosis techniques are the following: - Capability of monitoring the fault evolution continuously over time under any transient operating condition; - Speed/slip measurement or estimation is not required; - Higher accuracy in filtering frequency components around the fundamental in case of rotor faults; - Reduction in the likelihood of false indications by avoiding confusion with other fault harmonics (the contribution of the most relevant fault frequency components under speed-varying conditions are clamped in a single frequency band); - Low memory requirement due to low sampling frequency; - Reduction in the latency of time processing (no requirement of repeated sampling operation).
Resumo:
When estimating the effect of treatment on HIV using data from observational studies, standard methods may produce biased estimates due to the presence of time-dependent confounders. Such confounding can be present when a covariate, affected by past exposure, is both a predictor of the future exposure and the outcome. One example is the CD4 cell count, being a marker for disease progression for HIV patients, but also a marker for treatment initiation and influenced by treatment. Fitting a marginal structural model (MSM) using inverse probability weights is one way to give appropriate adjustment for this type of confounding. In this paper we study a simple and intuitive approach to estimate similar treatment effects, using observational data to mimic several randomized controlled trials. Each 'trial' is constructed based on individuals starting treatment in a certain time interval. An overall effect estimate for all such trials is found using composite likelihood inference. The method offers an alternative to the use of inverse probability of treatment weights, which is unstable in certain situations. The estimated parameter is not identical to the one of an MSM, it is conditioned on covariate values at the start of each mimicked trial. This allows the study of questions that are not that easily addressed fitting an MSM. The analysis can be performed as a stratified weighted Cox analysis on the joint data set of all the constructed trials, where each trial is one stratum. The model is applied to data from the Swiss HIV cohort study.
Resumo:
Background The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations. Results Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females. Conclusion ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.
Resumo:
Accurate diagnosis of the causes of chest pain and dyspnea remain challenging. In this preliminary observational study with a 5-year follow-up, we attempted to find a simplified approach to selecting patients with chest pain needing immediate care based on the initial evaluation in ED. During a 24-month period were randomly selected 301 patients and a conditional inference tree (CIT) was used as the basis of the prognostic rule. Common diagnoses were musculoskeletal chest pain (27%), ACS (19%) and panic attack (12%). Using variables of ACS symptoms we estimated the likelihood of ACS based on a CIT to be high at 91% (32), low at 4% (198) and intermediate at 20.5-40% in (71) patients. Coronary catheterization was performed within 24 hours in 91% of the patients with ACS. A culprit lesion was found in 79%. Follow-up (median 4.2 years) information was available for 70% of the patients. Of the 164 patients without ACS who were followed up, 5 were treated with revascularization for stable angina pectoris, 2 were treated with revascularization for myocardial infarction, and 25 died. Although a simple triage decision tree could theoretically help to efficient select patients needing immediate care we need also to be vigilant for those presenting with atypical symptoms.
Resumo:
OBJECTIVES: In this population-based study, reference values were generated for renal length, and the heritability and factors associated with kidney length were assessed. METHODS: Anthropometric parameters and renal ultrasound measurements were assessed in randomly selected nuclear families of European ancestry (Switzerland). The adjusted narrow sense heritability of kidney size parameters was estimated by maximum likelihood assuming multivariate normality after power transformation. Gender-specific reference centiles were generated for renal length according to body height in the subset of non-diabetic non-obese participants with normal renal function. RESULTS: We included 374 men and 419 women (mean ± SD, age 47 ± 18 and 48 ± 17 years, BMI 26.2 ± 4 and 24.5 ± 5 kg/m(2), respectively) from 205 families. Renal length was 11.4 ± 0.8 cm in men and 10.7 ± 0.8 cm in women; there was no difference between right and left renal length. Body height, weight and estimated glomerular filtration rate (eGFR) were positively associated with renal length, kidney function negatively, age quadratically, whereas gender and hypertension were not. The adjusted heritability estimates of renal length and volume were 47.3 ± 8.5 % and 45.5 ± 8.8 %, respectively (P < 0.001). CONCLUSION: The significant heritability of renal length and volume highlights the familial aggregation of this trait, independently of age and body size. Population-based references for renal length provide a useful guide for clinicians. KEY POINTS: • Renal length and volume are heritable traits, independent of age and size. • Based on a European population, gender-specific reference values/percentiles are provided for renal length. • Renal length correlates positively with body length and weight. • There was no difference between right and left renal lengths in this study. • This negates general teaching that the left kidney is larger and longer.
Resumo:
Major histocompatibility complex (MHC) antigen-presenting genes are the most variable loci in vertebrate genomes. Host-parasite co-evolution is assumed to maintain the excessive polymorphism in the MHC loci. However, the molecular mechanisms underlying the striking diversity in the MHC remain contentious. The extent to which recombination contributes to the diversity at MHC loci in natural populations is still controversial, and there have been only few comparative studies that make quantitative estimates of recombination rates. In this study, we performed a comparative analysis for 15 different ungulates species to estimate the population recombination rate, and to quantify levels of selection. As expected for all species, we observed signatures of strong positive selection, and identified individual residues experiencing selection that were congruent with those constituting the peptide-binding region of the human DRB gene. However, in addition for each species, we also observed recombination rates that were significantly different from zero on the basis of likelihood-permutation tests, and in other non-quantitative analyses. Patterns of synonymous and non-synonymous sequence diversity were consistent with differing demographic histories between species, but recent simulation studies by other authors suggest inference of selection and recombination is likely to be robust to such deviations from standard models. If high rates of recombination are common in MHC genes of other taxa, re-evaluation of many inference-based phylogenetic analyses of MHC loci, such as estimates of the divergence time of alleles and trans-specific polymorphism, may be required.
Resumo:
Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes. Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals.
Resumo:
Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.
Resumo:
In this paper, we focus on the model for two types of tumors. Tumor development can be described by four types of death rates and four tumor transition rates. We present a general semi-parametric model to estimate the tumor transition rates based on data from survival/sacrifice experiments. In the model, we make a proportional assumption of tumor transition rates on a common parametric function but no assumption of the death rates from any states. We derived the likelihood function of the data observed in such an experiment, and an EM algorithm that simplified estimating procedures. This article extends work on semi-parametric models for one type of tumor (see Portier and Dinse and Dinse) to two types of tumors.
Resumo:
We propose robust and e±cient tests and estimators for gene-environment/gene-drug interactions in family-based association studies. The methodology is designed for studies in which haplotypes, quantitative pheno- types and complex exposure/treatment variables are analyzed. Using causal inference methodology, we derive family-based association tests and estimators for the genetic main effects and the interactions. The tests and estimators are robust against population admixture and strati¯cation without requiring adjustment for confounding variables. We illustrate the practical relevance of our approach by an application to a COPD study. The data analysis suggests a gene-environment interaction between a SNP in the Serpine gene and smok- ing status/pack years of smoking that reduces the FEV1 volume by about 0.02 liter per pack year of smoking. Simulation studies show that the pro- posed methodology is su±ciently powered for realistic sample sizes and that it provides valid tests and effect size estimators in the presence of admixture and stratification.