186 resultados para EFFICIENT ESTIMATION
em Université de Lausanne, Switzerland
Resumo:
MOTIVATION: The detection of positive selection is widely used to study gene and genome evolution, but its application remains limited by the high computational cost of existing implementations. We present a series of computational optimizations for more efficient estimation of the likelihood function on large-scale phylogenetic problems. We illustrate our approach using the branch-site model of codon evolution. RESULTS: We introduce novel optimization techniques that substantially outperform both CodeML from the PAML package and our previously optimized sequential version SlimCodeML. These techniques can also be applied to other likelihood-based phylogeny software. Our implementation scales well for large numbers of codons and/or species. It can therefore analyse substantially larger datasets than CodeML. We evaluated FastCodeML on different platforms and measured average sequential speedups of FastCodeML (single-threaded) versus CodeML of up to 5.8, average speedups of FastCodeML (multi-threaded) versus CodeML on a single node (shared memory) of up to 36.9 for 12 CPU cores, and average speedups of the distributed FastCodeML versus CodeML of up to 170.9 on eight nodes (96 CPU cores in total).Availability and implementation: ftp://ftp.vital-it.ch/tools/FastCodeML/. CONTACT: selectome@unil.ch or nicolas.salamin@unil.ch.
Resumo:
SummaryDiscrete data arise in various research fields, typically when the observations are count data.I propose a robust and efficient parametric procedure for estimation of discrete distributions. The estimation is done in two phases. First, a very robust, but possibly inefficient, estimate of the model parameters is computed and used to indentify outliers. Then the outliers are either removed from the sample or given low weights, and a weighted maximum likelihood estimate (WML) is computed.The weights are determined via an adaptive process such that if the data follow the model, then asymptotically no observation is downweighted.I prove that the final estimator inherits the breakdown point of the initial one, and that its influence function at the model is the same as the influence function of the maximum likelihood estimator, which strongly suggests that it is asymptotically fully efficient.The initial estimator is a minimum disparity estimator (MDE). MDEs can be shown to have full asymptotic efficiency, and some MDEs have very high breakdown points and very low bias under contamination. Several initial estimators are considered, and the performances of the WMLs based on each of them are studied.It results that in a great variety of situations the WML substantially improves the initial estimator, both in terms of finite sample mean square error and in terms of bias under contamination. Besides, the performances of the WML are rather stable under a change of the MDE even if the MDEs have very different behaviors.Two examples of application of the WML to real data are considered. In both of them, the necessity for a robust estimator is clear: the maximum likelihood estimator is badly corrupted by the presence of a few outliers.This procedure is particularly natural in the discrete distribution setting, but could be extended to the continuous case, for which a possible procedure is sketched.RésuméLes données discrètes sont présentes dans différents domaines de recherche, en particulier lorsque les observations sont des comptages.Je propose une méthode paramétrique robuste et efficace pour l'estimation de distributions discrètes. L'estimation est faite en deux phases. Tout d'abord, un estimateur très robuste des paramètres du modèle est calculé, et utilisé pour la détection des données aberrantes (outliers). Cet estimateur n'est pas nécessairement efficace. Ensuite, soit les outliers sont retirés de l'échantillon, soit des faibles poids leur sont attribués, et un estimateur du maximum de vraisemblance pondéré (WML) est calculé.Les poids sont déterminés via un processus adaptif, tel qu'asymptotiquement, si les données suivent le modèle, aucune observation n'est dépondérée.Je prouve que le point de rupture de l'estimateur final est au moins aussi élevé que celui de l'estimateur initial, et que sa fonction d'influence au modèle est la même que celle du maximum de vraisemblance, ce qui suggère que cet estimateur est pleinement efficace asymptotiquement.L'estimateur initial est un estimateur de disparité minimale (MDE). Les MDE sont asymptotiquement pleinement efficaces, et certains d'entre eux ont un point de rupture très élevé et un très faible biais sous contamination. J'étudie les performances du WML basé sur différents MDEs.Le résultat est que dans une grande variété de situations le WML améliore largement les performances de l'estimateur initial, autant en terme du carré moyen de l'erreur que du biais sous contamination. De plus, les performances du WML restent assez stables lorsqu'on change l'estimateur initial, même si les différents MDEs ont des comportements très différents.Je considère deux exemples d'application du WML à des données réelles, où la nécessité d'un estimateur robuste est manifeste : l'estimateur du maximum de vraisemblance est fortement corrompu par la présence de quelques outliers.La méthode proposée est particulièrement naturelle dans le cadre des distributions discrètes, mais pourrait être étendue au cas continu.
Resumo:
Preface The starting point for this work and eventually the subject of the whole thesis was the question: how to estimate parameters of the affine stochastic volatility jump-diffusion models. These models are very important for contingent claim pricing. Their major advantage, availability T of analytical solutions for characteristic functions, made them the models of choice for many theoretical constructions and practical applications. At the same time, estimation of parameters of stochastic volatility jump-diffusion models is not a straightforward task. The problem is coming from the variance process, which is non-observable. There are several estimation methodologies that deal with estimation problems of latent variables. One appeared to be particularly interesting. It proposes the estimator that in contrast to the other methods requires neither discretization nor simulation of the process: the Continuous Empirical Characteristic function estimator (EGF) based on the unconditional characteristic function. However, the procedure was derived only for the stochastic volatility models without jumps. Thus, it has become the subject of my research. This thesis consists of three parts. Each one is written as independent and self contained article. At the same time, questions that are answered by the second and third parts of this Work arise naturally from the issues investigated and results obtained in the first one. The first chapter is the theoretical foundation of the thesis. It proposes an estimation procedure for the stochastic volatility models with jumps both in the asset price and variance processes. The estimation procedure is based on the joint unconditional characteristic function for the stochastic process. The major analytical result of this part as well as of the whole thesis is the closed form expression for the joint unconditional characteristic function for the stochastic volatility jump-diffusion models. The empirical part of the chapter suggests that besides a stochastic volatility, jumps both in the mean and the volatility equation are relevant for modelling returns of the S&P500 index, which has been chosen as a general representative of the stock asset class. Hence, the next question is: what jump process to use to model returns of the S&P500. The decision about the jump process in the framework of the affine jump- diffusion models boils down to defining the intensity of the compound Poisson process, a constant or some function of state variables, and to choosing the distribution of the jump size. While the jump in the variance process is usually assumed to be exponential, there are at least three distributions of the jump size which are currently used for the asset log-prices: normal, exponential and double exponential. The second part of this thesis shows that normal jumps in the asset log-returns should be used if we are to model S&P500 index by a stochastic volatility jump-diffusion model. This is a surprising result. Exponential distribution has fatter tails and for this reason either exponential or double exponential jump size was expected to provide the best it of the stochastic volatility jump-diffusion models to the data. The idea of testing the efficiency of the Continuous ECF estimator on the simulated data has already appeared when the first estimation results of the first chapter were obtained. In the absence of a benchmark or any ground for comparison it is unreasonable to be sure that our parameter estimates and the true parameters of the models coincide. The conclusion of the second chapter provides one more reason to do that kind of test. Thus, the third part of this thesis concentrates on the estimation of parameters of stochastic volatility jump- diffusion models on the basis of the asset price time-series simulated from various "true" parameter sets. The goal is to show that the Continuous ECF estimator based on the joint unconditional characteristic function is capable of finding the true parameters. And, the third chapter proves that our estimator indeed has the ability to do so. Once it is clear that the Continuous ECF estimator based on the unconditional characteristic function is working, the next question does not wait to appear. The question is whether the computation effort can be reduced without affecting the efficiency of the estimator, or whether the efficiency of the estimator can be improved without dramatically increasing the computational burden. The efficiency of the Continuous ECF estimator depends on the number of dimensions of the joint unconditional characteristic function which is used for its construction. Theoretically, the more dimensions there are, the more efficient is the estimation procedure. In practice, however, this relationship is not so straightforward due to the increasing computational difficulties. The second chapter, for example, in addition to the choice of the jump process, discusses the possibility of using the marginal, i.e. one-dimensional, unconditional characteristic function in the estimation instead of the joint, bi-dimensional, unconditional characteristic function. As result, the preference for one or the other depends on the model to be estimated. Thus, the computational effort can be reduced in some cases without affecting the efficiency of the estimator. The improvement of the estimator s efficiency by increasing its dimensionality faces more difficulties. The third chapter of this thesis, in addition to what was discussed above, compares the performance of the estimators with bi- and three-dimensional unconditional characteristic functions on the simulated data. It shows that the theoretical efficiency of the Continuous ECF estimator based on the three-dimensional unconditional characteristic function is not attainable in practice, at least for the moment, due to the limitations on the computer power and optimization toolboxes available to the general public. Thus, the Continuous ECF estimator based on the joint, bi-dimensional, unconditional characteristic function has all the reasons to exist and to be used for the estimation of parameters of the stochastic volatility jump-diffusion models.
Resumo:
Restriction site-associated DNA sequencing (RADseq) provides researchers with the ability to record genetic polymorphism across thousands of loci for nonmodel organisms, potentially revolutionizing the field of molecular ecology. However, as with other genotyping methods, RADseq is prone to a number of sources of error that may have consequential effects for population genetic inferences, and these have received only limited attention in terms of the estimation and reporting of genotyping error rates. Here we use individual sample replicates, under the expectation of identical genotypes, to quantify genotyping error in the absence of a reference genome. We then use sample replicates to (i) optimize de novo assembly parameters within the program Stacks, by minimizing error and maximizing the retrieval of informative loci; and (ii) quantify error rates for loci, alleles and single-nucleotide polymorphisms. As an empirical example, we use a double-digest RAD data set of a nonmodel plant species, Berberis alpina, collected from high-altitude mountains in Mexico.
Resumo:
According to the hypothesis of Traub, also known as the 'formula of Traub', postmortem values of glucose and lactate found in the cerebrospinal fluid or vitreous humor are considered indicators of antemortem blood glucose levels. However, because the lactate concentration increases in the vitreous and cerebrospinal fluid after death, some authors postulated that using the sum value to estimate antemortem blood glucose levels could lead to an overestimation of the cases of glucose metabolic disorders with fatal outcomes, such as diabetic ketoacidosis. The aim of our study, performed on 470 consecutive forensic cases, was to ascertain the advantages of the sum value to estimate antemortem blood glucose concentrations and, consequently, to rule out fatal diabetic ketoacidosis as the cause of death. Other biochemical parameters, such as blood 3-beta-hydroxybutyrate, acetoacetate, acetone, glycated haemoglobin and urine glucose levels, were also determined. In addition, postmortem native CT scan, autopsy, histology, neuropathology and toxicology were performed to confirm diabetic ketoacidosis as the cause of death. According to our results, the sum value does not add any further information for the estimation of antemortem blood glucose concentration. The vitreous glucose concentration appears to be the most reliable marker to estimate antemortem hyperglycaemia and, along with the determination of other biochemical markers (such as blood acetone and 3-beta-hydroxybutyrate, urine glucose and glycated haemoglobin), to confirm diabetic ketoacidosis as the cause of death.
Resumo:
The dispersal process, by which individuals or other dispersing agents such as gametes or seeds move from birthplace to a new settlement locality, has important consequences for the dynamics of genes, individuals, and species. Many of the questions addressed by ecology and evolutionary biology require a good understanding of species' dispersal patterns. Much effort has thus been devoted to overcoming the difficulties associated with dispersal measurement. In this context, genetic tools have long been the focus of intensive research, providing a great variety of potential solutions to measuring dispersal. This methodological diversity is reviewed here to help (molecular) ecologists find their way toward dispersal inference and interpretation and to stimulate further developments.
Resumo:
In 1903, the eastern slope of Turtle Mountain (Alberta) was affected by a 30 M m3-rockslide named Frank Slide that resulted in more than 70 casualties. Assuming that the main discontinuity sets, including bedding, control part of the slope morphology, the structural features of Turtle Mountain were investigated using a digital elevation model (DEM). Using new landscape analysis techniques, we have identified three main joint and fault sets. These results are in agreement with those sets identified through field observations. Landscape analysis techniques, using a DEM, confirm and refine the most recent geology model of the Frank Slide. The rockslide was initiated along bedding and a fault at the base of the slope and propagated up slope by a regressive process following a surface composed of pre-existing discontinuities. The DEM analysis also permits the identification of important geological structures along the 1903 slide scar. Based on the so called Sloping Local Base Level (SLBL) an estimation was made of the present unstable volumes in the main scar delimited by the cracks, and around the south area of the scar (South Peak). The SLBL is a method permitting a geometric interpretation of the failure surface based on a DEM. Finally we propose a failure mechanism permitting the progressive failure of the rock mass that considers gentle dipping wedges (30°). The prisms or wedges defined by two discontinuity sets permit the creation of a failure surface by progressive failure. Such structures are more commonly observed in recent rockslides. This method is efficient and is recommended as a preliminary analysis prior to field investigation.
Resumo:
BACKGROUND: Recommendations for statin use for primary prevention of coronary heart disease (CHD) are based on estimation of the 10- year CHD risk. We compared the 10-year CHD risk assessments and eligibility percentages for statin therapy using three scoring algorithms currently used in Europe. METHODS: We studied 5683 women and men, aged 35-75, without overt cardiovascular disease (CVD), in a population-based study in Switzerland. We compared the 10-year CHD risk using three scoring schemes, i.e., the Framingham risk score (FRS) from the U.S. National Cholesterol Education Program's Adult Treatment Panel III (ATP III), the PROCAM scoring scheme from the International Atherosclerosis Society (IAS), and the European risk SCORE for low-risk countries, without and with extrapolation to 60 years as recommended by the European Society of Cardiology guidelines (ESC). With FRS and PROCAM, high-risk was defined as a 10- year risk of fatal or non-fatal CHD>20% and a 10-year risk of fatal CVD≥5% with SCORE. We compared the proportions of high-risk participants and eligibility for statin use according to these three schemes. For each guideline, we estimated the impact of increased statin use from current partial compliance to full compliance on potential CHD deaths averted over 10 years, using a success proportion of 27% for statins. RESULTS: Participants classified at high-risk (both genders) were 5.8% according to FRS and 3.0% to the PROCAM, whereas the European risk SCORE classified 12.5% at high-risk (15.4% with extrapolation to 60 years). For the primary prevention of CHD, 18.5% of participants were eligible for statin therapy using ATP III, 16.6% using IAS, and 10.3% using ESC (13.0% with extrapolation) because ESC guidelines recommend statin therapy only in high-risk subjects. In comparison with IAS, agreement to identify eligible adults for statins was good with ATP III, but moderate with ESC. Using a population perspective, a full compliance with ATP III guidelines would reduce up to 17.9% of the 24′ 310 CHD deaths expected over 10 years in Switzerland, 17.3% with IAS and 10.8% with ESC (11.5% with extrapolation). CONCLUSIONS: Full compliance with guidelines for statin therapy would result in substantial health benefits, but proportions of high-risk adults and eligible adults for statin use varied substantially depending on the scoring systems and corresponding guidelines used for estimating CHD risk in Europe.
Resumo:
Innate immune responses play a central role in neuroprotection and neurotoxicity during inflammatory processes that are triggered by pathogen-associated molecular pattern-exhibiting agents such as bacterial lipopolysaccharide (LPS) and that are modulated by inflammatory cytokines such as interferon γ (IFNγ). Recent findings describing the unexpected complexity of mammalian genomes and transcriptomes have stimulated further identification of novel transcripts involved in specific physiological and pathological processes, such as the neural innate immune response that alters the expression of many genes. We developed a system for efficient subtractive cloning that employs both sense and antisense cRNA drivers, and coupled it with in-house cDNA microarray analysis. This system enabled effective direct cloning of differentially expressed transcripts, from a small amount (0.5 µg) of total RNA. We applied this system to isolation of genes activated by LPS and IFNγ in primary-cultured cortical cells that were derived from newborn mice, to investigate the mechanisms involved in neuroprotection and neurotoxicity in maternal/perinatal infections that cause various brain injuries including periventricular leukomalacia. A number of genes involved in the immune and inflammatory response were identified, showing that neonatal neuronal/glial cells are highly responsive to LPS and IFNγ. Subsequent RNA blot analysis revealed that the identified genes were activated by LPS and IFNγ in a cooperative or distinctive manner, thereby supporting the notion that these bacterial and cellular inflammatory mediators can affect the brain through direct but complicated pathways. We also identified several novel clones of apparently non-coding RNAs that potentially harbor various regulatory functions. Characterization of the presently identified genes will give insights into mechanisms and interventions not only for perinatal infection-induced brain damage, but also for many other innate immunity-related brain disorders.
Resumo:
An online algorithm for determining respiratory mechanics in patients using non-invasive ventilation (NIV) in pressure support mode was developed and embedded in a ventilator system. Based on multiple linear regression (MLR) of respiratory data, the algorithm was tested on a patient bench model under conditions with and without leak and simulating a variety of mechanics. Bland-Altman analysis indicates reliable measures of compliance across the clinical range of interest (± 11-18% limits of agreement). Resistance measures showed large quantitative errors (30-50%), however, it was still possible to qualitatively distinguish between normal and obstructive resistances. This outcome provides clinically significant information for ventilator titration and patient management.
Resumo:
Astrocytes are now considered as key players in brain information processing because of their newly discovered roles in synapse formation and plasticity, energy metabolism and blood flow regulation. However, our understanding of astrocyte function is still fragmented compared to other brain cell types. A better appreciation of the biology of astrocytes requires the development of tools to generate animal models in which astrocyte-specific proteins and pathways can be manipulated. In addition, it is becoming increasingly evident that astrocytes are also important players in many neurological disorders. Targeted modulation of protein expression in astrocytes would be critical for the development of new therapeutic strategies. Gene transfer is valuable to target a subpopulation of cells and explore their function in experimental models. In particular, viral-mediated gene transfer provides a rapid, highly flexible and cost-effective, in vivo paradigm to study the impact of genes of interest during central nervous system development or in adult animals. We will review the different strategies that led to the recent development of efficient viral vectors that can be successfully used to selectively transduce astrocytes in the mammalian brain.