154 resultados para Bayesian approaches
Resumo:
In the forensic examination of DNA mixtures, the question of how to set the total number of contributors (N) presents a topic of ongoing interest. Part of the discussion gravitates around issues of bias, in particular when assessments of the number of contributors are not made prior to considering the genotypic configuration of potential donors. Further complication may stem from the observation that, in some cases, there may be numbers of contributors that are incompatible with the set of alleles seen in the profile of a mixed crime stain, given the genotype of a potential contributor. In such situations, procedures that take a single and fixed number contributors as their output can lead to inferential impasses. Assessing the number of contributors within a probabilistic framework can help avoiding such complication. Using elements of decision theory, this paper analyses two strategies for inference on the number of contributors. One procedure is deterministic and focuses on the minimum number of contributors required to 'explain' an observed set of alleles. The other procedure is probabilistic using Bayes' theorem and provides a probability distribution for a set of numbers of contributors, based on the set of observed alleles as well as their respective rates of occurrence. The discussion concentrates on mixed stains of varying quality (i.e., different numbers of loci for which genotyping information is available). A so-called qualitative interpretation is pursued since quantitative information such as peak area and height data are not taken into account. The competing procedures are compared using a standard scoring rule that penalizes the degree of divergence between a given agreed value for N, that is the number of contributors, and the actual value taken by N. Using only modest assumptions and a discussion with reference to a casework example, this paper reports on analyses using simulation techniques and graphical models (i.e., Bayesian networks) to point out that setting the number of contributors to a mixed crime stain in probabilistic terms is, for the conditions assumed in this study, preferable to a decision policy that uses categoric assumptions about N.
Resumo:
A ubiquitous assessment of swimming velocity (main metric of the performance) is essential for the coach to provide a tailored feedback to the trainee. We present a probabilistic framework for the data-driven estimation of the swimming velocity at every cycle using a low-cost wearable inertial measurement unit (IMU). The statistical validation of the method on 15 swimmers shows that an average relative error of 0.1 ± 9.6% and high correlation with the tethered reference system (rX,Y=0.91 ) is achievable. Besides, a simple tool to analyze the influence of sacrum kinematics on the performance is provided.
Resumo:
ABSTRACTThe Online Mendelian Inheritance in Man database (OMIM) reports about 3000 Mendelian diseases of known causal gene and about 2000 that remain to be mapped. These cases are often difficult to solve because of the rareness of the disease, the structure of the family (too big or too small) or the heterogeneity of the phenotype. The goal of this thesis is to explore the current genetic tools, before the advent of ultra high throughput sequencing, and integrate them in the attempt to map the genes behind the four studied cases. In this framework we have studied a small family with a recessive disease, a modifier gene for the penetrance of a dominant mutation, a large extended family with a cardiac phenotype and clinical and/or allelic heterogeneity and we have molecularly analyzed a balanced chromosomal translocation.RESUMELa base de données des maladies à transmission mendélienne, Online Mendelian Inheritance in Man (OMIM), contient environ 3000 affections à caractère mendélien pour lesquelles le gène responsable est connu et environ 2000 qui restent à élucider.Les cas restant à résoudre sont souvent difficiles soit par le caractère intrinsèquement rare de ces maladies soit à cause de difficultés structurelles (famille trop petite ou trop étendue) ou hétérogénéité du phénotype ou génétique. Cette thèse s'inscrit avant l'arrivée des nouveaux outils de séquençage à haut débit. Son but est d'explorer les outils génétiques actuels, et de les intégrer pour trouver les gènes impliqués dans quatre cas représentant chacun une situation génétique différente : nous avons étudié une famille de quatre individus avec une transmission récessive, recherché un gène modificateur de la pénétrance de mutations dominantes, étudié une famille étendue présentant un phénotype cardiaque cliniquement et/ou allèliquement hétérogène et nous avons fait l'analyse moléculaire d'une translocation chromosomique balancée.
Resumo:
The genetic characterization of unbalanced mixed stains remains an important area where improvement is imperative. In fact, with current methods for DNA analysis (Polymerase Chain Reaction with the SGM Plus™ multiplex kit), it is generally not possible to obtain a conventional autosomal DNA profile of the minor contributor if the ratio between the two contributors in a mixture is smaller than 1:10. This is a consequence of the fact that the major contributor's profile 'masks' that of the minor contributor. Besides known remedies to this problem, such as Y-STR analysis, a new compound genetic marker that consists of a Deletion/Insertion Polymorphism (DIP), linked to a Short Tandem Repeat (STR) polymorphism, has recently been developed and proposed elsewhere in literature [1]. The present paper reports on the derivation of an approach for the probabilistic evaluation of DIP-STR profiling results obtained from unbalanced DNA mixtures. The procedure is based on object-oriented Bayesian networks (OOBNs) and uses the likelihood ratio as an expression of the probative value. OOBNs are retained in this paper because they allow one to provide a clear description of the genotypic configuration observed for the mixed stain as well as for the various potential contributors (e.g., victim and suspect). These models also allow one to depict the assumed relevance relationships and perform the necessary probabilistic computations.
Resumo:
This paper presents and discusses the use of Bayesian procedures - introduced through the use of Bayesian networks in Part I of this series of papers - for 'learning' probabilities from data. The discussion will relate to a set of real data on characteristics of black toners commonly used in printing and copying devices. Particular attention is drawn to the incorporation of the proposed procedures as an integral part in probabilistic inference schemes (notably in the form of Bayesian networks) that are intended to address uncertainties related to particular propositions of interest (e.g., whether or not a sample originates from a particular source). The conceptual tenets of the proposed methodologies are presented along with aspects of their practical implementation using currently available Bayesian network software.
Resumo:
High-resolution tomographic imaging of the shallow subsurface is becoming increasingly important for a wide range of environmental, hydrological and engineering applications. Because of their superior resolution power, their sensitivity to pertinent petrophysical parameters, and their far reaching complementarities, both seismic and georadar crosshole imaging are of particular importance. To date, corresponding approaches have largely relied on asymptotic, ray-based approaches, which only account for a very small part of the observed wavefields, inherently suffer from a limited resolution, and in complex environments may prove to be inadequate. These problems can potentially be alleviated through waveform inversion. We have developed an acoustic waveform inversion approach for crosshole seismic data whose kernel is based on a finite-difference time-domain (FDTD) solution of the 2-D acoustic wave equations. This algorithm is tested on and applied to synthetic data from seismic velocity models of increasing complexity and realism and the results are compared to those obtained using state-of-the-art ray-based traveltime tomography. Regardless of the heterogeneity of the underlying models, the waveform inversion approach has the potential of reliably resolving both the geometry and the acoustic properties of features of the size of less than half a dominant wavelength. Our results do, however, also indicate that, within their inherent resolution limits, ray-based approaches provide an effective and efficient means to obtain satisfactory tomographic reconstructions of the seismic velocity structure in the presence of mild to moderate heterogeneity and in absence of strong scattering. Conversely, the excess effort of waveform inversion provides the greatest benefits for the most heterogeneous, and arguably most realistic, environments where multiple scattering effects tend to be prevalent and ray-based methods lose most of their effectiveness.
Resumo:
Ground-penetrating radar (GPR) has the potential to provide valuable information on hydrological properties of the vadose zone because of their strong sensitivity to soil water content. In particular, recent evidence has suggested that the stochastic inversion of crosshole GPR data within a coupled geophysical-hydrological framework may allow for effective estimation of subsurface van-Genuchten-Mualem (VGM) parameters and their corresponding uncertainties. An important and still unresolved issue, however, is how to best integrate GPR data into a stochastic inversion in order to estimate the VGM parameters and their uncertainties, thus improving hydrological predictions. Recognizing the importance of this issue, the aim of the research presented in this thesis was to first introduce a fully Bayesian inversion called Markov-chain-Monte-carlo (MCMC) strategy to perform the stochastic inversion of steady-state GPR data to estimate the VGM parameters and their uncertainties. Within this study, the choice of the prior parameter probability distributions from which potential model configurations are drawn and tested against observed data was also investigated. Analysis of both synthetic and field data collected at the Eggborough (UK) site indicates that the geophysical data alone contain valuable information regarding the VGM parameters. However, significantly better results are obtained when these data are combined with a realistic, informative prior. A subsequent study explore in detail the dynamic infiltration case, specifically to what extent time-lapse ZOP GPR data, collected during a forced infiltration experiment at the Arrenaes field site (Denmark), can help to quantify VGM parameters and their uncertainties using the MCMC inversion strategy. The findings indicate that the stochastic inversion of time-lapse GPR data does indeed allow for a substantial refinement in the inferred posterior VGM parameter distributions. In turn, this significantly improves knowledge of the hydraulic properties, which are required to predict hydraulic behaviour. Finally, another aspect that needed to be addressed involved the comparison of time-lapse GPR data collected under different infiltration conditions (i.e., natural loading and forced infiltration conditions) to estimate the VGM parameters using the MCMC inversion strategy. The results show that for the synthetic example, considering data collected during a forced infiltration test helps to better refine soil hydraulic properties compared to data collected under natural infiltration conditions. When investigating data collected at the Arrenaes field site, further complications arised due to model error and showed the importance of also including a rigorous analysis of the propagation of model error with time and depth when considering time-lapse data. Although the efforts in this thesis were focused on GPR data, the corresponding findings are likely to have general applicability to other types of geophysical data and field environments. Moreover, the obtained results allow to have confidence for future developments in integration of geophysical data with stochastic inversions to improve the characterization of the unsaturated zone but also reveal important issues linked with stochastic inversions, namely model errors, that should definitely be addressed in future research.
Resumo:
The recent advance in high-throughput sequencing and genotyping protocols allows rapid investigation of Mendelian and complex diseases on a scale not previously been possible. In my thesis research I took advantage of these modern techniques to study retinitis pigmentosa (RP), a rare inherited disease characterized by progressive loss of photoreceptors and leading to blindness; and hypertension, a common condition affecting 30% of the adult population. Firstly, I compared the performance of different next generation sequencing (NGS) platforms in the sequencing of the RP-linked gene PRPF31. The gene contained a mutation in an intronic repetitive element, which presented difficulties for both classic sequencing methods and NGS. We showed that all NGS platforms are powerful tools to identify rare and common DNA variants, also in case of more complex sequences. Moreover, we evaluated the features of different NGS platforms that are important in re-sequencing projects. The main focus of my thesis was then to investigate the involvement of pre-mRNA splicing factors in autosomal dominant RP (adRP). I screened 5 candidate genes in a large cohort of patients by using long-range PCR as enrichment step, followed by NGS. We tested two different approaches: in one, all target PCRs from all patients were pooled and sequenced as a single DNA library; in the other, PCRs from each patient were separated within the pool by DNA barcodes. The first solution was more cost-effective, while the second one allowed obtaining faster and more accurate results, but overall they both proved to be effective strategies for gene screenings in many samples. We could in fact identify novel missense mutations in the SNRNP200 gene, encoding an essential RNA helicase for splicing catalysis. Interestingly, one of these mutations showed incomplete penetrance in one family with adRP. Thus, we started to study the possible molecular causes underlying phenotypic differences between asymptomatic and affected members of this family. For the study of hypertension, I joined a European consortium to perform genome-wide association studies (GWAS). Thanks to the use of very informative genotyping arrays and of phenotipically well-characterized cohorts, we could identify a novel susceptibility locus for hypertension in the promoter region of the endothelial nitric oxide synthase gene (NOS3). Moreover, we have proven the direct causality of the associated SNP using three different methods: 1) targeted resequencing, 2) luciferase assay, and 3) population study. - Le récent progrès dans le Séquençage à haut Débit et les protocoles de génotypage a permis une plus vaste et rapide étude des maladies mendéliennes et multifactorielles à une échelle encore jamais atteinte. Durant ma thèse de recherche, j'ai utilisé ces nouvelles techniques de séquençage afin d'étudier la retinite pigmentale (RP), une maladie héréditaire rare caractérisée par une perte progressive des photorécepteurs de l'oeil qui entraine la cécité; et l'hypertension, une maladie commune touchant 30% de la population adulte. Tout d'abord, j'ai effectué une comparaison des performances de différentes plateformes de séquençage NGS (Next Generation Sequencing) lors du séquençage de PRPF31, un gène lié à RP. Ce gène contenait une mutation dans un élément répétable intronique, qui présentait des difficultés de séquençage avec la méthode classique et les NGS. Nous avons montré que les plateformes de NGS analysées sont des outils très puissants pour identifier des variations de l'ADN rares ou communes et aussi dans le cas de séquences complexes. De plus, nous avons exploré les caractéristiques des différentes plateformes NGS qui sont importantes dans les projets de re-séquençage. L'objectif principal de ma thèse a été ensuite d'examiner l'effet des facteurs d'épissage de pre-ARNm dans une forme autosomale dominante de RP (adRP). Un screening de 5 gènes candidats issus d'une large cohorte de patients a été effectué en utilisant la long-range PCR comme étape d'enrichissement, suivie par séquençage avec NGS. Nous avons testé deux approches différentes : dans la première, toutes les cibles PCRs de tous les patients ont été regroupées et séquencées comme une bibliothèque d'ADN unique; dans la seconde, les PCRs de chaque patient ont été séparées par code barres d'ADN. La première solution a été la plus économique, tandis que la seconde a permis d'obtenir des résultats plus rapides et précis. Dans l'ensemble, ces deux stratégies se sont démontrées efficaces pour le screening de gènes issus de divers échantillons. Nous avons pu identifier des nouvelles mutations faux-sens dans le gène SNRNP200, une hélicase ayant une fonction essentielle dans l'épissage. Il est intéressant de noter qu'une des ces mutations montre une pénétrance incomplète dans une famille atteinte d'adRP. Ainsi, nous avons commencé une étude sur les causes moléculaires entrainant des différences phénotypiques entre membres affectés et asymptomatiques de cette famille. Lors de l'étude de l'hypertension, j'ai rejoint un consortium européen pour réaliser une étude d'association Pangénomique ou genome-wide association study Grâce à l'utilisation de tableaux de génotypage très informatifs et de cohortes extrêmement bien caractérisées au niveau phénotypique, un nouveau locus lié à l'hypertension a été identifié dans la région promotrice du gène endothélial nitric oxide sinthase (NOS3). Par ailleurs, nous avons prouvé la cause directe du SNP associé au moyen de trois méthodes différentes: i) en reséquençant la cible avec NGS, ii) avec des essais à la luciférase et iii) une étude de population.
Resumo:
Individuals sampled in hybrid zones are usually analysed according to their sampling locality, morphology, behaviour or karyotype. But the increasing availability of genetic information more and more favours its use for individual sorting purposes and numerous assignment methods based on the genetic composition of individuals have been developed. The shrews of the Sorex araneus group offer good opportunities to test the genetic assignment on individuals identified by their karyotype. Here we explored the potential and efficiency of a Bayesian assignment method combined or not with a reference dataset to study admixture and individual assignment in the difficult context of two hybrid zones between karyotypic species of the Sorex araneus group. As a whole, we assigned more than 80% of the individuals to their respective karyotypic categories (i.e. 'pure' species or hybrids). This assignment level is comparable to what was obtained for the same species away from hybrid zones. Additionally, we showed that the assignment result for several individuals was strongly affected by the inclusion or not of a reference dataset. This highlights the importance of such comparisons when analysing hybrid zones. Finally, differences between the admixture levels detected in both hybrid zones support the hypothesis of an impact of chromosomal rearrangements on gene flow.
Resumo:
Host genome studies are increasingly available for the study of infectious disease susceptibility. Current technologies include large-scale genotyping, genome-wide screens such as transcriptome and silencing (silencing RNA) studies, and increasingly, the possibility to sequence complete genomes. These approaches are of interest for the study of individuals who remain uninfected despite documented exposure to human immunodeficiency virus type 1. The main limitation remains the ascertainment of exposure and establishing large cohorts of informative individuals. The pattern of enrichment for CCR5 Δ32 homozygosis should serve as the standard for assessing the extent to which a given cohort (of white subjects) includes a large proportion of exposed uninfected individuals.
Resumo:
Aim We investigated the late Quaternary history of two closely related and partly sympatric species of Primula from the south-western European Alps, P. latifolia Lapeyr. and P. marginata Curtis, by combining phylogeographical and palaeodistribution modelling approaches. In particular, we were interested in whether the two approaches were congruent and identified the same glacial refugia. Location South-western European Alps. Methods For the phylogeographical analysis we included 353 individuals from 28 populations of P. marginata and 172 individuals from 15 populations of P. latifolia and used amplified fragment length polymorphisms (AFLPs). For palaeodistribution modelling, species distribution models (SDMs) were based on extant species occurrences and then projected to climate models (CCSM, MIROC) of the Last Glacial Maximum (LGM), approximately 21 ka. Results The locations of the modelled LGM refugia were confirmed by various indices of genetic variation. The refugia of the two species were largely geographically isolated, overlapping only 6% to 11% of the species' total LGM distribution. This overlap decreased when the position of the glacial ice sheet and the differential elevational and edaphic distributions of the two species were considered. Main conclusions The combination of phylogeography and palaeodistribution modelling proved useful in locating putative glacial refugia of two alpine species of Primula. The phylogeographical data allowed us to identify those parts of the modelled LGM refugial area that were likely source areas for recolonization. The use of SDMs predicted LGM refugial areas substantially larger and geographically more divergent than could have been predicted by phylogeographical data alone
Resumo:
Part I of this series of articles focused on the construction of graphical probabilistic inference procedures, at various levels of detail, for assessing the evidential value of gunshot residue (GSR) particle evidence. The proposed models - in the form of Bayesian networks - address the issues of background presence of GSR particles, analytical performance (i.e., the efficiency of evidence searching and analysis procedures) and contamination. The use and practical implementation of Bayesian networks for case pre-assessment is also discussed. This paper, Part II, concentrates on Bayesian parameter estimation. This topic complements Part I in that it offers means for producing estimates useable for the numerical specification of the proposed probabilistic graphical models. Bayesian estimation procedures are given a primary focus of attention because they allow the scientist to combine (his/her) prior knowledge about the problem of interest with newly acquired experimental data. The present paper also considers further topics such as the sensitivity of the likelihood ratio due to uncertainty in parameters and the study of likelihood ratio values obtained for members of particular populations (e.g., individuals with or without exposure to GSR).