3 resultados para multilocus genotype data
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.
Resumo:
Staphylococcus aureus is one of the most important bacteria that cause disease in humans, and methicillin-resistant S. aureus (MRSA) has become the most commonly identified antibiotic-resistant pathogen in many parts of the world. MRSA rates have been stable for many years in the Nordic countries and the Netherlands with a low MRSA prevalence in Europe, but in the recent decades, MRSA rates have increased in those low-prevalence countries as well. MRSA has been established as a major hospital pathogen, but has also been found increasingly in long-term facilities (LTF) and in communities of persons with no connections to the health-care setting. In Finland, the annual number of MRSA isolates reported to the National Infectious Disease Register (NIDR) has constantly increased, especially outside the Helsinki metropolitan area. Molecular typing has revealed numerous outbreak strains of MRSA, some of which have previously been associated with community acquisition. In this work, data on MRSA cases notified to the NIDR and on MRSA strain types identified with pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), and staphylococcal cassette chromosome mec (SCCmec) typing at the National Reference Laboratory (NRL) in Finland from 1997 to 2004 were analyzed. An increasing trend in MRSA incidence in Finland from 1997 to 2004 was shown. In addition, non-multi-drug resistant (NMDR) MRSA isolates, especially those resistant only to methicillin/oxacillin, showed an emerging trend. The predominant MRSA strains changed over time and place, but two internationally spread epidemic strains of MRSA, FIN-16 and FIN-21, were related to the increase detected most recently. Those strains were also one cause of the strikingly increasing invasive MRSA findings. The rise of MRSA strains with SCCmec types IV or V, possible community-acquired MRSA was also detected. With questionnaires, the diagnostic methods used for MRSA identification in Finnish microbiology laboratories and the number of MRSA screening specimens studied were reviewed. Surveys, which focused on the MRSA situation in long-term facilities in 2001 and on the background information of MRSA-positive persons in 2001-2003, were also carried out. The rates of MRSA and screening practices varied widely across geographic regions. Part of the NMDR MRSA strains could remain undetected in some laboratories because of insufficient diagnostic techniques used. The increasing proportion of elderly population carrying MRSA suggests that MRSA is an emerging problem in Finnish long-term facilities. Among the patients, 50% of the specimens were taken on a clinical basis, 43% on a screening basis after exposure to MRSA, 3% on a screening basis because of hospital contact abroad, and 4% for other reasons. In response to an outbreak of MRSA possessing a new genotype that occurred in a health care ward and in an associated nursing home of a small municipality in Northern Finland in autumn 2003, a point-prevalence survey was performed six months later. In the same study, the molecular epidemiology of MRSA and methicillin-sensitive S. aureus (MSSA) strains were also assessed, the results to the national strain collection compared, and the difficulties of MRSA screening with low-level oxacillin-resistant isolates encountered. The original MRSA outbreak in LTF, which consisted of isolates possessing a nationally new PFGE profile (FIN-22) and internationally rare MLST type (ST-27), was confined. Another previously unrecognized MRSA strain was found with additional screening, possibly indicating that current routine MRSA screening methods may be insufficiently sensitive for strains possessing low-level oxacillin resistance. Most of the MSSA strains found were genotypically related to the epidemic MRSA strains, but only a few of them had received the SCCmec element, and all those strains possessed the new SCCmec type V. In the second largest nursing home in Finland, the colonization of S. aureus and MRSA, and the role of screening sites along with broth enrichment culture on the sensitivity to detect S. aureus were studied. Combining the use of enrichment broth and perineal swabbing, in addition to nostrils and skin lesions swabbing, may be an alternative for throat swabs in the nursing home setting, especially when residents are uncooperative. Finally, in order to evaluate adequate phenotypic and genotypic methods needed for reliable laboratory diagnostics of MRSA, oxacillin disk diffusion and MIC tests to the cefoxitin disk diffusion method at both +35°C and +30°C, both with or without an addition of sodium chloride (NaCl) to the Müller Hinton test medium, and in-house PCR to two commercial molecular methods (the GenoType® MRSA test and the EVIGENETM MRSA Detection test) with different bacterial species in addition to S. aureus were compared. The cefoxitin disk diffusion method was superior to that of oxacillin disk diffusion and to the MIC tests in predicting mecA-mediated resistance in S. aureus when incubating at +35°C with or without the addition of NaCl to the test medium. Both the Geno Type® MRSA and EVIGENETM MRSA Detection tests are usable, accurate, cost-effective, and sufficiently fast methods for rapid MRSA confirmation from a pure culture.