42 resultados para computational biology
em Helda - Digital Repository of University of Helsinki
Resumo:
This thesis which consists of an introduction and four peer-reviewed original publications studies the problems of haplotype inference (haplotyping) and local alignment significance. The problems studied here belong to the broad area of bioinformatics and computational biology. The presented solutions are computationally fast and accurate, which makes them practical in high-throughput sequence data analysis. Haplotype inference is a computational problem where the goal is to estimate haplotypes from a sample of genotypes as accurately as possible. This problem is important as the direct measurement of haplotypes is difficult, whereas the genotypes are easier to quantify. Haplotypes are the key-players when studying for example the genetic causes of diseases. In this thesis, three methods are presented for the haplotype inference problem referred to as HaploParser, HIT, and BACH. HaploParser is based on a combinatorial mosaic model and hierarchical parsing that together mimic recombinations and point-mutations in a biologically plausible way. In this mosaic model, the current population is assumed to be evolved from a small founder population. Thus, the haplotypes of the current population are recombinations of the (implicit) founder haplotypes with some point--mutations. HIT (Haplotype Inference Technique) uses a hidden Markov model for haplotypes and efficient algorithms are presented to learn this model from genotype data. The model structure of HIT is analogous to the mosaic model of HaploParser with founder haplotypes. Therefore, it can be seen as a probabilistic model of recombinations and point-mutations. BACH (Bayesian Context-based Haplotyping) utilizes a context tree weighting algorithm to efficiently sum over all variable-length Markov chains to evaluate the posterior probability of a haplotype configuration. Algorithms are presented that find haplotype configurations with high posterior probability. BACH is the most accurate method presented in this thesis and has comparable performance to the best available software for haplotype inference. Local alignment significance is a computational problem where one is interested in whether the local similarities in two sequences are due to the fact that the sequences are related or just by chance. Similarity of sequences is measured by their best local alignment score and from that, a p-value is computed. This p-value is the probability of picking two sequences from the null model that have as good or better best local alignment score. Local alignment significance is used routinely for example in homology searches. In this thesis, a general framework is sketched that allows one to compute a tight upper bound for the p-value of a local pairwise alignment score. Unlike the previous methods, the presented framework is not affeced by so-called edge-effects and can handle gaps (deletions and insertions) without troublesome sampling and curve fitting.
Resumo:
Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.
Resumo:
Mass spectrometry (MS) became a standard tool for identifying metabolites in biological tissues, and metabolomics is slowly acknowledged as a legitimate research discipline for characterizing biological conditions. The computational analyses of metabolomics, however, lag behind compared with the rapid advances in analytical aspects for two reasons. First is the lack of standardized data repository for mass spectra: each research institution is flooded with gigabytes of mass-spectral data from its own analytical groups and cannot host a world-class repository for mass spectra. The second reason is the lack of informatics experts that are fully experienced with spectral analyses. The two barriers must be overcome to establish a publicly free data server for MS analysis in metabolomics as does GenBank in genomics and UniProt in proteomics. The workshop brought together bioinformaticians working on mass spectral analyses in Finland and Japan with the goal to establish a consortium to freely exchange and publicize mass spectra of metabolites measured on various platforms computational tools to analyze spectra spectral knowledge that are computationally predicted from standardized data. This book contains the abstracts of the presentations given in the workshop. The programme of the workshop consisted of oral presentations from Japan and Finland, invited lectures from Steffen Neumann (Leibniz Institute of Plant Biochemistry), Matej Oresic (VTT), Merja Penttila (VTT) and Nicola Zamboni (ETH Zurich) as well as free form discussion among the participants. The event was funded by Academy of Finland (grants 139203 and 118653), Japan Society for the Promotion of Science (JSPS Japan-Finland Bilateral Semi- nar Program 2010) and Department of Computer Science University of Helsinki. We would like to thank all the people contributing to the technical pro- gramme and the sponsors for making the workshop possible. Helsinki, October 2010 Masanori Arita, Markus Heinonen and Juho Rousu
Resumo:
In the thesis it is discussed in what ways concepts and methodology developed in evolutionary biology can be applied to the explanation and research of language change. The parallel nature of the mechanisms of biological evolution and language change is explored along with the history of the exchange of ideas between these two disciplines. Against this background computational methods developed in evolutionary biology are taken into consideration in terms of their applicability to the study of historical relationships between languages. Different phylogenetic methods are explained in common terminology, avoiding the technical language of statistics. The thesis is on one hand a synthesis of earlier scientific discussion, and on the other an attempt to map out the problems of earlier approaches in addition to finding new guidelines in the study of language change on their basis. Primarily literature about the connections between evolutionary biology and language change, along with research articles describing applications of phylogenetic methods into language change have been used as source material. The thesis starts out by describing the initial development of the disciplines of evolutionary biology and historical linguistics, a process which right from the beginning can be seen to have involved an exchange of ideas concerning the mechanisms of language change and biological evolution. The historical discussion lays the foundation for the handling of the generalised account of selection developed during the recent few decades. This account is aimed for creating a theoretical framework capable of explaining both biological evolution and cultural change as selection processes acting on self-replicating entities. This thesis focusses on the capacity of the generalised account of selection to describe language change as a process of this kind. In biology, the mechanisms of evolution are seen to form populations of genetically related organisms through time. One of the central questions explored in this thesis is whether selection theory makes it possible to picture languages are forming populations of a similar kind, and what a perspective like this can offer to the understanding of language in general. In historical linguistics, the comparative method and other, complementing methods have been traditionally used to study the development of languages from a common ancestral language. Computational, quantitative methods have not become widely used as part of the central methodology of historical linguistics. After the fading of a limited popularity enjoyed by the lexicostatistical method since the 1950s, only in the recent years have also the computational methods of phylogenetic inference used in evolutionary biology been applied to the study of early language history. In this thesis the possibilities offered by the traditional methodology of historical linguistics and the new phylogenetic methods are compared. The methods are approached through the ways in which they have been applied to the Indo-European languages, which is the most thoroughly investigated language family using both the traditional and the phylogenetic methods. The problems of these applications along with the optimal form of the linguistic data used in these methods are explored in the thesis. The mechanisms of biological evolution are seen in the thesis as parallel in a limited sense to the mechanisms of language change, however sufficiently so that the development of a generalised account of selection is deemed as possibly fruiful for understanding language change. These similarities are also seen to support the validity of using phylogenetic methods in the study of language history, although the use of linguistic data and the models of language change employed by these models are seen to await further development.
Resumo:
Atherosclerosis is a disease of the arteries; its characteristic features include chronic inflammation, extra- and intracellular lipid accumulation, extracellular matrix remodeling, and an increase in extracellular matrix volume. The underlying mechanisms in the pathogenesis of advanced atherosclerotic plaques, that involve local acidity of the extracellular fluid, are still incompletely understood. In this thesis project, my co-workers and I studied the different mechanisms by which local extracellular acidity could promote accumulation of the atherogenic apolipoprotein B-100 (apoB-100)-containing plasma lipoprotein particles in the inner layer of the arterial wall, the intima. We found that lipolysis of atherogenic apoB-100-containing plasma lipoprotein particles (LDL, IDL, and sVLDL) by the secretory phospholipase A2 group V (sPLA2-V) enzyme, was increased at acidic pH. Also, the binding of apoB-100-containing plasma lipoprotein particles to human aortic proteoglycans was dramatically enhanced at acidic pH. Additionally, lipolysis by sPLA2-V enzyme further increased this binding. Using proteoglycan-affinity chromatography, we found that sVLDL lipoprotein particles consist of populations, differing in their affinities toward proteoglycans. These populations also contained different amounts of apolipoprotein E (apoE) and apolipoprotein C-III (apoC-III); the amounts of apoC-III and apoE per particle were highest in the population with the lowest affinity toward proteoglycans. Since PLA2-modification of LDL particles has been shown to change their aggregation behavior, we also studied the effect of acidic pH on the monolayer structure covering lipoprotein particles after PLA2-induced hydrolysis. Using molecular dynamics simulations, we found that, in acidity, the monolayer is more tightly packed laterally; moreover, its spontaneous curvature is negative, suggesting that acidity may promote lipoprotein particles fusion. In addition to extracellular lipid accumulation, the apoB-100-containing plasma lipoprotein particles can be taken up by inflammatory cells, namely macrophages. Using radiolabeled lipoprotein particles and cell cultures, we showed that sPLA2-V-modification of LDL, IDL, and sVLDL lipoproteins particles, at neutral or acidic pH, increased their uptake by human monocyte-derived macrophages.
Resumo:
Torque teno virus (TTV) was discovered in 1997 in the serum of a Japanese patient who had a post-transfusion hepatitis of unknown etiology. It is a small virus containing a circular single-stranded DNA genome which is unique among human viruses. Within a few years after its discovery, the TTVs were noted to form a large family of viruses with numerous genotypes. TTV is highly prevalent among the general population throughout the world, and persistent infections and co-infections with several genotypes occur frequently. However, the pathogenicity and the mechanism for the sustained occurrence of the virus in blood are at present unclear. To determine the prevalence of TTV in Finland, we set up PCR methods and examined the sera of asymptomatic subjects for the presence of TTV DNA and for genotype-6 DNA. TTV was found to be highly prevalent also in Finland; 85% of adults harbored TTV in their blood, and 4% were infected with genotype-6. In addition, TTV DNA was detected in a number of different tissues, with no tissue-type or symptom specificity. Most cell-biological events during TTV infections are at the moment unknown. Replicating TTV DNA has, however, been detected in liver and the hematopoietic compartment, and three mRNAs are known to be generated. To characterize TTV cell biology in more detail, we cloned in full length the genome of TTV genotype 6. We showed that in human kidney-derived cells TTV produces altogether six proteins with distinct subcellular localizations. TTV mRNA transcription was detected in all cell lines transfected with the full-length clone, and TTV DNA replicated in several of them, including those of erythroid, kidney, and hepatic origin. Furthermore, the viral DNA replication was shown to utilize the cellular DNA polymerases. Diagnoses of TTV infections have been based almost solely on PCR, whereas serological tests, measuring antibody responses, would give more information on many aspects of these infections. To investigate the TTV immunology in more detail, we produced all six TTV proteins for use as antigens in serological tests. We detected in human sera IgM and IgG antibodies to occur simultaneously with TTV DNA, and observed appearance of TTV DNA regardless of pre-existing antibodies, and disappearance of TTV DNA after antibody appearance. The genotype-6 nucleotide sequence remained stable for years within the infected subjects, suggesting that some mechanism other than mutations is used by this minute virus to evade our immune system and to establish chronic infections in immunocompetent subjects.
Resumo:
Progressive myoclonus epilepsy of Unverricht-Lundborg type (EPM1) is an autosomal recessively inherited disorder characterized by age of onset at 6-15 years, stimulus-sensitive myoclonus, tonic-clonic epileptic seizures and a progressive course. Mutations in the cystatin B (CSTB) gene underlie EPM1. The most common mutation underlying EPM1 is a dodecamer repeat expansion in the promoter region of CSTB. In addition, nine other mutations have been identified. CSTB, a cysteine protease inhibitor, is a ubiquitously expressed inhibitor of cathepsins, but its physiological function is unknown. The purpose of this study was to investigate CSTB gene expression and CSTB protein function in normal and pathological conditions. The basal CSTB promoter was mapped and characterized using different promoter-luciferase gene constructs. The binding activity of transcription factors to one ARE half, five Sp1 and four AP1 sites in the CSTB promoter was demonstrated. The CSTB promoter activity was clearly decreased using a CSTB promoter with "premutation" repeat expansions and in individuals with alike expansions. The expression of CSTB mRNA and protein was markedly reduced in patient cells. The endogenous CSTB protein localized to the nucleus, cytoplasm and lysosomes, and in differentiated cells merely to the cytoplasm. This suggests that the subcellular distribution of CSTB is dependent on the differentation status of the cells. The proteins representing patient missense mutations failed to associate with lysosomes, implying the importance of the lysosomal association for the proper physiological function of CSTB. Several alternatively spliced CSTB isoforms were identified. Of these CSTB2 was widely expressed with very low levels whereas the other alternatively spliced forms seemed to have limited tissue expression. In patients CSTB2 expression was reduced similarly to that of CSTB. The physiological relevance of CSTB alternative splicing remains unknown. The mouse Cstb transcript was shown to be present in all embryonic stages and adult tissues examined. The expression was highest at embryonic day 7 and in thymus, as well as in postnatal brain in the cortex, caudate putamen, thalamus, hippocampus, and in the Purkinje cell layer of the cerebellum. Our data implies that CSTB expression is tightly temporally and spatially regulated. The data presented in my thesis lay the basis for further understanding of the role of CSTB in health and disease.
Resumo:
Epilysin (MMP-28) is the most recently identified member of the matrix metalloproteinase (MMP) family of extracellular proteases. Together these enzymes are capable of degrading almost all components of the extracellular matrix (ECM) and are thus involved in important biological processes such as development, wound healing and immune functions, but also in pathological processes such as tumor invasion, metastasis and arthritis. MMPs do not act solely by degrading the ECM. They also regulate cell behavior by releasing growth factors and biologically active peptides from the ECM, by modulating cell surface receptors and adhesion molecules and by regulating the activity of many important mediators in inflammatory pathways. The aim of this study was to define the unique role of epilysin within the MMP-family, to elucidate how and when it is expressed and how its catalytic activity is regulated. To gain information on its essential functions and substrates, the specific aim was to characterize how epilysin affects the phenotype of epithelial cells, where it is biologically expressed. During the course of the study we found that the epilysin promoter contains a well conserved GT-box that is essential for the basic expression of this gene. Transcription factors Sp1 and Sp3 bind this sequence and could hence regulate both the basic and cell type and differentiation stage specific expression of epilysin. We cloned mouse epilysin cDNA and found that epilysin is well conserved between human and mouse genomes and that epilysin is glycosylated and activated by furin. Similarly to in human tissues, epilysin is normally expressed in a number of mouse tissues. The expression pattern differs from most other MMPs, which are expressed only in response to injury or inflammation and in pathological processes like cancer. These findings implicate that epilysin could be involved in tissue homeostasis, perhaps fine-tuning the phenotype of epithelial cells according to signals from the ECM. In view of these results, it was unexpected to find that epilysin can induce a stable epithelial to mesenchymal transition (EMT) when overexpressed in epithelial lung carcinoma cells. Transforming growth factor b (TGF-b) was recognized as a crucial mediator of this process, which was characterized by the loss of E-cadherin mediated cell-cell adhesion, elevated expression of gelatinase B and MT1-MMP and increased cell migration and invasion into collagen I gels. We also observed that epilysin is bound to the surface of epithelial cells and that this interaction is lost upon cell transformation and is susceptible to degradation by membrane type-1-MMP (MT1-MMP). The wide expression of epilysin under physiological conditions implicates that its effects on epithelial cell phenotype in vivo are not as dramatic as seen in our in vitro cell system. Nevertheless, current results indicate a possible interaction between epilysin and TGF-b also under physiological circumstances, where epilysin activity may not induce EMT but, instead, trigger less permanent changes in TGF-b signaling and cell motility. Epilysin may thus play an important role in TGF-b regulated events such as wound healing and inflammation, processes where involvement of epilysin has been indicated.
Resumo:
The molecular level structure of mixtures of water and alcohols is very complicated and has been under intense research in the recent past. Both experimental and computational methods have been used in the studies. One method for studying the intra- and intermolecular bindings in the mixtures is the use of the so called difference Compton profiles, which are a way to obtain information about changes in the electron wave functions. In the process of Compton scattering a photon scatters inelastically from an electron. The Compton profile that is obtained from the electron wave functions is directly proportional to the probability of photon scattering at a given energy to a given solid angle. In this work we develop a method to compute Compton profiles numerically for mixtures of liquids. In order to obtain the electronic wave functions necessary to calculate the Compton profiles we need some statistical information about atomic coordinates. Acquiring this using ab-initio molecular dynamics is beyond our computational capabilities and therefore we use classical molecular dynamics to model the movement of atoms in the mixture. We discuss the validity of the chosen method in view of the results obtained from the simulations. There are some difficulties in using classical molecular dynamics for the quantum mechanical calculations, but these can possibly be overcome by parameter tuning. According to the calculations clear differences can be seen in the Compton profiles of different mixtures. This prediction needs to be tested in experiments in order to find out whether the approximations made are valid.
Resumo:
There is intense activity in the area of theoretical chemistry of gold. It is now possible to predict new molecular species, and more recently, solids by combining relativistic methodology with isoelectronic thinking. In this thesis we predict a series of solid sheet-type crystals for Group-11 cyanides, MCN (M=Cu, Ag, Au), and Group-2 and 12 carbides MC2 (M=Be-Ba, Zn-Hg). The idea of sheets is then extended to nanostrips which can be bent to nanorings. The bending energies and deformation frequencies can be systematized by treating these molecules as an elastic bodies. In these species Au atoms act as an 'intermolecular glue'. Further suggested molecular species are the new uncongested aurocarbons, and the neutral Au_nHg_m clusters. Many of the suggested species are expected to be stabilized by aurophilic interactions. We also estimate the MP2 basis-set limit of the aurophilicity for the model compounds [ClAuPH_3]_2 and [P(AuPH_3)_4]^+. Beside investigating the size of the basis-set applied, our research confirms that the 19-VE TZVP+2f level, used a decade ago, already produced 74 % of the present aurophilic attraction energy for the [ClAuPH_3]_2 dimer. Likewise we verify the preferred C4v structure for the [P(AuPH_3)_4]^+ cation at the MP2 level. We also perform the first calculation on model aurophilic systems using the SCS-MP2 method and compare the results to high-accuracy CCSD(T) ones. The recently obtained high-resolution microwave spectra on MCN molecules (M=Cu, Ag, Au) provide an excellent testing ground for quantum chemistry. MP2 or CCSD(T) calculations, correlating all 19 valence electrons of Au and including BSSE and SO corrections, are able to give bond lengths to 0.6 pm, or better. Our calculated vibrational frequencies are expected to be better than the currently available experimental estimates. Qualitative evidence for multiple Au-C bonding in triatomic AuCN is also found.