15 resultados para Exact sequence
em Helda - Digital Repository of University of Helsinki
Resumo:
Inorganic pyrophosphatases (PPases, EC 3.6.1.1) hydrolyse pyrophosphate in a reaction that provides the thermodynamic 'push' for many reactions in the cell, including DNA and protein synthesis. Soluble PPases can be classified into two families that differ completely in both sequence and structure. While Family I PPases are found in all kingdoms, family II PPases occur only in certain prokaryotes. The enzyme from baker's yeast (Saccharomyces cerevisiae) is very well characterised both kinetically and structurally, but the exact mechanism has remained elusive. The enzyme uses divalent cations as cofactors; in vivo the metal is magnesium. Two metals are permanently bound to the enzyme, while two come with the substrate. The reaction cycle involves the activation of the nucleophilic oxygen and allows different pathways for product release. In this thesis I have solved the crystal structures of wild type yeast PPase and seven active site variants in the presence of the native cofactor magnesium. These structures explain the effects of the mutations and have allowed me to describe each intermediate along the catalytic pathway with a structure. Although establishing the ʻchoreographyʼ of the heavy atoms is an important step in understanding the mechanism, hydrogen atoms are crucial for the mechanism. The most unambiguous method to determine the positions of these hydrogen atoms is neutron crystallography. In order to determine the neutron structure of yeast PPase I perdeuterated the enzyme and grew large crystals of it. Since the crystals were not stable at ambient temperature, a cooling device was developed to allow neutron data collection. In order to investigate the structural changes during the reaction in real time by time-resolved crystallography a photolysable substrate precursor is needed. I synthesised a candidate molecule and characterised its photolysis kinetics, but unfortunately it is hydrolysed by both yeast and Thermotoga maritima PPases. The mechanism of Family II PPases is subtly different from Family I. The native metal cofactor is manganese instead of magnesium, but the metal activation is more complex because the metal ions that arrive with the substrate are magnesium different from those permanently bound to the enzyme. I determined the crystal structures of wild type Bacillus subtilis PPase with the inhibitor imidodiphosphate and an inactive H98Q variant with the substrate pyrophosphate. These structures revealed a new trimetal site that activates the nucleophile. I also determined that the metal ion sites were partially occupied by manganese and iron using anomalous X- ray scattering.
Resumo:
NMR spectroscopy enables the study of biomolecules from peptides and carbohydrates to proteins at atomic resolution. The technique uniquely allows for structure determination of molecules in solution-state. It also gives insights into dynamics and intermolecular interactions important for determining biological function. Detailed molecular information is entangled in the nuclear spin states. The information can be extracted by pulse sequences designed to measure the desired molecular parameters. Advancement of pulse sequence methodology therefore plays a key role in the development of biomolecular NMR spectroscopy. A range of novel pulse sequences for solution-state NMR spectroscopy are presented in this thesis. The pulse sequences are described in relation to the molecular information they provide. The pulse sequence experiments represent several advances in NMR spectroscopy with particular emphasis on applications for proteins. Some of the novel methods are focusing on methyl-containing amino acids which are pivotal for structure determination. Methyl-specific assignment schemes are introduced for increasing the size range of 13C,15N labeled proteins amenable to structure determination without resolving to more elaborate labeling schemes. Furthermore, cost-effective means are presented for monitoring amide and methyl correlations simultaneously. Residual dipolar couplings can be applied for structure refinement as well as for studying dynamics. Accurate methods for measuring residual dipolar couplings in small proteins are devised along with special techniques applicable when proteins require high pH or high temperature solvent conditions. Finally, a new technique is demonstrated to diminish strong-coupling induced artifacts in HMBC, a routine experiment for establishing long-range correlations in unlabeled molecules. The presented experiments facilitate structural studies of biomolecules by NMR spectroscopy.
Resumo:
The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.
Resumo:
Evolutionary genetics incorporates traditional population genetics and studies of the origins of genetic variation by mutation and recombination, and the molecular evolution of genomes. Among the primary forces that have potential to affect the genetic variation within and among populations, including those that may lead to adaptation and speciation, are genetic drift, gene flow, mutations and natural selection. The main challenges in knowing the genetic basis of evolutionary changes is to distinguish the adaptive selection forces that cause existent DNA sequence variants and also to identify the nucleotide differences responsible for the observed phenotypic variation. To understand the effects of various forces, interpretation of gene sequence variation has been the principal basis of many evolutionary genetic studies. The main aim of this thesis was to assess different forms of teleost gene sequence polymorphisms in evolutionary genetic studies of Atlantic salmon (Salmo salar) and other species. Firstly, the level of Darwinian adaptive evolution affected coding regions of the growth hormone (GH) gene during the teleost evolution was investigated based on the sequence data existing in public databases. Secondly, a target gene approach was used to identify within population variation in the growth hormone 1 (GH1) gene in salmon. Then, a new strategy for single nucleotide polymorphisms (SNPs) discovery in salmonid fishes was introduced, and, finally, the usefulness of a limited number of SNP markers as molecular tools in several applications of population genetics in Atlantic salmon was assessed. This thesis showed that the gene sequences in databases can be utilized to perform comparative studies of molecular evolution, and some putative evidence of the existence of Darwinian selection during the teleost GH evolution was presented. In addition, existent sequence data was exploited to investigate GH1 gene variation within Atlantic salmon populations throughout its range. Purifying selection is suggested to be the predominant evolutionary force controlling the genetic variation of this gene in salmon, and some support for gene flow between continents was also observed. The novel approach to SNP discovery in species with duplicated genome fragments introduced here proved to be an effective method, and this may have several applications in evolutionary genetics with different species - e.g. when developing gene-targeted markers to investigate quantitative genetic variation. The thesis also demonstrated that only a few SNPs performed highly similar signals in some of the population genetic analyses when compared with the microsatellite markers. This may have useful applications when estimating genetic diversity in genes having a potential role in ecological and conservation issues, or when using hard biological samples in genetic studies as SNPs can be applied with relatively highly degraded DNA.
Resumo:
Visual pigments of different animal species must have evolved at some stage to match the prevailing light environments, since all visual functions depend on their ability to absorb available photons and transduce the event into a reliable neural signal. There is a large literature on correlation between the light environment and spectral sensitivity between different fish species. However, little work has been done on evolutionary adaptation between separated populations within species. More generally, little is known about the rate of evolutionary adaptation to changing spectral environments. The objective of this thesis is to illuminate the constraints under which the evolutionary tuning of visual pigments works as evident in: scope, tempo, available molecular routes, and signal/noise trade-offs. Aquatic environments offer Nature s own laboratories for research on visual pigment properties, as naturally occurring light environments offer an enormous range of variation in both spectral composition and intensity. The present thesis focuses on the visual pigments that serve dim-light vision in two groups of model species, teleost fishes and mysid crustaceans. The geographical emphasis is in the brackish Baltic Sea area with its well-known postglacial isolation history and its aquatic fauna of both marine and fresh-water origin. The absorbance spectrum of the (single) dim-light visual pigment were recorded by microspectrophotometry (MSP) in single rods of 26 fish species and single rhabdoms of 8 opossum shrimp populations of the genus Mysis inhabiting marine, brackish or freshwater environments. Additionally, spectral sensitivity was determined from six Mysis populations by electroretinogram (ERG) recording. The rod opsin gene was sequenced in individuals of four allopatric populations of the sand goby (Pomatoschistus minutus). Rod opsins of two other goby species were investigated as outgroups for comparison. Rod absorbance spectra of the Baltic subspecies or populations of the primarily marine species herring (Clupea harengus membras), sand goby (P. minutus), and flounder (Platichthys flesus) were long-wavelength-shifted compared to their marine populations. The spectral shifts are consistent with adaptation for improved quantum catch (QC) as well as improved signal-to-noise ratio (SNR) of vision in the Baltic light environment. Since the chromophore of the pigment was pure A1 in all cases, this has apparently been achieved by evolutionary tuning of the opsin visual pigment. By contrast, no opsin-based differences were evident between lake and sea populations of species of fresh-water origin, which can tune their pigment by varying chromophore ratios. A more detailed analysis of differences in absorbance spectra and opsin sequence between and within populations was conducted using the sand goby as model species. Four allopatric populations from the Baltic Sea (B), Swedish west coast (S), English Channel (E), and Adriatic Sea (A) were examined. Rod absorbance spectra, characterized by the wavelength of maximum absorbance (λmax), differed between populations and correlated with differences in the spectral light transmission of the respective water bodies. The greatest λmax shift as well as the greatest opsin sequence difference was between the Baltic and the Adriatic populations. The significant within-population variation of the Baltic λmax values (506-511 nm) was analyzed on the level of individuals and was shown to correlate well with opsin sequence substitutions. The sequences of individuals with λmax at shorter wavelengths were identical to that of the Swedish population, whereas those with λmax at longer wavelengths additionally had substitution F261F/Y in the sixth transmembrane helix of the protein. This substitution (Y261) was also present in the Baltic common gobies and is known to redshift spectra. The tuning mechanism of the long-wavelength type Baltic sand gobies is assumed to be the co-expression of F261 and Y261 in all rods to produce ≈ 5 nm redshift. The polymorphism of the Baltic sand goby population possibly indicates ambiguous selection pressures in the Baltic Sea. The visual pigments of all lake populations of the opossum shrimp (Mysis relicta) were red-shifted by 25 nm compared with all Baltic Sea populations. This is calculated to confer a significant advantage in both QC and SNR in many humus-rich lakes with reddish water. Since only A2 chromophore was present, the differences obviously reflect evolutionary tuning of the visual protein, the opsin. The changes have occurred within the ca. 9000 years that the lakes have been isolated from the Sea after the most recent glaciation. At present, it seems that the mechanism explaining the spectral differences between lake and sea populations is not an amino acid substitution at any other conventional tuning site, but the mechanism is yet to be found.
Resumo:
A distributed system is a collection of networked autonomous processing units which must work in a cooperative manner. Currently, large-scale distributed systems, such as various telecommunication and computer networks, are abundant and used in a multitude of tasks. The field of distributed computing studies what can be computed efficiently in such systems. Distributed systems are usually modelled as graphs where nodes represent the processors and edges denote communication links between processors. This thesis concentrates on the computational complexity of the distributed graph colouring problem. The objective of the graph colouring problem is to assign a colour to each node in such a way that no two nodes connected by an edge share the same colour. In particular, it is often desirable to use only a small number of colours. This task is a fundamental symmetry-breaking primitive in various distributed algorithms. A graph that has been coloured in this manner using at most k different colours is said to be k-coloured. This work examines the synchronous message-passing model of distributed computation: every node runs the same algorithm, and the system operates in discrete synchronous communication rounds. During each round, a node can communicate with its neighbours and perform local computation. In this model, the time complexity of a problem is the number of synchronous communication rounds required to solve the problem. It is known that 3-colouring any k-coloured directed cycle requires at least ½(log* k - 3) communication rounds and is possible in ½(log* k + 7) communication rounds for all k ≥ 3. This work shows that for any k ≥ 3, colouring a k-coloured directed cycle with at most three colours is possible in ½(log* k + 3) rounds. In contrast, it is also shown that for some values of k, colouring a directed cycle with at most three colours requires at least ½(log* k + 1) communication rounds. Furthermore, in the case of directed rooted trees, reducing a k-colouring into a 3-colouring requires at least log* k + 1 rounds for some k and possible in log* k + 3 rounds for all k ≥ 3. The new positive and negative results are derived using computational methods, as the existence of distributed colouring algorithms corresponds to the colourability of so-called neighbourhood graphs. The colourability of these graphs is analysed using Boolean satisfiability (SAT) solvers. Finally, this thesis shows that similar methods are applicable in capturing the existence of distributed algorithms for other graph problems, such as the maximal matching problem.
Resumo:
A new classification and linear sequence of the gymnosperms based on previous molecular and morphological phylogenetic and other studies is presented. Currently accepted genera are listed for each family and arranged according to their (probable) phylogenetic position. A full synonymy is provided, and types are listed for accepted genera. An index to genera assists in easy access to synonymy and family placement of genera.
Resumo:
Throughout the history of the classification of extant ferns (monilophytes) and lycophytes, familial and generic concepts have been in great flux. For the organisation of lycophytes and ferns in herbaria, books, checklists, indices and spore banks and on the internet, this poses a problem, and a standardized linear sequence of these plants is therefore in great need. We provide here a linear classification to the extant lycophytes and ferns based on current phylogenetic knowledge; this provides a standardized guide for organisation of fern collections into a more natural sequence. Two new families, Diplaziopsidaceae and Rhachidosoraceae, are here introduced.
Resumo:
Bayesian networks are compact, flexible, and interpretable representations of a joint distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure. This is called structure discovery. This thesis contributes to two areas of structure discovery in Bayesian networks: space--time tradeoffs and learning ancestor relations. The fastest exact algorithms for structure discovery in Bayesian networks are based on dynamic programming and use excessive amounts of space. Motivated by the space usage, several schemes for trading space against time are presented. These schemes are presented in a general setting for a class of computational problems called permutation problems; structure discovery in Bayesian networks is seen as a challenging variant of the permutation problems. The main contribution in the area of the space--time tradeoffs is the partial order approach, in which the standard dynamic programming algorithm is extended to run over partial orders. In particular, a certain family of partial orders called parallel bucket orders is considered. A partial order scheme that provably yields an optimal space--time tradeoff within parallel bucket orders is presented. Also practical issues concerning parallel bucket orders are discussed. Learning ancestor relations, that is, directed paths between nodes, is motivated by the need for robust summaries of the network structures when there are unobserved nodes at work. Ancestor relations are nonmodular features and hence learning them is more difficult than modular features. A dynamic programming algorithm is presented for computing posterior probabilities of ancestor relations exactly. Empirical tests suggest that ancestor relations can be learned from observational data almost as accurately as arcs even in the presence of unobserved nodes.