36 resultados para genomic sequence
Resumo:
The analysis of sequential data is required in many diverse areas such as telecommunications, stock market analysis, and bioinformatics. A basic problem related to the analysis of sequential data is the sequence segmentation problem. A sequence segmentation is a partition of the sequence into a number of non-overlapping segments that cover all data points, such that each segment is as homogeneous as possible. This problem can be solved optimally using a standard dynamic programming algorithm. In the first part of the thesis, we present a new approximation algorithm for the sequence segmentation problem. This algorithm has smaller running time than the optimal dynamic programming algorithm, while it has bounded approximation ratio. The basic idea is to divide the input sequence into subsequences, solve the problem optimally in each subsequence, and then appropriately combine the solutions to the subproblems into one final solution. In the second part of the thesis, we study alternative segmentation models that are devised to better fit the data. More specifically, we focus on clustered segmentations and segmentations with rearrangements. While in the standard segmentation of a multidimensional sequence all dimensions share the same segment boundaries, in a clustered segmentation the multidimensional sequence is segmented in such a way that dimensions are allowed to form clusters. Each cluster of dimensions is then segmented separately. We formally define the problem of clustered segmentations and we experimentally show that segmenting sequences using this segmentation model, leads to solutions with smaller error for the same model cost. Segmentation with rearrangements is a novel variation to the segmentation problem: in addition to partitioning the sequence we also seek to apply a limited amount of reordering, so that the overall representation error is minimized. We formulate the problem of segmentation with rearrangements and we show that it is an NP-hard problem to solve or even to approximate. We devise effective algorithms for the proposed problem, combining ideas from dynamic programming and outlier detection algorithms in sequences. In the final part of the thesis, we discuss the problem of aggregating results of segmentation algorithms on the same set of data points. In this case, we are interested in producing a partitioning of the data that agrees as much as possible with the input partitions. We show that this problem can be solved optimally in polynomial time using dynamic programming. Furthermore, we show that not all data points are candidates for segment boundaries in the optimal solution.
Resumo:
Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.
Resumo:
The work covered in this thesis is focused on the development of technology for bioconversion of glucose into D-erythorbic acid (D-EA) and 5-ketogluconic acid (5-KGA). The task was to show on proof-of-concept level the functionality of the enzymatic conversion or one-step bioconversion of glucose to these acids. The feasibility of both studies to be further developed for production processes was also evaluated. The glucose - D-EA bioconversion study was based on the use of a cloned gene encoding a D-EA forming soluble flavoprotein, D-gluconolactone oxidase (GLO). GLO was purified from Penicillium cyaneo-fulvum and partially sequenced. The peptide sequences obtained were used to isolate a cDNA clone encoding the enzyme. The cloned gene (GenBank accession no. AY576053) is homologous to the other known eukaryotic lactone oxidases and also to some putative prokaryotic lactone oxidases. Analysis of the deduced protein sequence of GLO indicated the presence of a typical secretion signal sequence at the N-terminus of the enzyme. No other targeting/anchoring signals were found, suggesting that GLO is the first known lactone oxidase that is secreted rather than targeted to the membranes of the endoplasmic reticulum or mitochondria. Experimental evidence supports this analysis, as near complete secretion of GLO was observed in two different yeast expression systems. Highest expression levels of GLO were obtained using Pichia pastoris as an expression host. Recombinant GLO was characterised and the suitability of purified GLO for the production of D-EA was studied. Immobilised GLO was found to be rapidly inactivated during D-EA production. The feasibility of in vivo glucose - D-EA conversion using a P. pastoris strain co-expressing the genes of GLO and glucose oxidase (GOD, E.C. 1.1.3.4) of A. niger was demonstrated. The glucose - 5-KGA bioconversion study followed a similar strategy to that used in the D-EA production research. The rationale was based on the use of a cloned gene encoding a membrane-bound pyrroloquinoline quinone (PQQ)-dependent gluconate 5-dehydrogenase (GA 5-DH). GA 5-DH was purified to homogeneity from the only source of this enzyme known in literature, Gluconobacter suboxydans, and partially sequenced. Using the amino acid sequence information, the GA 5-DH gene was cloned from a genomic library of G. suboxydans. The cloned gene was sequenced (GenBank accession no. AJ577472) and found to be an operon of two adjacent genes encoding two subunits of GA 5-DH. It turned out that GA 5-DH is a rather close homologue of a sorbitol dehydrogenase from another G. suboxydans strain. It was also found that GA 5-DH has significant polyol dehydrogenase activity. The G. suboxydans GA 5-DH gene was poorly expressed in E. coli. Under optimised conditions maximum expression levels of GA 5-DH did not exceed the levels found in wild-type G. suboxydans. Attempts to increase expression levels resulted in repression of growth and extensive cell lysis. However, the expression levels were sufficient to demonstrate the possibility of bioconversion of glucose and gluconate into 5-KGA using recombinant strains of E. coli. An uncharacterised homologue of GA 5-DH was identified in Xanthomonas campestris using in silico screening. This enzyme encoded by chromosomal locus NP_636946 was found by a sequencing project of X. campestris and named as a hypothetical glucose dehydrogenase. The gene encoding this uncharacterised enzyme was cloned, expressed in E. coli and found to encode a gluconate/polyol dehydrogenase without glucose dehydrogenase activity. Moreover, the X. campestris GA 5-DH gene was expressed in E. coli at nearly 30 times higher levels than the G. suboxydans GA 5-DH gene. Good expressability of the X. campestris GA-5DH gene makes it a valuable tool not only for 5-KGA production in the tartaric acid (TA) bioprocess, but possibly also for other bioprocesses (e.g. oxidation of sorbitol into L-sorbose). In addition to glucose - 5-KGA bioconversion, a preliminary study of the feasibility of enzymatic conversion of 5-KGA into TA was carried out. Here, the efficacy of the first step of a prospective two-step conversion route including a transketolase and a dehydrogenase was confirmed. It was found that transketolase convert 5-KGA into TA semialdehyde. A candidate for the second step was suggested to be succinic dehydrogenase, but this was not tested. The analysis of the two subprojects indicated that bioconversion of glucose to TA using X. campestris GA 5-DH should be prioritised first and the process development efforts in future should be focused on development of more efficient GA 5-DH production strains by screening a more suitable production host and by protein engineering.
Resumo:
Genetic studies on phylogeography and adaptive divergence in Northern Hemisphere fish species such as three-spined stickleback (Gasterosteus aculeatus) provide an excellent opportunity to investigate genetic mechanisms underlying population differentiation. According to the theory, the process of population differentiation results from a complex interplay between random and deterministic processes as well historical factors. The main scope in this thesis was to study how historical factors like the Pleistocene ice ages have shaped the patterns molecular diversity in three-spined stickleback populations in Europe and how this information could be utilized in the conservation genetic context. Furthermore, identifying footprints of natural selection at the DNA level might be used in identifying genes involved in evolutionary change. Overall, the results from phylogeographic studies indicate that the three-spined stickleback has colonized the Atlantic basin relatively recently but constitutes three major evolutionary lineages in Europe. In addition, the colonization of freshwater appears to result from multiple and independent invasions by the marine conspecifics. Molecular data together with morphology suggest that the most divergent freshwater populations are located in the Balkan Peninsula and these populations deserve a special conservation genetic status without warranting further taxonomical classification. In order to investigate the adaptive divergence in Fennoscandian three-spined stickleback populations several approaches were used. First, sequence variability in the Eda-gene, coding for the number of lateral plates, was concordant with the previously observed global pattern. Full plated allele is in high frequencies among marine populations whereas low plated allele dominates in the freshwater populations. Second, a microsatellite based genome scan identified both indications of balancing and directional selection in the three-spined stickleback genome, i.e. loci with unusually similar or unusually different allele frequencies over populations. The directionally selected loci were mainly associated with the adaptation to freshwater. A follow up study conducting a more detailed analysis in a chromosome region containing a putatively selected gene locus identified a fairly large genomic region affected by natural selection. However, this region contained several gene predictions, all of which might be the actual target of natural selection. All in all, the phylogeographic and adaptive divergence studies indicate that most of the genetic divergence has occurred in the freshwater populations whereas the marine populations have remained relatively uniform.
Resumo:
The time of the large sequencing projects has enabled unprecedented possibilities of investigating more complex aspects of living organisms. Among the high-throughput technologies based on the genomic sequences, the DNA microarrays are widely used for many purposes, including the measurement of the relative quantity of the messenger RNAs. However, the reliability of microarrays has been strongly doubted as robust analysis of the complex microarray output data has been developed only after the technology had already been spread in the community. An objective of this study consisted of increasing the performance of microarrays, and was measured by the successful validation of the results by independent techniques. To this end, emphasis has been given to the possibility of selecting candidate genes with remarkable biological significance within specific experimental design. Along with literature evidence, the re-annotation of the probes and model-based normalization algorithms were found to be beneficial when analyzing Affymetrix GeneChip data. Typically, the analysis of microarrays aims at selecting genes whose expression is significantly different in different conditions followed by grouping them in functional categories, enabling a biological interpretation of the results. Another approach investigates the global differences in the expression of functionally related groups of genes. Here, this technique has been effective in discovering patterns related to temporal changes during infection of human cells. Another aspect explored in this thesis is related to the possibility of combining independent gene expression data for creating a catalog of genes that are selectively expressed in healthy human tissues. Not all the genes present in human cells are active; some involved in basic activities (named housekeeping genes) are expressed ubiquitously. Other genes (named tissue-selective genes) provide more specific functions and they are expressed preferably in certain cell types or tissues. Defining the tissue-selective genes is also important as these genes can cause disease with phenotype in the tissues where they are expressed. The hypothesis that gene expression could be used as a measure of the relatedness of the tissues has been also proved. Microarray experiments provide long lists of candidate genes that are often difficult to interpret and prioritize. Extending the power of microarray results is possible by inferring the relationships of genes under certain conditions. Gene transcription is constantly regulated by the coordinated binding of proteins, named transcription factors, to specific portions of the its promoter sequence. In this study, the analysis of promoters from groups of candidate genes has been utilized for predicting gene networks and highlighting modules of transcription factors playing a central role in the regulation of their transcription. Specific modules have been found regulating the expression of genes selectively expressed in the hippocampus, an area of the brain having a central role in the Major Depression Disorder. Similarly, gene networks derived from microarray results have elucidated aspects of the development of the mesencephalon, another region of the brain involved in Parkinson Disease.
Resumo:
Gastric cancer is the fourth most common cancer and the second most common cause of cancer-related death worldwide. Due to lack of early symptoms, gastric cancer is characterized by late stage diagnosis and unsatisfactory options for curative treatment. Several genomic alterations have been identified in gastric cancer, but the major factors contributing to initiation and progression of gastric cancer remain poorly known. Gene copy number alterations play a key role in the development of gastric cancer, and a change in gene copy number is one of the fundamental mechanisms for a cancer cell to control the expression of potential oncogenes and tumor suppressor genes. This thesis aims at clarifying the complex genomic alterations of gastric cancer to identify novel molecular biomarkers for diagnostic purposes as well as for targeted treatment. To highlight genes of potential biological and clinical relevance, we carried out a systematic microarray-based survey of gene expression and copy number levels in primary gastric tumors and gastric cancer cell lines. Results were validated using immunohistochemistry, real-time qRT-PCR, and affinity capture-based transcript (TRAC) assay. Altogether 192 clinical gastric tissue samples and 7 gastric cancer cell lines were included in this study. Multiple chromosomal regions with recurrent copy number alterations were detected. The most frequent chromosomal alterations included gains at 7q, 8q, 17q, 19q, and 20q and losses at 9p, 18q, and 21q. Distinctive patterns of copy number alterations were detected for different histological subtypes (intestinal and diffuse) and for cancers located in different parts of the stomach. The impact of copy number alterations on gene expression was significant, as 6-10% of genes located in the regions of gains and losses also showed concomitant alterations in their expression. By combining the information from the DNA- and RNA-level analyses many novel gastric cancer-related genes, such as ALPK2, ENAH, HHIPL2, and OSMR, were identified. Independent genome-wide gene expression analysis of Finnish and Japanese gastric tumors revealed an additional set of genes that was differentially expressed in cancerous gastric tissues compared with normal tissue. Overexpression of one of these genes, CXCL1, was associated with an improved survival of gastric cancer. Thus, using an integrative microarray analysis, several novel genes were identified that may be critically important for gastric carcinogenesis. Further studies of these genes may lead to novel biomarkers for gastric cancer diagnosis and targeted therapy.
Resumo:
Extraintestinal pathogenic Escherichia coli (ExPEC) represent a diverse group of strains of E. coli, which infect extraintestinal sites, such as the urinary tract, the bloodstream, the meninges, the peritoneal cavity, and the lungs. Urinary tract infections (UTIs) caused by uropathogenic E. coli (UPEC), the major subgroup of ExPEC, are among the most prevalent microbial diseases world wide and a substantial burden for public health care systems. UTIs are responsible for serious morbidity and mortality in the elderly, in young children, and in immune-compromised and hospitalized patients. ExPEC strains are different, both from genetic and clinical perspectives, from commensal E. coli strains belonging to the normal intestinal flora and from intestinal pathogenic E. coli strains causing diarrhea. ExPEC strains are characterized by a broad range of alternate virulence factors, such as adhesins, toxins, and iron accumulation systems. Unlike diarrheagenic E. coli, whose distinctive virulence determinants evoke characteristic diarrheagenic symptoms and signs, ExPEC strains are exceedingly heterogeneous and are known to possess no specific virulence factors or a set of factors, which are obligatory for the infection of a certain extraintestinal site (e. g. the urinary tract). The ExPEC genomes are highly diverse mosaic structures in permanent flux. These strains have obtained a significant amount of DNA (predictably up to 25% of the genomes) through acquisition of foreign DNA from diverse related or non-related donor species by lateral transfer of mobile genetic elements, including pathogenicity islands (PAIs), plasmids, phages, transposons, and insertion elements. The ability of ExPEC strains to cause disease is mainly derived from this horizontally acquired gene pool; the extragenous DNA facilitates rapid adaptation of the pathogen to changing conditions and hence the extent of the spectrum of sites that can be infected. However, neither the amount of unique DNA in different ExPEC strains (or UPEC strains) nor the mechanisms lying behind the observed genomic mobility are known. Due to this extreme heterogeneity of the UPEC and ExPEC populations in general, the routine surveillance of ExPEC is exceedingly difficult. In this project, we presented a novel virulence gene algorithm (VGA) for the estimation of the extraintestinal virulence potential (VP, pathogenicity risk) of clinically relevant ExPECs and fecal E. coli isolates. The VGA was based on a DNA microarray specific for the ExPEC phenotype (ExPEC pathoarray). This array contained 77 DNA probes homologous with known (e.g. adhesion factors, iron accumulation systems, and toxins) and putative (e.g. genes predictably involved in adhesion, iron uptake, or in metabolic functions) ExPEC virulence determinants. In total, 25 of DNA probes homologous with known virulence factors and 36 of DNA probes representing putative extraintestinal virulence determinants were found at significantly higher frequency in virulent ExPEC isolates than in commensal E. coli strains. We showed that the ExPEC pathoarray and the VGA could be readily used for the differentiation of highly virulent ExPECs both from less virulent ExPEC clones and from commensal E. coli strains as well. Implementing the VGA in a group of unknown ExPECs (n=53) and fecal E. coli isolates (n=37), 83% of strains were correctly identified as extraintestinal virulent or commensal E. coli. Conversely, 15% of clinical ExPECs and 19% of fecal E. coli strains failed to raster into their respective pathogenic and non-pathogenic groups. Clinical data and virulence gene profiles of these strains warranted the estimated VPs; UPEC strains with atypically low risk-ratios were largely isolated from patients with certain medical history, including diabetes mellitus or catheterization, or from elderly patients. In addition, fecal E. coli strains with VPs characteristic for ExPEC were shown to represent the diagnostically important fraction of resident strains of the gut flora with a high potential of causing extraintestinal infections. Interestingly, a large fraction of DNA probes associated with the ExPEC phenotype corresponded to novel DNA sequences without any known function in UTIs and thus represented new genetic markers for the extraintestinal virulence. These DNA probes included unknown DNA sequences originating from the genomic subtractions of four clinical ExPEC isolates as well as from five novel cosmid sequences identified in the UPEC strains HE300 and JS299. The characterized cosmid sequences (pJS332, pJS448, pJS666, pJS700, and pJS706) revealed complex modular DNA structures with known and unknown DNA fragments arranged in a puzzle-like manner and integrated into the common E. coli genomic backbone. Furthermore, cosmid pJS332 of the UPEC strain HE300, which carried a chromosomal virulence gene cluster (iroBCDEN) encoding the salmochelin siderophore system, was shown to be part of a transmissible plasmid of Salmonella enterica. Taken together, the results of this project pointed towards the assumptions that first, (i) homologous recombination, even within coding genes, contributes to the observed mosaicism of ExPEC genomes and secondly, (ii) besides en block transfer of large DNA regions (e.g. chromosomal PAIs) also rearrangements of small DNA modules provide a means of genomic plasticity. The data presented in this project supplemented previous whole genome sequencing projects of E. coli and indicated that each E. coli genome displays a unique assemblage of individual mosaic structures, which enable these strains to successfully colonize and infect different anatomical sites.
Resumo:
Evolutionary genetics incorporates traditional population genetics and studies of the origins of genetic variation by mutation and recombination, and the molecular evolution of genomes. Among the primary forces that have potential to affect the genetic variation within and among populations, including those that may lead to adaptation and speciation, are genetic drift, gene flow, mutations and natural selection. The main challenges in knowing the genetic basis of evolutionary changes is to distinguish the adaptive selection forces that cause existent DNA sequence variants and also to identify the nucleotide differences responsible for the observed phenotypic variation. To understand the effects of various forces, interpretation of gene sequence variation has been the principal basis of many evolutionary genetic studies. The main aim of this thesis was to assess different forms of teleost gene sequence polymorphisms in evolutionary genetic studies of Atlantic salmon (Salmo salar) and other species. Firstly, the level of Darwinian adaptive evolution affected coding regions of the growth hormone (GH) gene during the teleost evolution was investigated based on the sequence data existing in public databases. Secondly, a target gene approach was used to identify within population variation in the growth hormone 1 (GH1) gene in salmon. Then, a new strategy for single nucleotide polymorphisms (SNPs) discovery in salmonid fishes was introduced, and, finally, the usefulness of a limited number of SNP markers as molecular tools in several applications of population genetics in Atlantic salmon was assessed. This thesis showed that the gene sequences in databases can be utilized to perform comparative studies of molecular evolution, and some putative evidence of the existence of Darwinian selection during the teleost GH evolution was presented. In addition, existent sequence data was exploited to investigate GH1 gene variation within Atlantic salmon populations throughout its range. Purifying selection is suggested to be the predominant evolutionary force controlling the genetic variation of this gene in salmon, and some support for gene flow between continents was also observed. The novel approach to SNP discovery in species with duplicated genome fragments introduced here proved to be an effective method, and this may have several applications in evolutionary genetics with different species - e.g. when developing gene-targeted markers to investigate quantitative genetic variation. The thesis also demonstrated that only a few SNPs performed highly similar signals in some of the population genetic analyses when compared with the microsatellite markers. This may have useful applications when estimating genetic diversity in genes having a potential role in ecological and conservation issues, or when using hard biological samples in genetic studies as SNPs can be applied with relatively highly degraded DNA.
Resumo:
Visual pigments of different animal species must have evolved at some stage to match the prevailing light environments, since all visual functions depend on their ability to absorb available photons and transduce the event into a reliable neural signal. There is a large literature on correlation between the light environment and spectral sensitivity between different fish species. However, little work has been done on evolutionary adaptation between separated populations within species. More generally, little is known about the rate of evolutionary adaptation to changing spectral environments. The objective of this thesis is to illuminate the constraints under which the evolutionary tuning of visual pigments works as evident in: scope, tempo, available molecular routes, and signal/noise trade-offs. Aquatic environments offer Nature s own laboratories for research on visual pigment properties, as naturally occurring light environments offer an enormous range of variation in both spectral composition and intensity. The present thesis focuses on the visual pigments that serve dim-light vision in two groups of model species, teleost fishes and mysid crustaceans. The geographical emphasis is in the brackish Baltic Sea area with its well-known postglacial isolation history and its aquatic fauna of both marine and fresh-water origin. The absorbance spectrum of the (single) dim-light visual pigment were recorded by microspectrophotometry (MSP) in single rods of 26 fish species and single rhabdoms of 8 opossum shrimp populations of the genus Mysis inhabiting marine, brackish or freshwater environments. Additionally, spectral sensitivity was determined from six Mysis populations by electroretinogram (ERG) recording. The rod opsin gene was sequenced in individuals of four allopatric populations of the sand goby (Pomatoschistus minutus). Rod opsins of two other goby species were investigated as outgroups for comparison. Rod absorbance spectra of the Baltic subspecies or populations of the primarily marine species herring (Clupea harengus membras), sand goby (P. minutus), and flounder (Platichthys flesus) were long-wavelength-shifted compared to their marine populations. The spectral shifts are consistent with adaptation for improved quantum catch (QC) as well as improved signal-to-noise ratio (SNR) of vision in the Baltic light environment. Since the chromophore of the pigment was pure A1 in all cases, this has apparently been achieved by evolutionary tuning of the opsin visual pigment. By contrast, no opsin-based differences were evident between lake and sea populations of species of fresh-water origin, which can tune their pigment by varying chromophore ratios. A more detailed analysis of differences in absorbance spectra and opsin sequence between and within populations was conducted using the sand goby as model species. Four allopatric populations from the Baltic Sea (B), Swedish west coast (S), English Channel (E), and Adriatic Sea (A) were examined. Rod absorbance spectra, characterized by the wavelength of maximum absorbance (λmax), differed between populations and correlated with differences in the spectral light transmission of the respective water bodies. The greatest λmax shift as well as the greatest opsin sequence difference was between the Baltic and the Adriatic populations. The significant within-population variation of the Baltic λmax values (506-511 nm) was analyzed on the level of individuals and was shown to correlate well with opsin sequence substitutions. The sequences of individuals with λmax at shorter wavelengths were identical to that of the Swedish population, whereas those with λmax at longer wavelengths additionally had substitution F261F/Y in the sixth transmembrane helix of the protein. This substitution (Y261) was also present in the Baltic common gobies and is known to redshift spectra. The tuning mechanism of the long-wavelength type Baltic sand gobies is assumed to be the co-expression of F261 and Y261 in all rods to produce ≈ 5 nm redshift. The polymorphism of the Baltic sand goby population possibly indicates ambiguous selection pressures in the Baltic Sea. The visual pigments of all lake populations of the opossum shrimp (Mysis relicta) were red-shifted by 25 nm compared with all Baltic Sea populations. This is calculated to confer a significant advantage in both QC and SNR in many humus-rich lakes with reddish water. Since only A2 chromophore was present, the differences obviously reflect evolutionary tuning of the visual protein, the opsin. The changes have occurred within the ca. 9000 years that the lakes have been isolated from the Sea after the most recent glaciation. At present, it seems that the mechanism explaining the spectral differences between lake and sea populations is not an amino acid substitution at any other conventional tuning site, but the mechanism is yet to be found.
Resumo:
The first part of this work investigates the molecular epidemiology of a human enterovirus (HEV), echovirus 30 (E-30). This project is part of a series of studies performed in our research team analyzing the molecular epidemiology of HEV-B viruses. A total of 129 virus strains had been isolated in different parts of Europe. The sequence analysis was performed in three different genomic regions: 420 nucleotides (nt) in the VP4/VP2 capsid protein coding region, the entire VP1 capsid protein coding gene of 876 nt, and 150 nt in the VP1/2A junction region. The analysis revealed a succession of dominant sublineages within a major genotype. The temporally earlier genotypes had been replaced by a genetically homogenous lineage that has been circulating in Europe since the late 1970s. The same genotype was found by other research groups in North America and Australia. Globally, other cocirculating genetic lineages also exist. The prevalence of a dominant genotype makes E-30 different from other previously studied HEVs, such as polioviruses and coxsackieviruses B4 and B5, for which several coexisting genetic lineages have been reported. The second part of this work deals with molecular epidemiology of human rhinoviruses (HRVs). A total of 61 field isolates were studied in the 420-nt stretch in the capsid coding region of VP4/VP2. The isolates were collected from children under two years of age in Tampere, Finland. Sequences from the clinical isolates clustered in the two previously known phylogenetic clades. Seasonal clustering was found. Also, several distinct serotype-like clusters were found to co-circulate during the same epidemic season. Reappearance of a cluster after disappearing for a season was observed. The molecular epidemiology of the analyzed strains turned out to be complex, and we decided to continue our studies of HRV. Only five previously published complete genome sequences of HRV prototype strains were available for analysis. Therefore, all designated HRV prototype strains (n=102) were sequenced in the VP4/VP2 region, and the possibility of genetic typing of HRV was evaluated. Seventy-six of the 102 prototype strains clustered in HRV genetic group A (HRV-A) and 25 in group B (HRV-B). Serotype 87 clustered separately from other HRVs with HEV species D. The field strains of HRV represented as many as 19 different genotypes, as judged with an approximate demarcation of a 20% nt difference in the VP4/VP2 region. The interserotypic differences of HRV were generally similar to those reported between different HEV serotypes (i.e. about 20%), but smaller differences, less than 10%, were also observed. Because some HRV serotypes are genetically so closely related, we suggest that the genetic typing be performed using the criterion "the closest prototype strain". This study is the first systematic genetic characterization of all known HRV prototype strains, providing a further taxonomic proposal for classification of HRV. We proposed to divide the genus Human rhinoviruses into HRV-A and HRV-B. The final part of the work comprises a phylogenetic analysis of a subset (48) of HRV prototype strains and field isolates (12) in the nonstructural part of the genome coding for the RNA-dependent RNA polymerase (3D). The proposed division of the HRV strains in the species HRV-A and HRV-B was also supported by 3D region. HRV-B clustered closer to HEV species B, C, and also to polioviruses than to HRV-A. Intraspecies variation within both HRV-A and HRV-B was greater in the 3D coding region than in the VP4/VP2 coding region, in contrast to HEV. Moreover, the diversity of HRV in 3D exceeded that of HEV. One group of HRV-A, designated HRV-A', formed a separate cluster outside other HRV-A in the 3D region. It formed a cluster also in the capsid region, but located within HRV-A. This may reflect a different evolutionary history of distinct genomic regions among HRV-A. Furthermore, the tree topology within HRV-A in the 3D region differed from that in the VP4/VP2, suggesting possible recombination events in the evolution of the strains. No conflicting phylogenies were observed in any of the 12 field isolates. Possible recombination was further studied using the Similarity and Bootscanning analyses of the complete genome sequences of HRV available in public databases. Evidence for recombination among HRV-A was found, as HRV2 and HRV39 showed higher similarity in the nonstructural part of the genome. Whether HRV2 and HRV39 strains - and perhaps also some other HRV-A strains not yet completely sequenced - are recombinants remains to be determined.
Resumo:
Pohjoisella havumetsävyöhykkeellä typpi on usein kasvien kasvua rajoittava tekijä. Metsämaan typpivarannot koostuvat pääasiassa orgaaniseen ainekseen sitoutuneista typpiyhdisteistä, erityisesti aminohapoista. Ektomykorritsasienet osallistuvat metsämaassa tapahtuvaan typenkiertoon hajottamalla orgaanisia typpiyhdisteitä ja kuljettamalla niitä kasvien käytettäväksi. Sienisolun sisällä tapahtuvasta aminohappojen mineralisaatiosta tiedetään toistaiseksi melko vähän. Aminohappo-oksidaasit katalysoivat aminohappojen mineralisaatiota. Eräissä ektomykorritsaa muodostavien kantasienten suvuissa on osoitettu L-aminohappo-oksidaaseja (LAO). Toistaiseksi LAO-geeniä ei tunneta kantasienistä. Työssä kuvattiin ensimmäistä kertaa LAO-geeni kantasienistä. Hiekkatympösen LAO1- geenin cDNA:n 5´ ja 3´ päiden emäsjärjestykset määritettiin RACE-PCR -menetelmällä, josta saatujen sekvenssien perusteella suunniteltiin alukkeet koko geenin cDNA:n ja genomisen DNA:n monistamiseksi. Genomisen DNA ja cDNA -sekvenssien perusteella määritettiin hiekkatympösen LAO1-geenin rakenne. Hiekkatympösen LAO1-geeni koostuu viidestä eksonista ja neljästä intronista. Hiekkatympösen LAO1-geenin yläpuoliselta alueelta löydettiin typpimetabolian säätelyyn osallistuvan proteiinin sitoutumiskohta. LAO1-geeniä edeltävä geenin osittainen genominen DNA-sekvenssi määritettiin. Kangaslohisienen genomissa LAO1-geeniä edeltävä geeni oli ennustettu pyruvaattidekarboksylaasiksi. Lisäksi työssä määritettiin hiekkatympösen toisen LAOhomologin cDNA:n osittainen emäsjärjestys. Työssä tunnistettiin myös toisen kantasienen, kangaslohisienen, LAO-geeni. LAO-geeniksi tunnistettu kangaslohisienen geenimalli oli aiemmin ennustettu NCBI:n tietokannassa toiminnaltaan tuntemattomaksi proteiiniksi. Proteiinien sukupuun perusteella hiekkatympösen ja kangaslohisienen LAO:n kantamuoto on kahdentunut. Työstä saatu tutkimustulos tuo täysin uutta tietoa molekyylibiologian tasolla ektomykorritsasienten aminohappojen katabolisista reaktioista. Aminohappojen mineralisaation seurauksen muodostuneet ammoniumionit saattavat olla merkittävä typen lähde myös maan muille mikrobeille ja kasveille. On mahdollista, että ektomykorritsasienten LAO-entsyymi on yksi merkittävä tekijä metsämaan typenkierrossa.