931 resultados para Bioinformatics
Resumo:
In oxygenic photosynthesis, the highly oxidizing reactions of water splitting produce reactive oxygen species (ROS) and other radicals that could damage the photosynthetic apparatus and affect cell viability. Under particular environmental conditions, more electrons are produced in water oxidation than can be harmlessly used by photochemical processes for the reduction of metabolic electron sinks. In these circumstances, the excess of electrons can be delivered, for instance, to O2, resulting in the production of ROS. To prevent detrimental reactions, a diversified assortment of photoprotection mechanisms has evolved in oxygenic photosynthetic organisms. In this thesis, I focus on the role of alternative electron transfer routes in photoprotection of the cyanobacterium Synechocystis sp. PCC 6803. Firstly, I discovered a novel subunit of the NDH-1 complex, NdhS, which is necessary for cyclic electron transfer around Photosystem I, and provides tolerance to high light intensities. Cyclic electron transfer is important in modulating the ATP/NADPH ratio under stressful environmental conditions. The NdhS subunit is conserved in many oxygenic phototrophs, such as cyanobacteria and higher plants. NdhS has been shown to link linear electron transfer to cyclic electron transfer by forming a bridge for electrons accumulating in the Ferredoxin pool to reach the NDH-1 complexes. Secondly, I thoroughly investigated the role of the entire flv4-2 operon in the photoprotection of Photosystem II under air level CO2 conditions and varying light intensities. The operon encodes three proteins: two flavodiiron proteins Flv2 and Flv4 and a small Sll0218 protein. Flv2 and Flv4 are involved in a novel electron transport pathway diverting electrons from the QB pocket of Photosystem II to electron acceptors, which still remain unknown. In my work, it is shown that the flv4-2 operon-encoded proteins safeguard Photosystem II activity by sequestering electrons and maintaining the oxidized state of the PQ pool. Further, Flv2/Flv4 was shown to boost Photosystem II activity by accelerating forward electron flow, triggered by an increased redox potential of QB. The Sll0218 protein was shown to be differentially regulated as compared to Flv2 and Flv4. Sll0218 appeared to be essential for Photosystem II accumulation and was assigned a stabilizing role for Photosystem II assembly/repair. It was also shown to be responsible for optimized light-harvesting. Thus, Sll0218 and Flv2/Flv4 cooperate to protect and enhance Photosystem II activity. Sll0218 ensures an increased number of active Photosystem II centers that efficiently capture light energy from antennae, whilst the Flv2/Flv4 heterodimer provides a higher electron sink availability, in turn, promoting a safer and enhanced activity of Photosystem II. This intertwined function was shown to result in lowered singlet oxygen production. The flv4-2 operon-encoded photoprotective mechanism disperses excess excitation pressure in a complimentary manner with the Orange Carotenoid Protein-mediated non-photochemical quenching. Bioinformatics analyses provided evidence for the loss of the flv4-2 operon in the genomes of cyanobacteria that have developed a stress inducible D1 form. However, the occurrence of various mechanisms, which dissipate excitation pressure at the acceptor side of Photosystem II was revealed in evolutionarily distant clades of organisms, i.e. cyanobacteria, algae and plants.
Resumo:
Bioinformatics applies computers to problems in molecular biology. Previous research has not addressed edit metric decoders. Decoders for quaternary edit metric codes are finding use in bioinformatics problems with applications to DNA. By using side effect machines we hope to be able to provide efficient decoding algorithms for this open problem. Two ideas for decoding algorithms are presented and examined. Both decoders use Side Effect Machines(SEMs) which are generalizations of finite state automata. Single Classifier Machines(SCMs) use a single side effect machine to classify all words within a code. Locking Side Effect Machines(LSEMs) use multiple side effect machines to create a tree structure of subclassification. The goal is to examine these techniques and provide new decoders for existing codes. Presented are ideas for best practices for the creation of these two types of new edit metric decoders.
Resumo:
Understanding the machinery of gene regulation to control gene expression has been one of the main focuses of bioinformaticians for years. We use a multi-objective genetic algorithm to evolve a specialized version of side effect machines for degenerate motif discovery. We compare some suggested objectives for the motifs they find, test different multi-objective scoring schemes and probabilistic models for the background sequence models and report our results on a synthetic dataset and some biological benchmarking suites. We conclude with a comparison of our algorithm with some widely used motif discovery algorithms in the literature and suggest future directions for research in this area.
Resumo:
Scientists have been debating for decades the origin of life on earth. A number of hypotheses were proposed as to what emerged first RNA or DNA; with most scientists are in favour of the "RNA World" hypothesis. Assuming RNA emerged first, it fellow that the RNA polymerases would've appeared before DNA polymerases. Using recombinant DNA technology and bioinformatics we undertook this study to explore the relationship between RNA polymerases, reverse transcriptase and DNA polymerases. The working hypothesis is that DNA polymerases evolved from reverse transcriptase and the latter evolved from RNA polymerases. If this hypothesis is correct then one would expect to find various ancient DNA polymerases with varying level of reverse transcriptase activity. In the first phase of this research project multiple sequence alignments were made on the protein sequence of 32 prokaryotic DNA-directed DNA polymerases originating from 11 prokaryotic families against 3 viral reverse transcriptase. The data from such alignments was not very conclusive. DNA polymerases with higher level of reverse transcriptase activity were non-confined to ancient organisms, as one would've expected. The second phase of this project was focused on conditions that may alter the DNA polymerase activity. Various reaction conditions, such as temperature, using various ions (Ni2+, Mn2+, Mg2+) were tested. Interestingly, it was found that the DNA polymerase from the Thermos aquatics family can be made to copy RNA into DNA (i.e. reverse transcriptase activity). Thus it was shown that under appropriate conditions (ions and reactions temperatures) reverse transcriptase activity can be induced in DNA polymerase. In the third phase of this study recombinant DNA technology was used to generate a chimeric DNA polymerase; in attempts to identify the region(s) of the polymerase responsible for RNA-directed DNA polymerase activity. The two DNA polymerases employed were the Thermus aquatic us and Thermus thermophiles. As in the second phase various reaction conditions were investigated. Data indicated that the newly engineered chimeric DNA polymerase can be induced to copy RNA into DNA. Thus the intrinsic reverse transcriptase activity found in ancient DNA polymerases was localized into a domain and can be induced via appropriate reaction conditions.
Resumo:
Genome sequence varies in numerous ways among individuals although the gross architecture is fixed for all humans. Retrotransposons create one of the most abundant structural variants in the human genome and are divided in many families, with certain members in some families, e.g., L1, Alu, SVA, and HERV-K, remaining active for transposition. Along with other types of genomic variants, retrotransponson-derived variants contribute to the whole spectrum of genome variants in humans. With the advancement of sequencing techniques, many human genomes are being sequenced at the individual level, fueling the comparative research on these variants among individuals. In this thesis, the evolution and functional impact of structural variations is examined primarily focusing on retrotransposons in the context of human evolution. The thesis comprises of three different studies on the topics that are presented in three data chapters. First, the recent evolution of all human specific AluYb members, representing the second most active subfamily of Alus, was tracked to identify their source/master copy using a novel approach. All human-specific AluYb elements from the reference genome were extracted, aligned with one another to construct clusters of similar copies and each cluster was analyzed to generate the evolutionary relationship between the members of the cluster. The approach resulted in identification of one major driver copy of all human specific Yb8 and the source copy of the Yb9 lineage. Three new subfamilies within the AluYb family – Yb8a1, Yb10 and Yb11 were also identified, with Yb11 being the youngest and most polymorphic. Second, an attempt to construct a relation between transposable elements (TEs) and tandem repeats (TRs) was made at a genome-wide scale for the first time. Upon sequence comparison, positional cross-checking and other relevant analyses, it was observed that over 20% of all TRs are derived from TEs. This result established the first connection between these two types of repetitive elements, and extends our appreciation for the impact of TEs on genomes. Furthermore, only 6% of these TE-derived TRs follow the already postulated initiation and expansion mechanisms, suggesting that the others are likely to follow a yet-unidentified mechanism. Third, by taking a combination of multiple computational approaches involving all types of genetic variations published so far including transposable elements, the first whole genome sequence of the most recent common ancestor of all modern human populations that diverged into different populations around 125,000-100,000 years ago was constructed. The study shows that the current reference genome sequence is 8.89 million base pairs larger than our common ancestor’s genome, contributed by a whole spectrum of genetic mechanisms. The use of this ancestral reference genome to facilitate the analysis of personal genomes was demonstrated using an example genome and more insightful recent evolutionary analyses involving the Neanderthal genome. The three data chapters presented in this thesis conclude that the tandem repeats and transposable elements are not two entirely distinctly isolated elements as over 20% TRs are actually derived from TEs. Certain subfamilies of TEs themselves are still evolving with the generation of newer subfamilies. The evolutionary analyses of all TEs along with other genomic variants helped to construct the genome sequence of the most recent common ancestor to all modern human populations which provides a better alternative to human reference genome and can be a useful resource for the study of personal genomics, population genetics, human and primate evolution.
Resumo:
DNA assembly is among the most fundamental and difficult problems in bioinformatics. Near optimal assembly solutions are available for bacterial and small genomes, however assembling large and complex genomes especially the human genome using Next-Generation-Sequencing (NGS) technologies is shown to be very difficult because of the highly repetitive and complex nature of the human genome, short read lengths, uneven data coverage and tools that are not specifically built for human genomes. Moreover, many algorithms are not even scalable to human genome datasets containing hundreds of millions of short reads. The DNA assembly problem is usually divided into several subproblems including DNA data error detection and correction, contig creation, scaffolding and contigs orientation; each can be seen as a distinct research area. This thesis specifically focuses on creating contigs from the short reads and combining them with outputs from other tools in order to obtain better results. Three different assemblers including SOAPdenovo [Li09], Velvet [ZB08] and Meraculous [CHS+11] are selected for comparative purposes in this thesis. Obtained results show that this thesis’ work produces comparable results to other assemblers and combining our contigs to outputs from other tools, produces the best results outperforming all other investigated assemblers.
Resumo:
Ordered gene problems are a very common classification of optimization problems. Because of their popularity countless algorithms have been developed in an attempt to find high quality solutions to the problems. It is also common to see many different types of problems reduced to ordered gene style problems as there are many popular heuristics and metaheuristics for them due to their popularity. Multiple ordered gene problems are studied, namely, the travelling salesman problem, bin packing problem, and graph colouring problem. In addition, two bioinformatics problems not traditionally seen as ordered gene problems are studied: DNA error correction and DNA fragment assembly. These problems are studied with multiple variations and combinations of heuristics and metaheuristics with two distinct types or representations. The majority of the algorithms are built around the Recentering- Restarting Genetic Algorithm. The algorithm variations were successful on all problems studied, and particularly for the two bioinformatics problems. For DNA Error Correction multiple cases were found with 100% of the codes being corrected. The algorithm variations were also able to beat all other state-of-the-art DNA Fragment Assemblers on 13 out of 16 benchmark problem instances.
Resumo:
The Madagascar periwinkle [Catharanthus roseus (L.) G. Don] is a commercially important horticultural flower species and is the only source for several pharmaceutically valuable monoterpenoid indole alkaloids (MIAs), including the powerful antihypertensive ajmalicine and the antineoplastic agents vincristine and vinblastine. While biosynthesis of MIA precursors has been elucidated, conversion of the common MIA precursor strictosidine to MIAs of different families, for example ajmalicine, catharanthine or vindoline, remains uncharacterized. Deglycosylation of strictosidine by the key enzyme Strictosidine beta-glucosidase (SGD) leads to a pool of uncharacterized reaction products that are diverted into the different MIA families, but the downstream reactions are uncharacterized. Screening of 3600 EMS (ethyl methane sulfonate) mutagenized C. roseus plants to identify mutants with altered MIA profiles yielded one plant with high ajmalicine, and low catharanthine and vindoline content. RNA sequencing and comparative bioinformatics of mutant and wildtype plants showed up-regulation of SGD and the transcriptional repressor Zinc finger Catharanthus transcription factor (ZCT1) in the mutant line. The increased SGD activity in mutants seems to yield a larger pool of uncharacterized SGD reaction products that are channeled away from catharanthine and vindoline towards biosynthesis of ajmalicine when compared to the wildtype. Further bioinformatic analyses, and crossings between mutant and wildtype suggest a transcription factor upstream of SGD and ZCT1 to be mutated, leading to up-regulation of Sgd and Zct1. The crossing experiments further show that biosynthesis of the different MIA families is differentially regulated and highly complex. Three new transcription factors were identified by bioinformatics that seem to be involved in the regulation of Zct1 and Sgd expression, leading to the high ajmalicine phenotype. Increased cathenamine reductase activity in the mutant converts the pool of SGD reaction products into ajmalicine and its stereoisomer tetrahydroalstonine. The stereochemistry of ajmalicine and tetrahydroalstonine biosynthesis in vivo and in vitro was further characterized. In addition, a new clade of perakine reductase-like enzymes was identified that reduces the SGD reaction product vallesiachotamine in a stereo-specific manner, characterizing one of the many reactions immediately downstream of SGD that determine the different MIA families. This study establishes that RNA sequencing and comparative bioinformatics, in combination with molecular and biochemical characterization, are valuable tools to determine the genetic basis for mutations that trigger phenotypes, and this approach can also be used for identification of new enzymes and transcription factors.
Resumo:
Affiliation: Institut de recherche en immunologie et en cancérologie, Université de Montréal
Resumo:
Affiliation: Claudia Kleinman, Nicolas Rodrigue & Hervé Philippe : Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
Affiliation: Centre Robert-Cedergren de l'Université de Montréal en bio-informatique et génomique & Département de biochimie, Université de Montréal
Resumo:
Affiliation: Département de biochimie, Faculté de médecine, Université de Montréal
Resumo:
La phylogénie moléculaire fournit un outil complémentaire aux études paléontologiques et géologiques en permettant la construction des relations phylogénétiques entre espèces ainsi que l’estimation du temps de leur divergence. Cependant lorsqu’un arbre phylogénétique est inféré, les chercheurs se focalisent surtout sur la topologie, c'est-à-dire l’ordre de branchement relatif des différents nœuds. Les longueurs des branches de cette phylogénie sont souvent considérées comme des sous-produits, des paramètres de nuisances apportant peu d’information. Elles constituent cependant l’information primaire pour réaliser des datations moléculaires. Or la saturation, la présence de substitutions multiples à une même position, est un artefact qui conduit à une sous-estimation systématique des longueurs de branche. Nous avons décidé d’estimer l‘influence de la saturation et son impact sur l’estimation de l’âge de divergence. Nous avons choisi d’étudier le génome mitochondrial des mammifères qui est supposé avoir un niveau élevé de saturation et qui est disponible pour de nombreuses espèces. De plus, les relations phylogénétiques des mammifères sont connues, ce qui nous a permis de fixer la topologie, contrôlant ainsi un des paramètres influant la longueur des branches. Nous avons utilisé principalement deux méthodes pour améliorer la détection des substitutions multiples : (i) l’augmentation du nombre d’espèces afin de briser les plus longues branches de l’arbre et (ii) des modèles d’évolution des séquences plus ou moins réalistes. Les résultats montrèrent que la sous-estimation des longueurs de branche était très importante (jusqu'à un facteur de 3) et que l’utilisation d'un grand nombre d’espèces est un facteur qui influence beaucoup plus la détection de substitutions multiples que l’amélioration des modèles d’évolutions de séquences. Cela suggère que même les modèles d’évolution les plus complexes disponibles actuellement, (exemple: modèle CAT+Covarion, qui prend en compte l’hétérogénéité des processus de substitution entre positions et des vitesses d’évolution au cours du temps) sont encore loin de capter toute la complexité des processus biologiques. Malgré l’importance de la sous-estimation des longueurs de branche, l’impact sur les datations est apparu être relativement faible, car la sous-estimation est plus ou moins homothétique. Cela est particulièrement vrai pour les modèles d’évolution. Cependant, comme les substitutions multiples sont le plus efficacement détectées en brisant les branches en fragments les plus courts possibles via l’ajout d’espèces, se pose le problème du biais dans l’échantillonnage taxonomique, biais dû à l‘extinction pendant l’histoire de la vie sur terre. Comme ce biais entraine une sous-estimation non-homothétique, nous considérons qu’il est indispensable d’améliorer les modèles d’évolution des séquences et proposons que le protocole élaboré dans ce travail permettra d’évaluer leur efficacité vis-à-vis de la saturation.
Resumo:
Les microARNs appartiennent à la famille des petits ARNs non-codants et agissent comme inhibiteurs des ARN messagers et/ou de leurs produits protéiques. Les mi- croARNs sont différents des petits ARNs interférants (siARN) car ils atténuent l’ex- pression au lieu de l’éliminer. Dans les dernières années, de nombreux microARNs et leurs cibles ont été découverts chez les mammifères et les plantes. La bioinforma- tique joue un rôle important dans ce domaine, et des programmes informatiques de découvertes de cibles ont été mis à la disposition de la communauté scientifique. Les microARNs peuvent réguler chacun des centaines de gènes, et les profils d’expression de ces derniers peuvent servir comme classificateurs de certains cancers. La modélisation des microARNs artificiels est donc justifiable, où l’un pourrait cibler des oncogènes surexprimés et promouvoir une prolifération de cellules en santé. Un outil pour créer des microARNs artificiels, nommé MultiTar V1.0, a été créé et est disponible comme application web. L’outil se base sur des propriétés structurelles et biochimiques des microARNs et utilise la recherche tabou, une métaheuristique. Il est démontré que des microARNs conçus in-silico peuvent avoir des effets lorsque testés in-vitro. Les sé- quences 3’UTR des gènes E2F1, E2F2 et E2F3 ont été soumises en entrée au programme MultiTar, et les microARNs prédits ont ensuite été testés avec des essais luciférases, des western blots et des courbes de croissance cellulaire. Au moins un microARN artificiel est capable de réguler les trois gènes par essais luciférases, et chacun des microARNs a pu réguler l’expression de E2F1 et E2F2 dans les western blots. Les courbes de crois- sance démontrent que chacun des microARNs interfère avec la croissance cellulaire. Ces résultats ouvrent de nouvelles portes vers des possibilités thérapeutiques.
Resumo:
La bio-informatique est un champ pluridisciplinaire qui utilise la biologie, l’informatique, la physique et les mathématiques pour résoudre des problèmes posés par la biologie. L’une des thématiques de la bio-informatique est l’analyse des séquences génomiques et la prédiction de gènes d’ARN non codants. Les ARN non codants sont des molécules d’ARN qui sont transcrites mais pas traduites en protéine et qui ont une fonction dans la cellule. Trouver des gènes d’ARN non codants par des techniques de biochimie et de biologie moléculaire est assez difficile et relativement coûteux. Ainsi, la prédiction des gènes d’ARNnc par des méthodes bio-informatiques est un enjeu important. Cette recherche décrit un travail d’analyse informatique pour chercher des nouveaux ARNnc chez le pathogène Candida albicans et d’une validation expérimentale. Nous avons utilisé comme stratégie une analyse informatique combinant plusieurs logiciels d’identification d’ARNnc. Nous avons validé un sous-ensemble des prédictions informatiques avec une expérience de puces à ADN couvrant 1979 régions du génome. Grace à cette expérience nous avons identifié 62 nouveaux transcrits chez Candida albicans. Ce travail aussi permit le développement d’une méthode d’analyse pour des puces à ADN de type tiling array. Ce travail présente également une tentation d’améliorer de la prédiction d’ARNnc avec une méthode se basant sur la recherche de motifs d’ARN dans les séquences.