10 resultados para Biocomputing
Resumo:
Els dominis d’activació (ADs) de les procarboxipeptidases de la subfamília A/B sempre han sorprès ja que representen una quarta part del proenzim. S’han realitzat alguns estudis per intentar descobrir-ne alguna possible funció alternativa, però no han estat fructífers. El descobriment de l’elevada velocitat de plegament del domini d’activació de la procarboxipeptidasa A2 humana, (ADA2h), emperò, va portar a proposar la possibilitat de que realitzessin una funció d’assistència al plegament del domini enzimàtic. Posteriorment, l’anàlisi del plegament d’ADA2h a pH baix va revelar la capacitat d’aquest domini per formar fibres amiloides, a més de demostrar que un increment de l’estabilitat proteica podia prevenir la formació d’aquests agregats. La profunda caracterització del plegament d’ADA2h va fer que aquesta proteïna fos un bon model amiloidogènic, de manera que es van proposar un seguit d’experiments que s’han desenvolupat en el present treball per tal de conèixer millor aquest procés. S’han dut a terme estudis cinètics d’agregació per tal de valorar la contribució dels diferents aminoàcids de la seqüència polipeptídica, utilitzant 29 variants puntuals d’ADA2h. Es va eliminar la contribució de l’estabilitat mitjançant la utilització d’urea, i per dicroïsme circular conjuntament amb un aparell de flux detingut, es van obtenir dues velocitats diferents, v1 i v2, que corresponen a la formació d’un intermediari i a la seva reorganització, respectivament. Experiments complementaris utilitzant espectroscòpia d’infraroig (IR) revelaren la reorganització de l’estat natiu (en aquest cas) per a donar la forma agregada. Les cinètiques d’IR van mostrar que ADA2h forma l’estructura _ típica de les fibres amiloides, previ desplegament les seves hèlixs-_. Finalment, s’han realitzat estudis de biocomputació per tal d’esbrinar possibles funcions alternatives dels ADs. Les superposicions estructurals semblen mostrar similaritat dels ADs amb dominis de reconeixement d’RNA (RRM). Aquesta hipòtesi s’ha comprovat experimentalment amb ADA4h, mostrant una dèbil, però existent, unió a RNA.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
RESUM Com a continuació del treball de final de carrera “Desenvolupament d’un laboratori virtual per a les pràctiques de Biologia Molecular” de Jordi Romero, s’ha realitzat una eina complementaria per a la visualització de molècules integrada en el propi laboratori virtual. Es tracta d’una eina per a la visualització gràfica de gens, ORF, marques i seqüències de restricció de molècules reals o fictícies. El fet de poder treballar amb molècules fictícies és la gran avantatge respecte a les solucions com GENBANK que només permet treballar amb molècules pròpies. Treballar amb molècules fictícies fa que sigui una solució ideal per a l’ensenyament, ja que dóna la possibilitat als professors de realitzar exercicis o demostracions amb molècules reals o dissenyades expressament per a l’exercici a demostrar. A més, permet mostrar de forma visual les diferents parts simultàniament o per separat, de manera que ofereix una primera aproximació interpretació dels resultats. Per altra banda, permet marcar gens, crear marques, localitzar seqüències de restricció i generar els ORF de la molècula que nosaltres creem o modificar una ja existent. Per l’implementació, s’ha continuat amb l’idea de separar la part de codi i la part de disseny en les aplicacions Flash. Per fer-ho, s’ha utilitzat la plataforma de codi lliure Ariware ARPv2.02 que proposa un marc de desenvolupament d’aplicacions Flash orientades a objectes amb el codi (classes ActionScript 2.0) separats del movieclip. Per al processament de dades s’ha fet servir Perl per ser altament utilitzat en Bioinformàtica i per velocitat de càlcul. Les dades generades es guarden en una Base de Dades en MYSQL (de lliure distribució), de la que s’extreuen les dades per generar fitxers XML, fent servir tant PHP com la plataforma AMFPHP com a enllaç entre Flash i la resta de parts.
Resumo:
Com a continuació del treball de final de carrera “Desenvolupament d’un laboratori virtual per a les pràctiques de Biologia Molecular” de Jordi Romero, s’ha realitzat una eina complementaria per a la visualització de molècules integrada en el propi laboratori virtual. Es tracta d’una eina per a la visualització gràfica de gens, ORF, marques i seqüències de restricció de molècules reals o fictícies. El fet de poder treballar amb molècules fictícies és la gran avantatge respecte a les solucions com GENBANK que només permet treballar amb molècules pròpies. Treballar amb molècules fictícies fa que sigui una solució ideal per a l’ensenyament, ja que dóna la possibilitat als professors de realitzar exercicis o demostracions amb molècules reals o dissenyades expressament per a l’exercici a demostrar. A més, permet mostrar de forma visual les diferents parts simultàniament o per separat, de manera que ofereix una primera aproximació interpretació dels resultats. Per altra banda, permet marcar gens, crear marques, localitzar seqüències de restricció i generar els ORF de la molècula que nosaltres creem o modificar una ja existent. Per l’implementació, s’ha continuat amb l’idea de separar la part de codi i la part de disseny en les aplicacions Flash. Per fer-ho, s’ha utilitzat la plataforma de codi lliure Ariware ARPv2.02 que proposa un marc de desenvolupament d’aplicacions Flash orientades a objectes amb el codi (classes ActionScript 2.0) separats del movieclip. Per al processament de dades s’ha fet servir Perl per ser altament utilitzat en Bioinformàtica i per velocitat de càlcul. Les dades generades es guarden en una Base de Dades en MYSQL (de lliure distribució), de la que s’extreuen les dades per generar fitxers XML, fent servir tant PHP com la plataforma AMFPHP com a enllaç entre Flash i la resta de parts.
Resumo:
DNA sequence representation methods are used to denote a gene structure effectively and help in similarities/dissimilarities analysis of coding sequences. Many different kinds of representations have been proposed in the literature. They can be broadly classified into Numerical, Graphical, Geometrical and Hybrid representation methods. DNA structure and function analysis are made easy with graphical and geometrical representation methods since it gives visual representation of a DNA structure. In numerical method, numerical values are assigned to a sequence and digital signal processing methods are used to analyze the sequence. Hybrid approaches are also reported in the literature to analyze DNA sequences. This paper reviews the latest developments in DNA Sequence representation methods. We also present a taxonomy of various methods. A comparison of these methods where ever possible is also done
Resumo:
Biochemical computing is an emerging field of unconventional computing that attempts to process information with biomolecules and biological objects using digital logic. In this work we survey filtering in general, in biochemical computing, and summarize the experimental realization of an and logic gate with sigmoid response in one of the inputs. The logic gate is realized with electrode-immobilized glucose-6-phosphate dehydrogenase enzyme that catalyzes a reaction corresponding to the Boolean and functions. A kinetic model is also developed and used to evaluate the extent to which the performance of the experimentally realized logic gate is close to optimal.
Resumo:
Due to the wide diversity of unknown organisms in the environment, 99% of them cannot be grown in traditional culture medium in laboratories. Therefore, metagenomics projects are proposed to study microbial communities present in the environment, from molecular techniques, especially the sequencing. Thereby, for the coming years it is expected an accumulation of sequences produced by these projects. Thus, the sequences produced by genomics and metagenomics projects present several challenges for the treatment, storing and analysis such as: the search for clones containing genes of interest. This work presents the OCI Metagenomics, which allows defines and manages dynamically the rules of clone selection in metagenomic libraries, thought an algebraic approach based on process algebra. Furthermore, a web interface was developed to allow researchers to easily create and execute their own rules to select clones in genomic sequence database. This software has been tested in metagenomic cosmid library and it was able to select clones containing genes of interest. Copyright 2010 ACM.
Resumo:
The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe
Resumo:
Here I will focus on three main topics that best address and include the projects I have been working in during my three year PhD period that I have spent in different research laboratories addressing both computationally and practically important problems all related to modern molecular genomics. The first topic is the use of livestock species (pigs) as a model of obesity, a complex human dysfunction. My efforts here concern the detection and annotation of Single Nucleotide Polymorphisms. I developed a pipeline for mining human and porcine sequences. Starting from a set of human genes related with obesity the platform returns a list of annotated porcine SNPs extracted from a new set of potential obesity-genes. 565 of these SNPs were analyzed on an Illumina chip to test the involvement in obesity on a population composed by more than 500 pigs. Results will be discussed. All the computational analysis and experiments were done in collaboration with the Biocomputing group and Dr.Luca Fontanesi, respectively, under the direction of prof. Rita Casadio at the Bologna University, Italy. The second topic concerns developing a methodology, based on Factor Analysis, to simultaneously mine information from different levels of biological organization. With specific test cases we develop models of the complexity of the mRNA-miRNA molecular interaction in brain tumors measured indirectly by microarray and quantitative PCR. This work was done under the supervision of Prof. Christine Nardini, at the “CAS-MPG Partner Institute for Computational Biology” of Shangai, China (co-founded by the Max Planck Society and the Chinese Academy of Sciences jointly) The third topic concerns the development of a new method to overcome the variety of PCR technologies routinely adopted to characterize unknown flanking DNA regions of a viral integration locus of the human genome after clinical gene therapy. This new method is entirely based on next generation sequencing and it reduces the time required to detect insertion sites, decreasing the complexity of the procedure. This work was done in collaboration with the group of Dr. Manfred Schmidt at the Nationales Centrum für Tumorerkrankungen (Heidelberg, Germany) supervised by Dr. Annette Deichmann and Dr. Ali Nowrouzi. Furthermore I add as an Appendix the description of a R package for gene network reconstruction that I helped to develop for scientific usage (http://www.bioconductor.org/help/bioc-views/release/bioc/html/BUS.html).
Resumo:
Prokaryotic organisms are one of the most successful forms of life, they are present in all known ecosystems. The deluge diversity of bacteria reflects their ability to colonise every environment. Also, human beings host trillions of microorganisms in their body districts, including skin, mucosae, and gut. This symbiosis is active for all other terrestrial and marine animals, as well as plants. With the term holobiont we refer, with a single word, to the systems including both the host and its symbiotic microbial species. The coevolution of bacteria within their ecological niches reflects the adaptation of both host and guest species, and it is shaped by complex interactions that are pivotal for determining the host state. Nowadays, thanks to the current sequencing technologies, Next Generation Sequencing, we have unprecedented tools for investigating the bacterial life by studying the prokaryotic genome sequences. NGS revolution has been sustained by the advancements in computational performance, in terms of speed, storage capacity, algorithm development and hardware costs decreasing following the Moore’s Law. Bioinformaticians and computational biologists design and implement ad hoc tools able to analyse high-throughput data and extract valuable biological information. Metagenomics requires the integration of life and computational sciences and it is uncovering the deluge diversity of the bacterial world. The present thesis work focuses mainly on the analysis of prokaryotic genomes under different aspects. Being supervised by two groups at the University of Bologna, the Biocomputing group and the group of Microbial Ecology of Health, I investigated three different topics: i) antimicrobial resistance, particularly with respect to missense point mutations involved in the resistant phenotype, ii) bacterial mechanisms involved in xenobiotic degradation via the computational analysis of metagenomic samples, and iii) the variation of the human gut microbiota through ageing, in elderly and longevous individuals.