894 resultados para SEQUENCE DATABASES


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

20.00% 20.00%

Publicador:

Resumo:

[EN]In this paper, we address the challenge of gender classi - cation using large databases of images with two goals. The rst objective is to evaluate whether the error rate decreases compared to smaller databases. The second goal is to determine if the classi er that provides the best classi cation rate for one database, improves the classi cation results for other databases, that is, the cross-database performance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Self-incompatibility (SI) systems have evolved in many flowering plants to prevent self-fertilization and thus promote outbreeding. Pear and apple, as many of the species belonging to the Rosaceae, exhibit RNase-mediated gametophytic self-incompatibility, a widespread system carried also by the Solanaceae and Plantaginaceae. Pear orchards must for this reason contain at least two different cultivars that pollenize each other; to guarantee an efficient cross-pollination, they should have overlapping flowering periods and must be genetically compatible. This compatibility is determined by the S-locus, containing at least two genes encoding for a female (pistil) and a male (pollen) determinant. The female determinant in the Rosaceae, Solanaceae and Plantaginaceae system is a stylar glycoprotein with ribonuclease activity (S-RNase), that acts as a specific cytotoxin in incompatible pollen tubes degrading cellular RNAs. Since its identification, the S-RNase gene has been intensively studied and the sequences of a large number of alleles are available in online databases. On the contrary, the male determinant has been only recently identified as a pollen-expressed protein containing a F-box motif, called S-Locus F-box (abbreviated SLF or SFB). Since F-box proteins are best known for their participation to the SCF (Skp1 - Cullin - F-box) E3 ubiquitine ligase enzymatic complex, that is involved in protein degradation through the 26S proteasome pathway, the male determinant is supposed to act mediating the ubiquitination of the S-RNases, targeting them for the degradation in compatible pollen tubes. Attempts to clone SLF/SFB genes in the Pyrinae produced no results until very recently; in apple, the use of genomic libraries allowed the detection of two F-box genes linked to each S haplotype, called SFBB (S-locus F-Box Brothers). In Japanese pear, three SFBB genes linked to each haplotype were cloned from pollen cDNA. The SFBB genes exhibit S haplotype-specific sequence divergence and pollen-specific expression; their multiplicity is a feature whose interpretation is unclear: it has been hypothesized that all of them participate in the S-specific interaction with the RNase, but it is also possible that only one of them is involved in this function. Moreover, even if the S locus male and female determinants are the only responsible for the specificity of the pollen-pistil recognition, many other factors are supposed to play a role in GSI; these are not linked to the S locus and act in a S-haplotype independent manner. They can have a function in regulating the expression of S determinants (group 1 factors), modulating their activity (group 2) or acting downstream, in the accomplishment of the reaction of acceptance or rejection of the pollen tube (group 3). This study was aimed to the elucidation of the molecular mechanism of GSI in European pear (Pyrus communis) as well as in the other Pyrinae; it was divided in two parts, the first focusing on the characterization of male determinants, and the second on factors external to the S locus. The research of S locus F-box genes was primarily aimed to the identification of such genes in European pear, for which sequence data are still not available; moreover, it allowed also to investigate about the S locus structure in the Pyrinae. The analysis was carried out on a pool of varieties of the three species Pyrus communis (European pear), Pyrus pyrifolia (Japanese pear), and Malus × domestica (apple); varieties carrying S haplotypes whose RNases are highly similar were chosen, in order to check whether or not the same level of similarity is maintained also between the male determinants. A total of 82 sequences was obtained, 47 of which represent the first S-locus F-box genes sequenced from European pear. The sequence data strongly support the hypothesis that the S locus structure is conserved among the three species, and presumably among all the Pyrinae; at least five genes have homologs in the analysed S haplotypes, but the number of F-box genes surrounding the S-RNase could be even greater. The high level of sequence divergence and the similarity between alleles linked to highly conserved RNases, suggest a shared ancestral polymorphism also for the F-box genes. The F-box genes identified in European pear were mapped on a segregating population of 91 individuals from the cross 'Abbé Fétel' × 'Max Red Bartlett'. All the genes were placed on the linkage group 17, where the S locus has been placed both in pear and apple maps, and resulted strongly associated to the S-RNase gene. The linkage with the RNase was perfect for some of the F-box genes, while for others very rare single recombination events were identified. The second part of this study was focused on the research of other genes involved in the SI response in pear; it was aimed on one side to the identification of genes differentially expressed in compatible and incompatible crosses, and on the other to the cloning and characterization of the transglutaminase (TGase) gene, whose role may be crucial in pollen rejection. For the identification of differentially expressed genes, controlled pollinations were carried out in four combinations (self pollination, incompatible, half-compatible and fully compatible cross-pollination); expression profiles were compared through cDNA-AFLP. 28 fragments displaying an expression pattern related to compatibility or incompatibility were identified, cloned and sequenced; the sequence analysis allowed to assign a putative annotation to a part of them. The identified genes are involved in very different cellular processes or in defense mechanisms, suggesting a very complex change in gene expression following the pollen/pistil recognition. The pool of genes identified with this technique offers a good basis for further study toward a better understanding of how the SI response is carried out. Among the factors involved in SI response, moreover, an important role may be played by transglutaminase (TGase), an enzyme involved both in post-translational protein modification and in protein cross-linking. The TGase activity detected in pear styles was significantly higher when pollinated in incompatible combinations than in compatible ones, suggesting a role of this enzyme in the abnormal cytoskeletal reorganization observed during pollen rejection reaction. The aim of this part of the work was thus to identify and clone the pear TGase gene; the PCR amplification of fragments of this gene was achieved using primers realized on the alignment between the Arabidopsis TGase gene sequence and several apple EST fragments; the full-length coding sequence of the pear TGase gene was then cloned from cDNA, and provided a precious tool for further study of the in vitro and in vivo action of this enzyme.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Longstanding taxonomic ambiguity and uncertainty exist in the identification of the common (M. mustelus) and blackspotted (M. punctulatus) smooth-hound in the Adriatic Sea. The lack of a clear and accurate method of morphological identification, leading to frequent misidentification, prevents the collation of species-specific landings and survey data for these fishes and hampers the delineation of the distribution ranges and stock boundaries of the species. In this context, adequate species-specific conservation and management strategies can not be applied without risks of population declining and local extinction. In this thesis work I investigated the molecular ecology of the two smooth-hound sharks which are abundant in the demersal trawl surveys carried out in the NC Adriatic Sea to monitor and assess the fishery resources. Ecological and evolutionary relationships were assessed by two molecular tests: a DNA barcoding analysis to improve species identification (and consequently the knowledge of their spatial ecology and taxonomy) and a hybridization assay based on the nuclear codominant marker ITS2 to evaluate reproductive interactions (hybridization or gene introgression). The smooth-hound sharks (N=208) were collected during the MEDITS 2008 and 2010 campaigns along the Italian and Croatian coasts of the Adriatic Sea, in the Sicilian Channel and in the Algerian fisheries. Since the identification based on morphological characters is not strongly reliable, I performed a molecular identification of the specimens producing for each one the cytochrome oxidase subunit 1 (COI) gene sequence (ca. 640 bp long) and compared them with reference sequences from different databases (GenBank and BOLD). From these molecular ID data I inferred the distribution of the two target species in the NC Adriatic Sea. In almost the totality of the MEDITS hauls I found no evidence of species sympatry. The data collected during the MEDITS survey showed an almost different distribution of M. mustelus (confined along the Italian coasts) and M. punctulatus (confined along the Croatian coasts); just one sample (Gulf of Venice, where probably the ranges of the species overlap) was found to have catches of both the species. Despite these data results suggested no interaction occurred between my two target species at least during the summertime (the period in which MEDITS survey is carried out), I still wanted to know if there were inter-species reproductive interactions so I developed a simple molecular genetic method to detect hybridization. This method is based on DNA sequence polymorphism among species in the nuclear ribosomal Internal Transcribed Spacer 2 locus (ITS2). Its application to the 208 specimens collected raised important questions regarding the ecology of this two species in the Adriatic Sea. In fact results showed signs of hybridization and/or gene introgression in two sharks collected during the trawl survey of 2008 and one collected during the 2010 one along the Italian and Croatian coasts. In the case that it will be confirmed the hybrid nature of these individuals, a spatiotemporal overlapping of the mating behaviour and ecology must occur. At the spatial level, the northern part of the Adriatic Sea (an area where the two species occur with high frequency of immature individuals) could likely play the role of a common nursery area for both species.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Das Wolf-Hirschhorn-Syndrom (WHS) ist ein komplexes und variables Fehlbildungs- Retardierungssyndrom, das durch Deletion in der distalen Chromosomenregion 4p16.3 hervorgerufen wird und dessen Ätiologie und Pathogenese bisher weitgehend unverstanden sind. Die Zielsetzung in der vorliegenden Arbeit bestand in der Identifizierung und vorläufigen Charakterisierung neuer Gene, die an der Entstehung des Syndroms beteiligt sein könnten. Die Wolf-Hirschhorn-Syndrom-kritische Region (WHSCR) konnte zu Beginn der vorliegenden Arbeit auf einen ca. 2 Mb großen Bereich zwischen den Markern D4S43 und D4S142 eingegrenzt werden. Für die Identifizierung neuer Gene wurden zunächst drei größere genomische Cosmid-/PAC-Contigs (I-III) im Bereich der Marker D4S114 bis D4S142 erstellt und mittels Exonamplifikation auf transkribierte Bereiche (Exons) untersucht. Es konnten insgesamt 67 putative 'Exons' isoliert werden, von denen einige bereits bekannten Genen (ZNF141, PDEB, MYL5, GAK, DAGK4 und FGFR3) entsprechen. Zwei dieser Gene konnten im Rahmen dieser Arbeit erstmals (DAGK4) bzw. genauer (GAK) in die distale Region 4p16.3 kartiert werden. Die restlichen Exons können aufgrund von Homologievergleichen und/oder EST-cDNA-Homologien vermutlich neuen Genen oder auch Pseudogenen (z. B. YWEE1hu) zugeordnet werden. Durch die im Verlaufe der vorliegenden Arbeit publizierte weitere Eingrenzung der WHSCR auf einen 165 Kb-großen Bereich proximal des FGFR3-Gens konzentrierten sich weitere Untersuchungen auf die detaillierte Analyse der WHSCR zwischen dem Marker D4S43 und FGFR3. Mit Hilfe von Exonamplifikation bzw. computergestützter Auswertung vorliegender Sequenzdaten aus diesem Bereich ('GRAIL', 'GENSCAN' und Homologievergleiche in den EST-Datenbanken des NCBI) konnten mehrere neue Gene identifiziert werden. In distaler-proximaler Reihenfolge handelt es sich dabei um die Gene LETM1, 51, 43, 45, 57 und POL4P. LETM1 kodiert für ein putatives Transmembran-Protein mit einem Leucin-Zipper- und zwei EF-Hand-Motiven und könnte aufgrund seiner möglichen Beteiligung an der Ca2+-Homeostase und/oder der Signal-transduktion zu Merkmalen des WHS (Krampfanfällen, mentale Retardierung und muskuläre Hypotonie) beitragen. Das Gen 51 entspricht einem in etwa zeitgleich durch Stec et al. (1998) und Chesi et al. (1998) als WHSC1 bzw. MMSET bezeichnetem Gen und wurde daher nicht weiter charakterisiert. Es wird genauso wie das Gen 43, das zeitgleich von Wright et al. (1999b) als WHSC2 beschrieben werden konnte und eine mögliche Rolle bei der Transkriptionselongation spielt, ubiquitär exprimiert. Das in der vorliegenden Arbeit identifizierte Gen 45 zeigt demgegenüber ein ausgesprochen spezifisches Expressionsmuster (in Nervenzellen des Gehirns sowie in Spermatiden). Dies stellt zusammen mit der strukturellen Ähnlichkeit des putativen Genprodukts zu Signalmolekülen einen interessanten Zusammenhang zu Merkmalen des WHS (beispielsweise Kryptorchismus, Uterusfehlbildungen oder auch neurologische Defekte) her. Demgegenüber handelt es sich bei dem Gen 57 möglicherweise um ein trunkiertes Pseudogen des eRFS-Gens auf Chromosom 6q24 (Wallrapp et al., 1998). Das POL4P-Gen schließlich stellt allein aufgrund seiner genomischen Lokalisation sowie seiner möglichen Funktion (als DNA-Polymerase-ähnliches Gen) kein gutes Kandidatengen für spezifische Merkmale des Syndroms dar und wurde daher nicht im Detail charakterisiert. Um die Beteiligung der Gene an der Ätiologie und Pathogenese des Syndroms zu verstehen, ist die Entwicklung eines Mausmodells (über das Einfügen gezielter Deletionen in das Mausgenom) geplant. Um dies zu ermöglichen, wurde in der vorliegenden Arbeit die Charakterisierung der orthologen Region bei der Maus vorgenommen. Zunächst wurden die orthologen Gene der Maus (Letm1, Whsc1, Gen 43 (Whsc2h), Gen 45 und Pol4p) identifiziert. Durch die Erstellung sowie die genaue Kartierung eines murinen genomischen P1/PAC-Klon-Contigs konnte gezeigt werden, daß die murinen Gene Fgfr3, Letm1, Whsc1, Gen 43 (Whsc2h), Gen 45 und Pol4p sowie einige weitere der überprüften EST-cDNA-Klone der Maus in einem durchgehenden Syntänieblock zwischen Mensch (POL4P bis FGFR3) und Maus (Mmu 5.20) enthalten sind, der in seiner genomischen Ausdehnung in etwa den Verhältnissen beim Menschen (zwischen POL4P und FGFR3) entspricht.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The comparative genomic sequence analysis of a region in human chromosome 11p15.3 and its homologous segment in mouse chromosome 7 between ST5 and LMO1 genes has been performed. 158,201 bases were sequenced in the mouse and compared with the syntenic region in human, partially available in the public databases. The analysed region exhibits the typical eukaryotic genomic structure and compared with the close neighbouring regions, strikingly reflexes the mosaic pattern distribution of (G+C) and repeats content despites its relative short size. Within this region the novel gene STK33 was discovered (Stk33 in the mouse), that codes for a serine/threonine kinase. The finding of this gene constitutes an excellent example of the strength of the comparative sequencing approach. Poor gene-predictions in the mouse genomic sequence were corrected and improved by the comparison with the unordered data from the human genomic sequence publicly available. Phylogenetical analysis suggests that STK33 belongs to the calcium/calmodulin-dependent protein kinases group and seems to be a novelty in the chordate lineage. The gene, as a whole, seems to evolve under purifying selection whereas some regions appear to be under strong positive selection. Both human and mouse versions of serine/threonine kinase 33, consists of seventeen exons highly conserved in the coding regions, particularly in those coding for the core protein kinase domain. Also the exon/intron structure in the coding regions of the gene is conserved between human and mouse. The existence and functionality of the gene is supported by the presence of entries in the EST databases and was in vivo fully confirmed by isolating specific transcripts from human uterus total RNA and from several mouse tissues. Strong evidence for alternative splicing was found, which may result in tissue-specific starting points of transcription and in some extent, different protein N-termini. RT-PCR and hybridisation experiments suggest that STK33/Stk33 is differentially expressed in a few tissues and in relative low levels. STK33 has been shown to be reproducibly down-regulated in tumor tissues, particularly in ovarian tumors. RNA in-situ hybridisation experiments using mouse Stk33-specific probes showed expression in dividing cells from lung and germinal epithelium and possibly also in macrophages from kidney and lungs. Preliminary experimentation with antibodies designed in this work, performed in parallel to the preparation of this manuscript, seems to confirm this expression pattern. The fact that the chromosomal region 11p15 in which STK33 is located may be associated with several human diseases including tumor development, suggest further investigation is necessary to establish the role of STK33 in human health.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Visual tracking is the problem of estimating some variables related to a target given a video sequence depicting the target. Visual tracking is key to the automation of many tasks, such as visual surveillance, robot or vehicle autonomous navigation, automatic video indexing in multimedia databases. Despite many years of research, long term tracking in real world scenarios for generic targets is still unaccomplished. The main contribution of this thesis is the definition of effective algorithms that can foster a general solution to visual tracking by letting the tracker adapt to mutating working conditions. In particular, we propose to adapt two crucial components of visual trackers: the transition model and the appearance model. The less general but widespread case of tracking from a static camera is also considered and a novel change detection algorithm robust to sudden illumination changes is proposed. Based on this, a principled adaptive framework to model the interaction between Bayesian change detection and recursive Bayesian trackers is introduced. Finally, the problem of automatic tracker initialization is considered. In particular, a novel solution for categorization of 3D data is presented. The novel category recognition algorithm is based on a novel 3D descriptors that is shown to achieve state of the art performances in several applications of surface matching.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study poses as its objective the genetic characterization of the ancient population of the Great White shark, Carcharodon carcharias, L.1758, present in the Mediterranean Sea. Using historical evidence, for the most part buccal arches but also whole, stuffed examples from various national museums, research institutes and private collections, a dataset of 18 examples coming from the Mediterranean Sea has been created, in order to increase the informations regarding this species in the Mediterranean. The importance of the Mediterranean provenance derives from the fact that a genetic characterization of this species' population does not exist, and this creates gaps in the knowledge of this species in the Mediterranean. The genetic characterization of the individuals will initially take place by the extraction of the ancient DNA and the analysis of the variations in the sequence markers of the mitochondrial DNA. This approach has allowed the genetic comparison between ancient populations of the Mediterranean and contemporary populations of the same geographical area. In addition, the genetic characterization of the population of white sharks of the Mediterranean, has allowed a genetic comparison with populations from global "hot spots", using published sequences in online databases (NCBI, GenBank). Analyzing the variability of the dataset, both in terms space and time, I assessed the evolutionary relationships of the Mediterranean population of Great Whites with the global populations (Australia/New Zealand, South Africa, Pacific USA, West Atlantic), and the temporal trend of the Mediterranean population variability. This method based on the sequencing of two portions of mitochondrial DNA genes, markers showed us how the population of Great White Sharks in the Mediterranean, is genetically more similar to the populations of the Australia Pacific ocean, American Pacific Ocean, rather than the population of South Africa, and showing also how the population of South Africa is abnormally distant from all other clusters. Interestingly, these results are inconsistent with the results from tagging of this species. In addition, there is evidence of differences between the ancient population of the Mediterranean with the modern one. This differentiation between the ancient and modern population of white shark can be the result of events impacting on this species occurred over the last two centuries.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Welche genetische Unterschiede machen uns verschieden von unseren nächsten Verwandten, den Schimpansen, und andererseits so ähnlich zu den Schimpansen? Was wir untersuchen und auch verstehen wollen, ist die komplexe Beziehung zwischen den multiplen genetischen und epigenetischen Unterschieden, deren Interaktion mit diversen Umwelt- und Kulturfaktoren in den beobachteten phänotypischen Unterschieden resultieren. Um aufzuklären, ob chromosomale Rearrangements zur Divergenz zwischen Mensch und Schimpanse beigetragen haben und welche selektiven Kräfte ihre Evolution geprägt haben, habe ich die kodierenden Sequenzen von 2 Mb umfassenden, die perizentrischen Inversionsbruchpunkte flankierenden Regionen auf den Chromosomen 1, 4, 5, 9, 12, 17 und 18 untersucht. Als Kontrolle dienten dabei 4 Mb umfassende kollineare Regionen auf den rearrangierten Chromosomen, welche mindestens 10 Mb von den Bruchpunktregionen entfernt lagen. Dabei konnte ich in den Bruchpunkten flankierenden Regionen im Vergleich zu den Kontrollregionen keine höhere Proteinevolutionsrate feststellen. Meine Ergebnisse unterstützen nicht die chromosomale Speziationshypothese für Mensch und Schimpanse, da der Anteil der positiv selektierten Gene (5,1% in den Bruchpunkten flankierenden Regionen und 7% in den Kontrollregionen) in beiden Regionen ähnlich war. Durch den Vergleich der Anzahl der positiv und negativ selektierten Gene per Chromosom konnte ich feststellen, dass Chromosom 9 die meisten und Chromosom 5 die wenigsten positiv selektierten Gene in den Bruchpunkt flankierenden Regionen und Kontrollregionen enthalten. Die Anzahl der negativ selektierten Gene (68) war dabei viel höher als die Anzahl der positiv selektierten Gene (17). Eine bioinformatische Analyse von publizierten Microarray-Expressionsdaten (Affymetrix Chip U95 und U133v2) ergab 31 Gene, die zwischen Mensch und Schimpanse differentiell exprimiert sind. Durch Untersuchung des dN/dS-Verhältnisses dieser 31 Gene konnte ich 7 Gene als negativ selektiert und nur 1 Gen als positiv selektiert identifizieren. Dieser Befund steht im Einklang mit dem Konzept, dass Genexpressionslevel unter stabilisierender Selektion evolvieren. Die meisten positiv selektierten Gene spielen überdies eine Rolle bei der Fortpflanzung. Viele dieser Speziesunterschiede resultieren eher aus Änderungen in der Genregulation als aus strukturellen Änderungen der Genprodukte. Man nimmt an, dass die meisten Unterschiede in der Genregulation sich auf transkriptioneller Ebene manifestieren. Im Rahmen dieser Arbeit wurden die Unterschiede in der DNA-Methylierung zwischen Mensch und Schimpanse untersucht. Dazu wurden die Methylierungsmuster der Promotor-CpG-Inseln von 12 Genen im Cortex von Menschen und Schimpansen mittels klassischer Bisulfit-Sequenzierung und Bisulfit-Pyrosequenzierung analysiert. Die Kandidatengene wurden wegen ihrer differentiellen Expressionsmuster zwischen Mensch und Schimpanse sowie wegen Ihrer Assoziation mit menschlichen Krankheiten oder dem genomischen Imprinting ausgewählt. Mit Ausnahme einiger individueller Positionen zeigte die Mehrzahl der analysierten Gene keine hohe intra- oder interspezifische Variation der DNA-Methylierung zwischen den beiden Spezies. Nur bei einem Gen, CCRK, waren deutliche intraspezifische und interspezifische Unterschiede im Grad der DNA-Methylierung festzustellen. Die differentiell methylierten CpG-Positionen lagen innerhalb eines repetitiven Alu-Sg1-Elements. Die Untersuchung des CCRK-Gens liefert eine umfassende Analyse der intra- und interspezifischen Variabilität der DNA-Methylierung einer Alu-Insertion in eine regulatorische Region. Die beobachteten Speziesunterschiede deuten darauf hin, dass die Methylierungsmuster des CCRK-Gens wahrscheinlich in Adaption an spezifische Anforderungen zur Feinabstimmung der CCRK-Regulation unter positiver Selektion evolvieren. Der Promotor des CCRK-Gens ist anfällig für epigenetische Modifikationen durch DNA-Methylierung, welche zu komplexen Transkriptionsmustern führen können. Durch ihre genomische Mobilität, ihren hohen CpG-Anteil und ihren Einfluss auf die Genexpression sind Alu-Insertionen exzellente Kandidaten für die Förderung von Veränderungen während der Entwicklungsregulation von Primatengenen. Der Vergleich der intra- und interspezifischen Methylierung von spezifischen Alu-Insertionen in anderen Genen und Geweben stellt eine erfolgversprechende Strategie dar.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work focused on the synthesis of novel monomers for the design of a series of oligo(p-benzamide)s following two approaches: iterative solution synthesis and automated solid phase protocols. These approaches present a useful method to the sequence-controlled synthesis of side-chain and main-chain functionalized oligomers for the preparation of an immense variety of nanoscaffolds. The challenge in the synthesis of such materials was their modification, while maintaining the characteristic properties (physical-chemical properties, shape persistence and anisotropy). The strategy for the preparation of predictable superstructures was devote to the selective control of noncovalent interactions, monodispersity and monomer sequence. In addition to this, the structure-properties correlation of the prepared rod-like soluble materials was pointed. The first approach involved the solution-based aramide synthesis via introduction of 2,4-dimethoxybenzyl N-amide protective group via an iterative synthetic strategy The second approach focused on the implementation of the salicylic acid scaffold to introduce substituents on the aromatic backbone for the stabilization of the OPBA-rotamers. The prepared oligomers were analyzed regarding their solubility and aggregation properties by systematically changing the degree of rotational freedom of the amide bonds, side chain polarity, monomer sequence and degree of oligomerization. The syntheses were performed on a modified commercial peptide synthesizer using a combination of fluorenylmethoxycarbonyl (Fmoc) and aramide chemistry. The automated synthesis allowed the preparation of aramides with potential applications as nanoscaffolds in supramolecular chemistry, e.g. comb-like-

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.