12 resultados para Sequence motif analysis
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.
Resumo:
The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.
Resumo:
Self-incompatibility (SI) systems have evolved in many flowering plants to prevent self-fertilization and thus promote outbreeding. Pear and apple, as many of the species belonging to the Rosaceae, exhibit RNase-mediated gametophytic self-incompatibility, a widespread system carried also by the Solanaceae and Plantaginaceae. Pear orchards must for this reason contain at least two different cultivars that pollenize each other; to guarantee an efficient cross-pollination, they should have overlapping flowering periods and must be genetically compatible. This compatibility is determined by the S-locus, containing at least two genes encoding for a female (pistil) and a male (pollen) determinant. The female determinant in the Rosaceae, Solanaceae and Plantaginaceae system is a stylar glycoprotein with ribonuclease activity (S-RNase), that acts as a specific cytotoxin in incompatible pollen tubes degrading cellular RNAs. Since its identification, the S-RNase gene has been intensively studied and the sequences of a large number of alleles are available in online databases. On the contrary, the male determinant has been only recently identified as a pollen-expressed protein containing a F-box motif, called S-Locus F-box (abbreviated SLF or SFB). Since F-box proteins are best known for their participation to the SCF (Skp1 - Cullin - F-box) E3 ubiquitine ligase enzymatic complex, that is involved in protein degradation through the 26S proteasome pathway, the male determinant is supposed to act mediating the ubiquitination of the S-RNases, targeting them for the degradation in compatible pollen tubes. Attempts to clone SLF/SFB genes in the Pyrinae produced no results until very recently; in apple, the use of genomic libraries allowed the detection of two F-box genes linked to each S haplotype, called SFBB (S-locus F-Box Brothers). In Japanese pear, three SFBB genes linked to each haplotype were cloned from pollen cDNA. The SFBB genes exhibit S haplotype-specific sequence divergence and pollen-specific expression; their multiplicity is a feature whose interpretation is unclear: it has been hypothesized that all of them participate in the S-specific interaction with the RNase, but it is also possible that only one of them is involved in this function. Moreover, even if the S locus male and female determinants are the only responsible for the specificity of the pollen-pistil recognition, many other factors are supposed to play a role in GSI; these are not linked to the S locus and act in a S-haplotype independent manner. They can have a function in regulating the expression of S determinants (group 1 factors), modulating their activity (group 2) or acting downstream, in the accomplishment of the reaction of acceptance or rejection of the pollen tube (group 3). This study was aimed to the elucidation of the molecular mechanism of GSI in European pear (Pyrus communis) as well as in the other Pyrinae; it was divided in two parts, the first focusing on the characterization of male determinants, and the second on factors external to the S locus. The research of S locus F-box genes was primarily aimed to the identification of such genes in European pear, for which sequence data are still not available; moreover, it allowed also to investigate about the S locus structure in the Pyrinae. The analysis was carried out on a pool of varieties of the three species Pyrus communis (European pear), Pyrus pyrifolia (Japanese pear), and Malus × domestica (apple); varieties carrying S haplotypes whose RNases are highly similar were chosen, in order to check whether or not the same level of similarity is maintained also between the male determinants. A total of 82 sequences was obtained, 47 of which represent the first S-locus F-box genes sequenced from European pear. The sequence data strongly support the hypothesis that the S locus structure is conserved among the three species, and presumably among all the Pyrinae; at least five genes have homologs in the analysed S haplotypes, but the number of F-box genes surrounding the S-RNase could be even greater. The high level of sequence divergence and the similarity between alleles linked to highly conserved RNases, suggest a shared ancestral polymorphism also for the F-box genes. The F-box genes identified in European pear were mapped on a segregating population of 91 individuals from the cross 'Abbé Fétel' × 'Max Red Bartlett'. All the genes were placed on the linkage group 17, where the S locus has been placed both in pear and apple maps, and resulted strongly associated to the S-RNase gene. The linkage with the RNase was perfect for some of the F-box genes, while for others very rare single recombination events were identified. The second part of this study was focused on the research of other genes involved in the SI response in pear; it was aimed on one side to the identification of genes differentially expressed in compatible and incompatible crosses, and on the other to the cloning and characterization of the transglutaminase (TGase) gene, whose role may be crucial in pollen rejection. For the identification of differentially expressed genes, controlled pollinations were carried out in four combinations (self pollination, incompatible, half-compatible and fully compatible cross-pollination); expression profiles were compared through cDNA-AFLP. 28 fragments displaying an expression pattern related to compatibility or incompatibility were identified, cloned and sequenced; the sequence analysis allowed to assign a putative annotation to a part of them. The identified genes are involved in very different cellular processes or in defense mechanisms, suggesting a very complex change in gene expression following the pollen/pistil recognition. The pool of genes identified with this technique offers a good basis for further study toward a better understanding of how the SI response is carried out. Among the factors involved in SI response, moreover, an important role may be played by transglutaminase (TGase), an enzyme involved both in post-translational protein modification and in protein cross-linking. The TGase activity detected in pear styles was significantly higher when pollinated in incompatible combinations than in compatible ones, suggesting a role of this enzyme in the abnormal cytoskeletal reorganization observed during pollen rejection reaction. The aim of this part of the work was thus to identify and clone the pear TGase gene; the PCR amplification of fragments of this gene was achieved using primers realized on the alignment between the Arabidopsis TGase gene sequence and several apple EST fragments; the full-length coding sequence of the pear TGase gene was then cloned from cDNA, and provided a precious tool for further study of the in vitro and in vivo action of this enzyme.
Resumo:
Introduction Phospholipase Cb1 (PLC-β1) is a key player in the regulation of nuclear inositol lipid signaling and of a wide range of cellular functions, such as proliferation and differentiation (1,2,3). PLCb1 signaling depends on the cleavage of phosphatidylinositol 4,5-bisphosphate and the formation of the second messengers diacylglycerol and Inositol tris-phosphate which activate canonical protein kinase C (cPKC) isoforms. Here we describe a proteomic approach to find out a potential effector of nuclear PLC-b1 dependent signaling during insulin stimulated myogenic differentiation. Methods Nuclear lysates obtained from insulin induced C2C12 myoblasts were immunoprecipitated with anti-phospho-substrate cPKC antibody. Proteins, stained with Comassie blue, were excised, digested and subsequently analysed in LC-MS/MS. For peptide sequence searching, the mass spectra were processed and analyzed using the Mascot MS/MS ion search program with the NCBI database. Western blotting, GST-pull down and co-immunoprecipitation were performed to study the interaction between eEF1A2 and cPKCs. Site direct mutagenesis was performed to confirm the phosphorylated motif recognized by the antibody. Immunofluorescence analysis, GFP-tagged eEF1A2 vector and subcellular fractionation were performed to study nuclear localization and relative distribution of eEF1A2. Results We have previously shown that PLC-β1 is greatly increased at the nuclear level during insulin-induced myoblasts differentiation and that this nuclear localization is essential for induction of differentiation. Thus, nuclear proteins of insulin stimulated C2C12 myoblasts, were immunoprecipitated with an anti-phospho-substrate cPKC antibody. After Electrophoretic gel separation of proteins immunoprecipitated, several molecules were identified by LC-MS/MS. Among these most relevant and unexpected was eukaryotic elongation factor 1 alpha 2 (eEF1A2). We found that eEF1A2 is phosphorylated by PKCb1 and that these two molecules coimmunolocalized at the nucleolar level. eEF1A2 could be phosphorylated in many sites among which both threonine and serine residues. By site direct mutagenesis we demonstrated that it is the serine residue of the motif recognized by the antibody that is specifically phosphorylated by PKCb1. The silencing of PLCb1 gives rise to a reduction of expression and phosphorylation levels of eEF1A2 indicating this molecule as a target of nuclear PLCb1 regulatory network during myoblasts differentiation.
Resumo:
The project was developed into three parts: the analysis of p63 isoform in breast tumours; the study of intra-tumour eterogeneicity in metaplastic breast carcinoma; the analysis of oncocytic breast carcinoma. p63 is a sequence-specific DNA-binding factor, homologue of the tumour suppressor and transcription factor p53. The human p63 gene is composed of 15 exons and transcription can occur from two distinct promoters: the transactivating isoforms (TAp63) are generated by a promoter upstream of exon 1, while the alternative promoter located in intron 3 leads to the expression of N-terminal truncated isoforms (ΔNp63). It has been demonstrated that anti-p63 antibodies decorate the majority of squamous cell carcinomas of different organs; moreover tumours with myoepithelial differentiation of the breast show nuclear p63 expression. Two new isoforms have been described with the same sequence as TAp63 and ΔNp63 but lacking exon 4: d4TAp63 and ΔNp73L, respectively. Purpose of the study was to investigate the molecular expression of N-terminal p63 isoforms in benign and malignant breast tissues. In the present study 40 specimens from normal breast, benign lesions, DIN/DCIS, and invasive carcinomas were analyzed by immunohistochemistry and RT-PCR (Reverse Transcriptase-PCR) in order to disclose the patterns of p63 expression. We have observed that the full-length isoforms can be detected in non neoplastic and neoplastic lesions, while the short isoforms are only present in the neoplastic cells of invasive carcinomas. Metaplastic carcinomas of the breast are a heterogeneous group of neoplasms which exhibit varied patterns of metaplasia and differentiation. The existence of such non-modal populations harbouring distinct genetic aberrations may explain the phenotypic diversity observed within a given tumour. Intra-tumour morphological heterogeneity is not uncommon in breast cancer and it can often be appreciated in metaplastic breast carcinomas. Aim of this study was to determine the existence of intra-tumour genetic heterogeneity in metaplastic breast cancers and whether areas with distinct morphological features in a given tumour might be underpinned by distinct patterns of genetic aberrations. 47 cases of metaplastic breast carcinomas were retrieved. Out of the 47 cases, 9 had areas that were of sufficient dimensions to be independently microdissected. Our results indicate that at least some breast cancers are composed of multiple non-modal populations of clonally related cells and provide direct evidence that at least some types of metaplastic breast cancers are composed of multiple non-modal clones harbouring distinct genetic aberrations. Oncocytic tumours represent a distinctive set of lesions with typical granular cytoplasmatic eosinophilia of the neoplastic cells. Only rare example of breast oncocytic carcinomas have been reported in literature and the incidence is probably underestimated. In this study we have analysed 33 cases of oncocytic invasive breast carcinoma of the breast, selected according to morphological and immunohistochemical criteria. These tumours were morphologically classified and studied by immunohistochemistry and aCGH. We have concluded that oncocytic breast carcinoma is a morphologic entity with distinctive ultrastructural and histological features; immunohistochemically is characterized by a luminal profile, it has a frequency of 19.8%, has not distinctive clinical features and, at molecular level, shows a specific constellation of genetic aberration.
Resumo:
Due to the growing attention of consumers towards their food, improvement of quality of animal products has become one of the main focus of research. To this aim, the application of modern molecular genetics approaches has been proved extremely useful and effective. This innovative drive includes all livestock species productions, including pork. The Italian pig breeding industry is unique because needs heavy pigs slaughtered at about 160 kg for the production of high quality processed products. For this reason, it requires precise meat quality and carcass characteristics. Two aspects have been considered in this thesis: the application of the transcriptome analysis in post mortem pig muscles as a possible method to evaluate meat quality parameters related to the pre mortem status of the animals, including health, nutrition, welfare, and with potential applications for product traceability (chapters 3 and 4); the study of candidate genes for obesity related traits in order to identify markers associated with fatness in pigs that could be applied to improve carcass quality (chapters 5, 6, and 7). Chapter three addresses the first issue from a methodological point of view. When we considered this issue, it was not obvious that post mortem skeletal muscle could be useful for transcriptomic analysis. Therefore we demonstrated that the quality of RNA extracted from skeletal muscle of pigs sampled at different post mortem intervals (20 minutes, 2 hours, 6 hours, and 24 hours) is good for downstream applications. Degradation occurred starting from 48 h post mortem even if at this time it is still possible to use some RNA products. In the fourth chapter, in order to demonstrate the potential use of RNA obtained up to 24 hours post mortem, we present the results of RNA analysis with the Affymetrix microarray platform that made it possible to assess the level of expression of more of 24000 mRNAs. We did not identify any significant differences between the different post mortem times suggesting that this technique could be applied to retrieve information coming from the transcriptome of skeletal muscle samples not collected just after slaughtering. This study represents the first contribution of this kind applied to pork. In the fifth chapter, we investigated as candidate for fat deposition the TBC1D1 [TBC1 (tre-2/USP6, BUB2, cdc16) gene. This gene is involved in mechanisms regulating energy homeostasis in skeletal muscle and is associated with predisposition to obesity in humans. By resequencing a fragment of the TBC1D1 gene we identified three synonymous mutations localized in exon 2 (g.40A>G, g.151C>T, and g.172T>C) and 2 polymorphisms localized in intron 2 (g.219G>A and g.252G>A). One of these polymorphisms (g.219G>A) was genotyped by high resolution melting (HRM) analysis and PCR-RFLP. Moreover, this gene sequence was mapped by radiation hybrid analysis on porcine chromosome 8. The association study was conducted in 756 performance tested pigs of Italian Large White and Italian Duroc breeds. Significant results were obtained for lean meat content, back fat thickness, visible intermuscular fat and ham weight. In chapter six, a second candidate gene (tribbles homolog 3, TRIB3) is analyzed in a study of association with carcass and meat quality traits. The TRIB3 gene is involved in energy metabolism of skeletal muscle and plays a role as suppressor of adipocyte differentiation. We identified two polymorphisms in the first coding exon of the porcine TRIB3 gene, one is a synonymous SNP (c.132T> C), a second is a missense mutation (c.146C> T, p.P49L). The two polymorphisms appear to be in complete linkage disequilibrium between and within breeds. The in silico analysis of the p.P49L substitution suggests that it might have a functional effect. The association study in about 650 pigs indicates that this marker is associated with back fat thickness in Italian Large White and Italian Duroc breeds in two different experimental designs. This polymorphisms is also associated with lactate content of muscle semimembranosus in Italian Large White pigs. Expression analysis indicated that this gene is transcribed in skeletal muscle and adipose tissue as well as in other tissues. In the seventh chapter, we reported the genotyping results for of 677 SNPs in extreme divergent groups of pigs chosen according to the extreme estimated breeding values for back fat thickness. SNPs were identified by resequencing, literature mining and in silico database mining. analysis, data reported in the literature of 60 candidates genes for obesity. Genotyping was carried out using the GoldenGate (Illumina) platform. Of the analyzed SNPs more that 300 were polymorphic in the genotyped population and had minor allele frequency (MAF) >0.05. Of these SNPs, 65 were associated (P<0.10) with back fat thickness. One of the most significant gene marker was the same TBC1D1 SNPs reported in chapter 5, confirming the role of this gene in fat deposition in pig. These results could be important to better define the pig as a model for human obesity other than for marker assisted selection to improve carcass characteristics.
Resumo:
Animal neocentromeres are defined as ectopic centromeres that have formed in non-centromeric locations and avoid some of the features, like the DNA satellite sequence, that normally characterize canonical centromeres. Despite this, they are stable functional centromeres inherited through generations. The only existence of neocentromeres provide convincing evidence that centromere specification is determined by epigenetic rather than sequence-specific mechanisms. For all this reasons, we used them as simplified models to investigate the molecular mechanisms that underlay the formation and the maintenance of functional centromeres. We collected human cell lines carrying neocentromeres in different positions. To investigate the region involved in the process at the DNA sequence level we applied a recent technology that integrates Chromatin Immuno-Precipitation and DNA microarrays (ChIP-on-chip) using rabbit polyclonal antibodies directed against CENP-A or CENP-C human centromeric proteins. These DNA binding-proteins are required for kinetochore function and are exclusively targeted to functional centromeres. Thus, the immunoprecipitation of DNA bound by these proteins allows the isolation of centromeric sequences, including those of the neocentromeres. Neocentromeres arise even in protein-coding genes region. We further analyzed if the increased scaffold attachment sites and the corresponding tighter chromatin of the region involved in the neocentromerization process still were permissive or not to transcription of within encoded genes. Centromere repositioning is a phenomenon in which a neocentromere arisen without altering the gene order, followed by the inactivation of the canonical centromere, becomes fixed in population. It is a process of chromosome rearrangement fundamental in evolution, at the bases of speciation. The repeat-free region where the neocentromere initially forms, progressively acquires extended arrays of satellite tandem repeats that may contribute to its functional stability. In this view our attention focalized to the repositioned horse ECA11 centromere. ChIP-on-chip analysis was used to define the region involved and SNPs studies, mapping within the region involved into neocentromerization, were carried on. We have been able to describe the structural polymorphism of the chromosome 11 centromeric domain of Caballus population. That polymorphism was seen even between homologues chromosome of the same cells. That discovery was the first described ever. Genomic plasticity had a fundamental role in evolution. Centromeres are not static packaged region of genomes. The key question that fascinates biologists is to understand how that centromere plasticity could be combined to the stability and maintenance of centromeric function. Starting from the epigenetic point of view that underlies centromere formation, we decided to analyze the RNA content of centromeric chromatin. RNA, as well as secondary chemically modifications that involve both histones and DNA, represents a good candidate to guide somehow the centromere formation and maintenance. Many observations suggest that transcription of centromeric DNA or of other non-coding RNAs could affect centromere formation. To date has been no thorough investigation addressing the identity of the chromatin-associated RNAs (CARs) on a global scale. This prompted us to develop techniques to identify CARs in a genome-wide approach using high-throughput genomic platforms. The future goal of this study will be to focalize the attention on what strictly happens specifically inside centromere chromatin.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
The goal of many plant scientists’ research is to explain natural phenotypic variation in term of simple changes in DNA sequence. DNA-based molecular markers are extensively used for the construction of genome-wide molecular maps and to perform genetic analysis for simple and complex traits. The PhD thesis was divided into two main research lines according to the different approaches adopted. The first research line is to analyze the genetic diversity in an Italian apple germplasm collection for the identification of markers tightly linked to targeted genes by an association genetic method. This made it possible to identify synomym and homonym accessions and triploids. The fruit red skin color trait has been used to test the reliability of the genetic approaches in this species. The second line is related to the development of molecular markers closely linked to the Rvi13 and Rvi5 scab resistance genes, previously mapped on apple’s chromosome 10 and 17 respectively by using the traditional linkage mapping method. Both region have been fine-mapped with various type of markers that could be used for marker-assisted selection in future breeding programs and to isolate the two resistance genes.
Resumo:
From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.
Resumo:
The present study has been carried out with the following objectives: i) To investigate the attributes of source parameters of local and regional earthquakes; ii) To estimate, as accurately as possible, M0, fc, Δσ and their standard errors to infer their relationship with source size; iii) To quantify high-frequency earthquake ground motion and to study the source scaling. This work is based on observational data of micro, small and moderate -earthquakes for three selected seismic sequences, namely Parkfield (CA, USA), Maule (Chile) and Ferrara (Italy). For the Parkfield seismic sequence (CA), a data set of 757 (42 clusters) repeating micro-earthquakes (0 ≤ MW ≤ 2), collected using borehole High Resolution Seismic Network (HRSN), have been analyzed and interpreted. We used the coda methodology to compute spectral ratios to obtain accurate values of fc , Δσ, and M0 for three target clusters (San Francisco, Los Angeles, and Hawaii) of our data. We also performed a general regression on peak ground velocities to obtain reliable seismic spectra of all earthquakes. For the Maule seismic sequence, a data set of 172 aftershocks of the 2010 MW 8.8 earthquake (3.7 ≤ MW ≤ 6.2), recorded by more than 100 temporary broadband stations, have been analyzed and interpreted to quantify high-frequency earthquake ground motion in this subduction zone. We completely calibrated the excitation and attenuation of the ground motion in Central Chile. For the Ferrara sequence, we calculated moment tensor solutions for 20 events from MW 5.63 (the largest main event occurred on May 20 2012), down to MW 3.2 by a 1-D velocity model for the crust beneath the Pianura Padana, using all the geophysical and geological information available for the area. The PADANIA model allowed a numerical study on the characteristics of the ground motion in the thick sediments of the flood plain.
Resumo:
Neisserial Heparin Binding Antigen (NHBA) is a surface-exposed lipoprotein ubiquitously expressed by genetically diverse Neisseria meningitidis strains and is an antigen of the multicomponent protein-based 4CMenB vaccine, able to induce bactericidal antibodies in humans and to bind heparin-like molecules. The aim of this study is to characterize the immunological and functional properties of NHBA. To evaluate immunogenicity and the contribution of aminoacid sequence variability to vaccine coverage, we constructed recombinant isogenic strains that are susceptible to bactericidal killing only by anti-NHBA antibodies and engineered them to express equal levels of selected NHBA peptides. In these recombinant strains, we observed different titres associated with the different peptide variants. These recombinant strains were then further engineered to express NHBA chimeric proteins to investigate the regions important for immunogenicity. In natural strains, anti-NHBA antibodies were found to be cross-protective against strains expressing different peptides. To investigate the functional properties of this antigen, the recombinant purified NHBA protein was tested in in vitro binding studies and was found to be able to bind epithelial cells. The binding was abolished when cells were treated specifically with heparinase III, suggesting that the interaction with the cells is mediated by heparan sulfate proteoglycans (HSPG). Mutation of the Arg-rich tract of NHBA abrogated the binding, confirming the importance of this region in mediating the binding to heparin-like molecules. In a panel of N. meningitidis strains, the deletion of nhba resulted in a reduction of adhesion with respect to each isogenic wild type strain. Furthermore, the adhesion of the wild-type strain was prevented by using anti-NHBA polyclonal sera, demonstrating the specificity of the interaction. These results suggest that NHBA could be a novel meningococcal adhesin contributing to host-cell interaction. Moreover, we analysed NHBA NalP-mediated cleavage in different NHBA peptides and showed that not all NHBA peptides are cleaved.