20 resultados para non-coding RNAs (ncRNAs)
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
From the late 1980s, the automation of sequencing techniques and the computer spread gave rise to a flourishing number of new molecular structures and sequences and to proliferation of new databases in which to store them. Here are presented three computational approaches able to analyse the massive amount of publicly avalilable data in order to answer to important biological questions. The first strategy studies the incorrect assignment of the first AUG codon in a messenger RNA (mRNA), due to the incomplete determination of its 5' end sequence. An extension of the mRNA 5' coding region was identified in 477 in human loci, out of all human known mRNAs analysed, using an automated expressed sequence tag (EST)-based approach. Proof-of-concept confirmation was obtained by in vitro cloning and sequencing for GNB2L1, QARS and TDP2 and the consequences for the functional studies are discussed. The second approach analyses the codon bias, the phenomenon in which distinct synonymous codons are used with different frequencies, and, following integration with a gene expression profile, estimates the total number of codons present across all the expressed mRNAs (named here "codonome value") in a given biological condition. Systematic analyses across different pathological and normal human tissues and multiple species shows a surprisingly tight correlation between the codon bias and the codonome bias. The third approach is useful to studies the expression of human autism spectrum disorder (ASD) implicated genes. ASD implicated genes sharing microRNA response elements (MREs) for the same microRNA are co-expressed in brain samples from healthy and ASD affected individuals. The different expression of a recently identified long non coding RNA which have four MREs for the same microRNA could disrupt the equilibrium in this network, but further analyses and experiments are needed.
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
Despite extensive research and introduction of innovative therapy, lung cancer prognosis remains poor, with a five years survival of only 17%. The success of pharmacological treatment is often impaired by drug resistance. Thus, the characterization of response mechanisms to anti-cancer compounds and of the molecular mechanisms supporting lung cancer aggressiveness are crucial for patient’s management. In the first part of this thesis, we characterized the molecular mechanism behind resistance of lung cancer cells to the Inhibitors of the Bromodomain and Extraterminal domain containing Proteins (BETi). Through a CRISPR/Cas9 screening we identified three Hippo Pathway members, LATS2, TAOK1 and NF2 as genes implicated in susceptibility to BETi. These genes confer sensitivity to BETi inhibiting TAZ activity. Conversely, TAZ overexpression increases resistance to BETi. We also displayed that BETi downregulate both YAP, TAZ and TEADs expression in several cancer cell lines, implying a novel BETi-dependent cytotoxic mechanism. In the second part of this work, we attempted to characterize the crosstalk between the TAZ gene and its cognate antisense long-non coding RNA (lncRNA) TAZ-AS202 in lung tumorigenesis. As for TAZ downregulation, TAZ-AS202 silencing impairs NSCLC cells proliferation, migration and invasion, suggesting a pro-tumorigenic function for this lncRNA during lung tumorigenesis. TAZ-AS202 regulates TAZ target genes without altering TAZ expression or localization. This finding implies an uncovered functional cooperation between TAZ and TAZ-AS202. Moreover, we found that the EPH-ephrin signaling receptor EPHB2 is a downstream effector affected by both TAZ and TAZ-AS202 silencing. EPHB2 downregulation significantly attenuates cells proliferation, migration and invasion, suggesting that, at least in part, TAZ-AS202 and TAZ pro-oncogenic activity depends on EPH-ephrin signaling final deregulation. Finally, we started to dissect the mechanism underlying the TAZ-AS202 regulatory activity on EPHB2 in lung cancer, which may involve the existence of an intermediate transcription factor and is the object of our ongoing research.
Resumo:
The thesis deals with channel coding theory applied to upper layers in the protocol stack of a communication link and it is the outcome of four year research activity. A specific aspect of this activity has been the continuous interaction between the natural curiosity related to the academic blue-sky research and the system oriented design deriving from the collaboration with European industry in the framework of European funded research projects. In this dissertation, the classical channel coding techniques, that are traditionally applied at physical layer, find their application at upper layers where the encoding units (symbols) are packets of bits and not just single bits, thus explaining why such upper layer coding techniques are usually referred to as packet layer coding. The rationale behind the adoption of packet layer techniques is in that physical layer channel coding is a suitable countermeasure to cope with small-scale fading, while it is less efficient against large-scale fading. This is mainly due to the limitation of the time diversity inherent in the necessity of adopting a physical layer interleaver of a reasonable size so as to avoid increasing the modem complexity and the latency of all services. Packet layer techniques, thanks to the longer codeword duration (each codeword is composed of several packets of bits), have an intrinsic longer protection against long fading events. Furthermore, being they are implemented at upper layer, Packet layer techniques have the indisputable advantages of simpler implementations (very close to software implementation) and of a selective applicability to different services, thus enabling a better matching with the service requirements (e.g. latency constraints). Packet coding technique improvement has been largely recognized in the recent communication standards as a viable and efficient coding solution: Digital Video Broadcasting standards, like DVB-H, DVB-SH, and DVB-RCS mobile, and 3GPP standards (MBMS) employ packet coding techniques working at layers higher than the physical one. In this framework, the aim of the research work has been the study of the state-of-the-art coding techniques working at upper layer, the performance evaluation of these techniques in realistic propagation scenario, and the design of new coding schemes for upper layer applications. After a review of the most important packet layer codes, i.e. Reed Solomon, LDPC and Fountain codes, in the thesis focus our attention on the performance evaluation of ideal codes (i.e. Maximum Distance Separable codes) working at UL. In particular, we analyze the performance of UL-FEC techniques in Land Mobile Satellite channels. We derive an analytical framework which is a useful tool for system design allowing to foresee the performance of the upper layer decoder. We also analyze a system in which upper layer and physical layer codes work together, and we derive the optimal splitting of redundancy when a frequency non-selective slowly varying fading channel is taken into account. The whole analysis is supported and validated through computer simulation. In the last part of the dissertation, we propose LDPC Convolutional Codes (LDPCCC) as possible coding scheme for future UL-FEC application. Since one of the main drawbacks related to the adoption of packet layer codes is the large decoding latency, we introduce a latency-constrained decoder for LDPCCC (called windowed erasure decoder). We analyze the performance of the state-of-the-art LDPCCC when our decoder is adopted. Finally, we propose a design rule which allows to trade-off performance and latency.