952 resultados para Noncoding Sequences
Resumo:
Abstract Background RNAs transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression. However, the complement of human genes in which introns are transcribed, and the number of intronic transcriptional units and their tissue expression patterns are not known. Results A survey of mRNA and EST public databases revealed more than 55,000 totally intronic noncoding (TIN) RNAs transcribed from the introns of 74% of all unique RefSeq genes. Guided by this information, we designed an oligoarray platform containing sense and antisense probes for each of 7,135 randomly selected TIN transcripts plus the corresponding protein-coding genes. We identified exonic and intronic tissue-specific expression signatures for human liver, prostate and kidney. The most highly expressed antisense TIN RNAs were transcribed from introns of protein-coding genes significantly enriched (p = 0.002 to 0.022) in the 'Regulation of transcription' Gene Ontology category. RNA polymerase II inhibition resulted in increased expression of a fraction of intronic RNAs in cell cultures, suggesting that other RNA polymerases may be involved in their biosynthesis. Members of a subset of intronic and protein-coding signatures transcribed from the same genomic loci have correlated expression patterns, suggesting that intronic RNAs regulate the abundance or the pattern of exon usage in protein-coding messages. Conclusion We have identified diverse intronic RNA expression patterns, pointing to distinct regulatory roles. This gene-oriented approach, using a combined intron-exon oligoarray, should permit further comparative analysis of intronic transcription under various physiological and pathological conditions, thus advancing current knowledge about the biological functions of these noncoding RNAs.
Resumo:
Abstract Background The metabolic capacity for nitrogen fixation is known to be present in several prokaryotic species scattered across taxonomic groups. Experimental detection of nitrogen fixation in microbes requires species-specific conditions, making it difficult to obtain a comprehensive census of this trait. The recent and rapid increase in the availability of microbial genome sequences affords novel opportunities to re-examine the occurrence and distribution of nitrogen fixation genes. The current practice for computational prediction of nitrogen fixation is to use the presence of the nifH and/or nifD genes. Results Based on a careful comparison of the repertoire of nitrogen fixation genes in known diazotroph species we propose a new criterion for computational prediction of nitrogen fixation: the presence of a minimum set of six genes coding for structural and biosynthetic components, namely NifHDK and NifENB. Using this criterion, we conducted a comprehensive search in fully sequenced genomes and identified 149 diazotrophic species, including 82 known diazotrophs and 67 species not known to fix nitrogen. The taxonomic distribution of nitrogen fixation in Archaea was limited to the Euryarchaeota phylum; within the Bacteria domain we predict that nitrogen fixation occurs in 13 different phyla. Of these, seven phyla had not hitherto been known to contain species capable of nitrogen fixation. Our analyses also identified protein sequences that are similar to nitrogenase in organisms that do not meet the minimum-gene-set criteria. The existence of nitrogenase-like proteins lacking conserved co-factor ligands in both diazotrophs and non-diazotrophs suggests their potential for performing other, as yet unidentified, metabolic functions. Conclusions Our predictions expand the known phylogenetic diversity of nitrogen fixation, and suggest that this trait may be much more common in nature than it is currently thought. The diverse phylogenetic distribution of nitrogenase-like proteins indicates potential new roles for anciently duplicated and divergent members of this group of enzymes.
Resumo:
Abstract Background Pancreatic ductal adenocarcinoma (PDAC) is known by its aggressiveness and lack of effective therapeutic options. Thus, improvement in current knowledge of molecular changes associated with pancreatic cancer is urgently needed to explore novel venues of diagnostics and treatment of this dismal disease. While there is mounting evidence that long noncoding RNAs (lncRNAs) transcribed from intronic and intergenic regions of the human genome may play different roles in the regulation of gene expression in normal and cancer cells, their expression pattern and biological relevance in pancreatic cancer is currently unknown. In the present work we investigated the relative abundance of a collection of lncRNAs in patients' pancreatic tissue samples aiming at identifying gene expression profiles correlated to pancreatic cancer and metastasis. Methods Custom 3,355-element spotted cDNA microarray interrogating protein-coding genes and putative lncRNA were used to obtain expression profiles from 38 clinical samples of tumor and non-tumor pancreatic tissues. Bioinformatics analyses were performed to characterize structure and conservation of lncRNAs expressed in pancreatic tissues, as well as to identify expression signatures correlated to tissue histology. Strand-specific reverse transcription followed by PCR and qRT-PCR were employed to determine strandedness of lncRNAs and to validate microarray results, respectively. Results We show that subsets of intronic/intergenic lncRNAs are expressed across tumor and non-tumor pancreatic tissue samples. Enrichment of promoter-associated chromatin marks and over-representation of conserved DNA elements and stable secondary structure predictions suggest that these transcripts are generated from independent transcriptional units and that at least a fraction is under evolutionary selection, and thus potentially functional. Statistically significant expression signatures comprising protein-coding mRNAs and lncRNAs that correlate to PDAC or to pancreatic cancer metastasis were identified. Interestingly, loci harboring intronic lncRNAs differentially expressed in PDAC metastases were enriched in genes associated to the MAPK pathway. Orientation-specific RT-PCR documented that intronic transcripts are expressed in sense, antisense or both orientations relative to protein-coding mRNAs. Differential expression of a subset of intronic lncRNAs (PPP3CB, MAP3K14 and DAPK1 loci) in metastatic samples was confirmed by Real-Time PCR. Conclusion Our findings reveal sets of intronic lncRNAs expressed in pancreatic tissues whose abundance is correlated to PDAC or metastasis, thus pointing to the potential relevance of this class of transcripts in biological processes related to malignant transformation and metastasis in pancreatic cancer.
Resumo:
The down-regulation of the tumor-suppressor gene RASSF1A has been shown to increase cell proliferation in several tumors. RASSF1A expression is regulated through epigenetic events involving the polycomb repressive complex 2 (PRC2); however, the molecular mechanisms modulating the recruitment of this epigenetic modifier to the RASSF1 locus remain largely unknown. Here, we identify and characterize ANRASSF1, an endogenous unspliced long noncoding RNA (lncRNA) that is transcribed from the opposite strand on the RASSF1 gene locus in several cell lines and tissues and binds PRC2. ANRASSF1 is transcribed through RNA polymerase II and is 5'-capped and polyadenylated; it exhibits nuclear localization and has a shorter half-life compared with other lncRNAs that bind PRC2. ANRASSF1 endogenous expression is higher in breast and prostate tumor cell lines compared with non-tumor, and an opposite pattern is observed for RASSF1A. ANRASSF1 ectopic overexpression reduces RASSF1A abundance and increases the proliferation of HeLa cells, whereas ANRASSF1 silencing causes the opposite effects. These changes in ANRASSF1 levels do not affect the RASSF1C isoform abundance. ANRASSF1 overexpression causes a marked increase in both PRC2 occupancy and histone H3K27me3 repressive marks, specifically at the RASSF1A promoter region. No effect of ANRASSF1 overexpression was detected on PRC2 occupancy and histone H3K27me3 at the promoter regions of RASSF1C and the four other neighboring genes, including two well-characterized tumor suppressor genes. Additionally, we demonstrated that ANRASSF1 forms an RNA/DNA hybrid and recruits PRC2 to the RASSF1A promoter. Together, these results demonstrate a novel mechanism of epigenetic repression of the RASSF1A tumor suppressor gene involving antisense unspliced lncRNA, in which ANRASSF1 selectively represses the expression of the RASSF1 isoform overlapping the antisense transcript in a location-specific manner. In a broader perspective, our findings suggest that other non-characterized unspliced intronic lncRNAs transcribed in the human genome might contribute to a location-specific epigenetic modulation of genes.
Resumo:
[EN] We present in this paper a variational approach to accurately estimate simultaneously the velocity field and its derivatives directly from PIV image sequences. Our method differs from other techniques that have been presented in the literature in the fact that the energy minimization used to estimate the particles motion depends on a second order Taylor development of the flow. In this way, we are not only able to compute the motion vector field, but we also obtain an accurate estimation of their derivatives. Hence, we avoid the use of numerical schemes to compute the derivatives from the estimated flow that usually yield to numerical amplification of the inherent uncertainty on the estimated flow. The performance of our approach is illustrated with the estimation of the motion vector field and the vorticity on both synthetic and real PIV datasets.
Resumo:
Facial expression recognition is one of the most challenging research areas in the image recognition ¯eld and has been actively studied since the 70's. For instance, smile recognition has been studied due to the fact that it is considered an important facial expression in human communication, it is therefore likely useful for human–machine interaction. Moreover, if a smile can be detected and also its intensity estimated, it will raise the possibility of new applications in the future
Resumo:
Forecasting the time, location, nature, and scale of volcanic eruptions is one of the most urgent aspects of modern applied volcanology. The reliability of probabilistic forecasting procedures is strongly related to the reliability of the input information provided, implying objective criteria for interpreting the historical and monitoring data. For this reason both, detailed analysis of past data and more basic research into the processes of volcanism, are fundamental tasks of a continuous information-gain process; in this way the precursor events of eruptions can be better interpreted in terms of their physical meanings with correlated uncertainties. This should lead to better predictions of the nature of eruptive events. In this work we have studied different problems associated with the long- and short-term eruption forecasting assessment. First, we discuss different approaches for the analysis of the eruptive history of a volcano, most of them generally applied for long-term eruption forecasting purposes; furthermore, we present a model based on the characteristics of a Brownian passage-time process to describe recurrent eruptive activity, and apply it for long-term, time-dependent, eruption forecasting (Chapter 1). Conversely, in an effort to define further monitoring parameters as input data for short-term eruption forecasting in probabilistic models (as for example, the Bayesian Event Tree for eruption forecasting -BET_EF-), we analyze some characteristics of typical seismic activity recorded in active volcanoes; in particular, we use some methodologies that may be applied to analyze long-period (LP) events (Chapter 2) and volcano-tectonic (VT) seismic swarms (Chapter 3); our analysis in general are oriented toward the tracking of phenomena that can provide information about magmatic processes. Finally, we discuss some possible ways to integrate the results presented in Chapters 1 (for long-term EF), 2 and 3 (for short-term EF) in the BET_EF model (Chapter 4).
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
It is usual to hear a strange short sentence: «Random is better than...». Why is randomness a good solution to a certain engineering problem? There are many possible answers, and all of them are related to the considered topic. In this thesis I will discuss about two crucial topics that take advantage by randomizing some waveforms involved in signals manipulations. In particular, advantages are guaranteed by shaping the second order statistic of antipodal sequences involved in an intermediate signal processing stages. The first topic is in the area of analog-to-digital conversion, and it is named Compressive Sensing (CS). CS is a novel paradigm in signal processing that tries to merge signal acquisition and compression at the same time. Consequently it allows to direct acquire a signal in a compressed form. In this thesis, after an ample description of the CS methodology and its related architectures, I will present a new approach that tries to achieve high compression by design the second order statistics of a set of additional waveforms involved in the signal acquisition/compression stage. The second topic addressed in this thesis is in the area of communication system, in particular I focused the attention on ultra-wideband (UWB) systems. An option to produce and decode UWB signals is direct-sequence spreading with multiple access based on code division (DS-CDMA). Focusing on this methodology, I will address the coexistence of a DS-CDMA system with a narrowband interferer. To do so, I minimize the joint effect of both multiple access (MAI) and narrowband (NBI) interference on a simple matched filter receiver. I will show that, when spreading sequence statistical properties are suitably designed, performance improvements are possible with respect to a system exploiting chaos-based sequences minimizing MAI only.
Resumo:
Most electronic systems can be described in a very simplified way as an assemblage of analog and digital components put all together in order to perform a certain function. Nowadays, there is an increasing tendency to reduce the analog components, and to replace them by operations performed in the digital domain. This tendency has led to the emergence of new electronic systems that are more flexible, cheaper and robust. However, no matter the amount of digital process implemented, there will be always an analog part to be sorted out and thus, the step of converting digital signals into analog signals and vice versa cannot be avoided. This conversion can be more or less complex depending on the characteristics of the signals. Thus, even if it is desirable to replace functions carried out by analog components by digital processes, it is equally important to do so in a way that simplifies the conversion from digital to analog signals and vice versa. In the present thesis, we have study strategies based on increasing the amount of processing in the digital domain in such a way that the implementation of analog hardware stages can be simplified. To this aim, we have proposed the use of very low quantized signals, i.e. 1-bit, for the acquisition and for the generation of particular classes of signals.
Resumo:
The present study has been carried out with the following objectives: i) To investigate the attributes of source parameters of local and regional earthquakes; ii) To estimate, as accurately as possible, M0, fc, Δσ and their standard errors to infer their relationship with source size; iii) To quantify high-frequency earthquake ground motion and to study the source scaling. This work is based on observational data of micro, small and moderate -earthquakes for three selected seismic sequences, namely Parkfield (CA, USA), Maule (Chile) and Ferrara (Italy). For the Parkfield seismic sequence (CA), a data set of 757 (42 clusters) repeating micro-earthquakes (0 ≤ MW ≤ 2), collected using borehole High Resolution Seismic Network (HRSN), have been analyzed and interpreted. We used the coda methodology to compute spectral ratios to obtain accurate values of fc , Δσ, and M0 for three target clusters (San Francisco, Los Angeles, and Hawaii) of our data. We also performed a general regression on peak ground velocities to obtain reliable seismic spectra of all earthquakes. For the Maule seismic sequence, a data set of 172 aftershocks of the 2010 MW 8.8 earthquake (3.7 ≤ MW ≤ 6.2), recorded by more than 100 temporary broadband stations, have been analyzed and interpreted to quantify high-frequency earthquake ground motion in this subduction zone. We completely calibrated the excitation and attenuation of the ground motion in Central Chile. For the Ferrara sequence, we calculated moment tensor solutions for 20 events from MW 5.63 (the largest main event occurred on May 20 2012), down to MW 3.2 by a 1-D velocity model for the crust beneath the Pianura Padana, using all the geophysical and geological information available for the area. The PADANIA model allowed a numerical study on the characteristics of the ground motion in the thick sediments of the flood plain.
Resumo:
Questa tesi si inserisce nell'ambito delle analisi statistiche e dei metodi stocastici applicati all'analisi delle sequenze di DNA. Nello specifico il nostro lavoro è incentrato sullo studio del dinucleotide CG (CpG) all'interno del genoma umano, che si trova raggruppato in zone specifiche denominate CpG islands. Queste sono legate alla metilazione del DNA, un processo che riveste un ruolo fondamentale nella regolazione genica. La prima parte dello studio è dedicata a una caratterizzazione globale del contenuto e della distribuzione dei 16 diversi dinucleotidi all'interno del genoma umano: in particolare viene studiata la distribuzione delle distanze tra occorrenze successive dello stesso dinucleotide lungo la sequenza. I risultati vengono confrontati con diversi modelli nulli: sequenze random generate con catene di Markov di ordine zero (basate sulle frequenze relative dei nucleotidi) e uno (basate sulle probabilità di transizione tra diversi nucleotidi) e la distribuzione geometrica per le distanze. Da questa analisi le proprietà caratteristiche del dinucleotide CpG emergono chiaramente, sia dal confronto con gli altri dinucleotidi che con i modelli random. A seguito di questa prima parte abbiamo scelto di concentrare le successive analisi in zone di interesse biologico, studiando l’abbondanza e la distribuzione di CpG al loro interno (CpG islands, promotori e Lamina Associated Domains). Nei primi due casi si osserva un forte arricchimento nel contenuto di CpG, e la distribuzione delle distanze è spostata verso valori inferiori, indicando che questo dinucleotide è clusterizzato. All’interno delle LADs si trovano mediamente meno CpG e questi presentano distanze maggiori. Infine abbiamo adottato una rappresentazione a random walk del DNA, costruita in base al posizionamento dei dinucleotidi: il walk ottenuto presenta caratteristiche drasticamente diverse all’interno e all’esterno di zone annotate come CpG island. Riteniamo pertanto che metodi basati su questo approccio potrebbero essere sfruttati per migliorare l’individuazione di queste aree di interesse nel genoma umano e di altri organismi.