6 resultados para Corpus annotation

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The corpus luteum (CL) lifespan is characterized by a rapid growth, differentiation and controlled regression of the luteal tissue, accompanied by an intense angiogenesis and angioregression. Indeed, the CL is one of the most highly vascularised tissue in the body with a proliferation rate of the endothelial cells 4- to 20-fold more intense than in some of the most malignant human tumours. This angiogenic process should be rigorously controlled to allow the repeated opportunities of fertilization. After a first period of rapid growth, the tissue becomes stably organized and prepares itself to switch to the phenotype required for its next apoptotic regression. In pregnant swine, the lifespan of the CLs must be extended to support embryonic and foetal development and vascularisation is necessary for the maintenance of luteal function. Among the molecules involved in the angiogenesis, Vascular Endothelial Growth Factor (VEGF) is the main regulator, promoting endothelial cells proliferation, differentiation and survival as well as vascular permeability and vessel lumen formation. During vascular invasion and apoptosis process, the remodelling of the extracellular matrix is essential for the correct evolution of the CL, particularly by the action of specific class of proteolytic enzymes known as matrix metalloproteinases (MMPs). Another important factor that plays a role in the processes of angiogenesis and angioregression during the CL formation and luteolysis is the isopeptide Endothelin-1 (ET-1), which is well-known to be a potent vasoconstrictor and mitogen for endothelial cells. The goal of the present thesis was to study the role and regulation of vascularisation in an adult vascular bed. For this purpose, using a precisely controlled in vivo model of swine CL development and regression, we determined the levels of expression of the members of VEGF system (VEGF total and specific isoforms; VEGF receptor-1, VEGFR-1; VEGF receptor-2, VEGFR-2) and ET- 1 system (ET-1; endothelin converting enzyme-1, ECE-1; endothelin receptor type A, ET-A) as well as the activity of the Ca++/Mg++-dependent endonucleases and gelatinases (MMP-2 and MMP-9). Three experiments were conducted to reach such objectives in CLs isolated from ovaries of cyclic, pregnant or fasted gilts. In the Experiment I, we evaluated the influence of acute fasting on VEGF production and VEGF, VEGFR-2, ET-1, ECE-1 and ET-A mRNA expressions in CLs collected on day 6 after ovulation (midluteal phase). The results indicated a down-regulation of VEGF, VEGFR-2, ET-1 and ECE-1 mRNA expression, although no change was observed for VEGF protein. Furthermore, we observed that fasting stimulated steroidogenesis by luteal cells. On the basis of the main effects of VEGF (stimulation of vessel growth and endothelial permeability) and ET-1 (stimulation of endothelial cell proliferation and vasoconstriction, as well as VEGF stimulation), we concluded that feed restriction possibly inhibited luteal vessel development. This could be, at least in part, compensated by a decrease of vasal tone due to a diminution of ET-1, thus ensuring an adequate blood flow and the production of steroids by the luteal cells. In the Experiment II, we investigated the relationship between VEGF, gelatinases and Ca++/Mg++-dependent endonucleases activities with the functional CL stage throughout the oestrous cycle and at pregnancy. The results demonstrated differential patterns of expression of those molecules in correspondence to the different phases of the oestrous cycle. Immediately after ovulation, VEGF mRNA/protein levels and MMP-9 activity are maximal. On days 5–14 after ovulation, VEGF expression and MMP-2 and -9 activities are at basal levels, while Ca++/Mg++-dependent endonuclease levels increased significantly in relation to day 1. Only at luteolysis (day 17), Ca++/Mg++-dependent endonuclease and MMP-2 spontaneous activity increased significantly. At pregnancy, high levels of MMP-9 and VEGF were observed. These results suggested that during the very early luteal phase, high MMPs activities coupled with high VEGF levels drive the tissue to an angiogenic phenotype, allowing CL growth under LH (Luteinising Hormone) stimulus, while during the late luteal phase, low VEGF and elevate MMPs levels may play a role in the apoptotic tissue and extracellular matrix remodelling during structural luteolysis. In the Experiment III, we described the expression patterns of all distinct VEGF isoforms throughout the oestrous cycle. Furthermore, the mRNA expression and protein levels of both VEGF receptors were also evaluated. Four novel VEGF isoforms (VEGF144, VEGF147, VEGF182, and VEGF164b) were found for the first time in swine and the seven identified isoforms presented four different patterns of expression. All isoforms showed their highest mRNA levels in newly formed CLs (day 1), followed by a decrease during mid-late luteal phase (days 10–17), except for VEGF182, VEGF188 and VEGF144 that showed a differential regulation during late luteal phase (day 14) or at luteolysis (day 17). VEGF protein levels paralleled the most expressed and secreted VEGF120 and VEGF164 isoforms. The VEGF receptors mRNAs showed a different pattern of expression in relation to their ligands, increasing between day 1 and 3 and gradually decreasing during the mid-late luteal phase. The differential regulation of some VEGF isoforms principally during the late luteal phase and luteolysis suggested a specific role of VEGF during tissue remodelling process that occurs either for CL maintenance in case of pregnancy or for noncapillary vessel development essential for tissue removal during structural luteolysis. In summary, our findings allow us to determine relationships among factors involved in the angiogenesis and angioregression mechanisms that take place during the formation and regression of the CL. Thus, CL provides a very interesting model for studying such factors in different fields of the basic research.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The construction and use of multimedia corpora has been advocated for a while in the literature as one of the expected future application fields of Corpus Linguistics. This research project represents a pioneering experience aimed at applying a data-driven methodology to the study of the field of AVT, similarly to what has been done in the last few decades in the macro-field of Translation Studies. This research was based on the experience of Forlixt 1, the Forlì Corpus of Screen Translation, developed at the University of Bologna’s Department of Interdisciplinary Studies in Translation, Languages and Culture. As a matter of fact, in order to quantify strategies of linguistic transfer of an AV product, we need to take into consideration not only the linguistic aspect of such a product but all the meaning-making resources deployed in the filmic text. Provided that one major benefit of Forlixt 1 is the combination of audiovisual and textual data, this corpus allows the user to access primary data for scientific investigation, and thus no longer rely on pre-processed material such as traditional annotated transcriptions. Based on this rationale, the first chapter of the thesis sets out to illustrate the state of the art of research in the disciplinary fields involved. The primary objective was to underline the main repercussions on multimedia texts resulting from the interaction of a double support, audio and video, and, accordingly, on procedures, means, and methods adopted in their translation. By drawing on previous research in semiotics and film studies, the relevant codes at work in visual and acoustic channels were outlined. Subsequently, we concentrated on the analysis of the verbal component and on the peculiar characteristics of filmic orality as opposed to spontaneous dialogic production. In the second part, an overview of the main AVT modalities was presented (dubbing, voice-over, interlinguistic and intra-linguistic subtitling, audio-description, etc.) in order to define the different technologies, processes and professional qualifications that this umbrella term presently includes. The second chapter focuses diachronically on various theories’ contribution to the application of Corpus Linguistics’ methods and tools to the field of Translation Studies (i.e. Descriptive Translation Studies, Polysystem Theory). In particular, we discussed how the use of corpora can favourably help reduce the gap existing between qualitative and quantitative approaches. Subsequently, we reviewed the tools traditionally employed by Corpus Linguistics in regard to the construction of traditional “written language” corpora, to assess whether and how they can be adapted to meet the needs of multimedia corpora. In particular, we reviewed existing speech and spoken corpora, as well as multimedia corpora specifically designed to investigate Translation. The third chapter reviews Forlixt 1's main developing steps, from a technical (IT design principles, data query functions) and methodological point of view, by laying down extensive scientific foundations for the annotation methods adopted, which presently encompass categories of pragmatic, sociolinguistic, linguacultural and semiotic nature. Finally, we described the main query tools (free search, guided search, advanced search and combined search) and the main intended uses of the database in a pedagogical perspective. The fourth chapter lists specific compilation criteria retained, as well as statistics of the two sub-corpora, by presenting data broken down by language pair (French-Italian and German-Italian) and genre (cinema’s comedies, television’s soapoperas and crime series). Next, we concentrated on the discussion of the results obtained from the analysis of summary tables reporting the frequency of categories applied to the French-Italian sub-corpus. The detailed observation of the distribution of categories identified in the original and dubbed corpus allowed us to empirically confirm some of the theories put forward in the literature and notably concerning the nature of the filmic text, the dubbing process and Italian dubbed language’s features. This was possible by looking into some of the most problematic aspects, like the rendering of socio-linguistic variation. The corpus equally allowed us to consider so far neglected aspects, such as pragmatic, prosodic, kinetic, facial, and semiotic elements, and their combination. At the end of this first exploration, some specific observations concerning possible macrotranslation trends were made for each type of sub-genre considered (cinematic and TV genre). On the grounds of this first quantitative investigation, the fifth chapter intended to further examine data, by applying ad hoc models of analysis. Given the virtually infinite number of combinations of categories adopted, and of the latter with searchable textual units, three possible qualitative and quantitative methods were designed, each of which was to concentrate on a particular translation dimension of the filmic text. The first one was the cultural dimension, which specifically focused on the rendering of selected cultural references and on the investigation of recurrent translation choices and strategies justified on the basis of the occurrence of specific clusters of categories. The second analysis was conducted on the linguistic dimension by exploring the occurrence of phrasal verbs in the Italian dubbed corpus and by ascertaining the influence on the adoption of related translation strategies of possible semiotic traits, such as gestures and facial expressions. Finally, the main aim of the third study was to verify whether, under which circumstances, and through which modality, graphic and iconic elements were translated into Italian from an original corpus of both German and French films. After having reviewed the main translation techniques at work, an exhaustive account of possible causes for their non-translation was equally provided. By way of conclusion, the discussion of results obtained from the distribution of annotation categories on the French-Italian corpus, as well as the application of specific models of analysis allowed us to underline possible advantages and drawbacks related to the adoption of a corpus-based approach to AVT studies. Even though possible updating and improvement were proposed in order to help solve some of the problems identified, it is argued that the added value of Forlixt 1 lies ultimately in having created a valuable instrument, allowing to carry out empirically-sound contrastive studies that may be usefully replicated on different language pairs and several types of multimedia texts. Furthermore, multimedia corpora can also play a crucial role in L2 and translation teaching, two disciplines in which their use still lacks systematic investigation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Con questa tesi abbiamo messo a punto una metodologia per l'applicazione del "corpus-based approach" allo studio dell'interpretazione simultanea, creando DIRSI-C, un corpus elettronico parallelo (italiano-inglese) e allineato di trascrizioni di registrazioni tratte da convegni medici, mediati da interpreti simultaneisti. Poiché gli interpreti professionisti coinvolti hanno lavorato dalla lingua straniera alla loro lingua materna e viceversa, il fattore direzionalità è il parametro di analisi delle prestazioni degli interpreti secondo i metodi di indagine della linguistica dei corpora. In this doctoral thesis a methodology was developed to fully apply the corpus-based approach to simultaneous interpreting research. DIRSI-C is a parallel (Italian-English/English-Italian) and aligned electronic corpus, containing transcripts of recorded medical international conferences with professional simultaneous interpreters working both from and into their foreign language. Against this backdrop, directionality represents the research parameter used to analyze interpreters' performance by means of corpus linguistics tools.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.