19 resultados para Annotation informatisée


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This research project is based on the Multimodal Corpus of Chinese Court Interpreting (MUCCCI [mutʃɪ]), a small-scale multimodal corpus on the basis of eight authentic court hearings with Chinese-English interpreting in Mainland China. The corpus has approximately 92,500 word tokens in total. Besides the transcription of linguistic and para-linguistic features, utilizing the facial expression classification rules suggested by Black and Yacoob (1995), MUCCCI also includes approximately 1,200 annotations of facial expressions linked to the six basic types of human emotions, namely, anger, disgust, happiness, surprise, sadness, and fear (Black & Yacoob, 1995). This thesis is an example of conducting qualitative analysis on interpreter-mediated courtroom interactions through a multimodal corpus. In particular, miscommunication events (MEs) and the reasons behind them were investigated in detail. During the analysis, although queries were conducted based on non-verbal annotations when searching for MEs, both verbal and non-verbal features were considered indispensable parts contributing to the entire context. This thesis also includes a detailed description of the compilation process of MUCCCI utilizing ELAN, from data collection to transcription, POS tagging and non-verbal annotation. The research aims at assessing the possibility and feasibility of conducting qualitative analysis through a multimodal corpus of court interpreting. The concept of integrating both verbal and non-verbal features to contribute to the entire context is emphasized. The qualitative analysis focusing on MEs can provide an inspiration for improving court interpreters’ performances. All the constraints and difficulties presented can be regarded as a reference for similar research in the future.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis focuses on automating the time-consuming task of manually counting activated neurons in fluorescent microscopy images, which is used to study the mechanisms underlying torpor. The traditional method of manual annotation can introduce bias and delay the outcome of experiments, so the author investigates a deep-learning-based procedure to automatize this task. The author explores two of the main convolutional-neural-network (CNNs) state-of-the-art architectures: UNet and ResUnet family model, and uses a counting-by-segmentation strategy to provide a justification of the objects considered during the counting process. The author also explores a weakly-supervised learning strategy that exploits only dot annotations. The author quantifies the advantages in terms of data reduction and counting performance boost obtainable with a transfer-learning approach and, specifically, a fine-tuning procedure. The author released the dataset used for the supervised use case and all the pre-training models, and designed a web application to share both the counting process pipeline developed in this work and the models pre-trained on the dataset analyzed in this work.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Digital Scholarly Edition is a conceptually and structurally sophisticated entity. Throughout the centuries, diverse methodologies have been employed to reconstruct a text transmitted through one or multiple sources, resulting in various edition types. With the advent of digital technology in philology, these practices have undergone a significant transformation, compelling scholars to reconsider their approach in light of the web. In the digital age, philologists are expected to possess (too) advanced technical skills to prepare interactive and enriched editions, even though, in most cases, only mechanical or documentary editions are published online. The Śivadharma Database is a web Content Management System (CMS) designed to facilitate the preparation, publication, and updating of Digital Scholarly Editions. By providing scholars with a user-friendly CRUD web application to reconstruct and annotate a text, they can prepare their textus with additional components such as apparatus, notes, translations, citations, and parallels. It is possible by leveraging an annotation system based on HTML and graph data structure. This choice is made because the text entity is multidimensional and multifaceted, even if its sequential presentation constrains it. In particular, editions of South Asian texts of the Śivadharma corpus, the case study of this research, contain a series of phenomena that are difficult to manage formally, such as overlapping hierarchies. Hence, it becomes necessary to establish the data structure best suited to represent this complexity. In Śivadharma Database, the textus is an HTML file readily displayable. Textual fragments, annotated via an interface without requiring philologists to write code and saved in the backend, form the atomic unit of multiple relationships organised in a graph database. This approach enables the formal representation of complex and overlapping textual phenomena, allowing for good annotation expressiveness with minimal effort to learn the relevant technologies during the editing workflow.