16 resultados para Domain-specific language
em AMS Tesi di Laurea - Alm@DL - Università di Bologna
Resumo:
La tesi ha lo scopo di indagare le tecnologie disponibili per la realizzazione di linguaggi di programmazione e linguaggi domain specific in ambiente Java. In particolare, vengono proposti e analizzati tre strumenti presenti sul mercato: JavaCC, ANTLR e Xtext. Al termine dell’elaborato, il lettore dovrebbe avere un’idea generale dei principali meccanismi e sistemi utilizzati (come lexer, parser, AST, parse trees, etc.), oltre che del funzionamento dei tre tools presentati. Inoltre, si vogliono individuare vantaggi e svantaggi di ciascuno strumento attraverso un’analisi delle funzionalità offerte, così da fornire un giudizio critico per la scelta e la valutazione dei sistemi da utilizzare.
Resumo:
Scopo di questo elaborato di tesi è la modellazione e l’implementazione di una estensione del simulatore Alchemist, denominata Biochemistry, che permetta di simulare un ambiente multi-cellulare. Al fine di simulare il maggior numero possibile di processi biologici, il simulatore dovrà consentire di modellare l’eterogeneità cellulare attraverso la modellazione di diversi aspetti dei sistemi cellulari, quali: reazioni intracellulari, segnalazione tra cellule adiacenti, giunzioni cellulari e movimento. Dovrà, inoltre, essere ammissibile anche l’esecuzione di azioni impossibili nel mondo reale, come la distruzione o la creazione dal nulla di molecole chimiche. In maniera più specifica si sono modellati ed implementati i seguenti processi biochimici: creazione e distruzione di molecole chimiche, reazioni biochimiche intracellulari, scambio di molecole tra cellule adiacenti, creazione e distruzione di giunzioni cellulari. È stata dunque posta particolare enfasi nella modellazione delle reazioni tra cellule vicine, il cui meccanismo è simile a quello usato nella segnalazione cellulare. Ogni parte del sistema è stata modellata seguendo fenomeni realmente presenti nei sistemi multi-cellulari, e documentati in letteratura. Per la specifica delle reazioni chimiche, date in ingresso alla simulazione, è stata necessaria l’implementazione di un Domain Specific Language (DSL) che consente la scrittura di reazioni in modo simile al linguaggio naturale, consentendo l’uso del simulatore anche a persone senza particolari conoscenze di biologia. La correttezza del progetto è stata validata tramite test compiuti con dati presenti in letteratura e inerenti a processi biologici noti e ampiamente studiati.
Resumo:
Blazor è un innovativo framework di Microsoft per lo sviluppo di applicazioni web in C#, HTML e CSS. Questo framework non possiede un designer visuale, ovvero un supporto grafico "drag-and-drop" alla creazione delle web applications. Questa tesi affronta la progettazione e la prototipazione di "Blazor Designer", un DSL (Domain-Specific Language) grafico a supporto dello sviluppo applicazioni web a pagina singola (SPA) sviluppato in collaborazione con IPREL Progetti srl, società del gruppo SACMI. Nella tesi si fa una analisi delle tecnologie messe a disposizione da Blazor, compreso WebAssembly, si discutono le caratteristiche e i vantaggi dei DSL, si descrive la progettazione e l'implementazione di "Blazor Designer" come estensione di Visual Studio. La conclusione riassume i risultati raggiunti, i limiti e le opportunità future: un DSL è effettivamente in grado di rendere più user-friendly e semplice lo sviluppo, ma lo strumento deve essere integrato per essere sfruttato pienamente.
Resumo:
Nowadays the idea of injecting world or domain-specific structured knowledge into pre-trained language models (PLMs) is becoming an increasingly popular approach for solving problems such as biases, hallucinations, huge architectural sizes, and explainability lack—critical for real-world natural language processing applications in sensitive fields like bioinformatics. One recent work that has garnered much attention in Neuro-symbolic AI is QA-GNN, an end-to-end model for multiple-choice open-domain question answering (MCOQA) tasks via interpretable text-graph reasoning. Unlike previous publications, QA-GNN mutually informs PLMs and graph neural networks (GNNs) on top of relevant facts retrieved from knowledge graphs (KGs). However, taking a more holistic view, existing PLM+KG contributions mainly consider commonsense benchmarks and ignore or shallowly analyze performances on biomedical datasets. This thesis start from a propose of a deep investigation of QA-GNN for biomedicine, comparing existing or brand-new PLMs, KGs, edge-aware GNNs, preprocessing techniques, and initialization strategies. By combining the insights emerged in DISI's research, we introduce Bio-QA-GNN that include a KG. Working with this part has led to an improvement in state-of-the-art of MCOQA model on biomedical/clinical text, largely outperforming the original one (+3.63\% accuracy on MedQA). Our findings also contribute to a better understanding of the explanation degree allowed by joint text-graph reasoning architectures and their effectiveness on different medical subjects and reasoning types. Codes, models, datasets, and demos to reproduce the results are freely available at: \url{https://github.com/disi-unibo-nlp/bio-qagnn}.
Resumo:
Ontology design and population -core aspects of semantic technologies- re- cently have become fields of great interest due to the increasing need of domain-specific knowledge bases that can boost the use of Semantic Web. For building such knowledge resources, the state of the art tools for ontology design require a lot of human work. Producing meaningful schemas and populating them with domain-specific data is in fact a very difficult and time-consuming task. Even more if the task consists in modelling knowledge at a web scale. The primary aim of this work is to investigate a novel and flexible method- ology for automatically learning ontology from textual data, lightening the human workload required for conceptualizing domain-specific knowledge and populating an extracted schema with real data, speeding up the whole ontology production process. Here computational linguistics plays a fundamental role, from automati- cally identifying facts from natural language and extracting frame of relations among recognized entities, to producing linked data with which extending existing knowledge bases or creating new ones. In the state of the art, automatic ontology learning systems are mainly based on plain-pipelined linguistics classifiers performing tasks such as Named Entity recognition, Entity resolution, Taxonomy and Relation extraction [11]. These approaches present some weaknesses, specially in capturing struc- tures through which the meaning of complex concepts is expressed [24]. Humans, in fact, tend to organize knowledge in well-defined patterns, which include participant entities and meaningful relations linking entities with each other. In literature, these structures have been called Semantic Frames by Fill- 6 Introduction more [20], or more recently as Knowledge Patterns [23]. Some NLP studies has recently shown the possibility of performing more accurate deep parsing with the ability of logically understanding the structure of discourse [7]. In this work, some of these technologies have been investigated and em- ployed to produce accurate ontology schemas. The long-term goal is to collect large amounts of semantically structured information from the web of crowds, through an automated process, in order to identify and investigate the cognitive patterns used by human to organize their knowledge.
Resumo:
The aim of this dissertation is to provide a translation from English into Italian of a specialised scientific article published in the Cambridge Working Papers in Economics series. In this text, the authors estimate the economic consequences of the earthquake that hit the Abruzzo region in 2009. An extract of this translation will be published as part of conference proceedings. The main reason behind this choice is a personal interest in specialised translation in the economic domain. Moreover, the subject of the article is of particular interest to the Italian readership. The aim of this study is to show how a non-specialised translator can tackle with such a highly specialised translation with the use of appropriate terminology resources and the collaboration of field experts. The translation could be of help to other Italian linguists looking for translated material in this particular domain where English seems to be the dominant language. In order to ensure consistent terminology and adequate style, the document has been translated with the use of different resources, such as dictionaries, glossaries and specialised corpora. I also contacted field experts and the authors of text. The collaboration with the authors proved to be an invaluable resource yet one to be carefully managed. This work is divided into 5 chapters. The first deals with domain-specific sublanguages. The second gives an overview of corpus linguistics and describes the corpora designed for the translation. The third provides an analysis of the article, focusing on syntactical, lexical and structural features while the fourth presents the translation, side-by-side with the source text. The fifth comments on the main difficulties encountered in the translation and the strategies used, as well as the relationship with the authors and their review of the published text. Appendix I contains the econometric glossary English – Italian.
Resumo:
In today’s globalized world, air travel is one of the fastest growing markets. Millions of aircrafts take off and then touch down all around the world each day. This well-synchronized symphony, however, is much more complex than it seems, and communication – language - plays a crucial role during a plane’s journey. Misunderstandings and miscommunications can have disastrous effects, so the adoption of a standard phraseology to be used during flight is a means to overcome language barriers, avoid ambiguous expressions and guarantee a safe and effective operation of an aircraft. Little is known about the interaction that goes on between pilots and air traffic controllers (ATCOs), and even though the language of aviation is English, cockpit communication can be hard to understand for people who are not familiar with this specific language. The scope of this thesis is to examine the origins of this uncommon language, the characteristics and peculiarities of air communication and to shed a little light on this mystery called Aviation English.
Resumo:
The ability to create hybrid systems that blend different paradigms has now become a requirement for complex AI systems usually made of more than a component. In this way, it is possible to exploit the advantages of each paradigm and exploit the potential of different approaches such as symbolic and non-symbolic approaches. In particular, symbolic approaches are often exploited for their efficiency, effectiveness and ability to manage large amounts of data, while symbolic approaches are exploited to ensure aspects related to explainability, fairness, and trustworthiness in general. The thesis lies in this context, in particular in the design and development of symbolic technologies that can be easily integrated and interoperable with other AI technologies. 2P-Kt is a symbolic ecosystem developed for this purpose, it provides a logic-programming (LP) engine which can be easily extended and customized to deal with specific needs. The aim of this thesis is to extend 2P-Kt to support constraint logic programming (CLP) as one of the main paradigms for solving highly combinatorial problems given a declarative problem description and a general constraint-propagation engine. A real case study concerning school timetabling is described to show a practical usage of the CLP(FD) library implemented. Since CLP represents only a particular scenario for extending LP to domain-specific scenarios, in this thesis we present also a more general framework: Labelled Prolog, extending LP with labelled terms and in particular labelled variables. The designed framework shows how it is possible to frame all variations and extensions of LP under a single language reducing the huge amount of existing languages and libraries and focusing more on how to manage different domain needs using labels which can be associated with every kind of term. Mapping of CLP into Labeled Prolog is also discussed as well as the benefits of the provided approach.
Towards model driven software development for Arduino platforms: a DSL and automatic code generation
Resumo:
La tesi ha lo scopo di esplorare la produzione di sistemi software per Embedded Systems mediante l'utilizzo di tecniche relative al mondo del Model Driven Software Development. La fase più importante dello sviluppo sarà la definizione di un Meta-Modello che caratterizza i concetti fondamentali relativi agli embedded systems. Tale modello cercherà di astrarre dalla particolare piattaforma utilizzata ed individuare quali astrazioni caratterizzano il mondo degli embedded systems in generale. Tale meta-modello sarà quindi di tipo platform-independent. Per la generazione automatica di codice è stata adottata una piattaforma di riferimento, cioè Arduino. Arduino è un sistema embedded che si sta sempre più affermando perché coniuga un buon livello di performance ed un prezzo relativamente basso. Tale piattaforma permette lo sviluppo di sistemi special purpose che utilizzano sensori ed attuatori di vario genere, facilmente connessi ai pin messi a disposizione. Il meta-modello definito è un'istanza del meta-metamodello MOF, definito formalmente dall'organizzazione OMG. Questo permette allo sviluppatore di pensare ad un sistema sotto forma di modello, istanza del meta-modello definito. Un meta-modello può essere considerato anche come la sintassi astratta di un linguaggio, quindi può essere definito da un insieme di regole EBNF. La tecnologia utilizzata per la definizione del meta-modello è stata Xtext: un framework che permette la scrittura di regole EBNF e che genera automaticamente il modello Ecore associato al meta-modello definito. Ecore è l'implementazione di EMOF in ambiente Eclipse. Xtext genera inoltre dei plugin che permettono di avere un editor guidato dalla sintassi, definita nel meta-modello. La generazione automatica di codice è stata realizzata usando il linguaggio Xtend2. Tale linguaggio permette di esplorare l'Abstract Syntax Tree generato dalla traduzione del modello in Ecore e di generare tutti i file di codice necessari. Il codice generato fornisce praticamente tutta la schematic part dell'applicazione, mentre lascia all'application designer lo sviluppo della business logic. Dopo la definizione del meta-modello di un sistema embedded, il livello di astrazione è stato spostato più in alto, andando verso la definizione della parte di meta-modello relativa all'interazione di un sistema embedded con altri sistemi. Ci si è quindi spostati verso un ottica di Sistema, inteso come insieme di sistemi concentrati che interagiscono. Tale difinizione viene fatta dal punto di vista del sistema concentrato di cui si sta definendo il modello. Nella tesi viene inoltre introdotto un caso di studio che, anche se abbastanza semplice, fornisce un esempio ed un tutorial allo sviluppo di applicazioni mediante l'uso del meta-modello. Ci permette inoltre di notare come il compito dell'application designer diventi piuttosto semplice ed immediato, sempre se basato su una buona analisi del problema. I risultati ottenuti sono stati di buona qualità ed il meta-modello viene tradotto in codice che funziona correttamente.
Resumo:
Artificial Intelligence (AI) is gaining ever more ground in every sphere of human life, to the point that it is now even used to pass sentences in courts. The use of AI in the field of Law is however deemed quite controversial, as it could provide more objectivity yet entail an abuse of power as well, given that bias in algorithms behind AI may cause lack of accuracy. As a product of AI, machine translation is being increasingly used in the field of Law too in order to translate laws, judgements, contracts, etc. between different languages and different legal systems. In the legal setting of Company Law, accuracy of the content and suitability of terminology play a crucial role within a translation task, as any addition or omission of content or mistranslation of terms could entail legal consequences for companies. The purpose of the present study is to first assess which neural machine translation system between DeepL and ModernMT produces a more suitable translation from Italian into German of the atto costitutivo of an Italian s.r.l. in terms of accuracy of the content and correctness of terminology, and then to assess which translation proves to be closer to a human reference translation. In order to achieve the above-mentioned aims, two human and automatic evaluations are carried out based on the MQM taxonomy and the BLEU metric. Results of both evaluations show an overall better performance delivered by ModernMT in terms of content accuracy, suitability of terminology, and closeness to a human translation. As emerged from the MQM-based evaluation, its accuracy and terminology errors account for just 8.43% (as opposed to DeepL’s 9.22%), while it obtains an overall BLEU score of 29.14 (against DeepL’s 27.02). The overall performances however show that machines still face barriers in overcoming semantic complexity, tackling polysemy, and choosing domain-specific terminology, which suggests that the discrepancy with human translation may still be remarkable.
Resumo:
Magnetic Resonance Spectroscopy (MRS) is an advanced clinical and research application which guarantees a specific biochemical and metabolic characterization of tissues by the detection and quantification of key metabolites for diagnosis and disease staging. The "Associazione Italiana di Fisica Medica (AIFM)" has promoted the activity of the "Interconfronto di spettroscopia in RM" working group. The purpose of the study is to compare and analyze results obtained by perfoming MRS on scanners of different manufacturing in order to compile a robust protocol for spectroscopic examinations in clinical routines. This thesis takes part into this project by using the GE Signa HDxt 1.5 T at the Pavillion no. 11 of the S.Orsola-Malpighi hospital in Bologna. The spectral analyses have been performed with the jMRUI package, which includes a wide range of preprocessing and quantification algorithms for signal analysis in the time domain. After the quality assurance on the scanner with standard and innovative methods, both spectra with and without suppression of the water peak have been acquired on the GE test phantom. The comparison of the ratios of the metabolite amplitudes over Creatine computed by the workstation software, which works on the frequencies, and jMRUI shows good agreement, suggesting that quantifications in both domains may lead to consistent results. The characterization of an in-house phantom provided by the working group has achieved its goal of assessing the solution content and the metabolite concentrations with good accuracy. The goodness of the experimental procedure and data analysis has been demonstrated by the correct estimation of the T2 of water, the observed biexponential relaxation curve of Creatine and the correct TE value at which the modulation by J coupling causes the Lactate doublet to be inverted in the spectrum. The work of this thesis has demonstrated that it is possible to perform measurements and establish protocols for data analysis, based on the physical principles of NMR, which are able to provide robust values for the spectral parameters of clinical use.
Resumo:
In any terminological study, candidate term extraction is a very time-consuming task. Corpus analysis tools have automatized some processes allowing the detection of relevant data within the texts, facilitating term candidate selection as well. Nevertheless, these tools are (normally) not specific for terminology research; therefore, the units which are automatically extracted need manual evaluation. Over the last few years some software products have been specifically developed for automatic term extraction. They are based on corpus analysis, but use linguistic and statistical information to filter data more precisely. As a result, the time needed for manual evaluation is reduced. In this framework, we tried to understand if and how these new tools can really be an advantage. In order to develop our project, we simulated a terminology study: we chose a domain (i.e. legal framework for medicinal products for human use) and compiled a corpus from which we extracted terms and phraseologisms using AntConc, a corpus analysis tool. Afterwards, we compared our list with the lists extracted automatically from three different tools (TermoStat Web, TaaS e Sketch Engine) in order to evaluate their performance. In the first chapter we describe some principles relating to terminology and phraseology in language for special purposes and show the advantages offered by corpus linguistics. In the second chapter we illustrate some of the main concepts of the domain selected, as well as some of the main features of legal texts. In the third chapter we describe automatic term extraction and the main criteria to evaluate it; moreover, we introduce the term-extraction tools used for this project. In the fourth chapter we describe our research method and, in the fifth chapter, we show our results and draw some preliminary conclusions on the performance and usefulness of term-extraction tools.
Resumo:
Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.