4 resultados para word boundaries

em AMS Tesi di Laurea - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis is developed in the contest of Ritmare project WP1, which main objective is the development of a sustainable fishery through the identification of populations boundaries in commercially important species in Italian Seas. Three main objectives are discussed in order to help reach the main purpose of identification of stock boundaries in Parapenaeus longirostris: 1 -Development of a representative sampling design for Italian seas; 2 -Evaluation of 2b-RAD protocol; 3 -Investigation of populations through biological data analysis. First of all we defined and accomplished a sampling design which properly represents all Italian seas. Then we used information and data about nursery areas distribution, abundance of populations and importance of P. longirostris in local fishery, to develop an experimental design that prioritize the most important areas to maximize the results with actual project funds. We introduced for the first time the use of 2b-RAD on this species, a genotyping method based on sequencing the uniform fragments produced by type IIB restriction endonucleases. Thanks to this method we were able to move from genetics to the more complex genomics. In order to proceed with 2b-RAD we performed several tests to identify the best DNA extraction kit and protocol and finally we were able to extract 192 high quality DNA extracts ready to be processed. We tested 2b-RAD with five samples and after high-throughput sequencing of libraries we used the software “Stacks” to analyze the sequences. We obtained positive results identifying a great number of SNP markers among the five samples. To guarantee a multidisciplinary approach we used the biological data associated to the collected samples to investigate differences between geographical samples. Such approach assures continuity with other project, for instance STOCKMED, which utilize a combination of molecular and biological analysis as well.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

La Word Sense Disambiguation è un problema informatico appartenente al campo di studi del Natural Language Processing, che consiste nel determinare il senso di una parola a seconda del contesto in cui essa viene utilizzata. Se un processo del genere può apparire banale per un essere umano, può risultare d'altra parte straordinariamente complicato se si cerca di codificarlo in una serie di istruzioni esguibili da una macchina. Il primo e principale problema necessario da affrontare per farlo è quello della conoscenza: per operare una disambiguazione sui termini di un testo, un computer deve poter attingere da un lessico che sia il più possibile coerente con quello di un essere umano. Sebbene esistano altri modi di agire in questo caso, quello di creare una fonte di conoscenza machine-readable è certamente il metodo che permette di affrontare il problema in maniera più diretta. Nel corso di questa tesi si cercherà, come prima cosa, di spiegare in cosa consiste la Word Sense Disambiguation, tramite una descrizione breve ma il più possibile dettagliata del problema. Nel capitolo 1 esso viene presentato partendo da alcuni cenni storici, per poi passare alla descrizione dei componenti fondamentali da tenere in considerazione durante il lavoro. Verranno illustrati concetti ripresi in seguito, che spaziano dalla normalizzazione del testo in input fino al riassunto dei metodi di classificazione comunemente usati in questo campo. Il capitolo 2 è invece dedicato alla descrizione di BabelNet, una risorsa lessico-semantica multilingua di recente costruzione nata all'Università La Sapienza di Roma. Verranno innanzitutto descritte le due fonti da cui BabelNet attinge la propria conoscenza, WordNet e Wikipedia. In seguito saranno illustrati i passi della sua creazione, dal mapping tra le due risorse base fino alla definizione di tutte le relazioni che legano gli insiemi di termini all'interno del lessico. Infine viene proposta una serie di esperimenti che mira a mettere BabelNet su un banco di prova, prima per verificare la consistenza del suo metodo di costruzione, poi per confrontarla, in termini di prestazioni, con altri sistemi allo stato dell'arte sottoponendola a diversi task estrapolati dai SemEval, eventi internazionali dedicati alla valutazione dei problemi WSD, che definiscono di fatto gli standard di questo campo. Nel capitolo finale vengono sviluppate alcune considerazioni sulla disambiguazione, introdotte da un elenco dei principali campi applicativi del problema. Vengono in questa sede delineati i possibili sviluppi futuri della ricerca, ma anche i problemi noti e le strade recentemente intraprese per cercare di portare le prestazioni della Word Sense Disambiguation oltre i limiti finora definiti.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study focused on the role of oceanographic discontinuities and the presence of transitional areas in shaping the population structure and the phylogeography of the Raja miraletus species complex, coupled with the test of the effective occurrence of past speciation events. The comparisons between the Atlantic African and the North-Eastern Atlantic-Mediterranean geographic populations were unravelled using both Cytochrome Oxidase I and eight microsatellite loci. This approach guaranteed a robust dataset for the identification of a speciation event between the Atlantic African clade, corresponding to the ex Raja ocellifera nominal species, and the NE Atlantic-Mediterranean R. miraletus clade. As a matter of fact, the origin of the Atlantic Africa and the NE Atlantic-Mediterranean deep split dated about 11.74MYA and was likely due to the synergic influence currents and two upwelling areas crossing the Western African Waters. Within the Mediterranean Sea, particular attention was also paid to the transitional area represented by Adventura and Maltese Bank, that might have contributed in sustaining the connectivity of the Western and the Eastern Mediterranean geographical populations. Furthermore, the geology of the easternmost part of Sicily and the geo-morphological depression of the Calabrian Arc could have driven the differentiation of the Eastern Mediterranean Sea. Although bathymetric and oceanographic discontinuity could represent barriers to dispersal and migration between Eastern and Western Mediterranean samples, a clear and complete genetic separation among them was not detected. Results produced by this work identified a speciation event defining Raja ocellifera and R. miraletus as two different species, and describing the R. miraletus species complex as the most ancient cryptic speciation event in the family Rajidae, representing another example of how strictly connected the environment, the behavioural habits and the evolutionary and ecologic drivers are.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.