2 resultados para Metric Embeddings

em AMS Tesi di Laurea - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Internet traffic classification is a relevant and mature research field, anyway of growing importance and with still open technical challenges, also due to the pervasive presence of Internet-connected devices into everyday life. We claim the need for innovative traffic classification solutions capable of being lightweight, of adopting a domain-based approach, of not only concentrating on application-level protocol categorization but also classifying Internet traffic by subject. To this purpose, this paper originally proposes a classification solution that leverages domain name information extracted from IPFIX summaries, DNS logs, and DHCP leases, with the possibility to be applied to any kind of traffic. Our proposed solution is based on an extension of Word2vec unsupervised learning techniques running on a specialized Apache Spark cluster. In particular, learning techniques are leveraged to generate word-embeddings from a mixed dataset composed by domain names and natural language corpuses in a lightweight way and with general applicability. The paper also reports lessons learnt from our implementation and deployment experience that demonstrates that our solution can process 5500 IPFIX summaries per second on an Apache Spark cluster with 1 slave instance in Amazon EC2 at a cost of $ 3860 year. Reported experimental results about Precision, Recall, F-Measure, Accuracy, and Cohen's Kappa show the feasibility and effectiveness of the proposal. The experiments prove that words contained in domain names do have a relation with the kind of traffic directed towards them, therefore using specifically trained word embeddings we are able to classify them in customizable categories. We also show that training word embeddings on larger natural language corpuses leads improvements in terms of precision up to 180%.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Increasing in resolution of numerical weather prediction models has allowed more and more realistic forecasts of atmospheric parameters. Due to the growing variability into predicted fields the traditional verification methods are not always able to describe the model ability because they are based on a grid-point-by-grid-point matching between observation and prediction. Recently, new spatial verification methods have been developed with the aim of show the benefit associated to the high resolution forecast. Nested in among of the MesoVICT international project, the initially aim of this work is to compare the newly tecniques remarking advantages and disadvantages. First of all, the MesoVICT basic examples, represented by synthetic precipitation fields, have been examined. Giving an error evaluation in terms of structure, amplitude and localization of the precipitation fields, the SAL method has been studied more thoroughly respect to the others approaches with its implementation in the core cases of the project. The verification procedure has concerned precipitation fields over central Europe: comparisons between the forecasts performed by the 00z COSMO-2 model and the VERA (Vienna Enhanced Resolution Analysis) have been done. The study of these cases has shown some weaknesses of the methodology examined; in particular has been highlighted the presence of a correlation between the optimal domain size and the extention of the precipitation systems. In order to increase ability of SAL, a subdivision of the original domain in three subdomains has been done and the method has been applied again. Some limits have been found in cases in which at least one of the two domains does not show precipitation. The overall results for the subdomains have been summarized on scatter plots. With the aim to identify systematic errors of the model the variability of the three parameters has been studied for each subdomain.