917 resultados para LHC,CMS,Big Data
Resumo:
With the advent of new technologies it is increasingly easier to find data of different nature from even more accurate sensors that measure the most disparate physical quantities and with different methodologies. The collection of data thus becomes progressively important and takes the form of archiving, cataloging and online and offline consultation of information. Over time, the amount of data collected can become so relevant that it contains information that cannot be easily explored manually or with basic statistical techniques. The use of Big Data therefore becomes the object of more advanced investigation techniques, such as Machine Learning and Deep Learning. In this work some applications in the world of precision zootechnics and heat stress accused by dairy cows are described. Experimental Italian and German stables were involved for the training and testing of the Random Forest algorithm, obtaining a prediction of milk production depending on the microclimatic conditions of the previous days with satisfactory accuracy. Furthermore, in order to identify an objective method for identifying production drops, compared to the Wood model, typically used as an analytical model of the lactation curve, a Robust Statistics technique was used. Its application on some sample lactations and the results obtained allow us to be confident about the use of this method in the future.
Resumo:
The thesis represents the conclusive outcome of the European Joint Doctorate programmein Law, Science & Technology funded by the European Commission with the instrument Marie Skłodowska-Curie Innovative Training Networks actions inside of the H2020, grantagreement n. 814177. The tension between data protection and privacy from one side, and the need of granting further uses of processed personal datails is investigated, drawing the lines of the technological development of the de-anonymization/re-identification risk with an explorative survey. After acknowledging its span, it is questioned whether a certain degree of anonymity can still be granted focusing on a double perspective: an objective and a subjective perspective. The objective perspective focuses on the data processing models per se, while the subjective perspective investigates whether the distribution of roles and responsibilities among stakeholders can ensure data anonymity.
Resumo:
This thesis investigates the legal, ethical, technical, and psychological issues of general data processing and artificial intelligence practices and the explainability of AI systems. It consists of two main parts. In the initial section, we provide a comprehensive overview of the big data processing ecosystem and the main challenges we face today. We then evaluate the GDPR’s data privacy framework in the European Union. The Trustworthy AI Framework proposed by the EU’s High-Level Expert Group on AI (AI HLEG) is examined in detail. The ethical principles for the foundation and realization of Trustworthy AI are analyzed along with the assessment list prepared by the AI HLEG. Then, we list the main big data challenges the European researchers and institutions identified and provide a literature review on the technical and organizational measures to address these challenges. A quantitative analysis is conducted on the identified big data challenges and the measures to address them, which leads to practical recommendations for better data processing and AI practices in the EU. In the subsequent part, we concentrate on the explainability of AI systems. We clarify the terminology and list the goals aimed at the explainability of AI systems. We identify the reasons for the explainability-accuracy trade-off and how we can address it. We conduct a comparative cognitive analysis between human reasoning and machine-generated explanations with the aim of understanding how explainable AI can contribute to human reasoning. We then focus on the technical and legal responses to remedy the explainability problem. In this part, GDPR’s right to explanation framework and safeguards are analyzed in-depth with their contribution to the realization of Trustworthy AI. Then, we analyze the explanation techniques applicable at different stages of machine learning and propose several recommendations in chronological order to develop GDPR-compliant and Trustworthy XAI systems.
Resumo:
Big data and AI are paving the way to promising scenarios in clinical practice and research. However, the use of such technologies might clash with GDPR requirements. Today, two forces are driving the EU policies in this domain. The first is the necessity to protect individuals’ safety and fundamental rights. The second is to incentivize the deployment of innovative technologies. The first objective is pursued by legislative acts such as the GDPR or the AIA, the second is supported by the new data strategy recently launched by the European Commission. Against this background, the thesis analyses the issue of GDPR compliance when big data and AI systems are implemented in the health domain. The thesis focuses on the use of co-regulatory tools for compliance with the GDPR. This work argues that there are two level of co-regulation in the EU legal system. The first, more general, is the approach pursued by the EU legislator when shaping legislative measures that deal with fast-evolving technologies. The GDPR can be deemed a co-regulatory solution since it mainly introduces general requirements, which implementation shall then be interpretated by the addressee of the law following a risk-based approach. This approach, although useful is costly and sometimes burdensome for organisations. The second co-regulatory level is represented by specific co-regulatory tools, such as code of conduct and certification mechanisms. These tools are meant to guide and support the interpretation effort of the addressee of the law. The thesis argues that the lack of co-regulatory tools which are supposed to implement data protection law in specific situations could be an obstacle to the deployment of innovative solutions in complex scenario such as the health ecosystem. The thesis advances hypothesis on theoretical level about the reasons of such a lack of co-regulatory solutions.
Resumo:
L’argomento di questa tesi nasce dall’idea di unire due temi che stanno assumendo sempre più importanza nei nostri giorni, ovvero l’economia circolare e i big data, e ha come obiettivo quello di fornire dei punti di collegamento tra questi due. In un mondo tecnologico come quello di oggi, che sta trasformando tutto quello che abbiamo tra le nostre mani in digitale, si stanno svolgendo sempre più studi per capire come la sostenibilità possa essere supportata dalle tecnologie emergenti. L’economia circolare costituisce un nuovo paradigma economico in grado di sostituirsi a modelli di crescita incentrati su una visione lineare, puntando ad una riduzione degli sprechi e ad un radicale ripensamento nella concezione dei prodotti e nel loro uso nel tempo. In questa transizione verso un’economia circolare può essere utile considerare di assumere le nuove tecnologie emergenti per semplificare i processi di produzione e attuare politiche più sostenibili, che stanno diventando sempre più apprezzate anche dai consumatori. Il tutto verrà sostenuto dall’utilizzo sempre più significativo dei big data, ovvero di grandi dati ricchi di informazioni che permettono, tramite un’attenta analisi, di sviluppare piani di produzione che seguono il paradigma circolare: questo viene attuato grazie ai nuovi sistemi digitali sempre più innovativi e alle figure specializzate che acquisiscono sempre più conoscenze in questo campo.
Resumo:
In this thesis work, a cosmic-ray telescope was set up in the INFN laboratories in Bologna using smaller size replicas of CMS Drift Tubes chambers, called MiniDTs, to test and develop new electronics for the CMS Phase-2 upgrade. The MiniDTs were assembled in INFN National Laboratory in Legnaro, Italy. Scintillator tiles complete the telescope, providing a signal independent of the MiniDTs for offline analysis. The telescope readout is a test system for the CMS Phase-2 upgrade data acquisition design. The readout is based on the early prototype of a radiation-hard FPGA-based board developed for the High Luminosity LHC CMS upgrade, called On Board electronics for Drift Tubes. Once the set-up was operational, we developed an online monitor to display in real-time the most important observables to check the quality of the data acquisition. We performed an offline analysis of the collected data using a custom version of CMS software tools, which allowed us to estimate the time pedestal and drift velocity in each chamber, evaluate the efficiency of the different DT cells, and measure the space and time resolution of the telescope system.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
Resumo:
In questo lavoro di tesi è stato studiato lo spettro di massa invariante del sistema J/psi pi+ pi-, m(J/psi pi+ pi-), in collisioni protone-protone a LHC, con energia nel centro di massa sqrt(s)) pari a 8 TeV, alla ricerca di nuovi stati adronici. Lo studio è stato effettuato su un campione di dati raccolti da CMS in tutto il 2012, corrispondente ad una luminosità integrata di 18.6 fb-1. Lo spettro di massa invariante m(J/psi pi+ pi-), è stato ricostruito selezionando gli eventi J/psi->mu+ mu- associati a due tracce cariche di segno opposto, assunte essere pioni, provenienti da uno stesso vertice di interazione. Nonostante l'alta statistica a disposizione e l'ampia regione di massa invariante tra 3.6 e 6.0 GeV/c^2 osservata, sono state individuate solo risonanze già note: la risonanza psi(2S) del charmonio, lo stato X(3872) ed una struttura più complessa nella regione attorno a 5 GeV/c^2, che è caratteristica della massa dei mesoni contenenti il quark beauty (mesoni B). Al fine di identificare la natura di tale struttura, è stato necessario ottenere un campione di eventi arricchito in adroni B. È stata effettuata una selezione basata sull'elevata lunghezza di decadimento, che riflette la caratteristica degli adroni B di avere una vita media relativamente lunga (ordine dei picosecondi) rispetto ad altri adroni. Dal campione così ripulito, è stato possibile distinguere tre sottostrutture nello spettro di massa invariante in esame: una a 5.36 GeV/c^2, identificata come i decadimenti B^0_s-> J/psi pi+ pi-, un'altra a 5.28 GeV/c^2 come i candidati B^0-> J/psi pi+ pi- e un'ultima allargata tra 5.1 e 5.2 GeV/c^2 data da effetti di riflessione degli scambi tra pioni e kaoni. Quest'ultima struttura è stata identificata come totalmente costituita di una combinazione di eventi B^0-> J/psi K+ pi- e B^0_s-> J/psi K+ K-.
Resumo:
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
We are living in the era of Big Data. A time which is characterized by the continuous creation of vast amounts of data, originated from different sources, and with different formats. First, with the rise of the social networks and, more recently, with the advent of the Internet of Things (IoT), in which everyone and (eventually) everything is linked to the Internet, data with enormous potential for organizations is being continuously generated. In order to be more competitive, organizations want to access and explore all the richness that is present in those data. Indeed, Big Data is only as valuable as the insights organizations gather from it to make better decisions, which is the main goal of Business Intelligence. In this paper we describe an experiment in which data obtained from a NoSQL data source (database technology explicitly developed to deal with the specificities of Big Data) is used to feed a Business Intelligence solution.
Resumo:
This work is dedicated to comparison of open source as well as proprietary transport protocols for highspeed data transmission via IP networks. The contemporary common TCP needs significant improvement since it was developed as general-purpose transport protocol and firstly introduced four decades ago. In nowadays networks, TCP fits not all communication needs that society has. Caused of it another transport protocols have been developed and successfully used for e.g. Big Data movement. In scope of this research the following protocols have been investigated for its efficiency on 10Gbps links: UDT, RBUDP, MTP and RWTP. The protocols were tested under different impairments such as Round Trip Time up to 400 ms and packet losses up to 2%. Investigated parameters are the data rate under different conditions of the network, the CPU load by sender andreceiver during the experiments, size of feedback data, CPU usage per Gbps and the amount of feedback data per GiByte of effectively transmitted data. The best performance and fair resources consumption was observed by RWTP. From the opensource projects, the best behavior is showed by RBUDP.