937 resultados para Distributed data


Relevância:

30.00% 30.00%

Publicador:

Resumo:

To analyze the characteristics and predict the dynamic behaviors of complex systems over time, comprehensive research to enable the development of systems that can intelligently adapt to the evolving conditions and infer new knowledge with algorithms that are not predesigned is crucially needed. This dissertation research studies the integration of the techniques and methodologies resulted from the fields of pattern recognition, intelligent agents, artificial immune systems, and distributed computing platforms, to create technologies that can more accurately describe and control the dynamics of real-world complex systems. The need for such technologies is emerging in manufacturing, transportation, hazard mitigation, weather and climate prediction, homeland security, and emergency response. Motivated by the ability of mobile agents to dynamically incorporate additional computational and control algorithms into executing applications, mobile agent technology is employed in this research for the adaptive sensing and monitoring in a wireless sensor network. Mobile agents are software components that can travel from one computing platform to another in a network and carry programs and data states that are needed for performing the assigned tasks. To support the generation, migration, communication, and management of mobile monitoring agents, an embeddable mobile agent system (Mobile-C) is integrated with sensor nodes. Mobile monitoring agents visit distributed sensor nodes, read real-time sensor data, and perform anomaly detection using the equipped pattern recognition algorithms. The optimal control of agents is achieved by mimicking the adaptive immune response and the application of multi-objective optimization algorithms. The mobile agent approach provides potential to reduce the communication load and energy consumption in monitoring networks. The major research work of this dissertation project includes: (1) studying effective feature extraction methods for time series measurement data; (2) investigating the impact of the feature extraction methods and dissimilarity measures on the performance of pattern recognition; (3) researching the effects of environmental factors on the performance of pattern recognition; (4) integrating an embeddable mobile agent system with wireless sensor nodes; (5) optimizing agent generation and distribution using artificial immune system concept and multi-objective algorithms; (6) applying mobile agent technology and pattern recognition algorithms for adaptive structural health monitoring and driving cycle pattern recognition; (7) developing a web-based monitoring network to enable the visualization and analysis of real-time sensor data remotely. Techniques and algorithms developed in this dissertation project will contribute to research advances in networked distributed systems operating under changing environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Over the last decade, there has been a trend where water utility companies aim to make water distribution networks more intelligent in order to improve their quality of service, reduce water waste, minimize maintenance costs etc., by incorporating IoT technologies. Current state of the art solutions use expensive power hungry deployments to monitor and transmit water network states periodically in order to detect anomalous behaviors such as water leakage and bursts. However, more than 97% of water network assets are remote away from power and are often in geographically remote underpopulated areas, facts that make current approaches unsuitable for next generation more dynamic adaptive water networks. Battery-driven wireless sensor/actuator based solutions are theoretically the perfect choice to support next generation water distribution. In this paper, we present an end-to-end water leak localization system, which exploits edge processing and enables the use of battery-driven sensor nodes. Our system combines a lightweight edge anomaly detection algorithm based on compression rates and an efficient localization algorithm based on graph theory. The edge anomaly detection and localization elements of the systems produce a timely and accurate localization result and reduce the communication by 99% compared to the traditional periodic communication. We evaluated our schemes by deploying non-intrusive sensors measuring vibrational data on a real-world water test rig that have had controlled leakage and burst scenarios implemented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a distributed hierarchical multiagent architecture for detecting SQL injection attacks against databases. It uses a novel strategy, which is supported by a Case-Based Reasoning mechanism, which provides to the classifier agents with a great capacity of learning and adaptation to face this type of attack. The architecture combines strategies of intrusion detection systems such as misuse detection and anomaly detection. It has been tested and the results are presented in this paper.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data sources are often dispersed geographically in real life applications. Finding a knowledge model may require to join all the data sources and to run a machine learning algorithm on the joint set. We present an alternative based on a Multi Agent System (MAS): an agent mines one data source in order to extract a local theory (knowledge model) and then merges it with the previous MAS theory using a knowledge fusion technique. This way, we obtain a global theory that summarizes the distributed knowledge without spending resources and time in joining data sources. New experiments have been executed including statistical significance analysis. The results show that, as a result of knowledge fusion, the accuracy of initial theories is significantly improved as well as the accuracy of the monolithic solution.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The findings in this summary are based on the Iowa Barriers to Prenatal Care project. Ongoing since 1991, the purpose of this project is to obtain brief, accurate information about women delivering babies in Iowa hospitals. Specifically, the project seeks to learn about women’s experiences getting prenatal or delivery care during their current pregnancy. Other information is included which may be pertinent to health planners or those concerned with the systematic development of health care services. This project is a cooperative venture of all of Iowa’s maternity hospitals, the University of Northern Iowa Center for Social and Behavioral Research, and the Iowa Department of Public Health. The Robert Wood Johnson Foundation funded the first three years of this project. The current funding is provided by the Iowa Department of Public Health. The Director is Dr. Mary Losch, University of Northern Iowa Center for Social and Behavioral Research. The Coordinator for the project is Rodney Muilenburg. The questionnaire is distributed to nearly ninety maternity hospitals across the state of Iowa. Nursing staff or those responsible for obtaining birth certificate information in the obstetrics unit are responsible for approaching all birth mothers prior to dismissal and requesting their participation in the study. The questionnaire takes approximately ten minutes to complete. Completed questionnaires are returned to the University of Northern Iowa Center for Social and Behavioral Research for data entry and analysis. Returns are made monthly, weekly, or biweekly depending on the number of births per week in a given hospital. Except in the case of a mother who is too ill to complete the questionnaire, all mothers are eligible to be recruited for participation. The present yearly report includes an analysis of large Iowa cities, African American mothers, and a trend analysis of the last ten years. Also presented in this report is a frequency analysis of all variables included in the 2012 questionnaire. Unless otherwise noted, all entries reflect percentages. Please note that because percentages were rounded, total values may not equal 100%. Data presented are based upon 2012 questionnaires received to date (n = 23,674). All analyses reflect unweighted percentages of those responding.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Al giorno d'oggi il reinforcement learning ha dimostrato di essere davvero molto efficace nel machine learning in svariati campi, come ad esempio i giochi, il riconoscimento vocale e molti altri. Perciò, abbiamo deciso di applicare il reinforcement learning ai problemi di allocazione, in quanto sono un campo di ricerca non ancora studiato con questa tecnica e perchè questi problemi racchiudono nella loro formulazione un vasto insieme di sotto-problemi con simili caratteristiche, per cui una soluzione per uno di essi si estende ad ognuno di questi sotto-problemi. In questo progetto abbiamo realizzato un applicativo chiamato Service Broker, il quale, attraverso il reinforcement learning, apprende come distribuire l'esecuzione di tasks su dei lavoratori asincroni e distribuiti. L'analogia è quella di un cloud data center, il quale possiede delle risorse interne - possibilmente distribuite nella server farm -, riceve dei tasks dai suoi clienti e li esegue su queste risorse. L'obiettivo dell'applicativo, e quindi del data center, è quello di allocare questi tasks in maniera da minimizzare il costo di esecuzione. Inoltre, al fine di testare gli agenti del reinforcement learning sviluppati è stato creato un environment, un simulatore, che permettesse di concentrarsi nello sviluppo dei componenti necessari agli agenti, invece che doversi anche occupare di eventuali aspetti implementativi necessari in un vero data center, come ad esempio la comunicazione con i vari nodi e i tempi di latenza di quest'ultima. I risultati ottenuti hanno dunque confermato la teoria studiata, riuscendo a ottenere prestazioni migliori di alcuni dei metodi classici per il task allocation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The modern industrial environment is populated by a myriad of intelligent devices that collaborate for the accomplishment of the numerous business processes in place at the production sites. The close collaboration between humans and work machines poses new interesting challenges that industry must overcome in order to implement the new digital policies demanded by the industrial transition. The Industry 5.0 movement is a companion revolution of the previous Industry 4.0, and it relies on three characteristics that any industrial sector should have and pursue: human centrality, resilience, and sustainability. The application of the fifth industrial revolution cannot be completed without moving from the implementation of Industry 4.0-enabled platforms. The common feature found in the development of this kind of platform is the need to integrate the Information and Operational layers. Our thesis work focuses on the implementation of a platform addressing all the digitization features foreseen by the fourth industrial revolution, making the IT/OT convergence inside production plants an improvement and not a risk. Furthermore, we added modular features to our platform enabling the Industry 5.0 vision. We favored the human centrality using the mobile crowdsensing techniques and the reliability and sustainability using pluggable cloud computing services, combined with data coming from the crowd support. We achieved important and encouraging results in all the domains in which we conducted our experiments. Our IT/OT convergence-enabled platform exhibits the right performance needed to satisfy the strict requirements of production sites. The multi-layer capability of the framework enables the exploitation of data not strictly coming from work machines, allowing a more strict interaction between the company, its employees, and customers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The General Data Protection Regulation (GDPR) has been designed to help promote a view in favor of the interests of individuals instead of large corporations. However, there is the need of more dedicated technologies that can help companies comply with GDPR while enabling people to exercise their rights. We argue that such a dedicated solution must address two main issues: the need for more transparency towards individuals regarding the management of their personal information and their often hindered ability to access and make interoperable personal data in a way that the exercise of one's rights would result in straightforward. We aim to provide a system that helps to push personal data management towards the individual's control, i.e., a personal information management system (PIMS). By using distributed storage and decentralized computing networks to control online services, users' personal information could be shifted towards those directly concerned, i.e., the data subjects. The use of Distributed Ledger Technologies (DLTs) and Decentralized File Storage (DFS) as an implementation of decentralized systems is of paramount importance in this case. The structure of this dissertation follows an incremental approach to describing a set of decentralized systems and models that revolves around personal data and their subjects. Each chapter of this dissertation builds up the previous one and discusses the technical implementation of a system and its relation with the corresponding regulations. We refer to the EU regulatory framework, including GDPR, eIDAS, and Data Governance Act, to build our final system architecture's functional and non-functional drivers. In our PIMS design, personal data is kept in a Personal Data Space (PDS) consisting of encrypted personal data referring to the subject stored in a DFS. On top of that, a network of authorization servers acts as a data intermediary to provide access to potential data recipients through smart contracts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of modern ICT technologies is radically changing many fields pushing toward more open and dynamic value chains fostering the cooperation and integration of many connected partners, sensors, and devices. As a valuable example, the emerging Smart Tourism field derived from the application of ICT to Tourism so to create richer and more integrated experiences, making them more accessible and sustainable. From a technological viewpoint, a recurring challenge in these decentralized environments is the integration of heterogeneous services and data spanning multiple administrative domains, each possibly applying different security/privacy policies, device and process control mechanisms, service access, and provisioning schemes, etc. The distribution and heterogeneity of those sources exacerbate the complexity in the development of integrating solutions with consequent high effort and costs for partners seeking them. Taking a step towards addressing these issues, we propose APERTO, a decentralized and distributed architecture that aims at facilitating the blending of data and services. At its core, APERTO relies on APERTO FaaS, a Serverless platform allowing fast prototyping of the business logic, lowering the barrier of entry and development costs to newcomers, (zero) fine-grained scaling of resources servicing end-users, and reduced management overhead. APERTO FaaS infrastructure is based on asynchronous and transparent communications between the components of the architecture, allowing the development of optimized solutions that exploit the peculiarities of distributed and heterogeneous environments. In particular, APERTO addresses the provisioning of scalable and cost-efficient mechanisms targeting: i) function composition allowing the definition of complex workloads from simple, ready-to-use functions, enabling smarter management of complex tasks and improved multiplexing capabilities; ii) the creation of end-to-end differentiated QoS slices minimizing interfaces among application/service running on a shared infrastructure; i) an abstraction providing uniform and optimized access to heterogeneous data sources, iv) a decentralized approach for the verification of access rights to resources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of my thesis work is to exploit the Google native and open-source platform Kubeflow, specifically using Kubeflow pipelines, to execute a Federated Learning scalable ML process in a 5G-like and simplified test architecture hosting a Kubernetes cluster and apply the largely adopted FedAVG algorithm and FedProx its optimization empowered by the ML platform ‘s abilities to ease the development and production cycle of this specific FL process. FL algorithms are more are and more promising and adopted both in Cloud application development and 5G communication enhancement through data coming from the monitoring of the underlying telco infrastructure and execution of training and data aggregation at edge nodes to optimize the global model of the algorithm ( that could be used for example for resource provisioning to reach an agreed QoS for the underlying network slice) and after a study and a research over the available papers and scientific articles related to FL with the help of the CTTC that suggests me to study and use Kubeflow to bear the algorithm we found out that this approach for the whole FL cycle deployment was not documented and may be interesting to investigate more in depth. This study may lead to prove the efficiency of the Kubeflow platform itself for this need of development of new FL algorithms that will support new Applications and especially test the FedAVG algorithm performances in a simulated client to cloud communication using a MNIST dataset for FL as benchmark.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

LHC experiments produce an enormous amount of data, estimated of the order of a few PetaBytes per year. Data management takes place using the Worldwide LHC Computing Grid (WLCG) grid infrastructure, both for storage and processing operations. However, in recent years, many more resources are available on High Performance Computing (HPC) farms, which generally have many computing nodes with a high number of processors. Large collaborations are working to use these resources in the most efficient way, compatibly with the constraints imposed by computing models (data distributed on the Grid, authentication, software dependencies, etc.). The aim of this thesis project is to develop a software framework that allows users to process a typical data analysis workflow of the ATLAS experiment on HPC systems. The developed analysis framework shall be deployed on the computing resources of the Open Physics Hub project and on the CINECA Marconi100 cluster, in view of the switch-on of the Leonardo supercomputer, foreseen in 2023.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The idea of Grid Computing originated in the nineties and found its concrete applications in contexts like the SETI@home project where a lot of computers (offered by volunteers) cooperated, performing distributed computations, inside the Grid environment analyzing radio signals trying to find extraterrestrial life. The Grid was composed of traditional personal computers but, with the emergence of the first mobile devices like Personal Digital Assistants (PDAs), researchers started theorizing the inclusion of mobile devices into Grid Computing; although impressive theoretical work was done, the idea was discarded due to the limitations (mainly technological) of mobile devices available at the time. Decades have passed, and now mobile devices are extremely more performant and numerous than before, leaving a great amount of resources available on mobile devices, such as smartphones and tablets, untapped. Here we propose a solution for performing distributed computations over a Grid Computing environment that utilizes both desktop and mobile devices, exploiting the resources from day-to-day mobile users that alternatively would end up unused. The work starts with an introduction on what Grid Computing is, the evolution of mobile devices, the idea of integrating such devices into the Grid and how to convince device owners to participate in the Grid. Then, the tone becomes more technical, starting with an explanation on how Grid Computing actually works, followed by the technical challenges of integrating mobile devices into the Grid. Next, the model, which constitutes the solution offered by this study, is explained, followed by a chapter regarding the realization of a prototype that proves the feasibility of distributed computations over a Grid composed by both mobile and desktop devices. To conclude future developments and ideas to improve this project are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

I sistemi decentralizzati hanno permesso agli utenti di condividere informazioni senza la presenza di un intermediario centralizzato che possiede la sovranità sui dati scambiati, rischi di sicurezza e la possibilità di colli di bottiglia. Tuttavia, sono rari i sistemi pratici per il recupero delle informazioni salvate su di essi che non includano una componente centralizzata. In questo lavoro di tesi viene presentato lo sviluppo di un'applicazione il cui scopo è quello di consentire agli utenti di caricare immagini in un'architettura totalmente decentralizzata, grazie ai Decentralized File Storage e alla successiva ricerca e recupero di tali oggetti attraverso una Distributed Hash Table (DHT) in cui sono memorizzati i necessari Content IDentifiers (CID).\\ L'obiettivo principale è stato quello di trovare una migliore allocazione delle immagini all'interno del DHT attraverso l'uso dell'International Standard Content Code (ISCC), ovvero uno standard ISO che, attraverso funzioni hash content-driven, locality-sensitive e similarity-preserving, assegna i CID IPFS delle immagini ai nodi del DHT in modo efficiente, per ridurre il più possibile i salti tra i nodi e recuperare immagini coerenti con la query eseguita. Verranno, poi, analizzati i risultati ottenuti dall'allocazione dei CID delle immagini nei nodi mettendo a confronto ISCC e hash crittografico SHA-256, per verificare se ISCC rappresenti meglio la somiglianza tra le immagini allocando le immagini simili in nodi vicini tra loro.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.