14 resultados para Distributed data
em AMS Tesi di Dottorato - Alm@DL - Universit
Resumo:
The miniaturization race in the hardware industry aiming at continuous increasing of transistor density on a die does not bring respective application performance improvements any more. One of the most promising alternatives is to exploit a heterogeneous nature of common applications in hardware. Supported by reconfigurable computation, which has already proved its efficiency in accelerating data intensive applications, this concept promises a breakthrough in contemporary technology development. Memory organization in such heterogeneous reconfigurable architectures becomes very critical. Two primary aspects introduce a sophisticated trade-off. On the one hand, a memory subsystem should provide well organized distributed data structure and guarantee the required data bandwidth. On the other hand, it should hide the heterogeneous hardware structure from the end-user, in order to support feasible high-level programmability of the system. This thesis work explores the heterogeneous reconfigurable hardware architectures and presents possible solutions to cope the problem of memory organization and data structure. By the example of the MORPHEUS heterogeneous platform, the discussion follows the complete design cycle, starting from decision making and justification, until hardware realization. Particular emphasis is made on the methods to support high system performance, meet application requirements, and provide a user-friendly programmer interface. As a result, the research introduces a complete heterogeneous platform enhanced with a hierarchical memory organization, which copes with its task by means of separating computation from communication, providing reconfigurable engines with computation and configuration data, and unification of heterogeneous computational devices using local storage buffers. It is distinguished from the related solutions by distributed data-flow organization, specifically engineered mechanisms to operate with data on local domains, particular communication infrastructure based on Network-on-Chip, and thorough methods to prevent computation and communication stalls. In addition, a novel advanced technique to accelerate memory access was developed and implemented.
Resumo:
With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.
Resumo:
The Internet of Vehicles (IoV) paradigm has emerged in recent times, where with the support of technologies like the Internet of Things and V2X , Vehicular Users (VUs) can access different services through internet connectivity. With the support of 6G technology, the IoV paradigm will evolve further and converge into a fully connected and intelligent vehicular system. However, this brings new challenges over dynamic and resource-constrained vehicular systems, and advanced solutions are demanded. This dissertation analyzes the future 6G enabled IoV systems demands, corresponding challenges, and provides various solutions to address them. The vehicular services and application requests demands proper data processing solutions with the support of distributed computing environments such as Vehicular Edge Computing (VEC). While analyzing the performance of VEC systems it is important to take into account the limited resources, coverage, and vehicular mobility into account. Recently, Non terrestrial Networks (NTN) have gained huge popularity for boosting the coverage and capacity of terrestrial wireless networks. Integrating such NTN facilities into the terrestrial VEC system can address the above mentioned challenges. Additionally, such integrated Terrestrial and Non-terrestrial networks (T-NTN) can also be considered to provide advanced intelligent solutions with the support of the edge intelligence paradigm. In this dissertation, we proposed an edge computing-enabled joint T-NTN-based vehicular system architecture to serve VUs. Next, we analyze the terrestrial VEC systems performance for VUs data processing problems and propose solutions to improve the performance in terms of latency and energy costs. Next, we extend the scenario toward the joint T-NTN system and address the problem of distributed data processing through ML-based solutions. We also proposed advanced distributed learning frameworks with the support of a joint T-NTN framework with edge computing facilities. In the end, proper conclusive remarks and several future directions are provided for the proposed solutions.
Resumo:
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.
Resumo:
The recent trend in Web services is fostering a computing scenario where loosely coupled parties interact in a distributed and dynamic environment. Such interactions are sequences of xml messages and in order to assemble parties – either statically or dynamically – it is important to verify that the “contracts” of the parties are “compatible”. The Web Service Description Language (wsdl) is a standard used for describing one-way (asynchronous) and request/response (synchronous) interactions. Web Service Conversation Language extends wscl contracts by allowing the description of arbitrary, possibly cyclic sequences of exchanged messages between communicating parties. Unfortunately, neither wsdl nor wscl can effectively define a notion of compatibility, for the very simple reason that they do not provide any formal characterization of their contract languages. We define two contract languages for Web services. The first one is a data contract language and allow us to describe a Web service in terms of messages (xml documents) that can be sent or received. The second one is a behavioral contract language and allow us to give an abstract definition of the Web service conversation protocol. Both these languages are equipped with a sort of “sub-typing” relation and, therefore, they are suitable to be used for querying Web services repositories. In particular a query for a service compatible with a given contract may safely return services with “greater” contract.
Resumo:
By the end of the 19th century, geodesy has contributed greatly to the knowledge of regional tectonics and fault movement through its ability to measure, at sub-centimetre precision, the relative positions of points on the Earth’s surface. Nowadays the systematic analysis of geodetic measurements in active deformation regions represents therefore one of the most important tool in the study of crustal deformation over different temporal scales [e.g., Dixon, 1991]. This dissertation focuses on motion that can be observed geodetically with classical terrestrial position measurements, particularly triangulation and leveling observations. The work is divided into two sections: an overview of the principal methods for estimating longterm accumulation of elastic strain from terrestrial observations, and an overview of the principal methods for rigorously inverting surface coseismic deformation fields for source geometry with tests on synthetic deformation data sets and applications in two different tectonically active regions of the Italian peninsula. For the long-term accumulation of elastic strain analysis, triangulation data were available from a geodetic network across the Messina Straits area (southern Italy) for the period 1971 – 2004. From resulting angle changes, the shear strain rates as well as the orientation of the principal axes of the strain rate tensor were estimated. The computed average annual shear strain rates for the time period between 1971 and 2004 are γ˙1 = 113.89 ± 54.96 nanostrain/yr and γ˙2 = -23.38 ± 48.71 nanostrain/yr, with the orientation of the most extensional strain (θ) at N140.80° ± 19.55°E. These results suggests that the first-order strain field of the area is dominated by extension in the direction perpendicular to the trend of the Straits, sustaining the hypothesis that the Messina Straits could represents an area of active concentrated deformation. The orientation of θ agree well with GPS deformation estimates, calculated over shorter time interval, and is consistent with previous preliminary GPS estimates [D’Agostino and Selvaggi, 2004; Serpelloni et al., 2005] and is also similar to the direction of the 1908 (MW 7.1) earthquake slip vector [e.g., Boschi et al., 1989; Valensise and Pantosti, 1992; Pino et al., 2000; Amoruso et al., 2002]. Thus, the measured strain rate can be attributed to an active extension across the Messina Straits, corresponding to a relative extension rate ranges between < 1mm/yr and up to ~ 2 mm/yr, within the portion of the Straits covered by the triangulation network. These results are consistent with the hypothesis that the Messina Straits is an important active geological boundary between the Sicilian and the Calabrian domains and support previous preliminary GPS-based estimates of strain rates across the Straits, which show that the active deformation is distributed along a greater area. Finally, the preliminary dislocation modelling has shown that, although the current geodetic measurements do not resolve the geometry of the dislocation models, they solve well the rate of interseismic strain accumulation across the Messina Straits and give useful information about the locking the depth of the shear zone. Geodetic data, triangulation and leveling measurements of the 1976 Friuli (NE Italy) earthquake, were available for the inversion of coseismic source parameters. From observed angle and elevation changes, the source parameters of the seismic sequence were estimated in a join inversion using an algorithm called “simulated annealing”. The computed optimal uniform–slip elastic dislocation model consists of a 30° north-dipping shallow (depth 1.30 ± 0.75 km) fault plane with azimuth of 273° and accommodating reverse dextral slip of about 1.8 m. The hypocentral location and inferred fault plane of the main event are then consistent with the activation of Periadriatic overthrusts or other related thrust faults as the Gemona- Kobarid thrust. Then, the geodetic data set exclude the source solution of Aoudia et al. [2000], Peruzza et al. [2002] and Poli et al. [2002] that considers the Susans-Tricesimo thrust as the May 6 event. The best-fit source model is then more consistent with the solution of Pondrelli et al. [2001], which proposed the activation of other thrusts located more to the North of the Susans-Tricesimo thrust, probably on Periadriatic related thrust faults. The main characteristics of the leveling and triangulation data are then fit by the optimal single fault model, that is, these results are consistent with a first-order rupture process characterized by a progressive rupture of a single fault system. A single uniform-slip fault model seems to not reproduce some minor complexities of the observations, and some residual signals that are not modelled by the optimal single-fault plane solution, were observed. In fact, the single fault plane model does not reproduce some minor features of the leveling deformation field along the route 36 south of the main uplift peak, that is, a second fault seems to be necessary to reproduce these residual signals. By assuming movements along some mapped thrust located southward of the inferred optimal single-plane solution, the residual signal has been successfully modelled. In summary, the inversion results presented in this Thesis, are consistent with the activation of some Periadriatic related thrust for the main events of the sequence, and with a minor importance of the southward thrust systems of the middle Tagliamento plain.
Resumo:
The research activity carried out during the PhD course in Electrical Engineering belongs to the branch of electric and electronic measurements. The main subject of the present thesis is a distributed measurement system to be installed in Medium Voltage power networks, as well as the method developed to analyze data acquired by the measurement system itself and to monitor power quality. In chapter 2 the increasing interest towards power quality in electrical systems is illustrated, by reporting the international research activity inherent to the problem and the relevant standards and guidelines emitted. The aspect of the quality of voltage provided by utilities and influenced by customers in the various points of a network came out only in recent years, in particular as a consequence of the energy market liberalization. Usually, the concept of quality of the delivered energy has been associated mostly to its continuity. Hence the reliability was the main characteristic to be ensured for power systems. Nowadays, the number and duration of interruptions are the “quality indicators” commonly perceived by most customers; for this reason, a short section is dedicated also to network reliability and its regulation. In this contest it should be noted that although the measurement system developed during the research activity belongs to the field of power quality evaluation systems, the information registered in real time by its remote stations can be used to improve the system reliability too. Given the vast scenario of power quality degrading phenomena that usually can occur in distribution networks, the study has been focused on electromagnetic transients affecting line voltages. The outcome of such a study has been the design and realization of a distributed measurement system which continuously monitor the phase signals in different points of a network, detect the occurrence of transients superposed to the fundamental steady state component and register the time of occurrence of such events. The data set is finally used to locate the source of the transient disturbance propagating along the network lines. Most of the oscillatory transients affecting line voltages are due to faults occurring in any point of the distribution system and have to be seen before protection equipment intervention. An important conclusion is that the method can improve the monitored network reliability, since the knowledge of the location of a fault allows the energy manager to reduce as much as possible both the area of the network to be disconnected for protection purposes and the time spent by technical staff to recover the abnormal condition and/or the damage. The part of the thesis presenting the results of such a study and activity is structured as follows: chapter 3 deals with the propagation of electromagnetic transients in power systems by defining characteristics and causes of the phenomena and briefly reporting the theory and approaches used to study transients propagation. Then the state of the art concerning methods to detect and locate faults in distribution networks is presented. Finally the attention is paid on the particular technique adopted for the same purpose during the thesis, and the methods developed on the basis of such approach. Chapter 4 reports the configuration of the distribution networks on which the fault location method has been applied by means of simulations as well as the results obtained case by case. In this way the performance featured by the location procedure firstly in ideal then in realistic operating conditions are tested. In chapter 5 the measurement system designed to implement the transients detection and fault location method is presented. The hardware belonging to the measurement chain of every acquisition channel in remote stations is described. Then, the global measurement system is characterized by considering the non ideal aspects of each device that can concur to the final combined uncertainty on the estimated position of the fault in the network under test. Finally, such parameter is computed according to the Guide to the Expression of Uncertainty in Measurements, by means of a numeric procedure. In the last chapter a device is described that has been designed and realized during the PhD activity aiming at substituting the commercial capacitive voltage divider belonging to the conditioning block of the measurement chain. Such a study has been carried out aiming at providing an alternative to the used transducer that could feature equivalent performance and lower cost. In this way, the economical impact of the investment associated to the whole measurement system would be significantly reduced, making the method application much more feasible.
Resumo:
Recent progress in microelectronic and wireless communications have enabled the development of low cost, low power, multifunctional sensors, which has allowed the birth of new type of networks named wireless sensor networks (WSNs). The main features of such networks are: the nodes can be positioned randomly over a given field with a high density; each node operates both like sensor (for collection of environmental data) as well as transceiver (for transmission of information to the data retrieval); the nodes have limited energy resources. The use of wireless communications and the small size of nodes, make this type of networks suitable for a large number of applications. For example, sensor nodes can be used to monitor a high risk region, as near a volcano; in a hospital they could be used to monitor physical conditions of patients. For each of these possible application scenarios, it is necessary to guarantee a trade-off between energy consumptions and communication reliability. The thesis investigates the use of WSNs in two possible scenarios and for each of them suggests a solution that permits to solve relating problems considering the trade-off introduced. The first scenario considers a network with a high number of nodes deployed in a given geographical area without detailed planning that have to transmit data toward a coordinator node, named sink, that we assume to be located onboard an unmanned aerial vehicle (UAV). This is a practical example of reachback communication, characterized by the high density of nodes that have to transmit data reliably and efficiently towards a far receiver. It is considered that each node transmits a common shared message directly to the receiver onboard the UAV whenever it receives a broadcast message (triggered for example by the vehicle). We assume that the communication channels between the local nodes and the receiver are subject to fading and noise. The receiver onboard the UAV must be able to fuse the weak and noisy signals in a coherent way to receive the data reliably. It is proposed a cooperative diversity concept as an effective solution to the reachback problem. In particular, it is considered a spread spectrum (SS) transmission scheme in conjunction with a fusion center that can exploit cooperative diversity, without requiring stringent synchronization between nodes. The idea consists of simultaneous transmission of the common message among the nodes and a Rake reception at the fusion center. The proposed solution is mainly motivated by two goals: the necessity to have simple nodes (to this aim we move the computational complexity to the receiver onboard the UAV), and the importance to guarantee high levels of energy efficiency of the network, thus increasing the network lifetime. The proposed scheme is analyzed in order to better understand the effectiveness of the approach presented. The performance metrics considered are both the theoretical limit on the maximum amount of data that can be collected by the receiver, as well as the error probability with a given modulation scheme. Since we deal with a WSN, both of these performance are evaluated taking into consideration the energy efficiency of the network. The second scenario considers the use of a chain network for the detection of fires by using nodes that have a double function of sensors and routers. The first one is relative to the monitoring of a temperature parameter that allows to take a local binary decision of target (fire) absent/present. The second one considers that each node receives a decision made by the previous node of the chain, compares this with that deriving by the observation of the phenomenon, and transmits the final result to the next node. The chain ends at the sink node that transmits the received decision to the user. In this network the goals are to limit throughput in each sensor-to-sensor link and minimize probability of error at the last stage of the chain. This is a typical scenario of distributed detection. To obtain good performance it is necessary to define some fusion rules for each node to summarize local observations and decisions of the previous nodes, to get a final decision that it is transmitted to the next node. WSNs have been studied also under a practical point of view, describing both the main characteristics of IEEE802:15:4 standard and two commercial WSN platforms. By using a commercial WSN platform it is realized an agricultural application that has been tested in a six months on-field experimentation.
Resumo:
This thesis presents several data processing and compression techniques capable of addressing the strict requirements of wireless sensor networks. After introducing a general overview of sensor networks, the energy problem is introduced, dividing the different energy reduction approaches according to the different subsystem they try to optimize. To manage the complexity brought by these techniques, a quick overview of the most common middlewares for WSNs is given, describing in detail SPINE2, a framework for data processing in the node environment. The focus is then shifted on the in-network aggregation techniques, used to reduce data sent by the network nodes trying to prolong the network lifetime as long as possible. Among the several techniques, the most promising approach is the Compressive Sensing (CS). To investigate this technique, a practical implementation of the algorithm is compared against a simpler aggregation scheme, deriving a mixed algorithm able to successfully reduce the power consumption. The analysis moves from compression implemented on single nodes to CS for signal ensembles, trying to exploit the correlations among sensors and nodes to improve compression and reconstruction quality. The two main techniques for signal ensembles, Distributed CS (DCS) and Kronecker CS (KCS), are introduced and compared against a common set of data gathered by real deployments. The best trade-off between reconstruction quality and power consumption is then investigated. The usage of CS is also addressed when the signal of interest is sampled at a Sub-Nyquist rate, evaluating the reconstruction performance. Finally the group sparsity CS (GS-CS) is compared to another well-known technique for reconstruction of signals from an highly sub-sampled version. These two frameworks are compared again against a real data-set and an insightful analysis of the trade-off between reconstruction quality and lifetime is given.
Resumo:
The wide diffusion of cheap, small, and portable sensors integrated in an unprecedented large variety of devices and the availability of almost ubiquitous Internet connectivity make it possible to collect an unprecedented amount of real time information about the environment we live in. These data streams, if properly and timely analyzed, can be exploited to build new intelligent and pervasive services that have the potential of improving people's quality of life in a variety of cross concerning domains such as entertainment, health-care, or energy management. The large heterogeneity of application domains, however, calls for a middleware-level infrastructure that can effectively support their different quality requirements. In this thesis we study the challenges related to the provisioning of differentiated quality-of-service (QoS) during the processing of data streams produced in pervasive environments. We analyze the trade-offs between guaranteed quality, cost, and scalability in streams distribution and processing by surveying existing state-of-the-art solutions and identifying and exploring their weaknesses. We propose an original model for QoS-centric distributed stream processing in data centers and we present Quasit, its prototype implementation offering a scalable and extensible platform that can be used by researchers to implement and validate novel QoS-enforcement mechanisms. To support our study, we also explore an original class of weaker quality guarantees that can reduce costs when application semantics do not require strict quality enforcement. We validate the effectiveness of this idea in a practical use-case scenario that investigates partial fault-tolerance policies in stream processing by performing a large experimental study on the prototype of our novel LAAR dynamic replication technique. Our modeling, prototyping, and experimental work demonstrates that, by providing data distribution and processing middleware with application-level knowledge of the different quality requirements associated to different pervasive data flows, it is possible to improve system scalability while reducing costs.
Resumo:
In the era of the Internet of Everything, a user with a handheld or wearable device equipped with sensing capability has become a producer as well as a consumer of information and services. The more powerful these devices get, the more likely it is that they will generate and share content locally, leading to the presence of distributed information sources and the diminishing role of centralized servers. As of current practice, we rely on infrastructure acting as an intermediary, providing access to the data. However, infrastructure-based connectivity might not always be available or the best alternative. Moreover, it is often the case where the data and the processes acting upon them are of local scopus. Answers to a query about a nearby object, an information source, a process, an experience, an ability, etc. could be answered locally without reliance on infrastructure-based platforms. The data might have temporal validity limited to or bounded to a geographical area and/or the social context where the user is immersed in. In this envisioned scenario users could interact locally without the need for a central authority, hence, the claim of an infrastructure-less, provider-less platform. The data is owned by the users and consulted locally as opposed to the current approach of making them available globally and stay on forever. From a technical viewpoint, this network resembles a Delay/Disruption Tolerant Network where consumers and producers might be spatially and temporally decoupled exchanging information with each other in an adhoc fashion. To this end, we propose some novel data gathering and dissemination strategies for use in urban-wide environments which do not rely on strict infrastructure mediation. While preserving the general aspects of our study and without loss of generality, we focus our attention toward practical applicative scenarios which help us capture the characteristics of opportunistic communication networks.
Resumo:
The modern industrial environment is populated by a myriad of intelligent devices that collaborate for the accomplishment of the numerous business processes in place at the production sites. The close collaboration between humans and work machines poses new interesting challenges that industry must overcome in order to implement the new digital policies demanded by the industrial transition. The Industry 5.0 movement is a companion revolution of the previous Industry 4.0, and it relies on three characteristics that any industrial sector should have and pursue: human centrality, resilience, and sustainability. The application of the fifth industrial revolution cannot be completed without moving from the implementation of Industry 4.0-enabled platforms. The common feature found in the development of this kind of platform is the need to integrate the Information and Operational layers. Our thesis work focuses on the implementation of a platform addressing all the digitization features foreseen by the fourth industrial revolution, making the IT/OT convergence inside production plants an improvement and not a risk. Furthermore, we added modular features to our platform enabling the Industry 5.0 vision. We favored the human centrality using the mobile crowdsensing techniques and the reliability and sustainability using pluggable cloud computing services, combined with data coming from the crowd support. We achieved important and encouraging results in all the domains in which we conducted our experiments. Our IT/OT convergence-enabled platform exhibits the right performance needed to satisfy the strict requirements of production sites. The multi-layer capability of the framework enables the exploitation of data not strictly coming from work machines, allowing a more strict interaction between the company, its employees, and customers.
Resumo:
The General Data Protection Regulation (GDPR) has been designed to help promote a view in favor of the interests of individuals instead of large corporations. However, there is the need of more dedicated technologies that can help companies comply with GDPR while enabling people to exercise their rights. We argue that such a dedicated solution must address two main issues: the need for more transparency towards individuals regarding the management of their personal information and their often hindered ability to access and make interoperable personal data in a way that the exercise of one's rights would result in straightforward. We aim to provide a system that helps to push personal data management towards the individual's control, i.e., a personal information management system (PIMS). By using distributed storage and decentralized computing networks to control online services, users' personal information could be shifted towards those directly concerned, i.e., the data subjects. The use of Distributed Ledger Technologies (DLTs) and Decentralized File Storage (DFS) as an implementation of decentralized systems is of paramount importance in this case. The structure of this dissertation follows an incremental approach to describing a set of decentralized systems and models that revolves around personal data and their subjects. Each chapter of this dissertation builds up the previous one and discusses the technical implementation of a system and its relation with the corresponding regulations. We refer to the EU regulatory framework, including GDPR, eIDAS, and Data Governance Act, to build our final system architecture's functional and non-functional drivers. In our PIMS design, personal data is kept in a Personal Data Space (PDS) consisting of encrypted personal data referring to the subject stored in a DFS. On top of that, a network of authorization servers acts as a data intermediary to provide access to potential data recipients through smart contracts.
Resumo:
The application of modern ICT technologies is radically changing many fields pushing toward more open and dynamic value chains fostering the cooperation and integration of many connected partners, sensors, and devices. As a valuable example, the emerging Smart Tourism field derived from the application of ICT to Tourism so to create richer and more integrated experiences, making them more accessible and sustainable. From a technological viewpoint, a recurring challenge in these decentralized environments is the integration of heterogeneous services and data spanning multiple administrative domains, each possibly applying different security/privacy policies, device and process control mechanisms, service access, and provisioning schemes, etc. The distribution and heterogeneity of those sources exacerbate the complexity in the development of integrating solutions with consequent high effort and costs for partners seeking them. Taking a step towards addressing these issues, we propose APERTO, a decentralized and distributed architecture that aims at facilitating the blending of data and services. At its core, APERTO relies on APERTO FaaS, a Serverless platform allowing fast prototyping of the business logic, lowering the barrier of entry and development costs to newcomers, (zero) fine-grained scaling of resources servicing end-users, and reduced management overhead. APERTO FaaS infrastructure is based on asynchronous and transparent communications between the components of the architecture, allowing the development of optimized solutions that exploit the peculiarities of distributed and heterogeneous environments. In particular, APERTO addresses the provisioning of scalable and cost-efficient mechanisms targeting: i) function composition allowing the definition of complex workloads from simple, ready-to-use functions, enabling smarter management of complex tasks and improved multiplexing capabilities; ii) the creation of end-to-end differentiated QoS slices minimizing interfaces among application/service running on a shared infrastructure; i) an abstraction providing uniform and optimized access to heterogeneous data sources, iv) a decentralized approach for the verification of access rights to resources.