12 resultados para Data sources detection

em AMS Tesi di Dottorato - Alm@DL - Universit


Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This work is part of a project promoted by Emilia-Romagna that aims at encouraging research activities in order to support the innovation strategies of the regional economic system through the exploitation of new data sources. To gain this scope, a database containing administrative data is provided by the Municipality of Bologna. This is achieved by linking data from the Register Office of the Municipality and fiscal data coming from the tax returns submitted to the Revenue Agency and released by the Ministry of Economy and Finance for the period 2002-2017. The main purpose of the project is the analysis of the medium term financial and distributional trends of income of the citizens residing in the Municipality of Bologna. Exploiting this innovative source of data allow us to analyse the dynamics of income at municipal level, overcoming the lack of information in official survey-based statistic. We investigate these trends by building inequality indicators and by examining the persistence of in-work poverty. Our results represent an important informative element to improve the effectiveness and equity of welfare policies at the local level, and to guide the distribution of economic and social support and urban redevelopment interventions in different areas of the Municipality.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This PhD thesis aims at providing an evaluation of EU Cohesion policy impact on regional growth. It employs methodologies and data sources never before applied for this purpose. Main contributions to the literature concerning EU regional policy effectiveness have been extensively analysed. Moreover, having carried out an overview of the current literature on Cohesion Policy, we deduce that this work introduces innovative features in the field. The work enriches the current literature with regards to two aspects. The first aspect concerns the use of the instrument of Regression Discontinuity Design in order to examine the presence of a different outcome in terms of growth between Objectives 1 regions and non-Objective 1 regions at the cut-off point (75 percent of EU-15 GDP per capita in PPS) during the two programming periods, 1994-1999 and 2000-2006. The results confirm a significant difference higher than 0.5 percent per year between the two groups. The other empirical evaluation regards the study of a cross-section regression model based on the convergence theory that analyses the dependence relation between regional per capita growth and EU Cohesion policy expenditure in several fields of interventions. We have built a very fine dataset of spending variables (certified expenditure), using sources of data directly provided from the Regional Policy Directorate of the European Commission.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

L'elaborato si pone l'obiettivo di indagare il complesso quadro delle molestie in famiglia e sul lavoro nell'ordinamento italiano e di effettuare una comparazione con un ordinamento appartenente alla stessa tradizione giuridica, l'ordinamento francese. Nel corso dell'esposizione saranno ricostruiti gli aspetti socio-criminologici e giuridici delle molestie in famiglia verso i soggetti deboli, donne, minori, anziani o portatori di handicap, le molestie sul luogo di lavoro quali molestie sessuali e mobbing, le molestie a distanza o stalking, che per molti aspetti rappresentano un fenomeno sommerso e poco conosciuto. La tesi intende analizzare soprattutto le forme di molestie psicologiche e meno conosciute. La ricostruzione teorico-normativa degli argomenti trattati è integrata con i risultati di una ricerca quantitativa e qualitativa tratta dalla giurisprudenza dei due paesi. Il lavoro, quindi, è organizzato in due parti: la prima è incentrata sugli aspetti teorici, socio-criminologici e giuridici e la seconda è dedicata alla ricerca empirica, che è stata condotta utilizzando quali fonti di dati le sentenze della Suprema Corte di Cassazione italiana e francese.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This doctoral thesis aims at contributing to the literature on transition economies focusing on the Russian Federations and in particular on regional income convergence and fertility patterns. The first two chapter deal with the issue of income convergence across regions. Chapter 1 provides an historical-institutional analysis of the period between the late years of the Soviet Union and the last decade of economic growth and a presentation of the sample with a description of gross regional product composition, agrarian or industrial vocation, labor. Chapter 2 contributes to the literature on exploratory spatial data analysis with a application to a panel of 77 regions in the period 1994-2008. It provides an analysis of spatial patterns and it extends the theoretical framework of growth regressions controlling for spatial correlation and heterogeneity. Chapter 3 analyses the national demographic patterns since 1960 and provides a review of the policies on maternity leave and family benefits. Data sources are the Statistical Yearbooks of USSR, the Statistical Yearbooks of the Russian Soviet Federative Socialist Republic and the Demographic Yearbooks of Russia. Chapter 4 analyses the demographic patterns in light of the theoretical framework of the Becker model, the Second Demographic Transition and an economic-crisis argument. With national data from 1960, the theoretically issue of the pro or countercyclical relation between income and fertility is graphically analyzed and discussed, together with female employment and education. With regional data after 1994 different panel data models are tested. Individual level data from the Russian Longitudinal Monitoring Survey are employed using the logit model. Chapter 5 employs data from the Generations and Gender Survey by UNECE to focus on postponement and second births intentions. Postponement is studied through cohort analysis of mean maternal age at first birth, while the methodology used for second birth intentions is the ordered logit model.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This dissertation consists of three standalone articles that contribute to the economics literature concerning technology adoption, information diffusion, and network economics in one way or another, using a couple of primary data sources from Ethiopia. The first empirical paper identifies the main behavioral factors affecting the adoption of brand new (radical) and upgraded (incremental) bioenergy innovations in Ethiopia. The results highlight the importance of targeting different instruments to increase the adoption rate of the two types of innovations. The second and the third empirical papers of this thesis, use primary data collected from 3,693 high school students in Ethiopia, and shed light on how we should select informants to effectively and equitably disseminate new information, mainly concerning environmental issues. There are different well-recognized standard centrality measures that are used to select informants. These standard centrality measures, however, are based on the network topology---shaped only by the number of connections---and fail to incorporate the intrinsic motivations of the informants. This thesis introduces an augmented centrality measure (ACM) by modifying the eigenvector centrality measure through weighting the adjacency matrix with the altruism levels of connected nodes. The results from the two papers suggest that targeting informants based on network position and behavioral attributes ensures more effective and equitable (gender perspective) transmission of information in social networks than selecting informants on network centrality measures alone. Notably, when the information is concerned with environmental issues.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Social interactions have been the focus of social science research for a century, but their study has recently been revolutionized by novel data sources and by methods from computer science, network science, and complex systems science. The study of social interactions is crucial for understanding complex societal behaviours. Social interactions are naturally represented as networks, which have emerged as a unifying mathematical language to understand structural and dynamical aspects of socio-technical systems. Networks are, however, highly dimensional objects, especially when considering the scales of real-world systems and the need to model the temporal dimension. Hence the study of empirical data from social systems is challenging both from a conceptual and a computational standpoint. A possible approach to tackling such a challenge is to use dimensionality reduction techniques that represent network entities in a low-dimensional feature space, preserving some desired properties of the original data. Low-dimensional vector space representations, also known as network embeddings, have been extensively studied, also as a way to feed network data to machine learning algorithms. Network embeddings were initially developed for static networks and then extended to incorporate temporal network data. We focus on dimensionality reduction techniques for time-resolved social interaction data modelled as temporal networks. We introduce a novel embedding technique that models the temporal and structural similarities of events rather than nodes. Using empirical data on social interactions, we show that this representation captures information relevant for the study of dynamical processes unfolding over the network, such as epidemic spreading. We then turn to another large-scale dataset on social interactions: a popular Web-based crowdfunding platform. We show that tensor-based representations of the data and dimensionality reduction techniques such as tensor factorization allow us to uncover the structural and temporal aspects of the system and to relate them to geographic and temporal activity patterns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The application of modern ICT technologies is radically changing many fields pushing toward more open and dynamic value chains fostering the cooperation and integration of many connected partners, sensors, and devices. As a valuable example, the emerging Smart Tourism field derived from the application of ICT to Tourism so to create richer and more integrated experiences, making them more accessible and sustainable. From a technological viewpoint, a recurring challenge in these decentralized environments is the integration of heterogeneous services and data spanning multiple administrative domains, each possibly applying different security/privacy policies, device and process control mechanisms, service access, and provisioning schemes, etc. The distribution and heterogeneity of those sources exacerbate the complexity in the development of integrating solutions with consequent high effort and costs for partners seeking them. Taking a step towards addressing these issues, we propose APERTO, a decentralized and distributed architecture that aims at facilitating the blending of data and services. At its core, APERTO relies on APERTO FaaS, a Serverless platform allowing fast prototyping of the business logic, lowering the barrier of entry and development costs to newcomers, (zero) fine-grained scaling of resources servicing end-users, and reduced management overhead. APERTO FaaS infrastructure is based on asynchronous and transparent communications between the components of the architecture, allowing the development of optimized solutions that exploit the peculiarities of distributed and heterogeneous environments. In particular, APERTO addresses the provisioning of scalable and cost-efficient mechanisms targeting: i) function composition allowing the definition of complex workloads from simple, ready-to-use functions, enabling smarter management of complex tasks and improved multiplexing capabilities; ii) the creation of end-to-end differentiated QoS slices minimizing interfaces among application/service running on a shared infrastructure; i) an abstraction providing uniform and optimized access to heterogeneous data sources, iv) a decentralized approach for the verification of access rights to resources.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The pervasive availability of connected devices in any industrial and societal sector is pushing for an evolution of the well-established cloud computing model. The emerging paradigm of the cloud continuum embraces this decentralization trend and envisions virtualized computing resources physically located between traditional datacenters and data sources. By totally or partially executing closer to the network edge, applications can have quicker reactions to events, thus enabling advanced forms of automation and intelligence. However, these applications also induce new data-intensive workloads with low-latency constraints that require the adoption of specialized resources, such as high-performance communication options (e.g., RDMA, DPDK, XDP, etc.). Unfortunately, cloud providers still struggle to integrate these options into their infrastructures. That risks undermining the principle of generality that underlies the cloud computing scale economy by forcing developers to tailor their code to low-level APIs, non-standard programming models, and static execution environments. This thesis proposes a novel system architecture to empower cloud platforms across the whole cloud continuum with Network Acceleration as a Service (NAaaS). To provide commodity yet efficient access to acceleration, this architecture defines a layer of agnostic high-performance I/O APIs, exposed to applications and clearly separated from the heterogeneous protocols, interfaces, and hardware devices that implement it. A novel system component embodies this decoupling by offering a set of agnostic OS features to applications: memory management for zero-copy transfers, asynchronous I/O processing, and efficient packet scheduling. This thesis also explores the design space of the possible implementations of this architecture by proposing two reference middleware systems and by adopting them to support interactive use cases in the cloud continuum: a serverless platform and an Industry 4.0 scenario. A detailed discussion and a thorough performance evaluation demonstrate that the proposed architecture is suitable to enable the easy-to-use, flexible integration of modern network acceleration into next-generation cloud platforms.