14 resultados para Data detection

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

40.00% 40.00%

Publicador:

Resumo:

In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent progress in microelectronic and wireless communications have enabled the development of low cost, low power, multifunctional sensors, which has allowed the birth of new type of networks named wireless sensor networks (WSNs). The main features of such networks are: the nodes can be positioned randomly over a given field with a high density; each node operates both like sensor (for collection of environmental data) as well as transceiver (for transmission of information to the data retrieval); the nodes have limited energy resources. The use of wireless communications and the small size of nodes, make this type of networks suitable for a large number of applications. For example, sensor nodes can be used to monitor a high risk region, as near a volcano; in a hospital they could be used to monitor physical conditions of patients. For each of these possible application scenarios, it is necessary to guarantee a trade-off between energy consumptions and communication reliability. The thesis investigates the use of WSNs in two possible scenarios and for each of them suggests a solution that permits to solve relating problems considering the trade-off introduced. The first scenario considers a network with a high number of nodes deployed in a given geographical area without detailed planning that have to transmit data toward a coordinator node, named sink, that we assume to be located onboard an unmanned aerial vehicle (UAV). This is a practical example of reachback communication, characterized by the high density of nodes that have to transmit data reliably and efficiently towards a far receiver. It is considered that each node transmits a common shared message directly to the receiver onboard the UAV whenever it receives a broadcast message (triggered for example by the vehicle). We assume that the communication channels between the local nodes and the receiver are subject to fading and noise. The receiver onboard the UAV must be able to fuse the weak and noisy signals in a coherent way to receive the data reliably. It is proposed a cooperative diversity concept as an effective solution to the reachback problem. In particular, it is considered a spread spectrum (SS) transmission scheme in conjunction with a fusion center that can exploit cooperative diversity, without requiring stringent synchronization between nodes. The idea consists of simultaneous transmission of the common message among the nodes and a Rake reception at the fusion center. The proposed solution is mainly motivated by two goals: the necessity to have simple nodes (to this aim we move the computational complexity to the receiver onboard the UAV), and the importance to guarantee high levels of energy efficiency of the network, thus increasing the network lifetime. The proposed scheme is analyzed in order to better understand the effectiveness of the approach presented. The performance metrics considered are both the theoretical limit on the maximum amount of data that can be collected by the receiver, as well as the error probability with a given modulation scheme. Since we deal with a WSN, both of these performance are evaluated taking into consideration the energy efficiency of the network. The second scenario considers the use of a chain network for the detection of fires by using nodes that have a double function of sensors and routers. The first one is relative to the monitoring of a temperature parameter that allows to take a local binary decision of target (fire) absent/present. The second one considers that each node receives a decision made by the previous node of the chain, compares this with that deriving by the observation of the phenomenon, and transmits the final result to the next node. The chain ends at the sink node that transmits the received decision to the user. In this network the goals are to limit throughput in each sensor-to-sensor link and minimize probability of error at the last stage of the chain. This is a typical scenario of distributed detection. To obtain good performance it is necessary to define some fusion rules for each node to summarize local observations and decisions of the previous nodes, to get a final decision that it is transmitted to the next node. WSNs have been studied also under a practical point of view, describing both the main characteristics of IEEE802:15:4 standard and two commercial WSN platforms. By using a commercial WSN platform it is realized an agricultural application that has been tested in a six months on-field experimentation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Precipitation retrieval over high latitudes, particularly snowfall retrieval over ice and snow, using satellite-based passive microwave spectrometers, is currently an unsolved problem. The challenge results from the large variability of microwave emissivity spectra for snow and ice surfaces, which can mimic, to some degree, the spectral characteristics of snowfall. This work focuses on the investigation of a new snowfall detection algorithm specific for high latitude regions, based on a combination of active and passive sensors able to discriminate between snowing and non snowing areas. The space-borne Cloud Profiling Radar (on CloudSat), the Advanced Microwave Sensor units A and B (on NOAA-16) and the infrared spectrometer MODIS (on AQUA) have been co-located for 365 days, from October 1st 2006 to September 30th, 2007. CloudSat products have been used as truth to calibrate and validate all the proposed algorithms. The methodological approach followed can be summarised into two different steps. In a first step, an empirical search for a threshold, aimed at discriminating the case of no snow, was performed, following Kongoli et al. [2003]. This single-channel approach has not produced appropriate results, a more statistically sound approach was attempted. Two different techniques, which allow to compute the probability above and below a Brightness Temperature (BT) threshold, have been used on the available data. The first technique is based upon a Logistic Distribution to represent the probability of Snow given the predictors. The second technique, defined Bayesian Multivariate Binary Predictor (BMBP), is a fully Bayesian technique not requiring any hypothesis on the shape of the probabilistic model (such as for instance the Logistic), which only requires the estimation of the BT thresholds. The results obtained show that both methods proposed are able to discriminate snowing and non snowing condition over the Polar regions with a probability of correct detection larger than 0.5, highlighting the importance of a multispectral approach.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study provides a comprehensive genetic overview on the endangered Italian wolf population. In particular, it focuses on two research lines. On one hand, we focalised on melanism in wolf in order to isolate a mutation related with black coat colour in canids. With several reported black individuals (an exception at European level), the Italian wolf population constituted a challenging research field posing many unanswered questions. As found in North American wolf, we reported that melanism in the Italian population is caused by a different melanocortin pathway component, the K locus, in which a beta-defensin protein acts as an alternative ligand for the Mc1r. This research project was conducted in collaboration with Prof. Gregory Barsh, Department of Genetics and Paediatrics, Stanford University. On the other hand, we performed analysis on a high number of SNPs thanks to a customized Canine microarray useful to integrate or substitute the STR markers for genotyping individuals and detecting wolf-dog hybrids. Thanks to DNA microchip technology, we obtained an impressive amount of genetic data which provides a solid base for future functional genomic studies. This study was undertaken in collaboration with Prof. Robert K. Wayne, Department of Ecology and Evolutionary Biology, University of California, Los Angeles (UCLA).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Falls are caused by complex interaction between multiple risk factors which may be modified by age, disease and environment. A variety of methods and tools for fall risk assessment have been proposed, but none of which is universally accepted. Existing tools are generally not capable of providing a quantitative predictive assessment of fall risk. The need for objective, cost-effective and clinically applicable methods would enable quantitative assessment of fall risk on a subject-specific basis. Tracking objectively falls risk could provide timely feedback about the effectiveness of administered interventions enabling intervention strategies to be modified or changed if found to be ineffective. Moreover, some of the fundamental factors leading to falls and what actually happens during a fall remain unclear. Objectively documented and measured falls are needed to improve knowledge of fall in order to develop more effective prevention strategies and prolong independent living. In the last decade, several research groups have developed sensor-based automatic or semi-automatic fall risk assessment tools using wearable inertial sensors. This approach may also serve to detect falls. At the moment, i) several fall-risk assessment studies based on inertial sensors, even if promising, lack of a biomechanical model-based approach which could provide accurate and more detailed measurements of interests (e.g., joint moments, forces) and ii) the number of published real-world fall data of older people in a real-world environment is minimal since most authors have used simulations with healthy volunteers as a surrogate for real-world falls. With these limitations in mind, this thesis aims i) to suggest a novel method for the kinematics and dynamics evaluation of functional motor tasks, often used in clinics for the fall-risk evaluation, through a body sensor network and a biomechanical approach and ii) to define the guidelines for a fall detection algorithm based on a real-world fall database availability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Autism Spectrum Disorders (ASDs) describe a set of neurodevelopmental disorders. ASD represents a significant public health problem. Currently, ASDs are not diagnosed before the 2nd year of life but an early identification of ASDs would be crucial as interventions are much more effective than specific therapies starting in later childhood. To this aim, cheap an contact-less automatic approaches recently aroused great clinical interest. Among them, the cry and the movements of the newborn, both involving the central nervous system, are proposed as possible indicators of neurological disorders. This PhD work is a first step towards solving this challenging problem. An integrated system is presented enabling the recording of audio (crying) and video (movements) data of the newborn, their automatic analysis with innovative techniques for the extraction of clinically relevant parameters and their classification with data mining techniques. New robust algorithms were developed for the selection of the voiced parts of the cry signal, the estimation of acoustic parameters based on the wavelet transform and the analysis of the infant’s general movements (GMs) through a new body model for segmentation and 2D reconstruction. In addition to a thorough literature review this thesis presents the state of the art on these topics that shows that no studies exist concerning normative ranges for newborn infant cry in the first 6 months of life nor the correlation between cry and movements. Through the new automatic methods a population of control infants (“low-risk”, LR) was compared to a group of “high-risk” (HR) infants, i.e. siblings of children already diagnosed with ASD. A subset of LR infants clinically diagnosed as newborns with Typical Development (TD) and one affected by ASD were compared. The results show that the selected acoustic parameters allow good differentiation between the two groups. This result provides new perspectives both diagnostic and therapeutic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Il telerilevamento rappresenta un efficace strumento per il monitoraggio dell’ambiente e del territorio, grazie alla disponibilità di sensori che riprendono con cadenza temporale fissa porzioni della superficie terrestre. Le immagini multi/iperspettrali acquisite sono in grado di fornire informazioni per differenti campi di applicazione. In questo studio è stato affrontato il tema del consumo di suolo che rappresenta un’importante sfida per una corretta gestione del territorio, poiché direttamente connesso con i fenomeni del runoff urbano, della frammentazione ecosistemica e con la sottrazione di importanti territori agricoli. Ancora non esiste una definizione unica, ed anche una metodologia di misura, del consumo di suolo; in questo studio è stato definito come tale quello che provoca impermeabilizzazione del terreno. L’area scelta è quella della Provincia di Bologna che si estende per 3.702 km2 ed è caratterizzata a nord dalla Pianura Padana e a sud dalla catena appenninica; secondo i dati forniti dall’ISTAT, nel periodo 2001-2011 è stata la quarta provincia in Italia con più consumo di suolo. Tramite classificazione pixel-based è stata fatta una mappatura del fenomeno per cinque immagini Landsat. Anche se a media risoluzione, e quindi non in grado di mappare tutti i dettagli, esse sono particolarmente idonee per aree estese come quella scelta ed inoltre garantiscono una più ampia copertura temporale. Il periodo considerato va dal 1987 al 2013 e, tramite procedure di change detection applicate alle mappe prodotte, si è cercato di quantificare il fenomeno, confrontarlo con i dati esistenti e analizzare la sua distribuzione spaziale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis two approaches were applied to achieve a double general objective. The first chapter was dedicated to the study of the distribution of the expression of genes of several bitter and fat receptor in several gastrointestinal tracts. A set of 7 genes for bitter taste and for 3 genes for fat taste was amplified with real-time PCR from mRNA extracted from 5 gastrointestinal segments of weaned pigs. The presence of gene expression for several chemosensing receptors for bitter and fat taste in different compartments of the stomach confirms that this organ should be considered a player for the early detection of bolus composition. In the second chapter we investigated in young pigs the distribution of butyrate-sensing olfactory receptor (OR51E1) receptor along the GIT, its relation with some endocrine markers, its variation with age, and after interventions affecting the gut environment and intestinal microbiota in piglets and in different tissues. Our results indicate that OR51E1 is strictly related to the normal GIT enteroendocrine activity. In the third chapter we investigated the differential gene expression between oxyntic and pyloric mucosa in seven starter pigs. The obtained data indicate that there is significant differential gene exression between oxintic of the young pig and pyloric mucosa and further functional studies are needed to confirm their physiological importance. In the last chapter, thymol, that has been proposed as an oral alternative to antibiotics in the feed of pigs and broilers, was introduced directly into the stomach of 8 weaned pigs and sampled for gastric oxyntic and pyloric mucosa. The analysis of the whole transcript expression shoes that the stimulation of gastric proliferative activity and the control of digestive activity by thymol can influence positively gastric maturation and function in the weaned pigs.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The continuous and swift progression of both wireless and wired communication technologies in today's world owes its success to the foundational systems established earlier. These systems serve as the building blocks that enable the enhancement of services to cater to evolving requirements. Studying the vulnerabilities of previously designed systems and their current usage leads to the development of new communication technologies replacing the old ones such as GSM-R in the railway field. The current industrial research has a specific focus on finding an appropriate telecommunication solution for railway communications that will replace the GSM-R standard which will be switched off in the next years. Various standardization organizations are currently exploring and designing a radiofrequency technology based standard solution to serve railway communications in the form of FRMCS (Future Railway Mobile Communication System) to substitute the current GSM-R. Bearing on this topic, the primary strategic objective of the research is to assess the feasibility to leverage on the current public network technologies such as LTE to cater to mission and safety critical communication for low density lines. The research aims to identify the constraints, define a service level agreement with telecom operators, and establish the necessary implementations to make the system as reliable as possible over an open and public network, while considering safety and cybersecurity aspects. The LTE infrastructure would be utilized to transmit the vital data for the communication of a railway system and to gather and transmit all the field measurements to the control room for maintenance purposes. Given the significance of maintenance activities in the railway sector, the ongoing research includes the implementation of a machine learning algorithm to detect railway equipment faults, reducing time and human analysis errors due to the large volume of measurements from the field.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Cherenkov Telescope Array (CTA) will be the next-generation ground-based observatory to study the universe in the very-high-energy domain. The observatory will rely on a Science Alert Generation (SAG) system to analyze the real-time data from the telescopes and generate science alerts. The SAG system will play a crucial role in the search and follow-up of transients from external alerts, enabling multi-wavelength and multi-messenger collaborations. It will maximize the potential for the detection of the rarest phenomena, such as gamma-ray bursts (GRBs), which are the science case for this study. This study presents an anomaly detection method based on deep learning for detecting gamma-ray burst events in real-time. The performance of the proposed method is evaluated and compared against the Li&Ma standard technique in two use cases of serendipitous discoveries and follow-up observations, using short exposure times. The method shows promising results in detecting GRBs and is flexible enough to allow real-time search for transient events on multiple time scales. The method does not assume background nor source models and doe not require a minimum number of photon counts to perform analysis, making it well-suited for real-time analysis. Future improvements involve further tests, relaxing some of the assumptions made in this study as well as post-trials correction of the detection significance. Moreover, the ability to detect other transient classes in different scenarios must be investigated for completeness. The system can be integrated within the SAG system of CTA and deployed on the onsite computing clusters. This would provide valuable insights into the method's performance in a real-world setting and be another valuable tool for discovering new transient events in real-time. Overall, this study makes a significant contribution to the field of astrophysics by demonstrating the effectiveness of deep learning-based anomaly detection techniques for real-time source detection in gamma-ray astronomy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.