10 resultados para data accuracy

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The quality of temperature and humidity retrievals from the infrared SEVIRI sensors on the geostationary Meteosat Second Generation (MSG) satellites is assessed by means of a one dimensional variational algorithm. The study is performed with the aim of improving the spatial and temporal resolution of available observations to feed analysis systems designed for high resolution regional scale numerical weather prediction (NWP) models. The non-hydrostatic forecast model COSMO (COnsortium for Small scale MOdelling) in the ARPA-SIM operational configuration is used to provide background fields. Only clear sky observations over sea are processed. An optimised 1D–VAR set-up comprising of the two water vapour and the three window channels is selected. It maximises the reduction of errors in the model backgrounds while ensuring ease of operational implementation through accurate bias correction procedures and correct radiative transfer simulations. The 1D–VAR retrieval quality is firstly quantified in relative terms employing statistics to estimate the reduction in the background model errors. Additionally the absolute retrieval accuracy is assessed comparing the analysis with independent radiosonde and satellite observations. The inclusion of satellite data brings a substantial reduction in the warm and dry biases present in the forecast model. Moreover it is shown that the retrieval profiles generated by the 1D–VAR are well correlated with the radiosonde measurements. Subsequently the 1D–VAR technique is applied to two three–dimensional case–studies: a false alarm case–study occurred in Friuli–Venezia–Giulia on the 8th of July 2004 and a heavy precipitation case occurred in Emilia–Romagna region between 9th and 12th of April 2005. The impact of satellite data for these two events is evaluated in terms of increments in the integrated water vapour and saturation water vapour over the column, in the 2 meters temperature and specific humidity and in the surface temperature. To improve the 1D–VAR technique a method to calculate flow–dependent model error covariance matrices is also assessed. The approach employs members from an ensemble forecast system generated by perturbing physical parameterisation schemes inside the model. The improved set–up applied to the case of 8th of July 2004 shows a substantial neutral impact.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In recent years, the use of Reverse Engineering systems has got a considerable interest for a wide number of applications. Therefore, many research activities are focused on accuracy and precision of the acquired data and post processing phase improvements. In this context, this PhD Thesis deals with the definition of two novel methods for data post processing and data fusion between physical and geometrical information. In particular a technique has been defined for error definition in 3D points’ coordinates acquired by an optical triangulation laser scanner, with the aim to identify adequate correction arrays to apply under different acquisition parameters and operative conditions. Systematic error in data acquired is thus compensated, in order to increase accuracy value. Moreover, the definition of a 3D thermogram is examined. Object geometrical information and its thermal properties, coming from a thermographic inspection, are combined in order to have a temperature value for each recognizable point. Data acquired by an optical triangulation laser scanner are also used to normalize temperature values and make thermal data independent from thermal-camera point of view.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Big data are reshaping the way we interact with technology, thus fostering new applications to increase the safety-assessment of foods. An extraordinary amount of information is analysed using machine learning approaches aimed at detecting the existence or predicting the likelihood of future risks. Food business operators have to share the results of these analyses when applying to place on the market regulated products, whereas agri-food safety agencies (including the European Food Safety Authority) are exploring new avenues to increase the accuracy of their evaluations by processing Big data. Such an informational endowment brings with it opportunities and risks correlated to the extraction of meaningful inferences from data. However, conflicting interests and tensions among the involved entities - the industry, food safety agencies, and consumers - hinder the finding of shared methods to steer the processing of Big data in a sound, transparent and trustworthy way. A recent reform in the EU sectoral legislation, the lack of trust and the presence of a considerable number of stakeholders highlight the need of ethical contributions aimed at steering the development and the deployment of Big data applications. Moreover, Artificial Intelligence guidelines and charters published by European Union institutions and Member States have to be discussed in light of applied contexts, including the one at stake. This thesis aims to contribute to these goals by discussing what principles should be put forward when processing Big data in the context of agri-food safety-risk assessment. The research focuses on two interviewed topics - data ownership and data governance - by evaluating how the regulatory framework addresses the challenges raised by Big data analysis in these domains. The outcome of the project is a tentative Roadmap aimed to identify the principles to be observed when processing Big data in this domain and their possible implementations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intelligent systems are currently inherent to the society, supporting a synergistic human-machine collaboration. Beyond economical and climate factors, energy consumption is strongly affected by the performance of computing systems. The quality of software functioning may invalidate any improvement attempt. In addition, data-driven machine learning algorithms are the basis for human-centered applications, being their interpretability one of the most important features of computational systems. Software maintenance is a critical discipline to support automatic and life-long system operation. As most software registers its inner events by means of logs, log analysis is an approach to keep system operation. Logs are characterized as Big data assembled in large-flow streams, being unstructured, heterogeneous, imprecise, and uncertain. This thesis addresses fuzzy and neuro-granular methods to provide maintenance solutions applied to anomaly detection (AD) and log parsing (LP), dealing with data uncertainty, identifying ideal time periods for detailed software analyses. LP provides deeper semantics interpretation of the anomalous occurrences. The solutions evolve over time and are general-purpose, being highly applicable, scalable, and maintainable. Granular classification models, namely, Fuzzy set-Based evolving Model (FBeM), evolving Granular Neural Network (eGNN), and evolving Gaussian Fuzzy Classifier (eGFC), are compared considering the AD problem. The evolving Log Parsing (eLP) method is proposed to approach the automatic parsing applied to system logs. All the methods perform recursive mechanisms to create, update, merge, and delete information granules according with the data behavior. For the first time in the evolving intelligent systems literature, the proposed method, eLP, is able to process streams of words and sentences. Essentially, regarding to AD accuracy, FBeM achieved (85.64+-3.69)%; eGNN reached (96.17+-0.78)%; eGFC obtained (92.48+-1.21)%; and eLP reached (96.05+-1.04)%. Besides being competitive, eLP particularly generates a log grammar, and presents a higher level of model interpretability.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Deep learning methods are extremely promising machine learning tools to analyze neuroimaging data. However, their potential use in clinical settings is limited because of the existing challenges of applying these methods to neuroimaging data. In this study, first a data leakage type caused by slice-level data split that is introduced during training and validation of a 2D CNN is surveyed and a quantitative assessment of the model’s performance overestimation is presented. Second, an interpretable, leakage-fee deep learning software written in a python language with a wide range of options has been developed to conduct both classification and regression analysis. The software was applied to the study of mild cognitive impairment (MCI) in patients with small vessel disease (SVD) using multi-parametric MRI data where the cognitive performance of 58 patients measured by five neuropsychological tests is predicted using a multi-input CNN model taking brain image and demographic data. Each of the cognitive test scores was predicted using different MRI-derived features. As MCI due to SVD has been hypothesized to be the effect of white matter damage, DTI-derived features MD and FA produced the best prediction outcome of the TMT-A score which is consistent with the existing literature. In a second study, an interpretable deep learning system aimed at 1) classifying Alzheimer disease and healthy subjects 2) examining the neural correlates of the disease that causes a cognitive decline in AD patients using CNN visualization tools and 3) highlighting the potential of interpretability techniques to capture a biased deep learning model is developed. Structural magnetic resonance imaging (MRI) data of 200 subjects was used by the proposed CNN model which was trained using a transfer learning-based approach producing a balanced accuracy of 71.6%. Brain regions in the frontal and parietal lobe showing the cerebral cortex atrophy were highlighted by the visualization tools.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the advent of new technologies it is increasingly easier to find data of different nature from even more accurate sensors that measure the most disparate physical quantities and with different methodologies. The collection of data thus becomes progressively important and takes the form of archiving, cataloging and online and offline consultation of information. Over time, the amount of data collected can become so relevant that it contains information that cannot be easily explored manually or with basic statistical techniques. The use of Big Data therefore becomes the object of more advanced investigation techniques, such as Machine Learning and Deep Learning. In this work some applications in the world of precision zootechnics and heat stress accused by dairy cows are described. Experimental Italian and German stables were involved for the training and testing of the Random Forest algorithm, obtaining a prediction of milk production depending on the microclimatic conditions of the previous days with satisfactory accuracy. Furthermore, in order to identify an objective method for identifying production drops, compared to the Wood model, typically used as an analytical model of the lactation curve, a Robust Statistics technique was used. Its application on some sample lactations and the results obtained allow us to be confident about the use of this method in the future.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis investigates the legal, ethical, technical, and psychological issues of general data processing and artificial intelligence practices and the explainability of AI systems. It consists of two main parts. In the initial section, we provide a comprehensive overview of the big data processing ecosystem and the main challenges we face today. We then evaluate the GDPR’s data privacy framework in the European Union. The Trustworthy AI Framework proposed by the EU’s High-Level Expert Group on AI (AI HLEG) is examined in detail. The ethical principles for the foundation and realization of Trustworthy AI are analyzed along with the assessment list prepared by the AI HLEG. Then, we list the main big data challenges the European researchers and institutions identified and provide a literature review on the technical and organizational measures to address these challenges. A quantitative analysis is conducted on the identified big data challenges and the measures to address them, which leads to practical recommendations for better data processing and AI practices in the EU. In the subsequent part, we concentrate on the explainability of AI systems. We clarify the terminology and list the goals aimed at the explainability of AI systems. We identify the reasons for the explainability-accuracy trade-off and how we can address it. We conduct a comparative cognitive analysis between human reasoning and machine-generated explanations with the aim of understanding how explainable AI can contribute to human reasoning. We then focus on the technical and legal responses to remedy the explainability problem. In this part, GDPR’s right to explanation framework and safeguards are analyzed in-depth with their contribution to the realization of Trustworthy AI. Then, we analyze the explanation techniques applicable at different stages of machine learning and propose several recommendations in chronological order to develop GDPR-compliant and Trustworthy XAI systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim of the present study was to develop a statistical approach to define the best cut-off Copy number alterations (CNAs) calling from genomic data provided by high throughput experiments, able to predict a specific clinical end-point (early relapse, 18 months) in the context of Multiple Myeloma (MM). 743 newly diagnosed MM patients with SNPs array-derived genomic and clinical data were included in the study. CNAs were called both by a conventional (classic, CL) and an outcome-oriented (OO) method, and Progression Free Survival (PFS) hazard ratios of CNAs called by the two approaches were compared. The OO approach successfully identified patients at higher risk of relapse and the univariate survival analysis showed stronger prognostic effects for OO-defined high-risk alterations, as compared to that defined by CL approach, statistically significant for 12 CNAs. Overall, 155/743 patients relapsed within 18 months from the therapy start. A small number of OO-defined CNAs were significantly recurrent in early-relapsed patients (ER-CNAs) - amp1q, amp2p, del2p, del12p, del17p, del19p -. Two groups of patients were identified either carrying or not ≥1 ER-CNAs (249 vs. 494, respectively), the first one with significantly shorter PFS and overall survivals (OS) (PFS HR 2.15, p<0001; OS HR 2.37, p<0.0001). The risk of relapse defined by the presence of ≥1 ER-CNAs was independent from those conferred both by R-IIS 3 (HR=1.51; p=0.01) and by low quality (< stable disease) clinical response (HR=2.59 p=0.004). Notably, the type of induction therapy was not descriptive, suggesting that ER is strongly related to patients’ baseline genomic architecture. In conclusion, the OO- approach employed allowed to define CNAs-specific dynamic clonality cut-offs, improving the CNAs calls’ accuracy to identify MM patients with the highest probability to ER. As being outcome-dependent, the OO-approach is dynamic and might be adjusted according to the selected outcome variable of interest.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The present Dissertation shows how recent statistical analysis tools and open datasets can be exploited to improve modelling accuracy in two distinct yet interconnected domains of flood hazard (FH) assessment. In the first Part, unsupervised artificial neural networks are employed as regional models for sub-daily rainfall extremes. The models aim to learn a robust relation to estimate locally the parameters of Gumbel distributions of extreme rainfall depths for any sub-daily duration (1-24h). The predictions depend on twenty morphoclimatic descriptors. A large study area in north-central Italy is adopted, where 2238 annual maximum series are available. Validation is performed over an independent set of 100 gauges. Our results show that multivariate ANNs may remarkably improve the estimation of percentiles relative to the benchmark approach from the literature, where Gumbel parameters depend on mean annual precipitation. Finally, we show that the very nature of the proposed ANN models makes them suitable for interpolating predicted sub-daily rainfall quantiles across space and time-aggregation intervals. In the second Part, decision trees are used to combine a selected blend of input geomorphic descriptors for predicting FH. Relative to existing DEM-based approaches, this method is innovative, as it relies on the combination of three characteristics: (1) simple multivariate models, (2) a set of exclusively DEM-based descriptors as input, and (3) an existing FH map as reference information. First, the methods are applied to northern Italy, represented with the MERIT DEM (∼90m resolution), and second, to the whole of Italy, represented with the EU-DEM (25m resolution). The results show that multivariate approaches may (a) significantly enhance flood-prone areas delineation relative to a selected univariate one, (b) provide accurate predictions of expected inundation depths, (c) produce encouraging results in extrapolation, (d) complete the information of imperfect reference maps, and (e) conveniently convert binary maps into continuous representation of FH.