8 resultados para Data Extraction
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Salt deposits characterize the subsurface of Tuzla (BiH) and made it famous since the ancient times. Archeological discoveries demonstrate the presence of a Neolithic pile-dwelling settlement related to the existence of saltwater springs that contributed to make the most of the area a swampy ground. Since the Roman times, the town is reported as “the City of Salt deposits and Springs”; "tuz" is the Turkish word for salt, as the Ottomans renamed the settlement in the 15th century following their conquest of the medieval Bosnia (Donia and Fine, 1994). Natural brine springs were located everywhere and salt has been evaporated by means of hot charcoals since pre-Roman times. The ancient use of salt was just a small exploitation compared to the massive salt production carried out during the 20th century by means of classical mine methodologies and especially wild brine pumping. In the past salt extraction was practised tapping natural brine springs, while the modern technique consists in about 100 boreholes with pumps tapped to the natural underground brine runs, at an average depth of 400-500 m. The mining operation changed the hydrogeological conditions enabling the downward flow of fresh water causing additional salt dissolution. This process induced severe ground subsidence during the last 60 years reaching up to 10 meters of sinking in the most affected area. Stress and strain of the overlying rocks induced the formation of numerous fractures over a conspicuous area (3 Km2). Consequently serious damages occurred to buildings and infrastructures such as water supply system, sewage networks and power lines. Downtown urban life was compromised by the destruction of more than 2000 buildings that collapsed or needed to be demolished causing the resettlement of about 15000 inhabitants (Tatić, 1979). Recently salt extraction activities have been strongly reduced, but the underground water system is returning to his natural conditions, threatening the flooding of the most collapsed area. During the last 60 years local government developed a monitoring system of the phenomenon, collecting several data about geodetic measurements, amount of brine pumped, piezometry, lithostratigraphy, extension of the salt body and geotechnical parameters. A database was created within a scientific cooperation between the municipality of Tuzla and the city of Rotterdam (D.O.O. Mining Institute Tuzla, 2000). The scientific investigation presented in this dissertation has been financially supported by a cooperation project between the Municipality of Tuzla, The University of Bologna (CIRSA) and the Province of Ravenna. The University of Tuzla (RGGF) gave an important scientific support in particular about the geological and hydrogeological features. Subsidence damage resulting from evaporite dissolution generates substantial losses throughout the world, but the causes are only well understood in a few areas (Gutierrez et al., 2008). The subject of this study is the collapsing phenomenon occurring in Tuzla area with the aim to identify and quantify the several factors involved in the system and their correlations. Tuzla subsidence phenomenon can be defined as geohazard, which represents the consequence of an adverse combination of geological processes and ground conditions precipitated by human activity with the potential to cause harm (Rosenbaum and Culshaw, 2003). Where an hazard induces a risk to a vulnerable element, a risk management process is required. The single factors involved in the subsidence of Tuzla can be considered as hazards. The final objective of this dissertation represents a preliminary risk assessment procedure and guidelines, developed in order to quantify the buildings vulnerability in relation to the overall geohazard that affect the town. The historical available database, never fully processed, have been analyzed by means of geographic information systems and mathematical interpolators (PART I). Modern geomatic applications have been implemented to deeply investigate the most relevant hazards (PART II). In order to monitor and quantify the actual subsidence rates, geodetic GPS technologies have been implemented and 4 survey campaigns have been carried out once a year. Subsidence related fractures system has been identified by means of field surveys and mathematical interpretations of the sinking surface, called curvature analysis. The comparison of mapped and predicted fractures leaded to a better comprehension of the problem. Results confirmed the reliability of fractures identification using curvature analysis applied to sinking data instead of topographic or seismic data. Urban changes evolution has been reconstructed analyzing topographic maps and satellite imageries, identifying the most damaged areas. This part of the investigation was very important for the quantification of buildings vulnerability.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.
Resumo:
This thesis aims at investigating methods and software architectures for discovering what are the typical and frequently occurring structures used for organizing knowledge in the Web. We identify these structures as Knowledge Patterns (KPs). KP discovery needs to address two main research problems: the heterogeneity of sources, formats and semantics in the Web (i.e., the knowledge soup problem) and the difficulty to draw relevant boundary around data that allows to capture the meaningful knowledge with respect to a certain context (i.e., the knowledge boundary problem). Hence, we introduce two methods that provide different solutions to these two problems by tackling KP discovery from two different perspectives: (i) the transformation of KP-like artifacts to KPs formalized as OWL2 ontologies; (ii) the bottom-up extraction of KPs by analyzing how data are organized in Linked Data. The two methods address the knowledge soup and boundary problems in different ways. The first method provides a solution to the two aforementioned problems that is based on a purely syntactic transformation step of the original source to RDF followed by a refactoring step whose aim is to add semantics to RDF by select meaningful RDF triples. The second method allows to draw boundaries around RDF in Linked Data by analyzing type paths. A type path is a possible route through an RDF that takes into account the types associated to the nodes of a path. Then we present K~ore, a software architecture conceived to be the basis for developing KP discovery systems and designed according to two software architectural styles, i.e, the Component-based and REST. Finally we provide an example of reuse of KP based on Aemoo, an exploratory search tool which exploits KPs for performing entity summarization.
Resumo:
Over the past ten years, the cross-correlation of long-time series of ambient seismic noise (ASN) has been widely adopted to extract the surface-wave part of the Green’s Functions (GF). This stochastic procedure relies on the assumption that ASN wave-field is diffuse and stationary. At frequencies <1Hz, the ASN is mainly composed by surface-waves, whose origin is attributed to the sea-wave climate. Consequently, marked directional properties may be observed, which call for accurate investigation about location and temporal evolution of the ASN-sources before attempting any GF retrieval. Within this general context, this thesis is aimed at a thorough investigation about feasibility and robustness of the noise-based methods toward the imaging of complex geological structures at the local (∼10-50km) scale. The study focused on the analysis of an extended (11 months) seismological data set collected at the Larderello-Travale geothermal field (Italy), an area for which the underground geological structures are well-constrained thanks to decades of geothermal exploration. Focusing on the secondary microseism band (SM;f>0.1Hz), I first investigate the spectral features and the kinematic properties of the noise wavefield using beamforming analysis, highlighting a marked variability with time and frequency. For the 0.1-0.3Hz frequency band and during Spring- Summer-time, the SMs waves propagate with high apparent velocities and from well-defined directions, likely associated with ocean-storms in the south- ern hemisphere. Conversely, at frequencies >0.3Hz the distribution of back- azimuths is more scattered, thus indicating that this frequency-band is the most appropriate for the application of stochastic techniques. For this latter frequency interval, I tested two correlation-based methods, acting in the time (NCF) and frequency (modified-SPAC) domains, respectively yielding esti- mates of the group- and phase-velocity dispersions. Velocity data provided by the two methods are markedly discordant; comparison with independent geological and geophysical constraints suggests that NCF results are more robust and reliable.
Resumo:
Big data are reshaping the way we interact with technology, thus fostering new applications to increase the safety-assessment of foods. An extraordinary amount of information is analysed using machine learning approaches aimed at detecting the existence or predicting the likelihood of future risks. Food business operators have to share the results of these analyses when applying to place on the market regulated products, whereas agri-food safety agencies (including the European Food Safety Authority) are exploring new avenues to increase the accuracy of their evaluations by processing Big data. Such an informational endowment brings with it opportunities and risks correlated to the extraction of meaningful inferences from data. However, conflicting interests and tensions among the involved entities - the industry, food safety agencies, and consumers - hinder the finding of shared methods to steer the processing of Big data in a sound, transparent and trustworthy way. A recent reform in the EU sectoral legislation, the lack of trust and the presence of a considerable number of stakeholders highlight the need of ethical contributions aimed at steering the development and the deployment of Big data applications. Moreover, Artificial Intelligence guidelines and charters published by European Union institutions and Member States have to be discussed in light of applied contexts, including the one at stake. This thesis aims to contribute to these goals by discussing what principles should be put forward when processing Big data in the context of agri-food safety-risk assessment. The research focuses on two interviewed topics - data ownership and data governance - by evaluating how the regulatory framework addresses the challenges raised by Big data analysis in these domains. The outcome of the project is a tentative Roadmap aimed to identify the principles to be observed when processing Big data in this domain and their possible implementations.
Resumo:
The increasing demand for alternatives to meat food products, which is linked to ethical and environmental reasons, highlights the necessity of using different protein sources. Plant proteins provide a valid option, thanks to the relative low costs, high availability and wide supply sources. The current process used to produce plant concentrates and isolates is the alkaline extraction followed by isoelectric precipitation. However, despite the high purity of the proteins, it presents some drawbacks. Innovative protein extraction processes are emerging, with the aim of reducing the environmental impact and the costs, as well as improving the functional properties. In this study, the traditional wet protein extraction and another simplified wet process were used to obtain protein-rich extracts out of different plants. The sources considered in the project were de-oiled sunflower and canola, chickpea, lentils, and the camelina meal, an emerging oleaginous seed interesting for its high content of omega 3. The extracts obtained from the two processes were then analysed for their capacities to hold water and fat, to form gel and a stable foam. Results highlighted strong differences concerning the protein content, yield and functionalities. The extracts obtained with the alkaline process confirmed the literature data about the four plant sources (sunflower, canola, chickpea and lentils) and allow to obtain a camelina concentrate with a protein content of 63 % and a protein recovery of 41 %. The second easiest process was not effective to obtain a protein enrichment in oleaginous sources, whereas an enrichment of 10 and 15 % was obtained in chickpea and lentils, respectively. The functional properties were also completely different: the easiest process produced protein ingredients completely water-soluble at pH 7, with a discrete foaming capacity compared to the extracts obtained with alkaline process. These characteristics could make these extracts suitable for the plant milk-analogue products.
Resumo:
The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.