2 resultados para Multiple datasets
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Several countries have acquired, over the past decades, large amounts of area covering Airborne Electromagnetic data. Contribution of airborne geophysics has dramatically increased for both groundwater resource mapping and management proving how those systems are appropriate for large-scale and efficient groundwater surveying. We start with processing and inversion of two AEM dataset from two different systems collected over the Spiritwood Valley Aquifer area, Manitoba, Canada respectively, the AeroTEM III (commissioned by the Geological Survey of Canada in 2010) and the “Full waveform VTEM” dataset, collected and tested over the same survey area, during the fall 2011. We demonstrate that in the presence of multiple datasets, either AEM and ground data, due processing, inversion, post-processing, data integration and data calibration is the proper approach capable of providing reliable and consistent resistivity models. Our approach can be of interest to many end users, ranging from Geological Surveys, Universities to Private Companies, which are often proprietary of large geophysical databases to be interpreted for geological and\or hydrogeological purposes. In this study we deeply investigate the role of integration of several complimentary types of geophysical data collected over the same survey area. We show that data integration can improve inversions, reduce ambiguity and deliver high resolution results. We further attempt to use the final, most reliable output resistivity models as a solid basis for building a knowledge-driven 3D geological voxel-based model. A voxel approach allows a quantitative understanding of the hydrogeological setting of the area, and it can be further used to estimate the aquifers volumes (i.e. potential amount of groundwater resources) as well as hydrogeological flow model prediction. In addition, we investigated the impact of an AEM dataset towards hydrogeological mapping and 3D hydrogeological modeling, comparing it to having only a ground based TEM dataset and\or to having only boreholes data.
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.