2 resultados para Semantic search
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.
Resumo:
Personal archives are the archives created by individuals for their own purposes. Among these are the library and documentary collections of writers and scholars. It is only recently that archival literature has begun to focus on this category of archives, emphasising how their heterogeneous nature necessitates the conciliation of different approaches to archival description, and calling for a broader understanding of the principle of provenance, recognising that multiple creators, including subsequent researchers, can contribute to shaping personal archives over time by adding new layers of contexts. Despite these advances in the theoretical debate, current architectures for archival representation remain behind. Finding aids privilege a single point of view and do not allow subsequent users to embed their own, potentially conflicting, readings. Using semantic web technologies this study aims to define a conceptual model for writers' archives based on existing and widely adopted models in the cultural heritage and humanities domains. The model developed can be used to represent different types of documents at various levels of analysis, as well as record content and components. It also enables the representation of complex relationships and the incorporation of additional layers of interpretation into the finding aid, transforming it from a static search tool into a dynamic research platform. The personal archive and library of Giuseppe Raimondi serves as a case study for the creation of an archival knowledge base using the proposed conceptual model. By querying the knowledge graph through SPARQL, the effectiveness of the model is evaluated. The results demonstrate that the model addresses the primary representation challenges identified in archival literature, from both a technological and methodological standpoint. The ultimate goal is to bring the output par excellence of archival science, i.e. the finding aid, more in line with the latest developments in archival thinking.