10 resultados para classification aided by clustering
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.
Resumo:
Sociology of work in Italy revived at the end of WWII, after thirty years of forced oblivion. This thesis examines the history of discipline by considering three paths that it followed from its revival up to its institutionalization: the influence of the productivity drive, the role of trade unions and the activity of early young researchers. European Productivity Agency's Italian office Comitato Nazionale per la Produttività propagandised studies on management and on the effects of the industrialization on work and society. Academicians, technicians, psychologists who worked for CNP started rethinking sociology of work, but the managerial use of sociology was unacceptable for both trade unions and young researchers. So âfree unionâ CISL created a School in Florence with an eager attention to social sciences as a medium to become a new model union, while Marxist CGIL, despite its ideological aversion to sociology, finally accepted the social sciences lexicon in order to explain the work changes and to resist against the employers' association offensive. On the other hand, political and social engagement led a first generation of sociologists to study social phenomenon in the recently industrialized Italy by using the sociological analysis. Finally, the thesis investigate the cultural transfers from France, whose industrial sociology (sociologie du travail) was considered as a reference in continental Europe. Nearby the wide importance of French sociologie, financially aided by planning institutions in order to employ it in the industrial reconstruction, other minor experiences such as the social surveys accomplished by worker-priests in the suburbs of industrial cities and the heterodox Marxism of the review âSocialisme ou Barbarieâ influenced Italian sociology of work.
Resumo:
The high quality of protected designation of origin (PDO) dry-cured pork products depends largely on the chemical and physical parameters of the fresh meat and their variation during the production process of the final product. The discovery of the mechanisms that regulate the variability of these parameters was aided by the reference genome of swine adjuvant to genetic analysis methods. This thesis can contribute to the discovery of genetic mechanisms that regulate the variability of some quality parameters of fresh meat for PDO dry-cured pork production. The first study is of gene expression and showed that between low and high glycolytic potential (GP) samples of Semimembranosus muscle of Italian Large White (ILW) pigs in early postmortem, the differentially expressed genes were all but one over expressed in low GP. These were involved in ATP biosynthesis processes, calcium homeostasis, and lipid metabolism including the potential master regulator gene Peroxisome Proliferator-Activated Receptor Alpha (PPARA). The second is a study in commercial hybrid pigs to evaluate correlations between carcass and fresh ham traits, including carcass and fresh ham lean meat percentages, the former, a potential predictor of the latter. In addition, a genome-wide association study allowed the identification of chromosome-wide associations with phenotypic traits for 19 SNPs, and genome-wide associations for 14 SNPs for ferrochelatase activity. The latter could be a determinant for color variation in nitrite-free dry-cured ham. The third study showed gene expression differences in the Longissimus thoracis muscle of ILW pigs by feeding diets with extruded linseed (source of polyunsaturated fatty acids) and vitamin E and selenium (diet three) or natural (diet four) antioxidants. The diet three promoted a more rapid and massive immune system response possibly determined by improvement in muscle tissue function, while the diet four promoted oxidative stability and increased the anti-inflammatory potential of muscle tissue.
Resumo:
В данной работе рассматривается лексикализация события движения в русском языке в сопоставлении с итальянским. Цель нашей работы двойная: с одной стороны, мы рассмотрим пространственную семантику выбранных нами глагольных префиксов и определим их семантический вклад в лексикализацию события движения. С другой стороны, мы проанализируем соответствия русских приставочных глаголов движения при переводе на итальянский язык. В частности, мы сосредоточимся на том, выражается ли вклад префикса, передается ли он полностью или частично, какие нюансы его пространственной семантики могут опускаться, а какие выражаются обязательно и какими языковыми средствами. Работа состоит из введения, трех глав и заключения. В Первой главе представляется теоретическая рамка, на которую опирается сопоставительный анализ. Рассматриваются понятия движения и перемещения согласно семантическим толкованиям, приведенным как в русскоязычной литературе, так и в работах на других языках. Кроме того, концептуализация пространства описывается в русле когнитивного подхода к изучению языка. Представлена классификация языков по лексикализации события движения, введенной Л. Талми, а также основные последующие исследования, посвященные лексикализации события движения в различных языках, проведенные в русле типологического подхода. Отдельный параграф первой главы посвящается вкладу исследований, проведенных Д. Слобиным в области лексикализации компонентов движения, в частности, способа движения в различных языках. Во Второй главе описываются система бесприставочных глаголов движения в русском языке и основные подходы к их изучению. Регулярно проводятся параллели с итальянской системой глаголов движения. Далее в этой главе представлен обзор системы приставочных глаголов движения русского языка. Отдельно мы рассматриваем главные подходы к изучению семантики глагольных префиксов, фокусируясь на их пространственных значениях. В Третьей главе представляется подбор глагольных префиксов с пространственной семантикой, выбранных для целей сопоставительного анализа. Для каждого префикса предлагается словарное толкование, описывается его пространственная семантика согласно концепциям, разработанным различными авторами, и проводится анализ контекстов употребления приставочных глаголов в русском языке и возможные стратегии их передачи на итальянский язык. Выводы изложены в заключении, прилагается также список литературы.
Resumo:
The recent widespread use of social media platforms and web services has led to a vast amount of behavioral data that can be used to model socio-technical systems. A significant part of this data can be represented as graphs or networks, which have become the prevalent mathematical framework for studying the structure and the dynamics of complex interacting systems. However, analyzing and understanding these data presents new challenges due to their increasing complexity and diversity. For instance, the characterization of real-world networks includes the need of accounting for their temporal dimension, together with incorporating higher-order interactions beyond the traditional pairwise formalism. The ongoing growth of AI has led to the integration of traditional graph mining techniques with representation learning and low-dimensional embeddings of networks to address current challenges. These methods capture the underlying similarities and geometry of graph-shaped data, generating latent representations that enable the resolution of various tasks, such as link prediction, node classification, and graph clustering. As these techniques gain popularity, there is even a growing concern about their responsible use. In particular, there has been an increased emphasis on addressing the limitations of interpretability in graph representation learning. This thesis contributes to the advancement of knowledge in the field of graph representation learning and has potential applications in a wide range of complex systems domains. We initially focus on forecasting problems related to face-to-face contact networks with time-varying graph embeddings. Then, we study hyperedge prediction and reconstruction with simplicial complex embeddings. Finally, we analyze the problem of interpreting latent dimensions in node embeddings for graphs. The proposed models are extensively evaluated in multiple experimental settings and the results demonstrate their effectiveness and reliability, achieving state-of-the-art performances and providing valuable insights into the properties of the learned representations.
Resumo:
The present work proposes a method based on CLV (Clustering around Latent Variables) for identifying groups of consumers in L-shape data. This kind of datastructure is very common in consumer studies where a panel of consumers is asked to assess the global liking of a certain number of products and then, preference scores are arranged in a two-way table Y. External information on both products (physicalchemical description or sensory attributes) and consumers (socio-demographic background, purchase behaviours or consumption habits) may be available in a row descriptor matrix X and in a column descriptor matrix Z respectively. The aim of this method is to automatically provide a consumer segmentation where all the three matrices play an active role in the classification, getting homogeneous groups from all points of view: preference, products and consumer characteristics. The proposed clustering method is illustrated on data from preference studies on food products: juices based on berry fruits and traditional cheeses from Trentino. The hedonic ratings given by the consumer panel on the products under study were explained with respect to the product chemical compounds, sensory evaluation and consumer socio-demographic information, purchase behaviour and consumption habits.
Resumo:
The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.
Resumo:
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
Resumo:
Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.
Resumo:
Intelligent Transport Systems (ITS) consists in the application of ICT to transport to offer new and improved services to the mobility of people and freights. While using ITS, travellers produce large quantities of data that can be collected and analysed to study their behaviour and to provide information to decision makers and planners. The thesis proposes innovative deployments of classification algorithms for Intelligent Transport System with the aim to support the decisions on traffic rerouting, bus transport demand and behaviour of two wheelers vehicles. The first part of this work provides an overview and a classification of a selection of clustering algorithms that can be implemented for the analysis of ITS data. The first contribution of this thesis is an innovative use of the agglomerative hierarchical clustering algorithm to classify similar travels in terms of their origin and destination, together with the proposal for a methodology to analyse drivers’ route choice behaviour using GPS coordinates and optimal alternatives. The clusters of repetitive travels made by a sample of drivers are then analysed to compare observed route choices to the modelled alternatives. The results of the analysis show that drivers select routes that are more reliable but that are more expensive in terms of travel time. Successively, different types of users of a service that provides information on the real time arrivals of bus at stop are classified using Support Vector Machines. The results shows that the results of the classification of different types of bus transport users can be used to update or complement the census on bus transport flows. Finally, the problem of the classification of accidents made by two wheelers vehicles is presented together with possible future application of clustering methodologies aimed at identifying and classifying the different types of accidents.