5 resultados para embeddings

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Social interactions have been the focus of social science research for a century, but their study has recently been revolutionized by novel data sources and by methods from computer science, network science, and complex systems science. The study of social interactions is crucial for understanding complex societal behaviours. Social interactions are naturally represented as networks, which have emerged as a unifying mathematical language to understand structural and dynamical aspects of socio-technical systems. Networks are, however, highly dimensional objects, especially when considering the scales of real-world systems and the need to model the temporal dimension. Hence the study of empirical data from social systems is challenging both from a conceptual and a computational standpoint. A possible approach to tackling such a challenge is to use dimensionality reduction techniques that represent network entities in a low-dimensional feature space, preserving some desired properties of the original data. Low-dimensional vector space representations, also known as network embeddings, have been extensively studied, also as a way to feed network data to machine learning algorithms. Network embeddings were initially developed for static networks and then extended to incorporate temporal network data. We focus on dimensionality reduction techniques for time-resolved social interaction data modelled as temporal networks. We introduce a novel embedding technique that models the temporal and structural similarities of events rather than nodes. Using empirical data on social interactions, we show that this representation captures information relevant for the study of dynamical processes unfolding over the network, such as epidemic spreading. We then turn to another large-scale dataset on social interactions: a popular Web-based crowdfunding platform. We show that tensor-based representations of the data and dimensionality reduction techniques such as tensor factorization allow us to uncover the structural and temporal aspects of the system and to relate them to geographic and temporal activity patterns.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Our objective in this thesis is to study the pseudo-metric and topological structure of the space of group equivariant non-expansive operators (GENEOs). We introduce the notions of compactification of a perception pair, collectionwise surjectivity, and compactification of a space of GENEOs. We obtain some compactification results for perception pairs and the space of GENEOs. We show that when the data spaces are totally bounded and endow the common domains with metric structures, the perception pairs and every collectionwise surjective space of GENEOs can be embedded isometrically into the compact ones through compatible embeddings. An important part of the study of topology of the space of GENEOs is to populate it in a rich manner. We introduce the notion of a generalized permutant and show that this concept too, like that of a permutant, is useful in defining new GENEOs. We define the analogues of some of the aforementioned concepts in a graph theoretic setting, enabling us to use the power of the theory of GENEOs for the study of graphs in an efficient way. We define the notions of a graph perception pair, graph permutant, and a graph GENEO. We develop two models for the theory of graph GENEOs. The first model addresses the case of graphs having weights assigned to their vertices, while the second one addresses weighted on the edges. We prove some new results in the proposed theory of graph GENEOs and exhibit the power of our models by describing their applications to the structural study of simple graphs. We introduce the concept of a graph permutant and show that this concept can be used to define new graph GENEOs between distinct graph perception pairs, thereby enabling us to populate the space of graph GENEOs in a rich manner and shed more light on its structure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The recent widespread use of social media platforms and web services has led to a vast amount of behavioral data that can be used to model socio-technical systems. A significant part of this data can be represented as graphs or networks, which have become the prevalent mathematical framework for studying the structure and the dynamics of complex interacting systems. However, analyzing and understanding these data presents new challenges due to their increasing complexity and diversity. For instance, the characterization of real-world networks includes the need of accounting for their temporal dimension, together with incorporating higher-order interactions beyond the traditional pairwise formalism. The ongoing growth of AI has led to the integration of traditional graph mining techniques with representation learning and low-dimensional embeddings of networks to address current challenges. These methods capture the underlying similarities and geometry of graph-shaped data, generating latent representations that enable the resolution of various tasks, such as link prediction, node classification, and graph clustering. As these techniques gain popularity, there is even a growing concern about their responsible use. In particular, there has been an increased emphasis on addressing the limitations of interpretability in graph representation learning. This thesis contributes to the advancement of knowledge in the field of graph representation learning and has potential applications in a wide range of complex systems domains. We initially focus on forecasting problems related to face-to-face contact networks with time-varying graph embeddings. Then, we study hyperedge prediction and reconstruction with simplicial complex embeddings. Finally, we analyze the problem of interpreting latent dimensions in node embeddings for graphs. The proposed models are extensively evaluated in multiple experimental settings and the results demonstrate their effectiveness and reliability, achieving state-of-the-art performances and providing valuable insights into the properties of the learned representations.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This thesis investigates how individuals can develop, exercise, and maintain autonomy and freedom in the presence of information technology. It is particularly interested in how information technology can impose autonomy constraints. The first part identifies a problem with current autonomy discourse: There is no agreed upon object of reference when bemoaning loss of or risk to an individual’s autonomy. Here, thesis introduces a pragmatic conceptual framework to classify autonomy constraints. In essence, the proposed framework divides autonomy in three categories: intrinsic autonomy, relational autonomy and informational autonomy. The second part of the thesis investigates the role of information technology in enabling and facilitating autonomy constraints. The analysis identifies eleven characteristics of information technology, as it is embedded in society, so-called vectors of influence, that constitute risk to an individual’s autonomy in a substantial way. These vectors are assigned to three sets that correspond to the general sphere of the information transfer process to which they can be attributed to, namely domain-specific vectors, agent-specific vectors and information recipient-specific vectors. The third part of the thesis investigates selected ethical and legal implications of autonomy constraints imposed by information technology. It shows the utility of the theoretical frameworks introduced earlier in the thesis when conducting an ethical analysis of autonomy-constraining technology. It also traces the concept of autonomy in the European Data Lawsand investigates the impact of cultural embeddings of individuals on efforts to safeguard autonomy, showing intercultural flashpoints of autonomy differences. In view of this, the thesis approaches the exercise and constraint of autonomy in presence of information technology systems holistically. It contributes to establish a common understanding of (intuitive) terminology and concepts, connects this to current phenomena arising out of ever-increasing interconnectivity and computational power and helps operationalize the protection of autonomy through application of the proposed frameworks.