9 resultados para Structure learning

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The wide use of e-technologies represents a great opportunity for underserved segments of the population, especially with the aim of reintegrating excluded individuals back into society through education. This is particularly true for people with different types of disabilities who may have difficulties while attending traditional on-site learning programs that are typically based on printed learning resources. The creation and provision of accessible e-learning contents may therefore become a key factor in enabling people with different access needs to enjoy quality learning experiences and services. Another e-learning challenge is represented by m-learning (which stands for mobile learning), which is emerging as a consequence of mobile terminals diffusion and provides the opportunity to browse didactical materials everywhere, outside places that are traditionally devoted to education. Both such situations share the need to access materials in limited conditions and collide with the growing use of rich media in didactical contents, which are designed to be enjoyed without any restriction. Nowadays, Web-based teaching makes great use of multimedia technologies, ranging from Flash animations to prerecorded video-lectures. Rich media in e-learning can offer significant potential in enhancing the learning environment, through helping to increase access to education, enhance the learning experience and support multiple learning styles. Moreover, they can often be used to improve the structure of Web-based courses. These highly variegated and structured contents may significantly improve the quality and the effectiveness of educational activities for learners. For example, rich media contents allow us to describe complex concepts and process flows. Audio and video elements may be utilized to add a “human touch” to distance-learning courses. Finally, real lectures may be recorded and distributed to integrate or enrich on line materials. A confirmation of the advantages of these approaches can be seen in the exponential growth of video-lecture availability on the net, due to the ease of recording and delivering activities which take place in a traditional classroom. Furthermore, the wide use of assistive technologies for learners with disabilities injects new life into e-learning systems. E-learning allows distance and flexible educational activities, thus helping disabled learners to access resources which would otherwise present significant barriers for them. For instance, students with visual impairments have difficulties in reading traditional visual materials, deaf learners have trouble in following traditional (spoken) lectures, people with motion disabilities have problems in attending on-site programs. As already mentioned, the use of wireless technologies and pervasive computing may really enhance the educational learner experience by offering mobile e-learning services that can be accessed by handheld devices. This new paradigm of educational content distribution maximizes the benefits for learners since it enables users to overcome constraints imposed by the surrounding environment. While certainly helpful for users without disabilities, we believe that the use of newmobile technologies may also become a fundamental tool for impaired learners, since it frees them from sitting in front of a PC. In this way, educational activities can be enjoyed by all the users, without hindrance, thus increasing the social inclusion of non-typical learners. While the provision of fully accessible and portable video-lectures may be extremely useful for students, it is widely recognized that structuring and managing rich media contents for mobile learning services are complex and expensive tasks. Indeed, major difficulties originate from the basic need to provide a textual equivalent for each media resource composing a rich media Learning Object (LO). Moreover, tests need to be carried out to establish whether a given LO is fully accessible to all kinds of learners. Unfortunately, both these tasks are truly time-consuming processes, depending on the type of contents the teacher is writing and on the authoring tool he/she is using. Due to these difficulties, online LOs are often distributed as partially accessible or totally inaccessible content. Bearing this in mind, this thesis aims to discuss the key issues of a system we have developed to deliver accessible, customized or nomadic learning experiences to learners with different access needs and skills. To reduce the risk of excluding users with particular access capabilities, our system exploits Learning Objects (LOs) which are dynamically adapted and transcoded based on the specific needs of non-typical users and on the barriers that they can encounter in the environment. The basic idea is to dynamically adapt contents, by selecting them from a set of media resources packaged in SCORM-compliant LOs and stored in a self-adapting format. The system schedules and orchestrates a set of transcoding processes based on specific learner needs, so as to produce a customized LO that can be fully enjoyed by any (impaired or mobile) student.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The Standard Model (SM) of particle physics predicts the existence of a Higgs field responsible for the generation of particles' mass. However, some aspects of this theory remain unsolved, supposing the presence of new physics Beyond the Standard Model (BSM) with the production of new particles at a higher energy scale compared to the current experimental limits. The search for additional Higgs bosons is, in fact, predicted by theoretical extensions of the SM including the Minimal Supersymmetry Standard Model (MSSM). In the MSSM, the Higgs sector consists of two Higgs doublets, resulting in five physical Higgs particles: two charged bosons $H^{\pm}$, two neutral scalars $h$ and $H$, and one pseudoscalar $A$. The work presented in this thesis is dedicated to the search of neutral non-Standard Model Higgs bosons decaying to two muons in the model independent MSSM scenario. Proton-proton collision data recorded by the CMS experiment at the CERN LHC at a center-of-mass energy of 13 TeV are used, corresponding to an integrated luminosity of $35.9\ \text{fb}^{-1}$. Such search is sensitive to neutral Higgs bosons produced either via gluon fusion process or in association with a $\text{b}\bar{\text{b}}$ quark pair. The extensive usage of Machine and Deep Learning techniques is a fundamental element in the discrimination between signal and background simulated events. A new network structure called parameterised Neural Network (pNN) has been implemented, replacing a whole set of single neural networks trained at a specific mass hypothesis value with a single neural network able to generalise well and interpolate in the entire mass range considered. The results of the pNN signal/background discrimination are used to set a model independent 95\% confidence level expected upper limit on the production cross section times branching ratio, for a generic $\phi$ boson decaying into a muon pair in the 130 to 1000 GeV range.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This dissertation contributes to the scholarly debate on temporary teams by exploring team interactions and boundaries.The fundamental challenge in temporary teams originates from temporary participation in the teams. First, as participants join the team for a short period of time, there is not enough time to build trust, share understanding, and have effective interactions. Consequently, team outputs and practices built on team interactions become vulnerable. Secondly, as team participants move on and off the teams, teams’ boundaries become blurred over time. It leads to uncertainty among team participants and leaders about who is/is not identified as a team member causing collective disagreement within the team. Focusing on the above mentioned challenges, we conducted this research in healthcare organisations since the use of temporary teams in healthcare and hospital setting is prevalent. In particular, we focused on orthopaedic teams that provide personalised treatments for patients using 3D printing technology. Qualitative and quantitative data were collected using interviews, observations, questionnaires and archival data at Rizzoli Orthopaedic Institute, Bologna, Italy. This study provides the following research outputs. The first is a conceptual study that explores temporary teams’ literature using bibliometric analysis and systematic literature review to highlight research gaps. The second paper qualitatively studies temporary relationships within the teams by collecting data using group interviews and observations. The results highlighted the role of short-term dyadic relationships as a ground to share and transfer knowledge at the team level. Moreover, hierarchical structure of the teams facilitates knowledge sharing by supporting dyadic relationships within and beyond the team meetings. The third paper investigates impact of blurred boundaries on temporary teams’ performance. Using quantitative data collected through questionnaires and archival data, we concluded that boundary blurring in terms of fluidity, overlap and dispersion differently impacts team performance at high and low levels of task complexity.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The development of Next Generation Sequencing promotes Biology in the Big Data era. The ever-increasing gap between proteins with known sequences and those with a complete functional annotation requires computational methods for automatic structure and functional annotation. My research has been focusing on proteins and led so far to the development of three novel tools, DeepREx, E-SNPs&GO and ISPRED-SEQ, based on Machine and Deep Learning approaches. DeepREx computes the solvent exposure of residues in a protein chain. This problem is relevant for the definition of structural constraints regarding the possible folding of the protein. DeepREx exploits Long Short-Term Memory layers to capture residue-level interactions between positions distant in the sequence, achieving state-of-the-art performances. With DeepRex, I conducted a large-scale analysis investigating the relationship between solvent exposure of a residue and its probability to be pathogenic upon mutation. E-SNPs&GO predicts the pathogenicity of a Single Residue Variation. Variations occurring on a protein sequence can have different effects, possibly leading to the onset of diseases. E-SNPs&GO exploits protein embeddings generated by two novel Protein Language Models (PLMs), as well as a new way of representing functional information coming from the Gene Ontology. The method achieves state-of-the-art performances and is extremely time-efficient when compared to traditional approaches. ISPRED-SEQ predicts the presence of Protein-Protein Interaction sites in a protein sequence. Knowing how a protein interacts with other molecules is crucial for accurate functional characterization. ISPRED-SEQ exploits a convolutional layer to parse local context after embedding the protein sequence with two novel PLMs, greatly surpassing the current state-of-the-art. All methods are published in international journals and are available as user-friendly web servers. They have been developed keeping in mind standard guidelines for FAIRness (FAIR: Findable, Accessible, Interoperable, Reusable) and are integrated into the public collection of tools provided by ELIXIR, the European infrastructure for Bioinformatics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent widespread use of social media platforms and web services has led to a vast amount of behavioral data that can be used to model socio-technical systems. A significant part of this data can be represented as graphs or networks, which have become the prevalent mathematical framework for studying the structure and the dynamics of complex interacting systems. However, analyzing and understanding these data presents new challenges due to their increasing complexity and diversity. For instance, the characterization of real-world networks includes the need of accounting for their temporal dimension, together with incorporating higher-order interactions beyond the traditional pairwise formalism. The ongoing growth of AI has led to the integration of traditional graph mining techniques with representation learning and low-dimensional embeddings of networks to address current challenges. These methods capture the underlying similarities and geometry of graph-shaped data, generating latent representations that enable the resolution of various tasks, such as link prediction, node classification, and graph clustering. As these techniques gain popularity, there is even a growing concern about their responsible use. In particular, there has been an increased emphasis on addressing the limitations of interpretability in graph representation learning. This thesis contributes to the advancement of knowledge in the field of graph representation learning and has potential applications in a wide range of complex systems domains. We initially focus on forecasting problems related to face-to-face contact networks with time-varying graph embeddings. Then, we study hyperedge prediction and reconstruction with simplicial complex embeddings. Finally, we analyze the problem of interpreting latent dimensions in node embeddings for graphs. The proposed models are extensively evaluated in multiple experimental settings and the results demonstrate their effectiveness and reliability, achieving state-of-the-art performances and providing valuable insights into the properties of the learned representations.