22 resultados para Machine de Boltzmann restreinte

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Quality control of medical radiological systems is of fundamental importance, and requires efficient methods for accurately determine the X-ray source spectrum. Straightforward measurements of X-ray spectra in standard operating require the limitation of the high photon flux, and therefore the measure has to be performed in a laboratory. However, the optimal quality control requires frequent in situ measurements which can be only performed using a portable system. To reduce the photon flux by 3 magnitude orders an indirect technique based on the scattering of the X-ray source beam by a solid target is used. The measured spectrum presents a lack of information because of transport and detection effects. The solution is then unfolded by solving the matrix equation that represents formally the scattering problem. However, the algebraic system is ill-conditioned and, therefore, it is not possible to obtain a satisfactory solution. Special strategies are necessary to circumvent the ill-conditioning. Numerous attempts have been done to solve this problem by using purely mathematical methods. In this thesis, a more physical point of view is adopted. The proposed method uses both the forward and the adjoint solutions of the Boltzmann transport equation to generate a better conditioned linear algebraic system. The procedure has been tested first on numerical experiments, giving excellent results. Then, the method has been verified with experimental measurements performed at the Operational Unit of Health Physics of the University of Bologna. The reconstructed spectra have been compared with the ones obtained with straightforward measurements, showing very good agreement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, a strategy to model the behavior of fluids and their interaction with deformable bodies is proposed. The fluid domain is modeled by using the lattice Boltzmann method, thus analyzing the fluid dynamics by a mesoscopic point of view. It has been proved that the solution provided by this method is equivalent to solve the Navier-Stokes equations for an incompressible flow with a second-order accuracy. Slender elastic structures idealized through beam finite elements are used. Large displacements are accounted for by using the corotational formulation. Structural dynamics is computed by using the Time Discontinuous Galerkin method. Therefore, two different solution procedures are used, one for the fluid domain and the other for the structural part, respectively. These two solvers need to communicate and to transfer each other several information, i.e. stresses, velocities, displacements. In order to guarantee a continuous, effective, and mutual exchange of information, a coupling strategy, consisting of three different algorithms, has been developed and numerically tested. In particular, the effectiveness of the three algorithms is shown in terms of interface energy artificially produced by the approximate fulfilling of compatibility and equilibrium conditions at the fluid-structure interface. The proposed coupled approach is used in order to solve different fluid-structure interaction problems, i.e. cantilever beams immersed in a viscous fluid, the impact of the hull of the ship on the marine free-surface, blood flow in a deformable vessels, and even flapping wings simulating the take-off of a butterfly. The good results achieved in each application highlight the effectiveness of the proposed methodology and of the C++ developed software to successfully approach several two-dimensional fluid-structure interaction problems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The research activity focused on the study, design and evaluation of innovative human-machine interfaces based on virtual three-dimensional environments. It is based on the brain electrical activities recorded in real time through the electrical impulses emitted by the brain waves of the user. The achieved target is to identify and sort in real time the different brain states and adapt the interface and/or stimuli to the corresponding emotional state of the user. The setup of an experimental facility based on an innovative experimental methodology for “man in the loop" simulation was established. It allowed involving during pilot training in virtually simulated flights, both pilot and flight examiner, in order to compare the subjective evaluations of this latter to the objective measurements of the brain activity of the pilot. This was done recording all the relevant information versus a time-line. Different combinations of emotional intensities obtained, led to an evaluation of the current situational awareness of the user. These results have a great implication in the current training methodology of the pilots, and its use could be extended as a tool that can improve the evaluation of a pilot/crew performance in interacting with the aircraft when performing tasks and procedures, especially in critical situations. This research also resulted in the design of an interface that adapts the control of the machine to the situation awareness of the user. The new concept worked on, aimed at improving the efficiency between a user and the interface, and gaining capacity by reducing the user’s workload and hence improving the system overall safety. This innovative research combining emotions measured through electroencephalography resulted in a human-machine interface that would have three aeronautical related applications: • An evaluation tool during the pilot training; • An input for cockpit environment; • An adaptation tool of the cockpit automation.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Le tecniche di Machine Learning sono molto utili in quanto consento di massimizzare l’utilizzo delle informazioni in tempo reale. Il metodo Random Forests può essere annoverato tra le tecniche di Machine Learning più recenti e performanti. Sfruttando le caratteristiche e le potenzialità di questo metodo, la presente tesi di dottorato affronta due casi di studio differenti; grazie ai quali è stato possibile elaborare due differenti modelli previsionali. Il primo caso di studio si è incentrato sui principali fiumi della regione Emilia-Romagna, caratterizzati da tempi di risposta molto brevi. La scelta di questi fiumi non è stata casuale: negli ultimi anni, infatti, in detti bacini si sono verificati diversi eventi di piena, in gran parte di tipo “flash flood”. Il secondo caso di studio riguarda le sezioni principali del fiume Po, dove il tempo di propagazione dell’onda di piena è maggiore rispetto ai corsi d’acqua del primo caso di studio analizzato. Partendo da una grande quantità di dati, il primo passo è stato selezionare e definire i dati in ingresso in funzione degli obiettivi da raggiungere, per entrambi i casi studio. Per l’elaborazione del modello relativo ai fiumi dell’Emilia-Romagna, sono stati presi in considerazione esclusivamente i dati osservati; a differenza del bacino del fiume Po in cui ai dati osservati sono stati affiancati anche i dati di previsione provenienti dalla catena modellistica Mike11 NAM/HD. Sfruttando una delle principali caratteristiche del metodo Random Forests, è stata stimata una probabilità di accadimento: questo aspetto è fondamentale sia nella fase tecnica che in fase decisionale per qualsiasi attività di intervento di protezione civile. L'elaborazione dei dati e i dati sviluppati sono stati effettuati in ambiente R. Al termine della fase di validazione, gli incoraggianti risultati ottenuti hanno permesso di inserire il modello sviluppato nel primo caso studio all’interno dell’architettura operativa di FEWS.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clinical and omics data are a promising field of application for machine learning techniques even though these methods are not yet systematically adopted in healthcare institutions. Despite artificial intelligence has proved successful in terms of prediction of pathologies or identification of their causes, the systematic adoption of these techniques still presents challenging issues due to the peculiarities of the analysed data. The aim of this thesis is to apply machine learning algorithms to both clinical and omics data sets in order to predict a patient's state of health and get better insights on the possible causes of the analysed diseases. In doing so, many of the arising issues when working with medical data will be discussed while possible solutions will be proposed to make machine learning provide feasible results and possibly become an effective and reliable support tool for healthcare systems.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last decade, manufacturing companies have been facing two significant challenges. First, digitalization imposes adopting Industry 4.0 technologies and allows creating smart, connected, self-aware, and self-predictive factories. Second, the attention on sustainability imposes to evaluate and reduce the impact of the implemented solutions from economic and social points of view. In manufacturing companies, the maintenance of physical assets assumes a critical role. Increasing the reliability and the availability of production systems leads to the minimization of systems’ downtimes; In addition, the proper system functioning avoids production wastes and potentially catastrophic accidents. Digitalization and new ICT technologies have assumed a relevant role in maintenance strategies. They allow assessing the health condition of machinery at any point in time. Moreover, they allow predicting the future behavior of machinery so that maintenance interventions can be planned, and the useful life of components can be exploited until the time instant before their fault. This dissertation provides insights on Predictive Maintenance goals and tools in Industry 4.0 and proposes a novel data acquisition, processing, sharing, and storage framework that addresses typical issues machine producers and users encounter. The research elaborates on two research questions that narrow down the potential approaches to data acquisition, processing, and analysis for fault diagnostics in evolving environments. The research activity is developed according to a research framework, where the research questions are addressed by research levers that are explored according to research topics. Each topic requires a specific set of methods and approaches; however, the overarching methodological approach presented in this dissertation includes three fundamental aspects: the maximization of the quality level of input data, the use of Machine Learning methods for data analysis, and the use of case studies deriving from both controlled environments (laboratory) and real-world instances.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Besides increasing the share of electric and hybrid vehicles, in order to comply with more stringent environmental protection limitations, in the mid-term the auto industry must improve the efficiency of the internal combustion engine and the well to wheel efficiency of the employed fuel. To achieve this target, a deeper knowledge of the phenomena that influence the mixture formation and the chemical reactions involving new synthetic fuel components is mandatory, but complex and time intensive to perform purely by experimentation. Therefore, numerical simulations play an important role in this development process, but their use can be effective only if they can be considered accurate enough to capture these variations. The most relevant models necessary for the simulation of the reacting mixture formation and successive chemical reactions have been investigated in the present work, with a critical approach, in order to provide instruments to define the most suitable approaches also in the industrial context, which is limited by time constraints and budget evaluations. To overcome these limitations, new methodologies have been developed to conjugate detailed and simplified modelling techniques for the phenomena involving chemical reactions and mixture formation in non-traditional conditions (e.g. water injection, biofuels etc.). Thanks to the large use of machine learning and deep learning algorithms, several applications have been revised or implemented, with the target of reducing the computing time of some traditional tasks by orders of magnitude. Finally, a complete workflow leveraging these new models has been defined and used for evaluating the effects of different surrogate formulations of the same experimental fuel on a proof-of-concept GDI engine model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its declining costs and the development of new platforms that help clinicians in the analysis and interpretation of SNV and InDels. However, we still know very little on how CNV detection could increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, all showing good performances towards specific CNV classes and sizes, suggesting that the combination of multiple tools is needed to obtain an overall good detection performance. Here we present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2 Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X alignments to construct our dataset and train our model, make predictions on autosomes target regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that our algorithm outperformed them in terms of stability, as we identified both deletions and duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the minimum resolution of 2 target regions. We also evaluated the method robustness using a set of WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic setting.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Three-Dimensional Single-Bin-Size Bin Packing Problem is one of the most studied problem in the Cutting & Packing category. From a strictly mathematical point of view, it consists of packing a finite set of strongly heterogeneous “small” boxes, called items, into a finite set of identical “large” rectangles, called bins, minimizing the unused volume and requiring that the items are packed without overlapping. The great interest is mainly due to the number of real-world applications in which it arises, such as pallet and container loading, cutting objects out of a piece of material and packaging design. Depending on these real-world applications, more objective functions and more practical constraints could be needed. After a brief discussion about the real-world applications of the problem and a exhaustive literature review, the design of a two-stage algorithm to solve the aforementioned problem is presented. The algorithm must be able to provide the spatial coordinates of the placed boxes vertices and also the optimal boxes input sequence, while guaranteeing geometric, stability, fragility constraints and a reduced computational time. Due to NP-hard complexity of this type of combinatorial problems, a fusion of metaheuristic and machine learning techniques is adopted. In particular, a hybrid genetic algorithm coupled with a feedforward neural network is used. In the first stage, a rich dataset is created starting from a set of real input instances provided by an industrial company and the feedforward neural network is trained on it. After its training, given a new input instance, the hybrid genetic algorithm is able to run using the neural network output as input parameter vector, providing as output the optimal solution. The effectiveness of the proposed works is confirmed via several experimental tests.