887 resultados para brush machine


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Besides increasing the share of electric and hybrid vehicles, in order to comply with more stringent environmental protection limitations, in the mid-term the auto industry must improve the efficiency of the internal combustion engine and the well to wheel efficiency of the employed fuel. To achieve this target, a deeper knowledge of the phenomena that influence the mixture formation and the chemical reactions involving new synthetic fuel components is mandatory, but complex and time intensive to perform purely by experimentation. Therefore, numerical simulations play an important role in this development process, but their use can be effective only if they can be considered accurate enough to capture these variations. The most relevant models necessary for the simulation of the reacting mixture formation and successive chemical reactions have been investigated in the present work, with a critical approach, in order to provide instruments to define the most suitable approaches also in the industrial context, which is limited by time constraints and budget evaluations. To overcome these limitations, new methodologies have been developed to conjugate detailed and simplified modelling techniques for the phenomena involving chemical reactions and mixture formation in non-traditional conditions (e.g. water injection, biofuels etc.). Thanks to the large use of machine learning and deep learning algorithms, several applications have been revised or implemented, with the target of reducing the computing time of some traditional tasks by orders of magnitude. Finally, a complete workflow leveraging these new models has been defined and used for evaluating the effects of different surrogate formulations of the same experimental fuel on a proof-of-concept GDI engine model.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The advent of omic data production has opened many new perspectives in the quest for modelling complexity in biophysical systems. With the capability of characterizing a complex organism through the patterns of its molecular states, observed at different levels through various omics, a new paradigm of investigation is arising. In this thesis, we investigate the links between perturbations of the human organism, described as the ensemble of crosstalk of its molecular states, and health. Machine learning plays a key role within this picture, both in omic data analysis and model building. We propose and discuss different frameworks developed by the author using machine learning for data reduction, integration, projection on latent features, pattern analysis, classification and clustering of omic data, with a focus on 1H NMR metabolomic spectral data. The aim is to link different levels of omic observations of molecular states, from nanoscale to macroscale, to study perturbations such as diseases and diet interpreted as changes in molecular patterns. The first part of this work focuses on the fingerprinting of diseases, linking cellular and systemic metabolomics with genomic to asses and predict the downstream of perturbations all the way down to the enzymatic network. The second part is a set of frameworks and models, developed with 1H NMR metabolomic at its core, to study the exposure of the human organism to diet and food intake in its full complexity, from epidemiological data analysis to molecular characterization of food structure.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its declining costs and the development of new platforms that help clinicians in the analysis and interpretation of SNV and InDels. However, we still know very little on how CNV detection could increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, all showing good performances towards specific CNV classes and sizes, suggesting that the combination of multiple tools is needed to obtain an overall good detection performance. Here we present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2 Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X alignments to construct our dataset and train our model, make predictions on autosomes target regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that our algorithm outperformed them in terms of stability, as we identified both deletions and duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the minimum resolution of 2 target regions. We also evaluated the method robustness using a set of WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic setting.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Three-Dimensional Single-Bin-Size Bin Packing Problem is one of the most studied problem in the Cutting & Packing category. From a strictly mathematical point of view, it consists of packing a finite set of strongly heterogeneous “small” boxes, called items, into a finite set of identical “large” rectangles, called bins, minimizing the unused volume and requiring that the items are packed without overlapping. The great interest is mainly due to the number of real-world applications in which it arises, such as pallet and container loading, cutting objects out of a piece of material and packaging design. Depending on these real-world applications, more objective functions and more practical constraints could be needed. After a brief discussion about the real-world applications of the problem and a exhaustive literature review, the design of a two-stage algorithm to solve the aforementioned problem is presented. The algorithm must be able to provide the spatial coordinates of the placed boxes vertices and also the optimal boxes input sequence, while guaranteeing geometric, stability, fragility constraints and a reduced computational time. Due to NP-hard complexity of this type of combinatorial problems, a fusion of metaheuristic and machine learning techniques is adopted. In particular, a hybrid genetic algorithm coupled with a feedforward neural network is used. In the first stage, a rich dataset is created starting from a set of real input instances provided by an industrial company and the feedforward neural network is trained on it. After its training, given a new input instance, the hybrid genetic algorithm is able to run using the neural network output as input parameter vector, providing as output the optimal solution. The effectiveness of the proposed works is confirmed via several experimental tests.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine (and deep) learning technologies are more and more present in several fields. It is undeniable that many aspects of our society are empowered by such technologies: web searches, content filtering on social networks, recommendations on e-commerce websites, mobile applications, etc., in addition to academic research. Moreover, mobile devices and internet sites, e.g., social networks, support the collection and sharing of information in real time. The pervasive deployment of the aforementioned technological instruments, both hardware and software, has led to the production of huge amounts of data. Such data has become more and more unmanageable, posing challenges to conventional computing platforms, and paving the way to the development and widespread use of the machine and deep learning. Nevertheless, machine learning is not only a technology. Given a task, machine learning is a way of proceeding (a way of thinking), and as such can be approached from different perspectives (points of view). This, in particular, will be the focus of this research. The entire work concentrates on machine learning, starting from different sources of data, e.g., signals and images, applied to different domains, e.g., Sport Science and Social History, and analyzed from different perspectives: from a non-data scientist point of view through tools and platforms; setting a problem stage from scratch; implementing an effective application for classification tasks; improving user interface experience through Data Visualization and eXtended Reality. In essence, not only in a quantitative task, not only in a scientific environment, and not only from a data-scientist perspective, machine (and deep) learning can do the difference.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In medicine, innovation depends on a better knowledge of the human body mechanism, which represents a complex system of multi-scale constituents. Unraveling the complexity underneath diseases proves to be challenging. A deep understanding of the inner workings comes with dealing with many heterogeneous information. Exploring the molecular status and the organization of genes, proteins, metabolites provides insights on what is driving a disease, from aggressiveness to curability. Molecular constituents, however, are only the building blocks of the human body and cannot currently tell the whole story of diseases. This is why nowadays attention is growing towards the contemporary exploitation of multi-scale information. Holistic methods are then drawing interest to address the problem of integrating heterogeneous data. The heterogeneity may derive from the diversity across data types and from the diversity within diseases. Here, four studies conducted data integration using customly designed workflows that implement novel methods and views to tackle the heterogeneous characterization of diseases. The first study devoted to determine shared gene regulatory signatures for onco-hematology and it showed partial co-regulation across blood-related diseases. The second study focused on Acute Myeloid Leukemia and refined the unsupervised integration of genomic alterations, which turned out to better resemble clinical practice. In the third study, network integration for artherosclerosis demonstrated, as a proof of concept, the impact of network intelligibility when it comes to model heterogeneous data, which showed to accelerate the identification of new potential pharmaceutical targets. Lastly, the fourth study introduced a new method to integrate multiple data types in a unique latent heterogeneous-representation that facilitated the selection of important data types to predict the tumour stage of invasive ductal carcinoma. The results of these four studies laid the groundwork to ease the detection of new biomarkers ultimately beneficial to medical practice and to the ever-growing field of Personalized Medicine.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this thesis, we investigate the role of applied physics in epidemiological surveillance through the application of mathematical models, network science and machine learning. The spread of a communicable disease depends on many biological, social, and health factors. The large masses of data available make it possible, on the one hand, to monitor the evolution and spread of pathogenic organisms; on the other hand, to study the behavior of people, their opinions and habits. Presented here are three lines of research in which an attempt was made to solve real epidemiological problems through data analysis and the use of statistical and mathematical models. In Chapter 1, we applied language-inspired Deep Learning models to transform influenza protein sequences into vectors encoding their information content. We then attempted to reconstruct the antigenic properties of different viral strains using regression models and to identify the mutations responsible for vaccine escape. In Chapter 2, we constructed a compartmental model to describe the spread of a bacterium within a hospital ward. The model was informed and validated on time series of clinical measurements, and a sensitivity analysis was used to assess the impact of different control measures. Finally (Chapter 3) we reconstructed the network of retweets among COVID-19 themed Twitter users in the early months of the SARS-CoV-2 pandemic. By means of community detection algorithms and centrality measures, we characterized users’ attention shifts in the network, showing that scientific communities, initially the most retweeted, lost influence over time to national political communities. In the Conclusion, we highlighted the importance of the work done in light of the main contemporary challenges for epidemiological surveillance. In particular, we present reflections on the importance of nowcasting and forecasting, the relationship between data and scientific research, and the need to unite the different scales of epidemiological surveillance.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The rapid progression of biomedical research coupled with the explosion of scientific literature has generated an exigent need for efficient and reliable systems of knowledge extraction. This dissertation contends with this challenge through a concentrated investigation of digital health, Artificial Intelligence, and specifically Machine Learning and Natural Language Processing's (NLP) potential to expedite systematic literature reviews and refine the knowledge extraction process. The surge of COVID-19 complicated the efforts of scientists, policymakers, and medical professionals in identifying pertinent articles and assessing their scientific validity. This thesis presents a substantial solution in the form of the COKE Project, an initiative that interlaces machine reading with the rigorous protocols of Evidence-Based Medicine to streamline knowledge extraction. In the framework of the COKE (“COVID-19 Knowledge Extraction framework for next-generation discovery science”) Project, this thesis aims to underscore the capacity of machine reading to create knowledge graphs from scientific texts. The project is remarkable for its innovative use of NLP techniques such as a BERT + bi-LSTM language model. This combination is employed to detect and categorize elements within medical abstracts, thereby enhancing the systematic literature review process. The COKE project's outcomes show that NLP, when used in a judiciously structured manner, can significantly reduce the time and effort required to produce medical guidelines. These findings are particularly salient during times of medical emergency, like the COVID-19 pandemic, when quick and accurate research results are critical.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In recent decades, two prominent trends have influenced the data modeling field, namely network analysis and machine learning. This thesis explores the practical applications of these techniques within the domain of drug research, unveiling their multifaceted potential for advancing our comprehension of complex biological systems. The research undertaken during this PhD program is situated at the intersection of network theory, computational methods, and drug research. Across six projects presented herein, there is a gradual increase in model complexity. These projects traverse a diverse range of topics, with a specific emphasis on drug repurposing and safety in the context of neurological diseases. The aim of these projects is to leverage existing biomedical knowledge to develop innovative approaches that bolster drug research. The investigations have produced practical solutions, not only providing insights into the intricacies of biological systems, but also allowing the creation of valuable tools for their analysis. In short, the achievements are: • A novel computational algorithm to identify adverse events specific to fixed-dose drug combinations. • A web application that tracks the clinical drug research response to SARS-CoV-2. • A Python package for differential gene expression analysis and the identification of key regulatory "switch genes". • The identification of pivotal events causing drug-induced impulse control disorders linked to specific medications. • An automated pipeline for discovering potential drug repurposing opportunities. • The creation of a comprehensive knowledge graph and development of a graph machine learning model for predictions. Collectively, these projects illustrate diverse applications of data science and network-based methodologies, highlighting the profound impact they can have in supporting drug research activities.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Background There is a wide variation of recurrence risk of Non-small-cell lung cancer (NSCLC) within the same Tumor Node Metastasis (TNM) stage, suggesting that other parameters are involved in determining this probability. Radiomics allows extraction of quantitative information from images that can be used for clinical purposes. The primary objective of this study is to develop a radiomic prognostic model that predicts a 3 year disease free-survival (DFS) of resected Early Stage (ES) NSCLC patients. Material and Methods 56 pre-surgery non contrast Computed Tomography (CT) scans were retrieved from the PACS of our institution and anonymized. Then they were automatically segmented with an open access deep learning pipeline and reviewed by an experienced radiologist to obtain 3D masks of the NSCLC. Images and masks underwent to resampling normalization and discretization. From the masks hundreds Radiomic Features (RF) were extracted using Py-Radiomics. Hence, RF were reduced to select the most representative features. The remaining RF were used in combination with Clinical parameters to build a DFS prediction model using Leave-one-out cross-validation (LOOCV) with Random Forest. Results and Conclusion A poor agreement between the radiologist and the automatic segmentation algorithm (DICE score of 0.37) was found. Therefore, another experienced radiologist manually segmented the lesions and only stable and reproducible RF were kept. 50 RF demonstrated a high correlation with the DFS but only one was confirmed when clinicopathological covariates were added: Busyness a Neighbouring Gray Tone Difference Matrix (HR 9.610). 16 clinical variables (which comprised TNM) were used to build the LOOCV model demonstrating a higher Area Under the Curve (AUC) when RF were included in the analysis (0.67 vs 0.60) but the difference was not statistically significant (p=0,5147).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Il volume di tesi ha riguardato lo sviluppo di un'applicazione mobile che sfrutta la Realtà Aumentata e il Machine Learning nel contesto della biodiversità. Nello specifico si è realizzato un modello di AI che permetta la classificazione di immagini di fiori. Tale modello è stato poi integrato in Android, al fine della realizzazione di un'app che riesca a riconoscere specifiche specie di fiori, oltre a individuare gli insetti impollinatori attratti da essi e rappresentarli in Realtà Aumentata.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Il quark-gluon plasma (QGP) è uno stato della materia previsto dalla cromodinamica quantistica. L’esperimento ALICE a LHC ha tra i suoi obbiettivi principali lo studio della materia fortemente interagente e le proprietà del QGP attraverso collisioni di ioni pesanti ultra-relativistici. Per un’esaustiva comprensione di tali proprietà, le stesse misure effettuate su sistemi collidenti più piccoli (collisioni protone-protone e protone-ione) sono necessarie come riferimento. Le recenti analisi dei dati raccolti ad ALICE hanno mostrato che la nostra comprensione dei meccanismi di adronizzazione di quark pesanti non è completa, perchè i dati ottenuti in collisioni pp e p-Pb non sono riproducibili utilizzando modelli basati sui risultati ottenuti con collisioni e+e− ed ep. Per questo motivo, nuovi modelli teorici e fenomenologici, in grado di riprodurre le misure sperimentali, sono stati proposti. Gli errori associati a queste nuove misure sperimentali al momento non permettono di verificare in maniera chiara la veridicità dei diversi modelli proposti. Nei prossimi anni sarà quindi fondamentale aumentare la precisione di tali misure sperimentali; d’altra parte, stimare il numero delle diverse specie di particelle prodotte in una collisione può essere estremamente complicato. In questa tesi, il numero di barioni Lc prodotti in un campione di dati è stato ottenuto utilizzando delle tecniche di machine learning, in grado di apprendere pattern e imparare a distinguere candidate di segnale da quelle di fondo. Si sono inoltre confrontate tre diverse implementazioni di un algoritmo di Boosted Decision Trees (BDT) e si è utilizzata quella più performante per ricostruire il barione Lc in collisioni pp raccolte dall’esperimento ALICE.