10 resultados para Automatic annotation

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Process algebraic architectural description languages provide a formal means for modeling software systems and assessing their properties. In order to bridge the gap between system modeling and system im- plementation, in this thesis an approach is proposed for automatically generating multithreaded object-oriented code from process algebraic architectural descriptions, in a way that preserves – under certain assumptions – the properties proved at the architectural level. The approach is divided into three phases, which are illustrated by means of a running example based on an audio processing system. First, we develop an architecture-driven technique for thread coordination management, which is completely automated through a suitable package. Second, we address the translation of the algebraically-specified behavior of the individual software units into thread templates, which will have to be filled in by the software developer according to certain guidelines. Third, we discuss performance issues related to the suitability of synthesizing monitors rather than threads from software unit descriptions that satisfy specific constraints. In addition to the running example, we present two case studies about a video animation repainting system and the implementation of a leader election algorithm, in order to summarize the whole approach. The outcome of this thesis is the implementation of the proposed approach in a translator called PADL2Java and its integration in the architecture-centric verification tool TwoTowers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The identification of people by measuring some traits of individual anatomy or physiology has led to a specific research area called biometric recognition. This thesis is focused on improving fingerprint recognition systems considering three important problems: fingerprint enhancement, fingerprint orientation extraction and automatic evaluation of fingerprint algorithms. An effective extraction of salient fingerprint features depends on the quality of the input fingerprint. If the fingerprint is very noisy, we are not able to detect a reliable set of features. A new fingerprint enhancement method, which is both iterative and contextual, is proposed. This approach detects high-quality regions in fingerprints, selectively applies contextual filtering and iteratively expands like wildfire toward low-quality ones. A precise estimation of the orientation field would greatly simplify the estimation of other fingerprint features (singular points, minutiae) and improve the performance of a fingerprint recognition system. The fingerprint orientation extraction is improved following two directions. First, after the introduction of a new taxonomy of fingerprint orientation extraction methods, several variants of baseline methods are implemented and, pointing out the role of pre- and post- processing, we show how to improve the extraction. Second, the introduction of a new hybrid orientation extraction method, which follows an adaptive scheme, allows to improve significantly the orientation extraction in noisy fingerprints. Scientific papers typically propose recognition systems that integrate many modules and therefore an automatic evaluation of fingerprint algorithms is needed to isolate the contributions that determine an actual progress in the state-of-the-art. The lack of a publicly available framework to compare fingerprint orientation extraction algorithms, motivates the introduction of a new benchmark area called FOE (including fingerprints and manually-marked orientation ground-truth) along with fingerprint matching benchmarks in the FVC-onGoing framework. The success of such framework is discussed by providing relevant statistics: more than 1450 algorithms submitted and two international competitions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This thesis proposes design methods and test tools, for optical systems, which may be used in an industrial environment, where not only precision and reliability but also ease of use is important. The approach to the problem has been conceived to be as general as possible, although in the present work, the design of a portable device for automatic identification applications has been studied, because this doctorate has been funded by Datalogic Scanning Group s.r.l., a world-class producer of barcode readers. The main functional components of the complete device are: electro-optical imaging, illumination and pattern generator systems. For what concerns the electro-optical imaging system, a characterization tool and an analysis one has been developed to check if the desired performance of the system has been achieved. Moreover, two design tools for optimizing the imaging system have been implemented. The first optimizes just the core of the system, the optical part, improving its performance ignoring all other contributions and generating a good starting point for the optimization of the whole complex system. The second tool optimizes the system taking into account its behavior with a model as near as possible to reality including optics, electronics and detection. For what concerns the illumination and the pattern generator systems, two tools have been implemented. The first allows the design of free-form lenses described by an arbitrary analytical function exited by an incoherent source and is able to provide custom illumination conditions for all kind of applications. The second tool consists of a new method to design Diffractive Optical Elements excited by a coherent source for large pattern angles using the Iterative Fourier Transform Algorithm. Validation of the design tools has been obtained, whenever possible, comparing the performance of the designed systems with those of fabricated prototypes. In other cases simulations have been used.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In distributed systems like clouds or service oriented frameworks, applications are typically assembled by deploying and connecting a large number of heterogeneous software components, spanning from fine-grained packages to coarse-grained complex services. The complexity of such systems requires a rich set of techniques and tools to support the automation of their deployment process. By relying on a formal model of components, a technique is devised for computing the sequence of actions allowing the deployment of a desired configuration. An efficient algorithm, working in polynomial time, is described and proven to be sound and complete. Finally, a prototype tool implementing the proposed algorithm has been developed. Experimental results support the adoption of this novel approach in real life scenarios.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

People tend to automatically mimic facial expressions of others. If clear evidence exists on the effect of non-verbal behavior (emotion faces) on automatic facial mimicry, little is known about the role of verbal behavior (emotion language) in triggering such effects. Whereas it is well-established that political affiliation modulates facial mimicry, no evidence exists on whether this modulation passes also through verbal means. This research addressed the role of verbal behavior in triggering automatic facial effects depending on whether verbal stimuli are attributed to leaders of different political parties. Study 1 investigated the role of interpersonal verbs, referring to positive and negative emotion expressions and encoding them at different levels of abstraction, in triggering corresponding facial muscle activation in a reader. Study 2 examined the role of verbs expressing positive and negative emotional behaviors of political leaders in modulating automatic facial effects depending on the matched or mismatched political affiliation of participants and politicians of left-and right-wing. Study 3 examined whether verbs expressing happiness displays of ingroup politicians induce a more sincere smile (Duchenne) pattern among readers of same political affiliation relative to happiness expressions of outgroup politicians. Results showed that verbs encoding facial actions at different levels of abstraction elicited differential facial muscle activity (Study 1). Furthermore, political affiliation significantly modulated facial activation triggered by emotion verbs as participants showed more congruent and enhanced facial activity towards ingroup politicians’ smiles and frowns compared to those of outgroup politicians (Study 2). Participants facially responded with a more sincere smile pattern towards verbs expressing smiles of ingroup compared to outgroup politicians (Study 3). Altogether, results showed that the role of political affiliation in modulating automatic facial effects passes also through verbal channels and is revealed at a fine-grained level by inducing quantitative and qualitative differences in automatic facial reactions of readers.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dysfunction of Autonomic Nervous System (ANS) is a typical feature of chronic heart failure and other cardiovascular disease. As a simple non-invasive technology, heart rate variability (HRV) analysis provides reliable information on autonomic modulation of heart rate. The aim of this thesis was to research and develop automatic methods based on ANS assessment for evaluation of risk in cardiac patients. Several features selection and machine learning algorithms have been combined to achieve the goals. Automatic assessment of disease severity in Congestive Heart Failure (CHF) patients: a completely automatic method, based on long-term HRV was proposed in order to automatically assess the severity of CHF, achieving a sensitivity rate of 93% and a specificity rate of 64% in discriminating severe versus mild patients. Automatic identification of hypertensive patients at high risk of vascular events: a completely automatic system was proposed in order to identify hypertensive patients at higher risk to develop vascular events in the 12 months following the electrocardiographic recordings, achieving a sensitivity rate of 71% and a specificity rate of 86% in identifying high-risk subjects among hypertensive patients. Automatic identification of hypertensive patients with history of fall: it was explored whether an automatic identification of fallers among hypertensive patients based on HRV was feasible. The results obtained in this thesis could have implications both in clinical practice and in clinical research. The system has been designed and developed in order to be clinically feasible. Moreover, since 5-minute ECG recording is inexpensive, easy to assess, and non-invasive, future research will focus on the clinical applicability of the system as a screening tool in non-specialized ambulatories, in order to identify high-risk patients to be shortlisted for more complex investigations.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A method for automatic scaling of oblique ionograms has been introduced. This method also provides a rejection procedure for ionograms that are considered to lack sufficient information, depicting a very good success rate. Observing the Kp index of each autoscaled ionogram, can be noticed that the behavior of the autoscaling program does not depend on geomagnetic conditions. The comparison between the values of the MUF provided by the presented software and those obtained by an experienced operator indicate that the procedure developed for detecting the nose of oblique ionogram traces is sufficiently efficient and becomes much more efficient as the quality of the ionograms improves. These results demonstrate the program allows the real-time evaluation of MUF values associated with a particular radio link through an oblique radio sounding. The automatic recognition of a part of the trace allows determine for certain frequencies, the time taken by the radio wave to travel the path between the transmitter and receiver. The reconstruction of the ionogram traces, suggests the possibility of estimating the electron density between the transmitter and the receiver, from an oblique ionogram. The showed results have been obtained with a ray-tracing procedure based on the integration of the eikonal equation and using an analytical ionospheric model with free parameters. This indicates the possibility of applying an adaptive model and a ray-tracing algorithm to estimate the electron density in the ionosphere between the transmitter and the receiver An additional study has been conducted on a high quality ionospheric soundings data set and another algorithm has been designed for the conversion of an oblique ionogram into a vertical one, using Martyn's theorem. This allows a further analysis of oblique soundings, throw the use of the INGV Autoscala program for the automatic scaling of vertical ionograms.