12 resultados para sequence based alignments

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Since the turn of the century, fisheries have maintained a steady growth rate, while aquaculture has experienced a more rapid expansion. Aquaculture can offer EU consumers more diverse, healthy, and sustainable food options, some of which are more popular elsewhere. To develop the sector, the EU is investing heavily. The EU supports innovative projects that promote the sustainable development of seafood sectors and food security. Priority 3 promotes sector development through innovation dissemination. This doctoral dissertation examined innovation transfer in the Italian aquaculture sector, specifically the adoption of innovative tools, using a theoretical model to better understand the complexity of these processes. The work focused on innovation adoption, emphasising that it is the end of a well-defined process. The Awareness Knowledge Adoption Implementation Effectiveness (AKAIE) model was created to better analyse post-adoption phases and evaluate technology adoption implementation and impact. To identify AKAIE drivers and barriers, aquaculture actors were consulted. "Perceived complexity"—barriers to adoption that are strongly influenced by contextual factors—has been used to examine their perspectives (i.e. socio-economic, institutional, cultural ones). The new model will contextualise the sequence based on technologies, entrepreneur traits, corporate and institutional contexts, and complexity perception, the sequence's central node. Technology adoption can also be studied by examining complexity perceptions along the AKAIE sequence. This study proposes a new model to evaluate the diffusion of a given technology, offering the policy maker the possibility to be able to act promptly across the process. The development of responsible policies for evaluating the effectiveness of innovation is more necessary than ever, especially to orient strategies and interventions in the face of major scenarios of change.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biohybrid derivatives of π-conjugated materials are emerging as powerful tools to study biological events through the (opto)electronic variations of the π-conjugated moieties, as well as to direct and govern the self-assembly properties of the organic materials through the organization principles of the bio component. So far, very few examples of thiophene-based biohybrids have been reported. The aim of this Ph. D thesis has been the development of oligothiophene-oligonucleotide hybrid derivatives as tools, on one side, to detect DNA hybridisation events and, on the other, as model compounds to investigate thiophene-nucleobase interactions in the solid state. To obtain oligothiophene bioconjugates with the required high level of purity, we first developed new synthetic ecofriendly protocols for the synthesis of thiophene oligomers. Our innovative heterogeneous Suzuki coupling methodology, carried out in EtOH/water or isopropanol under microwave irradiation, allowed us to obtain alkyl substituted oligothiophenes and thiophene based co-oligomers in high yields and very short reaction times, free from residual metals and with improved film forming properties. These methodologies were subsequently applied in the synthesis of oligothiophene-oligonucleotide conjugates. Oligothiophene-5-labeled deoxyuridines were synthesized and incorporated into 19-meric oligonucletide sequences. We showed that the oligothiophene-labeled oligonucletide sequences obtained can be used as probes to detect a single nucleotide polymorphism (SNP) in complementary DNA target sequences. In fact, all the probes showed marked variations in emission intensity upon hybridization with a complementary target sequence. The observed variations in emitted light were comparable or even superior to those reported in similar studies, showing that the biohybrids can potentially be useful to develop biosensors for the detection of DNA mismatches. Finally, water-soluble, photoluminescent and electroactive dinucleotide-hybrid derivatives of quaterthiophene and quinquethiophene were synthesized. By means of a combination of spectroscopy and microscopy techniques, electrical characterizations, microfluidic measurements and theoretical calculations, we were able to demonstrate that the self-assembly modalities of the biohybrids in thin films are driven by the interplay of intra and intermolecular interactions in which the π-stacking between the oligothiophene and nucleotide bases plays a major role.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It is usual to hear a strange short sentence: «Random is better than...». Why is randomness a good solution to a certain engineering problem? There are many possible answers, and all of them are related to the considered topic. In this thesis I will discuss about two crucial topics that take advantage by randomizing some waveforms involved in signals manipulations. In particular, advantages are guaranteed by shaping the second order statistic of antipodal sequences involved in an intermediate signal processing stages. The first topic is in the area of analog-to-digital conversion, and it is named Compressive Sensing (CS). CS is a novel paradigm in signal processing that tries to merge signal acquisition and compression at the same time. Consequently it allows to direct acquire a signal in a compressed form. In this thesis, after an ample description of the CS methodology and its related architectures, I will present a new approach that tries to achieve high compression by design the second order statistics of a set of additional waveforms involved in the signal acquisition/compression stage. The second topic addressed in this thesis is in the area of communication system, in particular I focused the attention on ultra-wideband (UWB) systems. An option to produce and decode UWB signals is direct-sequence spreading with multiple access based on code division (DS-CDMA). Focusing on this methodology, I will address the coexistence of a DS-CDMA system with a narrowband interferer. To do so, I minimize the joint effect of both multiple access (MAI) and narrowband (NBI) interference on a simple matched filter receiver. I will show that, when spreading sequence statistical properties are suitably designed, performance improvements are possible with respect to a system exploiting chaos-based sequences minimizing MAI only.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays robotic applications are widespread and most of the manipulation tasks are efficiently solved. However, Deformable-Objects (DOs) still represent a huge limitation for robots. The main difficulty in DOs manipulation is dealing with the shape and dynamics uncertainties, which prevents the use of model-based approaches (since they are excessively computationally complex) and makes sensory data difficult to interpret. This thesis reports the research activities aimed to address some applications in robotic manipulation and sensing of Deformable-Linear-Objects (DLOs), with particular focus to electric wires. In all the works, a significant effort was made in the study of an effective strategy for analyzing sensory signals with various machine learning algorithms. In the former part of the document, the main focus concerns the wire terminals, i.e. detection, grasping, and insertion. First, a pipeline that integrates vision and tactile sensing is developed, then further improvements are proposed for each module. A novel procedure is proposed to gather and label massive amounts of training images for object detection with minimal human intervention. Together with this strategy, we extend a generic object detector based on Convolutional-Neural-Networks for orientation prediction. The insertion task is also extended by developing a closed-loop control capable to guide the insertion of a longer and curved segment of wire through a hole, where the contact forces are estimated by means of a Recurrent-Neural-Network. In the latter part of the thesis, the interest shifts to the DLO shape. Robotic reshaping of a DLO is addressed by means of a sequence of pick-and-place primitives, while a decision making process driven by visual data learns the optimal grasping locations exploiting Deep Q-learning and finds the best releasing point. The success of the solution leverages on a reliable interpretation of the DLO shape. For this reason, further developments are made on the visual segmentation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

DNA as powerful building molecule, is widely used for the assembly of molecular structures and dynamic molecular devices with different potential applications, ranging from synthetic biology to diagnostics. The feature of sequence programmability, which makes it possible to predict how single stranded DNA molecules fold and interact with one another, allowed the development of spatiotemporally controlled nanostructures and the engineering of supramolecular devices. The first part of this thesis addresses the development of an integrated chemiluminescence (CL)-based lab-on-chip sensor for detection of Adenosine-5-triphosphate (ATP) life biomarker in extra-terrestrial environments.Subsequently, we investigated whether it is possible to study the interaction and the recognition between biomolecules and their targets, mimicking the intracellular environment in terms of crowding, confinement and compartmentalization. To this purpose, we developed a split G-quadruplex DNAzyme platform for the chemiluminescent and quantitative detection of antibodies based on antibody-induced co-localization proximity mechanism in which a split G-quadruplex DNAzyme is led to reassemble into the functional native G-quadruplex conformation as the effect of a guided spatial nanoconfinement.The following part of this thesis aims at developing chemiluminescent nanoparticles for bioimaging and photodynamic therapy applications.In chapter5 a realistic and accurate evaluation of the potentiality of electrochemistry and chemiluminescence (CL) for biosensors development (i.e., is it better to “measure an electron or a photon”?), has been achieved.In chapter 6 the emission anisotropy phenomenon for an emitting dipole bound to the interface between two media with different refractive index has been investigated for chemiluminescence detection.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we will see that the DNA sequence is constantly shaped by the interactions with its environment at multiple levels, showing footprints of DNA methylation, of its 3D organization and, in the case of bacteria, of the interaction with the host organisms. In the first chapter, we will see that analyzing the distribution of distances between consecutive dinucleotides of the same type along the sequence, we can detect epigenetic and structural footprints. In particular, we will see that CG distance distribution allows to distinguish among organisms of different biological complexity, depending on how much CG sites are involved in DNA methylation. Moreover, we will see that CG and TA can be described by the same fitting function, suggesting a relationship between the two. We will also provide an interpretation of the observed trend, simulating a positioning process guided by the presence and absence of memory. In the end, we will focus on TA distance distribution, characterizing deviations from the trend predicted by the best fitting function, and identifying specific patterns that might be related to peculiar mechanical properties of the DNA and also to epigenetic and structural processes. In the second chapter, we will see how we can map the 3D structure of the DNA onto its sequence. In particular, we devised a network-based algorithm that produces a genome assembly starting from its 3D configuration, using as inputs Hi-C contact maps. Specifically, we will see how we can identify the different chromosomes and reconstruct their sequences by exploiting the spectral properties of the Laplacian operator of a network. In the third chapter, we will see a novel method for source clustering and source attribution, based on a network approach, that allows to identify host-bacteria interaction starting from the detection of Single-Nucleotide Polymorphisms along the sequence of bacterial genomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Whole Exome Sequencing (WES) is rapidly becoming the first-tier test in clinics, both thanks to its declining costs and the development of new platforms that help clinicians in the analysis and interpretation of SNV and InDels. However, we still know very little on how CNV detection could increase WES diagnostic yield. A plethora of exome CNV callers have been published over the years, all showing good performances towards specific CNV classes and sizes, suggesting that the combination of multiple tools is needed to obtain an overall good detection performance. Here we present TrainX, a ML-based method for calling heterozygous CNVs in WES data using EXCAVATOR2 Normalized Read Counts. We select males and females’ non pseudo-autosomal chromosome X alignments to construct our dataset and train our model, make predictions on autosomes target regions and use HMM to call CNVs. We compared TrainX against a set of CNV tools differing for the detection method (GATK4 gCNV, ExomeDepth, DECoN, CNVkit and EXCAVATOR2) and found that our algorithm outperformed them in terms of stability, as we identified both deletions and duplications with good scores (0.87 and 0.82 F1-scores respectively) and for sizes reaching the minimum resolution of 2 target regions. We also evaluated the method robustness using a set of WES and SNP array data (n=251), part of the Italian cohort of Epi25 collaborative, and were able to retrieve all clinical CNVs previously identified by the SNP array. TrainX showed good accuracy in detecting heterozygous CNVs of different sizes, making it a promising tool to use in a diagnostic setting.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Allostery is a phenomenon of fundamental importance in biology, allowing regulation of function and dynamic adaptability of enzymes and proteins. Despite the allosteric effect was first observed more than a century ago allostery remains a biophysical enigma, defined as the “second secret of life”. The challenge is mainly associated to the rather complex nature of the allosteric mechanisms, which manifests itself as the alteration of the biological function of a protein/enzyme (e.g. ligand/substrate binding at the active site) by binding of “other object” (“allos stereos” in Greek) at a site distant (> 1 nanometer) from the active site, namely the effector site. Thus, at the heart of allostery there is signal propagation from the effector to the active site through a dense protein matrix, with a fundamental challenge being represented by the elucidation of the physico-chemical interactions between amino acid residues allowing communicatio n between the two binding sites, i.e. the “allosteric pathways”. Here, we propose a multidisciplinary approach based on a combination of computational chemistry, involving molecular dynamics simulations of protein motions, (bio)physical analysis of allosteric systems, including multiple sequence alignments of known allosteric systems, and mathematical tools based on graph theory and machine learning that can greatly help understanding the complexity of dynamical interactions involved in the different allosteric systems. The project aims at developing robust and fast tools to identify unknown allosteric pathways. The characterization and predictions of such allosteric spots could elucidate and fully exploit the power of allosteric modulation in enzymes and DNA-protein complexes, with great potential applications in enzyme engineering and drug discovery.