9 resultados para structured sequence

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

20.00% 20.00%

Publicador:

Resumo:

The need for a convergence between semi-structured data management and Information Retrieval techniques is manifest to the scientific community. In order to fulfil this growing request, W3C has recently proposed XQuery Full Text, an IR-oriented extension of XQuery. However, the issue of query optimization requires the study of important properties like query equivalence and containment; to this aim, a formal representation of document and queries is needed. The goal of this thesis is to establish such formal background. We define a data model for XML documents and propose an algebra able to represent most of XQuery Full-Text expressions. We show how an XQuery Full-Text expression can be translated into an algebraic expression and how an algebraic expression can be optimized.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this research work the optimization of the electrochemical system of LDHs as catalytic precursors on FeCrAlY foams was carried out. Preliminary sintheses were performed on flat surfaces in order to easily characterize the deposited material. From the study of pH evolution vs time at different cathodic potentials applied to a Pt electrode, the theoretical best working conditions for the synthesis of single hydroxides and LDH compounds was achieved. In order to define the optimal potential for the synthesis of a particular LDH compound, the collected data were compared with the interval of precipitation determined by titration with NaOH. However, the characterization of the deposited material on Pt surfaces did not confirm the deposition of a pure and homogeneous LDH phase during the synthesis. Instead a sequential deposition linked to the pH of precipitation of the involved elements is observed. The same behavior was observed during the synthesis of the RhMgAl LDH on FeCrAlY foam as catalytic precursor. Several parameters were considered in order to optimize the synthesis.. The development of electrochemical cells with different feature, such as the counter electrode dimensions or the contact between the foam and the potentiostat, had been carried out in order to obtain a better coating of the foam. The influence of the initial pH of the electrolyte solution, of the applied potential, of the composition of the electrolytic solution were investigated in order to improve a better coating of the catalyst support. Catalytic tests were performed after the calcination of the deposited foam for the CPO and SR reactions, showing an improve of performances along with optimization of the precursors synthesis conditions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The present research thesis was focused on the development of new biomaterials and devices for application in regenerative medicine, particularly in the repair/regeneration of bone and osteochondral regions affected by degenerative diseases such as Osteoarthritis and Osteoporosis or serious traumas. More specifically, the work was focused on the synthesis and physico-chemical-morphological characterization of: i) a new superparamagnetic apatite phase; ii) new biomimetic superparamagnetic bone and osteochondral scaffolds; iii) new bioactive bone cements for regenerative vertebroplasty. The new bio-devices were designed to exhibit high biomimicry with hard human tissues and with functionality promoting faster tissue repair and improved texturing. In particular, recent trends in tissue regeneration indicate magnetism as a new tool to stimulate cells towards tissue formation and organization; in this perspective a new superparamagnetic apatite was synthesized by doping apatite lattice with di-and trivalent iron ions during synthesis. This finding was the pin to synthesize newly conceived superparamagnetic bone and osteochondral scaffolds by reproducing in laboratory the biological processes yielding the formation of new bone, i.e. the self-assembly/organization of collagen fibrils and heterogeneous nucleation of nanosized, ionically substituted apatite mimicking the mineral part of bone. The new scaffolds can be magnetically switched on/off and function as workstations guiding fast tissue regeneration by minimally invasive and more efficient approaches. Moreover, in the view of specific treatments for patients affected by osteoporosis or traumas involving vertebrae weakening or fracture, the present work was also dedicated to the development of new self-setting injectable pastes based on strontium-substituted calcium phosphates, able to harden in vivo and transform into strontium-substituted hydroxyapatite. The addition of strontium may provide an anti-osteoporotic effect, aiding to restore the physiologic bone turnover. The ceramic-based paste was also added with bio-polymers, able to be progressively resorbed thus creating additional porosity in the cement body that favour cell colonization and osseointegration.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Bioinformatics, in the last few decades, has played a fundamental role to give sense to the huge amount of data produced. Obtained the complete sequence of a genome, the major problem of knowing as much as possible of its coding regions, is crucial. Protein sequence annotation is challenging and, due to the size of the problem, only computational approaches can provide a feasible solution. As it has been recently pointed out by the Critical Assessment of Function Annotations (CAFA), most accurate methods are those based on the transfer-by-homology approach and the most incisive contribution is given by cross-genome comparisons. In the present thesis it is described a non-hierarchical sequence clustering method for protein automatic large-scale annotation, called “The Bologna Annotation Resource Plus” (BAR+). The method is based on an all-against-all alignment of more than 13 millions protein sequences characterized by a very stringent metric. BAR+ can safely transfer functional features (Gene Ontology and Pfam terms) inside clusters by means of a statistical validation, even in the case of multi-domain proteins. Within BAR+ clusters it is also possible to transfer the three dimensional structure (when a template is available). This is possible by the way of cluster-specific HMM profiles that can be used to calculate reliable template-to-target alignments even in the case of distantly related proteins (sequence identity < 30%). Other BAR+ based applications have been developed during my doctorate including the prediction of Magnesium binding sites in human proteins, the ABC transporters superfamily classification and the functional prediction (GO terms) of the CAFA targets. Remarkably, in the CAFA assessment, BAR+ placed among the ten most accurate methods. At present, as a web server for the functional and structural protein sequence annotation, BAR+ is freely available at http://bar.biocomp.unibo.it/bar2.0.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clostridium difficile is an obligate anaerobic, Gram-positive, endospore-forming bacterium. Although an opportunistic pathogen, it is one of the important causes of healthcare-associated infections. While toxins TcdA and TcdB are the main virulence factors of C. difficile, the factors or processes involved in gut colonization during infection remain unclear. The biofilm-forming ability of bacterial pathogens has been associated with increased antibiotic resistance and chronic recurrent infections. Little is known about biofilm formation by anaerobic gut species. Biofilm formation by C. difficile could play a role in virulence and persistence of C. difficile, as seen for other intestinal pathogens. We demonstrate that C. difficile clinical strains, 630, and the strain isolated in the outbreak, R20291, form structured biofilms in vitro. Biofilm matrix is made of proteins, DNA and polysaccharide. Strain R20291 accumulates substantially more biofilm. Employing isogenic mutants, we show that virulence-associated proteins, Cwp84, flagella and a putative quorum sensing regulator, LuxS, Spo0A, are required for maximal biofilm formation by C. difficile. Moreover we demonstrate that bacteria in C. difficile biofilms are more resistant to high concentrations of vancomycin, a drug commonly used for treatment of CDI, and that inhibitory and sub-inhibitory concentrations of the same antibiotic induce biofilm formation. Surprisingly, clinical C. difficile strains from the same out-break, but from different origin, show differences in biofilm formation. Genome sequence analysis of these strains showed presence of a single nucleoide polymorphism (SNP) in the anti-σ factor RsbW, which regulates the stress-induced alternative sigma factor B (σB). We further demonstrate that RsbW, a negative regulator of alternative sigma factor B, has a role in biofilm formation and sporulation of C. difficile. Our data suggest that biofilm formation by C. difficile is a complex multifactorial process and may be a crucial mechanism for clostridial persistence in the host.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This PhD thesis is focused on the study of the molecular variability of some specific proteins, part of the outer membrane of the pathogen Neisseria meningitidis, and described as protective antigens and important virulence factors. These antigens have been employed as components of the vaccine developed by Novartis Vaccines against N. meningitidis of serogroup B, and their variability in the meningococcal population is a key aspect when the effect of the vaccine is evaluated. The PhD project has led to complete three major studies described in three different manuscritps, of which two have been published and the third is in preparation. The thesis is structured in three main chapters, each of them dedicated to the three studies. The first, described in Chapter 1, is specifically dedicated to the analysis of the molecular conservation of meningococcal antigens in the genomes of all species classified in the genus Neisseria (Conservation of Meningococcal Antigens in the Genus Neisseria. A. Muzzi et al.. 2013. mBio 4 (3)). The second study, described in Chapter 2, focuses on the analysis of the presence and conservation of the antigens in a panel of bacterial isolates obtained from cases of the disease and from healthy individuals, and collected in the same year and in the same geographical area (Conservation of fHbp, NadA, and NHBA in carrier and pathogenic isolates of Neisseria meningitidis collected in the Czech Republic in 1993. A. Muzzi et al.. Manuscript in preparation). Finally, Chapter 3 describes the molecular features of the antigens in a panel of bacterial isolates collected over a period of 50 years, and representatives of the epidemiological history of meningococcal disease in the Netherlands (An Analysis of the Sequence Variability of Meningococcal fHbp, NadA and NHBA over a 50-Year Period in the Netherlands. S. Bambini et al.. 2013. PloS one e65043).

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The recent advent of Next-generation sequencing technologies has revolutionized the way of analyzing the genome. This innovation allows to get deeper information at a lower cost and in less time, and provides data that are discrete measurements. One of the most important applications with these data is the differential analysis, that is investigating if one gene exhibit a different expression level in correspondence of two (or more) biological conditions (such as disease states, treatments received and so on). As for the statistical analysis, the final aim will be statistical testing and for modeling these data the Negative Binomial distribution is considered the most adequate one especially because it allows for "over dispersion". However, the estimation of the dispersion parameter is a very delicate issue because few information are usually available for estimating it. Many strategies have been proposed, but they often result in procedures based on plug-in estimates, and in this thesis we show that this discrepancy between the estimation and the testing framework can lead to uncontrolled first-type errors. We propose a mixture model that allows each gene to share information with other genes that exhibit similar variability. Afterwards, three consistent statistical tests are developed for differential expression analysis. We show that the proposed method improves the sensitivity of detecting differentially expressed genes with respect to the common procedures, since it is the best one in reaching the nominal value for the first-type error, while keeping elevate power. The method is finally illustrated on prostate cancer RNA-seq data.