Biblioteca Digital

951 resultados para Markov chains, uniformization, inexact methods, relaxed matrix-vector

TWO NEW WEAK CONSTRAINT QUALIFICATIONS AND APPLICATIONS

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We present two new constraint qualifications (CQs) that are weaker than the recently introduced relaxed constant positive linear dependence (RCPLD) CQ. RCPLD is based on the assumption that many subsets of the gradients of the active constraints preserve positive linear dependence locally. A major open question was to identify the exact set of gradients whose properties had to be preserved locally and that would still work as a CQ. This is done in the first new CQ, which we call the constant rank of the subspace component (CRSC) CQ. This new CQ also preserves many of the good properties of RCPLD, such as local stability and the validity of an error bound. We also introduce an even weaker CQ, called the constant positive generator (CPG), which can replace RCPLD in the analysis of the global convergence of algorithms. We close this work by extending convergence results of algorithms belonging to all the main classes of nonlinear optimization methods: sequential quadratic programming, augmented Lagrangians, interior point algorithms, and inexact restoration.

Chains of Infinite Order, Chains with Memory of Variable Length, and Maps of the Interval

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We show how to construct a topological Markov map of the interval whose invariant probability measure is the stationary law of a given stochastic chain of infinite order. In particular we characterize the maps corresponding to stochastic chains with memory of variable length. The problem treated here is the converse of the classical construction of the Gibbs formalism for Markov expanding maps of the interval.

A Single Statistic for Monitoring the Covariance Matrix of Bivariate Processes

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this article, we present a new control chart for monitoring the covariance matrix in a bivariate process. In this method, n observations of the two variables were considered as if they came from a single variable (as a sample of 2n observations), and a sample variance was calculated. This statistic was used to build a new control chart specifically as a VMIX chart. The performance of the new control chart was compared with its main competitors: the generalized sampled variance chart, the likelihood ratio test, Nagao's test, probability integral transformation (v(t)), and the recently proposed VMAX chart. Among these statistics, only the VMAX chart was competitive with the VMIX chart. For shifts in both variances, the VMIX chart outperformed VMAX; however, VMAX showed better performance for large shifts (higher than 10%) in one variance.

Evaluating different methods of microarray data normalization

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background With the development of DNA hybridization microarray technologies, nowadays it is possible to simultaneously assess the expression levels of thousands to tens of thousands of genes. Quantitative comparison of microarrays uncovers distinct patterns of gene expression, which define different cellular phenotypes or cellular responses to drugs. Due to technical biases, normalization of the intensity levels is a pre-requisite to performing further statistical analyses. Therefore, choosing a suitable approach for normalization can be critical, deserving judicious consideration. Results Here, we considered three commonly used normalization approaches, namely: Loess, Splines and Wavelets, and two non-parametric regression methods, which have yet to be used for normalization, namely, the Kernel smoothing and Support Vector Regression. The results obtained were compared using artificial microarray data and benchmark studies. The results indicate that the Support Vector Regression is the most robust to outliers and that Kernel is the worst normalization technique, while no practical differences were observed between Loess, Splines and Wavelets. Conclusion In face of our results, the Support Vector Regression is favored for microarray normalization due to its superiority when compared to the other methods for its robustness in estimating the normalization curve.

Modeling gene expression regulatory networks with the sparse vector autoregressive model

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background To understand the molecular mechanisms underlying important biological processes, a detailed description of the gene products networks involved is required. In order to define and understand such molecular networks, some statistical methods are proposed in the literature to estimate gene regulatory networks from time-series microarray data. However, several problems still need to be overcome. Firstly, information flow need to be inferred, in addition to the correlation between genes. Secondly, we usually try to identify large networks from a large number of genes (parameters) originating from a smaller number of microarray experiments (samples). Due to this situation, which is rather frequent in Bioinformatics, it is difficult to perform statistical tests using methods that model large gene-gene networks. In addition, most of the models are based on dimension reduction using clustering techniques, therefore, the resulting network is not a gene-gene network but a module-module network. Here, we present the Sparse Vector Autoregressive model as a solution to these problems. Results We have applied the Sparse Vector Autoregressive model to estimate gene regulatory networks based on gene expression profiles obtained from time-series microarray experiments. Through extensive simulations, by applying the SVAR method to artificial regulatory networks, we show that SVAR can infer true positive edges even under conditions in which the number of samples is smaller than the number of genes. Moreover, it is possible to control for false positives, a significant advantage when compared to other methods described in the literature, which are based on ranks or score functions. By applying SVAR to actual HeLa cell cycle gene expression data, we were able to identify well known transcription factor targets. Conclusion The proposed SVAR method is able to model gene regulatory networks in frequent situations in which the number of samples is lower than the number of genes, making it possible to naturally infer partial Granger causalities without any a priori information. In addition, we present a statistical test to control the false discovery rate, which was not previously possible using other gene regulatory network models.

Effects of enamel matrix derivative and transforming growth factor-β1 on human osteoblastic cells

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background Extracellular matrix proteins are key factors that influence the regenerative capacity of tissues. The objective of the present study was to evaluate the effects of enamel matrix derivative (EMD), TGF-β1, and the combination of both factors (EMD+TGF-β1) on human osteoblastic cell cultures. Methods Cells were obtained from alveolar bone of three adult patients using enzymatic digestion. Effects of EMD, TGF-β1, or a combination of both were analyzed on cell proliferation, bone sialoprotein (BSP), osteopontin (OPN) and alkaline phosphatase (ALP) immunodetection, total protein synthesis, ALP activity and bone-like nodule formation. Results All treatments significantly increased cell proliferation compared to the control group at 24 h and 4 days. At day 7, EMD group showed higher cell proliferation compared to TGF-β1, EMD + TGF-β1 and the control group. OPN was detected in the majority of the cells for all groups, whereas fluorescence intensities for ALP labeling were greater in the control than in treated groups; BSP was not detected in all groups. All treatments decreased ALP levels at 7 and 14 days and bone-like nodule formation at 21 days compared to the control group. Conclusions The exposure of human osteoblastic cells to EMD, TGF-β1 and the combination of factors in vitro supports the development of a less differentiated phenotype, with enhanced proliferative activity and total cell number, and reduced ALP activity levels and matrix mineralization.

Comparison measures of maps generated by geostatistical methods

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This study uses several measures derived from the error matrix for comparing two thematic maps generated with the same sample set. The reference map was generated with all the sample elements and the map set as the model was generated without the two points detected as influential by the analysis of local influence diagnostics. The data analyzed refer to the wheat productivity in an agricultural area of 13.55 ha considering a sampling grid of 50 x 50 m comprising 50 georeferenced sample elements. The comparison measures derived from the error matrix indicated that despite some similarity on the maps, they are different. The difference between the estimated production by the reference map and the actual production was of 350 kilograms. The same difference calculated with the mode map was of 50 kilograms, indicating that the study of influential points is of fundamental importance to obtain a more reliable estimative and use of measures obtained from the error matrix is a good option to make comparisons between thematic maps.

Comparison of extraction and transesterification methods on the determination of the fatty acid contents of three Brazilian seaweed species

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Seaweeds are photosynthetic organisms important to their ecosystem and constitute a source of compounds with several different applications in the pharmaceutical, cosmetic and biotechnology industries, such as triacylglycerols, which can be converted to fatty acid methyl esters that make up biodiesel, an alternative source of fuel applied in economic important areas. This study evaluates the fatty acid profiles and concentrations of three Brazilian seaweed species, Hypnea musciformis (Wulfen) J.V. Lamouroux (Rhodophya), Sargassum cymosum C. Agardh (Heterokontophyta), and Ulva lactuca L. (Chlorophyta), comparing three extraction methods (Bligh & Dyer - B&D; AOAC Official Methods - AOM; and extraction with methanol and ultrasound - EMU) and two transesterification methods (7% BF3 in methanol - BF3; and 5% HCl in methanol - HCl). The fatty acid contents of the three species of seaweeds were significantly different when extracted and transesterified by the different methods. Moreover, the best method for one species was not the same for the other species. The best extraction and transesterification methods for H. musciformis, S. cymosum and U. lactuca were, respectively, AOM-HCl, B&D-BF3 and B&D-BF3/B&D-HCl. These results point to a matrix effect and the method used for the analysis of the fatty acid content of different organisms should be selected carefully.

Long-term type 1 diabetes impairs decidualization and extracellular matrix remodeling during early embryonic development in mice

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Endometrial decidualization and associated extracellular matrix (ECM) remodeling are critical events to the establishment of the maternal-fetal interface and successful pregnancy. Here, we investigated the impact of type 1 diabetes on these processes during early embryonic development, in order to contribute to the understanding of the maternal factors associated to diabetic embryopathies. Methods: Alloxan-induced diabetic Swiss female mice were bred after different periods of time to determine the effects of diabetes progression on the development of gestational complications. Furthermore, the analyses focused on decidual development as well as mRNA expression, protein deposition and ultrastructural organization of decidual ECM. Results: Decreased number of implantation sites and decidual dimensions were observed in the group mated 90-110 days after diabetes induction (D), but not in the 50-70D group. Picrosirius staining showed augmentation in the fibrillar collagen network in the 90e110D group and, following immunohistochemical examination, that this was associated with increase in types I and V collagens and decrease in type III collagen and collagen-associated proteoglycans biglycan and lumican. qPCR, however, demonstrated that only type I collagen mRNA levels were increased in the diabetic group. Alterations in the molecular ratio among distinct collagen types and proteoglycans were associated with abnormal collagen fibrillogenesis, analyzed by transmission electron microscopy. Conclusions: Our results support the concept that the development of pregnancy complications is directly related with duration of diabetes (progression of the disease), and that this is a consequence of both systemic factors (i.e. disturbed maternal endocrine-metabolic profile) and uterine factors, including impaired decidualization and ECM remodeling

A comparative analysis of the relative efficacy of vector-control strategies against dengue fever

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dengue is considered one of the most important vector-borne infection, affecting almost half of the world population with 50 to 100 million cases every year. In this paper, we present one of the simplest models that can encapsulate all the important variables related to vector control of dengue fever. The model considers the human population, the adult mosquito population and the population of immature stages, which includes eggs, larvae and pupae. The model also considers the vertical transmission of dengue in the mosquitoes and the seasonal variation in the mosquito population. From this basic model describing the dynamics of dengue infection, we deduce thresholds for avoiding the introduction of the disease and for the elimination of the disease. In particular, we deduce a Basic Reproduction Number for dengue that includes parameters related to the immature stages of the mosquito. By neglecting seasonal variation, we calculate the equilibrium values of the model’s variables. We also present a sensitivity analysis of the impact of four vector-control strategies on the Basic Reproduction Number, on the Force of Infection and on the human prevalence of dengue. Each of the strategies was studied separately from the others. The analysis presented allows us to conclude that of the available vector control strategies, adulticide application is the most effective, followed by the reduction of the exposure to mosquito bites, locating and destroying breeding places and, finally, larvicides. Current vector-control methods are concentrated on mechanical destruction of mosquitoes’ breeding places. Our results suggest that reducing the contact between vector and hosts (biting rates) is as efficient as the logistically difficult but very efficient adult mosquito’s control.

Genetic characterization and molecular identification of the bloodmeal sources of the potential bluetongue vector Culicoides obsoletus in the Canary Islands, Spain

Relevância:

30.00% 30.00%

Publicador:

Resumo:

[EN] Background: Culicoides (Diptera: Ceratopogonidae) biting midges are vectors for a diversity of pathogens including bluetongue virus (BTV) that generate important economic losses. BTV has expanded its range in recent decades, probably due to the expansion of its main vector and the presence of other autochthonous competent vectors. Although the Canary Islands are still free of bluetongue disease (BTD), Spain and Europe have had to face up to a spread of bluetongue with disastrous consequences. Therefore, it is essential to identify the distribution of biting midges and understand their feeding patterns in areas susceptible to BTD. To that end, we captured biting midges on two farms in the Canary Islands (i) to identify the midge species in question and characterize their COI barcoding region and (ii) to ascertain the source of their bloodmeals using molecular tools.Methods: Biting midges were captured using CDC traps baited with a 4-W blacklight (UV) bulb on Gran Canaria and on Tenerife. Biting midges were quantified and identified according to their wing patterns. A 688 bp segment of the mitochondrial COI gene of 20 biting midges (11 from Gran Canaria and 9 from Tenerife) were PCR amplified using the primers LCO1490 and HCO2198. Moreover, after selected all available females showing any rest of blood in their abdomen, a nested-PCR approach was used to amplify a fragment of the COI gene from vertebrate DNA contained in bloodmeals. The origin of bloodmeals was identified by comparison with the nucleotide-nucleotide basic alignment search tool (BLAST). Results: The morphological identification of 491 female biting midges revealed the presence of a single morphospecies belonging to the Obsoletus group. When sequencing the barcoding region of the 20 females used to check genetic variability, we identified two haplotypes differing in a single base. Comparison analysis using the nucleotide-nucleotide basic alignment search tool (BLAST) showed that both haplotypes belong to Culicoides obsoletus, a potential BTV vector. As well, using molecular tools we identified the feeding sources of 136 biting midges and were able to confirm that C. obsoletus females feed on goats and sheep on both islands.Conclusions: These results confirm that the feeding pattern of C. obsoletus is a potentially important factor in BTV transmission to susceptible hosts in case of introduction into the archipelago. Consequently, in the Canary Islands it is essential to maintain vigilance of Culicoides-transmitted viruses such as BTV and the novel Schmallenberg virus.

Design and implementation of bioinformatics tools for large scale genome annotation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The continuous increase of genome sequencing projects produced a huge amount of data in the last 10 years: currently more than 600 prokaryotic and 80 eukaryotic genomes are fully sequenced and publically available. However the sole sequencing process of a genome is able to determine just raw nucleotide sequences. This is only the first step of the genome annotation process that will deal with the issue of assigning biological information to each sequence. The annotation process is done at each different level of the biological information processing mechanism, from DNA to protein, and cannot be accomplished only by in vitro analysis procedures resulting extremely expensive and time consuming when applied at a this large scale level. Thus, in silico methods need to be used to accomplish the task. The aim of this work was the implementation of predictive computational methods to allow a fast, reliable, and automated annotation of genomes and proteins starting from aminoacidic sequences. The first part of the work was focused on the implementation of a new machine learning based method for the prediction of the subcellular localization of soluble eukaryotic proteins. The method is called BaCelLo, and was developed in 2006. The main peculiarity of the method is to be independent from biases present in the training dataset, which causes the over‐prediction of the most represented examples in all the other available predictors developed so far. This important result was achieved by a modification, made by myself, to the standard Support Vector Machine (SVM) algorithm with the creation of the so called Balanced SVM. BaCelLo is able to predict the most important subcellular localizations in eukaryotic cells and three, kingdom‐specific, predictors were implemented. In two extensive comparisons, carried out in 2006 and 2008, BaCelLo reported to outperform all the currently available state‐of‐the‐art methods for this prediction task. BaCelLo was subsequently used to completely annotate 5 eukaryotic genomes, by integrating it in a pipeline of predictors developed at the Bologna Biocomputing group by Dr. Pier Luigi Martelli and Dr. Piero Fariselli. An online database, called eSLDB, was developed by integrating, for each aminoacidic sequence extracted from the genome, the predicted subcellular localization merged with experimental and similarity‐based annotations. In the second part of the work a new, machine learning based, method was implemented for the prediction of GPI‐anchored proteins. Basically the method is able to efficiently predict from the raw aminoacidic sequence both the presence of the GPI‐anchor (by means of an SVM), and the position in the sequence of the post‐translational modification event, the so called ω‐site (by means of an Hidden Markov Model (HMM)). The method is called GPIPE and reported to greatly enhance the prediction performances of GPI‐anchored proteins over all the previously developed methods. GPIPE was able to predict up to 88% of the experimentally annotated GPI‐anchored proteins by maintaining a rate of false positive prediction as low as 0.1%. GPIPE was used to completely annotate 81 eukaryotic genomes, and more than 15000 putative GPI‐anchored proteins were predicted, 561 of which are found in H. sapiens. In average 1% of a proteome is predicted as GPI‐anchored. A statistical analysis was performed onto the composition of the regions surrounding the ω‐site that allowed the definition of specific aminoacidic abundances in the different considered regions. Furthermore the hypothesis that compositional biases are present among the four major eukaryotic kingdoms, proposed in literature, was tested and rejected. All the developed predictors and databases are freely available at: BaCelLo http://gpcr.biocomp.unibo.it/bacello eSLDB http://gpcr.biocomp.unibo.it/esldb GPIPE http://gpcr.biocomp.unibo.it/gpipe

Computational methods for genome screening

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Motivation An actual issue of great interest, both under a theoretical and an applicative perspective, is the analysis of biological sequences for disclosing the information that they encode. The development of new technologies for genome sequencing in the last years, opened new fundamental problems since huge amounts of biological data still deserve an interpretation. Indeed, the sequencing is only the first step of the genome annotation process that consists in the assignment of biological information to each sequence. Hence given the large amount of available data, in silico methods became useful and necessary in order to extract relevant information from sequences. The availability of data from Genome Projects gave rise to new strategies for tackling the basic problems of computational biology such as the determination of the tridimensional structures of proteins, their biological function and their reciprocal interactions. Results The aim of this work has been the implementation of predictive methods that allow the extraction of information on the properties of genomes and proteins starting from the nucleotide and aminoacidic sequences, by taking advantage of the information provided by the comparison of the genome sequences from different species. In the first part of the work a comprehensive large scale genome comparison of 599 organisms is described. 2,6 million of sequences coming from 551 prokaryotic and 48 eukaryotic genomes were aligned and clustered on the basis of their sequence identity. This procedure led to the identification of classes of proteins that are peculiar to the different groups of organisms. Moreover the adopted similarity threshold produced clusters that are homogeneous on the structural point of view and that can be used for structural annotation of uncharacterized sequences. The second part of the work focuses on the characterization of thermostable proteins and on the development of tools able to predict the thermostability of a protein starting from its sequence. By means of Principal Component Analysis the codon composition of a non redundant database comprising 116 prokaryotic genomes has been analyzed and it has been showed that a cross genomic approach can allow the extraction of common determinants of thermostability at the genome level, leading to an overall accuracy in discriminating thermophilic coding sequences equal to 95%. This result outperform those obtained in previous studies. Moreover, we investigated the effect of multiple mutations on protein thermostability. This issue is of great importance in the field of protein engineering, since thermostable proteins are generally more suitable than their mesostable counterparts in technological applications. A Support Vector Machine based method has been trained to predict if a set of mutations can enhance the thermostability of a given protein sequence. The developed predictor achieves 88% accuracy.

Computational methods for the analysis of protein structure and function

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Kernel Methods for Tree Structured Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

«
1
2
...
54
55
56
57
58
59
60
...
63
64
»