7 resultados para Directed acyclic graphs
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
Alzheimer's disease (AD) and cancer represent two of the main causes of death worldwide. They are complex multifactorial diseases and several biochemical targets have been recognized to play a fundamental role in their development. Basing on their complex nature, a promising therapeutical approach could be represented by the so-called "Multi-Target-Directed Ligand" approach. This new strategy is based on the assumption that a single molecule could hit several targets responsible for the onset and/or progression of the pathology. In particular in AD, most currently prescribed drugs aim to increase the level of acetylcholine in the brain by inhibiting the enzyme acetylcholinesterase (AChE). However, clinical experience shows that AChE inhibition is a palliative treatment, and the simple modulation of a single target does not address AD aetiology. Research into newer and more potent anti-AD agents is thus focused on compounds whose properties go beyond AChE inhibition (such as inhibition of the enzyme β-secretase and inhibition of the aggregation of beta-amyloid). Therefore, the MTDL strategy seems a more appropriate approach for addressing the complexity of AD and may provide new drugs for tackling its multifactorial nature. In this thesis, it is described the design of new MTDLs able to tackle the multifactorial nature of AD. Such new MTDLs designed are less flexible analogues of Caproctamine, one of the first MTDL owing biological properties useful for the AD treatment. These new compounds are able to inhibit the enzymes AChE, beta-secretase and to inhibit both AChE-induced and self-induced beta-amyloid aggregation. In particular, the most potent compound of the series is able to inhibit AChE in subnanomolar range, to inhibit β-secretase in micromolar concentration and to inhibit both AChE-induced and self-induced beta-amyloid aggregation in micromolar concentration. Cancer, as AD, is a very complex pathology and many different therapeutical approaches are currently use for the treatment of such pathology. However, due to its multifactorial nature the MTDL approach could be, in principle, apply also to this pathology. Aim of this thesis has been the development of new molecules owing different structural motifs able to simultaneously interact with some of the multitude of targets responsible for the pathology. The designed compounds displayed cytotoxic activity in different cancer cell lines. In particular, the most potent compounds of the series have been further evaluated and they were able to bind DNA resulting 100-fold more potent than the reference compound Mitonafide. Furthermore, these compounds were able to trigger apoptosis through caspases activation and to inhibit PIN1 (preliminary result). This last protein is a very promising target because it is overexpressed in many human cancers, it functions as critical catalyst for multiple oncogenic pathways and in several cancer cell lines depletion of PIN1 determines arrest of mitosis followed by apoptosis induction. In conclusion, this study may represent a promising starting pint for the development of new MTDLs hopefully useful for cancer and AD treatment.
Resumo:
The MTDL (multi-target-directed ligand) design strategy is used to develop single chemical entities that are able to simultaneously modulate multiple targets. The development of such compounds might disclose new avenues for the treatment of a variety of pathologies (e.g. cancer, AIDS, neurodegenerative diseases), for which an effective cure is urgently needed. This strategy has been successfully applied to Alzheimer’s disease (AD) due to its multifactorial nature, involving cholinergic dysfunction, amyloid aggregation, and oxidative stress. Despite many biological entities have been recognized as possible AD-relevant, only four achetylcholinesterase inhibitors (AChEIs) and one NMDA receptor antagonist are used in therapy. Unfortunately, such compounds are not disease-modifying agents behaving only as cognition enhancers. Therefore, MTDL strategy is emerging as a powerful drug design paradigm: pharmacophores of different drugs are combined in the same structure to afford hybrid molecules. In principle, each pharmacophore of these new drugs should retain the ability to interact with its specific site(s) on the target and, consequently, to produce specific pharmacological responses that, taken together, should slow or block the neurodegenerative process. To this end, the design and synthesis of several examples of MTDLs for combating neurodegenerative diseases have been published. This seems to be the more appropriate approach for addressing the complexity of AD and may provide new drugs for tackling the multifactorial nature of AD, and hopefully stopping its progression. According to this emerging strategy, in this work thesis different classes of new molecular structures, based on the MTDL approach, have been developed. Moreover, curcumin and its constrained analogs have currently received remarkable interest as they have a unique conjugated structure which shows a pleiotropic profile that we considered a suitable framework in developing MTDLs. In fact, beside the well-known direct antioxidant activity, curcumin displays a wide range of biological properties including anti-inflammatory and anti-amyloidogenic activities and an indirect antioxidant action through activation of the cytoprotective enzyme heme oxygenase (HO-1). Thus, since many lines of evidence suggest that oxidative stess and mitochondria impairment have a cental role in age-related neurodegenerative diseases such as AD, we designed mitochondria-targeted antioxidants by connecting curcumin analogs to different polyamine chains that, with the aid of electrostatic force, might drive the selected antioxidant moiety into mitochondria.