4 resultados para Acyclic stereocontrol
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
The main aim of my PhD project was the design and the synthesis of new pyrrolidine organocatalysts. New effective ferrocenyl pyrrolidine catalysts, active in benchmark organocatalytic reactions, has been developed. The ferrocenyl moiety, in combination with simple ethyl chains, is capable of fixing the enamine conformation addressing the approach trajectory of the nucleophile in the reaction. The results obtained represent an interesting proof-of-concept, showing for the first time the remarkable effectiveness of the ferrocenyl moiety in providing enantioselectivity through conformational selection. This approach could be viably employed in the rational design of ligands for metal or organocatalysts. Other hindered secondary amines has been prepared from alkylation of acyclic chiral nitroderivatives with alcohols in a highly diastereoselective fashion, giving access to functionalized, useful organocatalytic chiral pyrrolidines. A family of new pyrrolidines bearing sterogenic centers and functional groups can be readily accessible by this methodology. The second purpose of the project was to study in deep the reactivity of stabilized carbocations in new metal-free and organocatalytic reactions. By taking advantage of the results from the kinetic studies described by Mayr, a simple and effective procedure for the direct formylation of aryltetrafluoroborate salts, has been development. The coupling of a range of aryl- and heteroaryl- trifluoroborate salts with 1,3-benzodithiolylium tetrafluoroborate, has been attempted in moderate to good yields. Finally, a simple and general methodology for the enamine-mediated enantioselective α-alkylation of α-substituted aldehydes with 1,3-benzodithiolylium tetrafluoroborate has been reported. The introduction of the benzodithiole moiety permit the installation of different functional groups due to its chameleonic behaviour.
Resumo:
In this work we presented several aspects regarding the possibility to use readily available propargylic alcohols as acyclic precursors to develop new stereoselective [Au(I)]-catalyzed cascade reactions for the synthesis of highly complex indole architectures. The use of indole-based propargylic alcohols of type 1 in a stereoselective [Au(I)]-catalyzed hydroindolynation/immiun trapping reactive sequence opened access to a new class of tetracyclic indolines, dihydropyranylindolines A and furoindolines B. An enantioselective protocol was futher explored in order to synthesize this molecules with high yields and ee. The suitability of propargylic alcohols in [Au(I)]-catalyzed cascade reactions was deeply investigated by developing cascade reactions in which was possible not only to synthesize the indole core but also to achieve a second functionalization. Aniline based propargylic alcohols 2 were found to be modular acyclic precursors for the synthesis of [1,2-a] azepinoindoles C. In describing this reactivity we additionally reported experimental evidences for an unprecedented NHCAu(I)-vinyl specie which in a chemoselective fashion, led to the annulation step, synthesizing the N1-C2-connected seven membered ring. The chemical flexibility of propargylic alcohols was further explored by changing the nature of the chemical surrounding with different preinstalled N-alkyl moiety in propargylic alcohols of type 3. Particularly, in the case of a primary alcohol, [Au(I)] catalysis was found to be prominent in the synthesis of a new class of [4,3-a]-oxazinoindoles D while the use of an allylic alcohol led to the first example of [Au(I)] catalyzed synthesis and enantioselective functionalization of this class of molecules (D*). With this work we established propargylic alcohols as excellent acyclic precursor to developed new [Au(I)]-catalyzed cascade reaction and providing new catalytic synthetic tools for the stereoselective synthesis of complex indole/indoline architectures.
Resumo:
In many application domains data can be naturally represented as graphs. When the application of analytical solutions for a given problem is unfeasible, machine learning techniques could be a viable way to solve the problem. Classical machine learning techniques are defined for data represented in a vectorial form. Recently some of them have been extended to deal directly with structured data. Among those techniques, kernel methods have shown promising results both from the computational complexity and the predictive performance point of view. Kernel methods allow to avoid an explicit mapping in a vectorial form relying on kernel functions, which informally are functions calculating a similarity measure between two entities. However, the definition of good kernels for graphs is a challenging problem because of the difficulty to find a good tradeoff between computational complexity and expressiveness. Another problem we face is learning on data streams, where a potentially unbounded sequence of data is generated by some sources. There are three main contributions in this thesis. The first contribution is the definition of a new family of kernels for graphs based on Directed Acyclic Graphs (DAGs). We analyzed two kernels from this family, achieving state-of-the-art results from both the computational and the classification point of view on real-world datasets. The second contribution consists in making the application of learning algorithms for streams of graphs feasible. Moreover,we defined a principled way for the memory management. The third contribution is the application of machine learning techniques for structured data to non-coding RNA function prediction. In this setting, the secondary structure is thought to carry relevant information. However, existing methods considering the secondary structure have prohibitively high computational complexity. We propose to apply kernel methods on this domain, obtaining state-of-the-art results.