948 resultados para Similarity Query
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Dissertation presented at the Faculty of Science and Technology of the New University of Lisbon in fulfillment of the requirements for the Masters degree in Electrical Engineering and Computers
Resumo:
Dissertation presented in partial fulfillment of the requirements for the degree of Master in Biotechnology
Resumo:
A genetic algorithm used to design radio-frequency binary-weighted differential switched capacitor arrays (RFDSCAs) is presented in this article. The algorithm provides a set of circuits all having the same maximum performance. This article also describes the design, implementation, and measurements results of a 0.25 lm BiCMOS 3-bit RFDSCA. The experimental results show that the circuit presents the expected performance up to 40 GHz. The similarity between the evolutionary solutions, circuit simulations, and measured results indicates that the genetic synthesis method is a very useful tool for designing optimum performance RFDSCAs.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Mestrado em Engenharia Informática - Área de Especialização em Sistemas Gráficos e Multimédia
Resumo:
Mestrado em Engenharia Electrotécnica e de Computadores - Área de Especialização de Telecomunicações
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Com o crescimento da informação disponível na Web, arquivos pessoais e profissionais, protagonizado tanto pelo aumento da capacidade de armazenamento de dados, como pelo aumento exponencial da capacidade de processamento dos computadores, e do fácil acesso a essa mesma informação, um enorme fluxo de produção e distribuição de conteúdos audiovisuais foi gerado. No entanto, e apesar de existirem mecanismos para a indexação desses conteúdos com o objectivo de permitir a pesquisa e acesso aos mesmos, estes apresentam normalmente uma grande complexidade algorítmica ou exigem a contratação de pessoal altamente qualificado, para a verificação e categorização dos conteúdos. Nesta dissertação pretende-se estudar soluções de anotação colaborativa de conteúdos e desenvolver uma ferramenta que facilite a anotação de um arquivo de conteúdos audiovisuais. A abordagem implementada é baseada no conceito dos “Jogos com Propósito” (GWAP – Game With a Purpose) e permite que os utilizadores criem tags (metadatos na forma de palavras-chave) de forma a atribuir um significado a um objecto a ser categorizado. Assim, e como primeiro objectivo, foi desenvolvido um jogo com o propósito não só de entretenimento, mas também que permita a criação de anotações audiovisuais perante os vídeos que são apresentados ao jogador e, que desta forma, se melhore a indexação e categorização dos mesmos. A aplicação desenvolvida permite ainda a visualização dos conteúdos e metadatos categorizados, e com o objectivo de criação de mais um elemento informativo, permite a inserção de um like num determinado instante de tempo do vídeo. A grande vantagem da aplicação desenvolvida reside no facto de adicionar anotações a pontos específicos do vídeo, mais concretamente aos seus instantes de tempo. Trata-se de uma funcionalidade nova, não disponível em outras aplicações de anotação colaborativa de conteúdos audiovisuais. Com isto, o acesso aos conteúdos será bastante mais eficaz pois será possível aceder, por pesquisa, a pontos específicos no interior de um vídeo.
Resumo:
Esta dissertação enquadra-se no âmbito dos Sistemas de Informação, em concreto, no desenvolvimento de aplicações Web, como é o caso de um website. Com a utilização em larga escala dos meios tecnológicos tem-se verificado um crescimento exponencial dos mesmos, o que se traduz na facilidade com que podem ser encontradas na Internet diversos tipos de plataformas informáticas. Além disso, hoje em dia, uma grande parte das organizações possui o seu próprio sítio na Internet, onde procede à divulgação dos seus serviços e/ou produtos. Pretende-se com esta dissertação explorar estas novas tecnologias, nomeadamente, os diagramas UML - Unified Modeling Language e a concepção de bases de dados, e posteriormente desenvolver um website. Com o desenvolvimento deste website não se propõe a criação de uma nova tecnologia, mas o uso de diversas tecnologias em conjunto com recurso às ferramentas UML. Este encontra-se organizado em três fases principais: análise de requisitos, implementação e desenho das interfaces. Na análise de requisitos efectuou-se o levantamento dos objectivos propostos para o sistema e das necessidades/requisitos necessários à sua implementação, auxiliado essencialmente pelo Diagrama de Use Cases do sistema. Na fase de implementação foram elaborados os arquivos e directórios que formam a arquitectura lógica de acordo com os modelos descritos no Diagrama de Classes e no Diagrama de Entidade-Relação. Os requisitos identificados foram analisados e usados na composição das interfaces e sistema de navegação. Por fim, na fase de desenho das interfaces foram aperfeiçoadas as interfaces desenvolvidas, com base no conceito artístico e criativo do autor. Este aperfeiçoamento vai de encontro ao gosto pessoal e tem como objectivo elaborar uma interface que possa também agradar ao maior número possível de utilizadores. Este pode ser observado na maneira como se encontram distribuídas as ligações (links) entre páginas, nos títulos, nos cabeçalhos, nas cores e animações e no seu design em geral. Para o desenvolvimento do website foram utilizadas diferentes linguagens de programação, nomeadamente a HyperText Markup Language (HTML), a Page Hypertext Preprocessor (PHP) e Javascript. A HTML foi utilizada para a disposição de todo o conteúdo visível das páginas e para definição do layout das mesmas e a PHP para executar pequenos scripts que permitem interagir com as diferentes funcionalidades do site. A linguagem Javascript foi usada para definir o design das páginas e incluir alguns efeitos visuais nas mesmas. Para a construção das páginas que compõem o website foi utilizado o software Macromedia Dreamweaver, o que simplificou a sua implementação pela facilidade com que estas podem ser construídas. Para interacção com o sistema de gestão da base de dados, o MySQL, foi utilizada a aplicação phpMyAdmin, que simplifica o acesso à base de dados, permitindo definir, manipular e consultar os seus dados.
Resumo:
Extracting the semantic relatedness of terms is an important topic in several areas, including data mining, information retrieval and web recommendation. This paper presents an approach for computing the semantic relatedness of terms using the knowledge base of DBpedia — a community effort to extract structured information from Wikipedia. Several approaches to extract semantic relatedness from Wikipedia using bag-of-words vector models are already available in the literature. The research presented in this paper explores a novel approach using paths on an ontological graph extracted from DBpedia. It is based on an algorithm for finding and weighting a collection of paths connecting concept nodes. This algorithm was implemented on a tool called Shakti that extract relevant ontological data for a given domain from DBpedia using its SPARQL endpoint. To validate the proposed approach Shakti was used to recommend web pages on a Portuguese social site related to alternative music and the results of that experiment are reported in this paper.
Resumo:
Hepatobiliary alterations found in an autopsy case of massive Biliary Ascariasis, are reported on histological grounds. Severe cholangitis was the main finding, but other changes were also detected, such as pyloric and intestinal metaplasia, hyperplasia of the epithelial lining, with intraductal papillomas and adenomatous proliferation. Remnants of the worm were observed tightly adhered to the epithelium, forming microscopic intrahepatic calculi. Mucopolysaccharides, especially acid, showed to be strongly positive on the luminal border, and in proliferated glands around the ducts. The authors discuss the similarity between such findings and Oriental Cholangiohepatitis, and suggest that inflammation and the presence of the parasitic remnants are responsible for the hyperplastic and metaplastic changes, similarly with what occurs in chlonorchiasis, fascioliasis and schistosomiasis.
Resumo:
Relatório da Prática Profissional Supervisionada Mestrado em Educação Pré-Escolar