13 resultados para Statistical Error
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do grau de Mestre em Engenharia do Ambiente
Resumo:
RESUMO: Desde 1976 que as Forças Armadas desenvolvem acções de prevenção do consumo de drogas e álcool. Na década de 80 foi criada capacidade laboratorial e deu-se início a um programa de rastreios toxicológicos. No quinquénio 2001 a 2005, as proporções de resultados positivos, associando todos os tipos de rastreio, variaram entre 3,7% e 1,5%. De Outubro de 2006 a Julho de 2007 realizou-se um estudo analítico transversal, para estimar a prevalência do consumo de drogas (canabinóides, opiáceos, cocaína e anfetaminas) num dos Ramos das Forças Armadas, com base nos despistes realizados pelo seu laboratório. Foi utilizada uma amostra aleatória simples de 1039 militares, profissionais (QP) e contratados (RC), no activo e de ambos os sexos. Desde a nomeação dos militares a rastrear, passando pela cadeia de custódia das amostras até à obtenção do resultado foi utilizado apoio informático específico. O processo de pesquisa utilizou duas técnicas de triagem por imunoensaio e tecnologia de confirmação por GC/MS, de acordo com as recomendações europeias, permitindo estabelecer uma metodologia standard para organizações e empresas. A prevalência estimada, de consumidores de droga, foi de 3,8/1.000, para um erro de 0,37%. O número de casos registado (4) não permitiu a utilização de testes estatísticos que conduzissem à identificação de características determinantes da positividade, mas não deixou de revelar aspectos inesperados. A observação de séries de casos e a realização regular de estudos epidemiológicos, que ajudem a redefinir grupos alvo e a perceber a dimensão, as determinantes e as consequências do consumo de drogas é sugerida, em conclusão.--------------------------------------- RÉSUMÉ: Depuis 1976, les Forces Armées mettent au point des mesures visant à prévenir la consommation de drogues et d'alcool. En 1980, fut créé capacité laboratoriel et ont ensuite commencé un programme de dépistage toxicologique. Au cours des cinq années allant de 2001 à 2005, les proportions de consommateurs, impliquant tous les types de dépistage, allaient de 3,7% à 1,5 %. D'octobre 2006 à juillet 2007, une étude analytique transversale a été organisée pour évaluer la prévalence de l’usage de drogues (cannabis, opiacés, cocaïne et amphétamines) dans une branche de les Forces Armées, basée sur les dépistages faites par un laboratoire militaire, à l'aide d'un échantillon aléatoire de 1039 militaires, professionnels (QP) et sous contract (RC), à l’actif et des deux sexes. Tout au long du procès, de la nomination des donneurs, en passant par la chaine de garde des échantillons, jusqu’à obtention du résultat, il fut employé un appui informatique sécurisé. Le processus de recherche employa deux techniques de tri par imunoessay et la technologie de confirmation GC/MS, selon les recommandations européennes, permettant d'établir une méthodologie standard pour les organisations et les entreprises. La prévalence estimée fut de 3,8/1.000 pour une marge d’erreur de 0,37%. Le nombre de cas enregistrés (4) n'autorise pas l'utilisation de testes statistiques de menant à l'identification de caractéristiques déterminant de la positivité, mais il permet à révéler des aspects inattendus. L'observation de séries de cas et la tenue régulière d’études épidémiologiques, qui contribuent à redéfinir les groupes cibles et de comprendre l'ampleur, les déterminants et les conséquences de l'usage de drogues, est suggéré, en fin de compte.--------------------------------------- ABSTRACT: Since 1976, the Armed Forces, have been developing measures to prevent the use of drugs and alcohol. In 1980, was created laboratory facility which then started a program of toxicological screenings. In the five years running from 2001 to 2005, the proportions of consumers, involving all types of screening, ranged from 3,7% to 1,5%. From October 2006 to July 2007, a cross-sectional study was held to estimate the prevalence of drug use (cannabinoids, opiates, cocaine and amphetamines) in one branch of the Portuguese Armed Forces, based on laboratory screenings, using a random sample of 1039 military, professional (QP) and enlisted (RC), active-duty and of both sexes. Specific computer support was used all the way, from the appointment, including the chain of custody of samples, to the obtaining of the result. The process of search used two techniques for sorting by immunoassay and confirmation technology GC/MS, according to European recommendations, allowing to establish a standard methodology for organizations and companies. The estimated prevalence of drug users was 3.8/1.000 for a 0.37% error (95% confidence interval). The number of cases registered (4) does not permit use of statistical testing leading to the identification of characteristics weighing in the establishing to extrapolate for the population, but it allows revealing unexpected aspects. The observation of series of cases and the regular holding of epidemiological studies, which help redefine target groups and to understand the extent, the determinants and consequences of drug use, is suggested, in conclusion.
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Dissertação apresentada para obtenção do Grau de Doutor em Engenharia do Ambiente, pela Universidade Nova de Lisboa, Faculdade de Ciências e Tecnologia
Resumo:
A Work Project, presented as part of the requirements for the Award of a Masters Degree in Finance from the NOVA – School of Business and Economics
Resumo:
Dissertation submitted in the fufillment of the requirements for the Degree of Master in Biomedical Engineering
Função visual e desempenho na leitura em crianças do 1º ciclo do ensino básico do concelho de Lisboa
Resumo:
RESUMO - Esta tese pretende ser um contributo para o estudo das anomalias da função visual e da sua influência no desempenho da leitura. Apresentava como objetivos: (1) Identificar a prevalência de anomalias da função visual, (2) Caracterizar o desempenho da leitura em crianças com e sem anomalias da função visual, (3) Identificar de que modo as anomalias da função visual influenciam o desempenho da leitura e (4) Identificar o impacto das variáveis que determinam o desempenho da leitura. Foi recolhida uma amostra de conveniência com 672 crianças do 1º ciclo do ensino básico de 11 Escolas do Concelho de Lisboa com idades compreendidas entre os 6 e os 11 anos (7,69±1,19), 670 encarregados de educação e 34 Professores. Para recolha de dados, foram utilizados três instrumentos: 2 questionários de perguntas fechadas, avaliação da função visual e prova de avaliação da leitura com 34 palavras. Após observadas, as crianças foram classificadas em dois grupos: função visual normal (FVN=562) e função visual alterada (FVA=110). Identificou-se uma prevalência de 16,4% de crianças com FVA. No teste de leitura, estas crianças apresentaram um menor número de palavras lidas corretamente (FVA=31,00; FVN=33,00; p<0,001) e menor precisão (FVA=91,18%; FVN=97,06%; p<0,001). Esta tendência também foi observada na comparação entre os 4 anos de escolaridade. As crianças com função visual alterada mostraram uma tendência para a omissão de letras e a confusão de grafema. Quanto à fluência (FVA=24,71; FVN=27,39; p=0,007) esta foi inferior nas crianças com FVA para todos os anos de escolaridade, exceto o 3º ano. As crianças com hipermetropia (p=0,003) e astigmatismo (p=0,019) não corrigido leram menos palavras corretamente (30,00; 31,00) e com menor precisão (88,24%; 91,18%) que as crianças sem erro refrativo significativo (32,00; 94,12%). A performance escolar classificada pelos professores foi inferior nas crianças com FVA e mais de ¼ necessitavam de medidas de apoio especial na escola. Não se verificaram diferenças significativas na performance da leitura das crianças com FVA por grupos de habilitações dos encarregados de educação. Verificou-se que o risco de ter um desempenho na leitura alterado é superior [OR=4,29; I.C.95%(2,49;7,38)] nas crianças que apresentam FVA. Relativamente ao 1º ano de escolaridade, o 2º, 3º e 4º anos apresentam um menor risco de ter um desempenho na leitura alterado. As variáveis método de ensino, habilitações dos encarregados de educação, tipo de escola (pública/privada), idade do Professor e número de anos de experiência do Professor, não foram fatores estatisticamente significativos para explicar a alteração do desempenho na leitura, quando o efeito da função visual se encontra contemplado no modelo. Um mau desempenho na leitura foi considerado nas crianças que apresentaram uma precisão inferior a 90%. Este indicador pode ser utilizado para identificar crianças em risco, que necessitam de uma observação Ortóptica/Oftalmológica para confirmação ou exclusão da existência de alterações da função visual. Este trabalho constitui um contributo para a identificação de crianças em desvantagem educacional devido a anomalias da função visual tratáveis, propondo um modelo que pretende orientar os professores na identificação de crianças que apresentem um baixo desempenho na leitura.
Resumo:
Retinal ultra-wide field of view images (fundus images) provides the visu-alization of a large part of the retina though, artifacts may appear in those images. Eyelashes and eyelids often cover the clinical region of interest and worse, eye-lashes can be mistaken with arteries and/or veins when those images are put through automatic diagnosis or segmentation software creating, in those cases, the appearance of false positives results. Correcting this problem, the first step in the development of qualified auto-matic diseases diagnosis programs can be done and in that way the development of an objective tool to assess diseases eradicating the human error from those processes can also be achieved. In this work the development of a tool that automatically delimitates the clinical region of interest is proposed by retrieving features from the images that will be analyzed by an automatic classifier. This automatic classifier will evaluate the information and will decide which part of the image is of interest and which part contains artifacts. The results were validated by implementing a software in C# language and validated through a statistical analysis. From those results it was confirmed that the methodology presented is capable of detecting artifacts and selecting the clin-ical region of interest in fundus images of the retina.
Resumo:
This work project (WP) is a study about a clustering strategy for Sport Zone. The general cluster study’s objective is to create groups such that within each group the individuals are similar to each other, but should be different among groups. The clusters creation is a mix of common sense, trial and error and some statistical supporting techniques. Our particular objective is to support category managers to better define the product type to be displayed in the stores’ shelves by doing store clusters. This research was carried out for Sport Zone, and comprises an objective definition, a literature review, the clustering activity itself, some factor analysis and a discriminant analysis to better frame our work. Together with this quantitative part, a survey addressed to category managers to better understand their key drivers, for choosing the type of product of each store, was carried out. Based in a non-random sample of 65 stores with data referring to 2013, the final result was the choice of 6 store clusters (Figure 1) which were individually characterized as the main outcome of this work. In what relates to our selected variables, all were important for the distinction between clusters, which proves the adequacy of their choice. The interpretation of the results gives category managers a tool to understand which products best fit the clustered stores. Furthermore, as a side finding thanks to the clusterization, a STP (Segmentation, Targeting and Positioning) was initiated, being this WP the first steps of a continuous process.
Resumo:
This paper is mainly concerned with the tracking accuracy of Exchange Traded Funds (ETFs) listed on the London Stock Exchange (LSE) but also evaluates their performance and pricing efficiency. The findings show that ETFs offer virtually the same return but exhibit higher volatility than their benchmark. It seems that the pricing efficiency, which should come from the creation and redemption process, does not fully hold as equity ETFs show consistent price premiums. The tracking error of the funds is generally small and is decreasing over time. The risk of the ETF, daily price volatility and the total expense ratio explain a large part of the tracking error. Trading volume, fund size, bid-ask spread and average price premium or discount did not have an impact on the tracking error. Finally, it is concluded that market volatility and the tracking error are positively correlated.