4 resultados para Bayesian statistical decision theory
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Over the last fifty years mobility practices have changed dramatically, improving the way travel takes place, the time it takes but also on matters like road safety and prevention. High mortality caused by high accident levels has reached untenable levels. But the research into road mortality stayed limited to comparative statistical exercises which go no further than defining accident types. In terms of sharing information and mapping accidents, little progress has been mad, aside from the normal publication of figures, either through simplistic tables or web pages. With considerable technological advances on geographical information technologies, research and development stayed rather static with only a few good examples on dynamic mapping. The use of Global Positioning System (GPS) devices as normal equipments on automobile industry resulted in a more dynamic mobility patterns but also with higher degrees of uncertainty on road traffic. This paper describes a road accident georeferencing project for the Lisbon District involving fatalities and serious injuries during 2007. In the initial phase, individual information summaries were compiled giving information on accidents and its majour characteristics, collected by the security forces: the Public Safety Police Force (Polícia de Segurança Pública - PSP) and the National Guard (Guarda Nacional Republicana - GNR). The Google Earth platform was used to georeference the information in order to inform the public and the authorities of the accident locations, the nature of the location, and the causes and consequences of the accidents. This paper also gives future insights about augmented reality technologies, considered crucial to advances to road safety and prevention studies. At the end, this exercise could be considered a success because of numerous consequences, as for stakeholders who decide what to do but also for the public awareness to the problem of road mortality.
Resumo:
Dissertação para obtenção do Grau de Doutor em Informática
Resumo:
Economics is a social science which, therefore, focuses on people and on the decisions they make, be it in an individual context, or in group situations. It studies human choices, in face of needs to be fulfilled, and a limited amount of resources to fulfill them. For a long time, there was a convergence between the normative and positive views of human behavior, in that the ideal and predicted decisions of agents in economic models were entangled in one single concept. That is, it was assumed that the best that could be done in each situation was exactly the choice that would prevail. Or, at least, that the facts that economics needed to explain could be understood in the light of models in which individual agents act as if they are able to make ideal decisions. However, in the last decades, the complexity of the environment in which economic decisions are made and the limits on the ability of agents to deal with it have been recognized, and incorporated into models of decision making in what came to be known as the bounded rationality paradigm. This was triggered by the incapacity of the unboundedly rationality paradigm to explain observed phenomena and behavior. This thesis contributes to the literature in three different ways. Chapter 1 is a survey on bounded rationality, which gathers and organizes the contributions to the field since Simon (1955) first recognized the necessity to account for the limits on human rationality. The focus of the survey is on theoretical work rather than the experimental literature which presents evidence of actual behavior that differs from what classic rationality predicts. The general framework is as follows. Given a set of exogenous variables, the economic agent needs to choose an element from the choice set that is avail- able to him, in order to optimize the expected value of an objective function (assuming his preferences are representable by such a function). If this problem is too complex for the agent to deal with, one or more of its elements is simplified. Each bounded rationality theory is categorized according to the most relevant element it simplifes. Chapter 2 proposes a novel theory of bounded rationality. Much in the same fashion as Conlisk (1980) and Gabaix (2014), we assume that thinking is costly in the sense that agents have to pay a cost for performing mental operations. In our model, if they choose not to think, such cost is avoided, but they are left with a single alternative, labeled the default choice. We exemplify the idea with a very simple model of consumer choice and identify the concept of isofin curves, i.e., sets of default choices which generate the same utility net of thinking cost. Then, we apply the idea to a linear symmetric Cournot duopoly, in which the default choice can be interpreted as the most natural quantity to be produced in the market. We find that, as the thinking cost increases, the number of firms thinking in equilibrium decreases. More interestingly, for intermediate levels of thinking cost, an equilibrium in which one of the firms chooses the default quantity and the other best responds to it exists, generating asymmetric choices in a symmetric model. Our model is able to explain well-known regularities identified in the Cournot experimental literature, such as the adoption of different strategies by players (Huck et al. , 1999), the inter temporal rigidity of choices (Bosch-Dom enech & Vriend, 2003) and the dispersion of quantities in the context of di cult decision making (Bosch-Dom enech & Vriend, 2003). Chapter 3 applies a model of bounded rationality in a game-theoretic set- ting to the well-known turnout paradox in large elections, pivotal probabilities vanish very quickly and no one should vote, in sharp contrast with the ob- served high levels of turnout. Inspired by the concept of rhizomatic thinking, introduced by Bravo-Furtado & Côrte-Real (2009a), we assume that each per- son is self-delusional in the sense that, when making a decision, she believes that a fraction of the people who support the same party decides alike, even if no communication is established between them. This kind of belief simplifies the decision of the agent, as it reduces the number of players he believes to be playing against { it is thus a bounded rationality approach. Studying a two-party first-past-the-post election with a continuum of self-delusional agents, we show that the turnout rate is positive in all the possible equilibria, and that it can be as high as 100%. The game displays multiple equilibria, at least one of which entails a victory of the bigger party. The smaller one may also win, provided its relative size is not too small; more self-delusional voters in the minority party decreases this threshold size. Our model is able to explain some empirical facts, such as the possibility that a close election leads to low turnout (Geys, 2006), a lower margin of victory when turnout is higher (Geys, 2006) and high turnout rates favoring the minority (Bernhagen & Marsh, 1997).