Biblioteca Digital

12 resultados para Data Interpretation, Statistical

Stratigraphic interpretation of Well-Log data of the Athabasca Oil Sands of Alberta Canada through Pattern recognition and Artificial Intelligence

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Veja mais

Automatic learning for the classification of chemical reactions and in statistical thermodynamics

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.

Veja mais

Using administrative data for research: the importance of appropriate statistical techniques.

Relevância:

40.00% 40.00%

Publicador:

Veja mais

Exploraty Multivariate Statistical Methods Applied to Pharmaceutical Industry CRM Data

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Disserta��o apresentada como requisito parcial para obten��o do grau de Mestre em Estat��stica e Gest��o de Informa��o

Veja mais

Mining protein structure data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The principal topic of this work is the application of data mining techniques, in particular of machine learning, to the discovery of knowledge in a protein database. In the first chapter a general background is presented. Namely, in section 1.1 we overview the methodology of a Data Mining project and its main algorithms. In section 1.2 an introduction to the proteins and its supporting file formats is outlined. This chapter is concluded with section 1.3 which defines that main problem we pretend to address with this work: determine if an amino acid is exposed or buried in a protein, in a discrete way (i.e.: not continuous), for five exposition levels: 2%, 10%, 20%, 25% and 30%. In the second chapter, following closely the CRISP-DM methodology, whole the process of construction the database that supported this work is presented. Namely, it is described the process of loading data from the Protein Data Bank, DSSP and SCOP. Then an initial data exploration is performed and a simple prediction model (baseline) of the relative solvent accessibility of an amino acid is introduced. It is also introduced the Data Mining Table Creator, a program developed to produce the data mining tables required for this problem. In the third chapter the results obtained are analyzed with statistical significance tests. Initially the several used classifiers (Neural Networks, C5.0, CART and Chaid) are compared and it is concluded that C5.0 is the most suitable for the problem at stake. It is also compared the influence of parameters like the amino acid information level, the amino acid window size and the SCOP class type in the accuracy of the predictive models. The fourth chapter starts with a brief revision of the literature about amino acid relative solvent accessibility. Then, we overview the main results achieved and finally discuss about possible future work. The fifth and last chapter consists of appendices. Appendix A has the schema of the database that supported this thesis. Appendix B has a set of tables with additional information. Appendix C describes the software provided in the DVD accompanying this thesis that allows the reconstruction of the present work.

Veja mais

Assessment of climate change statistical downscaling methods: Application and comparison of two statistical methods to a single site in Lisbon

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Disserta��o apresentada na Faculdade de Ci��ncias e Tecnologia da Universidade Nova de Lisboa para a obten��o do grau de Mestre em Engenharia do Ambiente

Veja mais

Landslide susceptibility assessment in Karanganyar regency - Indonesia - Comparison of knowledge-based and Data-driven Models

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Veja mais

Spatial Patterns and Irregularities of the electoral data: general elections in Canada

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.

Veja mais

SDAR: a package for plotting and analyzing stratigraphy data in R

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Stratigraphic Columns (SC) are the most useful and common ways to represent the eld descriptions (e.g., grain size, thickness of rock packages, and fossil and lithological components) of rock sequences and well logs. In these representations the width of SC vary according to the grain size (i.e., the wider the strata, the coarser the rocks (Miall 1990; Tucker 2011)), and the thickness of each layer is represented at the vertical axis of the diagram. Typically these representations are drawn 'manually' using vector graphic editors (e.g., Adobe Illustrator��, CorelDRAW��, Inskape). Nowadays there are various software which automatically plot SCs, but there are not versatile open-source tools and it is very di cult to both store and analyse stratigraphic information. This document presents Stratigraphic Data Analysis in R (SDAR), an analytical package1 designed for both plotting and facilitate the analysis of Stratigraphic Data in R (R Core Team 2014). SDAR, uses simple stratigraphic data and takes advantage of the exible plotting tools available in R to produce detailed SCs. The main bene ts of SDAR are: (i) used to generate accurate and complete SC plot including multiple features (e.g., sedimentary structures, samples, fossil content, color, structural data, contacts between beds), (ii) developed in a free software environment for statistical computing and graphics, (iii) run on a wide variety of platforms (i.e., UNIX, Windows, and MacOS), (iv) both plotting and analysing functions can be executed directly on R's command-line interface (CLI), consequently this feature enables users to integrate SDAR's functions with several others add-on packages available for R from The Comprehensive R Archive Network (CRAN).

Veja mais

The role of qualitative data and systems thinking in addressing service decline in market towns

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Retail services are a main contributor to municipal budget and are an activity that affects perceived quality-of-life, especially for those with mobility difficulties (e.g. the elderly, low income citizens). However, there is evidence of a decline in some of the services market towns provide to their citizens. In market towns, this decline has been reported all over the western world, from North America to Australia. The aim of this research was to understand retail decline and enlighten on some ways of addressing this decline, using a case study, Thornbury, a small town in the Southwest of England. Data collected came from two participatory approaches: photo-surveys and multicriteria mapping. The interpretation of data came from using participants as analysts, but also, using systems thinking (systems diagramming and social trap theory) for theory building. This research moves away from mainstream economic and town planning perspectives by making use of different methods and concepts used in anthropology and visual sociology (photo-surveys), decision-making and ecological economics (multicriteria mapping and social trap theory). In sum, this research has experimented with different methods, out of their context, to analyse retail decline in a small town. This research developed a conceptual model for retail decline and identified the existence of conflicting goals and interests and their implications for retail decline, as well as causes for these. Most of the potential causes have had little attention in the literature. This research also identified that some of the measures commonly used for dealing with retail decline may be contributing to the causes of retail decline itself. Additionally, this research reviewed some of the measures that can be used to deal with retail decline, implications for policy-making and reflected on the use of the data collection and analysis methods in the context of small to medium towns.

Veja mais

How can clusters help in the product type choice? The case of a specialized retail store

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This work project (WP) is a study about a clustering strategy for Sport Zone. The general cluster study��s objective is to create groups such that within each group the individuals are similar to each other, but should be different among groups. The clusters creation is a mix of common sense, trial and error and some statistical supporting techniques. Our particular objective is to support category managers to better define the product type to be displayed in the stores�� shelves by doing store clusters. This research was carried out for Sport Zone, and comprises an objective definition, a literature review, the clustering activity itself, some factor analysis and a discriminant analysis to better frame our work. Together with this quantitative part, a survey addressed to category managers to better understand their key drivers, for choosing the type of product of each store, was carried out. Based in a non-random sample of 65 stores with data referring to 2013, the final result was the choice of 6 store clusters (Figure 1) which were individually characterized as the main outcome of this work. In what relates to our selected variables, all were important for the distinction between clusters, which proves the adequacy of their choice. The interpretation of the results gives category managers a tool to understand which products best fit the clustered stores. Furthermore, as a side finding thanks to the clusterization, a STP (Segmentation, Targeting and Positioning) was initiated, being this WP the first steps of a continuous process.

Veja mais

Reclama��es dos utentes : estudo dos resultados e medidas correctivas no contexto dos Agrupamentos de Centros de Sa��de de Lisboa e Vale do Tejo

Relevância:

30.00% 30.00%

Publicador:

Resumo:

RESUMO - A qualidade dos cuidados de sa��de, evolui ao longo dos tempos e �� agora considerada um direito e um pilar fundamental nos servi��os de sa��de. As reclama��es dos utentes podem revelar informa��o acerca das experi��ncias entre os utentes e as organiza��es de sa��de. Desta forma as reclama��es podem ser consideradas como indicadores de qualidade que permitem identificar ��reas e/ou oportunidades de melhoria, e de grande representatividade no processo da melhoria cont��nua da qualidade na sa��de. Sendo fundamental dar voz aos utentes do SNS e possibilitar a sua participa��o activa no processo de melhoria da presta��o dos cuidados de sa��de, com este trabalho pretendeu-se estudar a forma como as reclama��es dos utentes nos ACES na Regi��o de Sa��de de Lisboa e Vale do Tejo, contribuem para a melhoria da qualidade nos referidos servi��os de sa��de. Foram reconhecidas e analisadas as principais causas de reclama��o, as correspondentes medidas correctivas e as necessidades e/ou dificuldades no seu processo de implementa��o, bem como a respectiva avalia��o dos resultados obtidos e identifica��o das recomenda��es dos Coordenadores dos Gabinetes do Utente no ��mbito dos ACES da Regi��o de Sa��de de LVT. Efectuou-se a an��lise de revis��o bibliogr��fica e a consulta dos dados, desagregados, das causas mais mencionadas nas reclama��es no ��mbito do estudo e foram realizados contactos informais com a estrutura regional e nacional do Sistema SIM-Cidad��o. Foram aplicados 15 question��rios aos Coordenadores Locais dos Gabinetes do Cidad��o dos ACES da ARSLVT, apresentando a investiga��o um car��cter explorat��rio e qualitativo. Os question��rios, foram enviados e recebidos anonimamente atrav��s da plataforma para estudos estat��sticos Survey Monkey. A sua an��lise e interpreta��o, foi efectuada de forma a organizar os seus dados de uma forma sistematizada e permitir categorizar a informa��o para permitir a sua an��lise. Os resultados evidenciaram que as reclama��es dos utentes apresentadas nos Gabinetes do Cidad��o, de certa forma, foram um contributo para o processo da melhoria da qualidade nos ACES da Regi��o de Sa��de de Lisboa e Vale do Tejo atrav��s do adop��o de medidas e ac��es correctivas, ultrapassando algumas limita��es devida �� cria��o de estrat��gias locais. No entanto foi evidente que algumas limita��es n��o foram passiveis de ser ultrapassadas, pois envolvem decis��es do ��mbito externo aos ACES. Os resultados alcan��ados e as recomenda��es dos Coordenadores, podem evidenciar algumas mudan��as organizacionais, mas transparecem a ideia de que existe ainda um longo caminho a percorrer.

Veja mais

12 resultados para Data Interpretation, Statistical

Filtro por publicador