39 resultados para Databases - Duplicate tuples
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Current computer systems have evolved from featuring only a single processing unit and limited RAM, in the order of kilobytes or few megabytes, to include several multicore processors, o↵ering in the order of several tens of concurrent execution contexts, and have main memory in the order of several tens to hundreds of gigabytes. This allows to keep all data of many applications in the main memory, leading to the development of inmemory databases. Compared to disk-backed databases, in-memory databases (IMDBs) are expected to provide better performance by incurring in less I/O overhead. In this dissertation, we present a scalability study of two general purpose IMDBs on multicore systems. The results show that current general purpose IMDBs do not scale on multicores, due to contention among threads running concurrent transactions. In this work, we explore di↵erent direction to overcome the scalability issues of IMDBs in multicores, while enforcing strong isolation semantics. First, we present a solution that requires no modification to either database systems or to the applications, called MacroDB. MacroDB replicates the database among several engines, using a master-slave replication scheme, where update transactions execute on the master, while read-only transactions execute on slaves. This reduces contention, allowing MacroDB to o↵er scalable performance under read-only workloads, while updateintensive workloads su↵er from performance loss, when compared to the standalone engine. Second, we delve into the database engine and identify the concurrency control mechanism used by the storage sub-component as a scalability bottleneck. We then propose a new locking scheme that allows the removal of such mechanisms from the storage sub-component. This modification o↵ers performance improvement under all workloads, when compared to the standalone engine, while scalability is limited to read-only workloads. Next we addressed the scalability limitations for update-intensive workloads, and propose the reduction of locking granularity from the table level to the attribute level. This further improved performance for intensive and moderate update workloads, at a slight cost for read-only workloads. Scalability is limited to intensive-read and read-only workloads. Finally, we investigate the impact applications have on the performance of database systems, by studying how operation order inside transactions influences the database performance. We then propose a Read before Write (RbW) interaction pattern, under which transaction perform all read operations before executing write operations. The RbW pattern allowed TPC-C to achieve scalable performance on our modified engine for all workloads. Additionally, the RbW pattern allowed our modified engine to achieve scalable performance on multicores, almost up to the total number of cores, while enforcing strong isolation.
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Dissertação de Mestrado em Engenharia Informática
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Electrotécnica e de Computadores
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Ciência e Sistemas de Informação Geográfica
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Software transactional memory is a promising programming model that adapts many concepts borrowed from the databases world to control concurrent accesses to main memory (RAM) locations. This paper discusses how to support apparently irreversible operations, such as memory allocation and deallocation, within software libraries that will be used in (software memory) transactional contexts, and propose a generic and elegant approach based on a handler system, which provide the means to create and execute compensation actions at key moments during the life-time of a transaction.
Resumo:
In Portugal, especially starting in the 1970s, women’s studies had implications on the emergency of the concept of gender and the feminist criticism to the prevailing models about differences between sexes. Until then, women had been absent from scientific research both as subject and as object. Feminism brought more reflexivity to the scientific thinking. After the 25th of April 1974, because of the consequent political openness, several innovating themes of research emerged, together with new concepts and fields of study. However, as far as gender and science relationship is concerned, such studies especially concentrate on higher education institutions. The feminist thinking seems to have two main objectives: to give women visibility, on the one hand, and to denunciate men’s domain in the several fields of knowledge. In 1977, the “Feminine Commission” is created and since then it has been publishing studies on women’s condition and contributing to the enhancement of the reflection of female condition at all levels. In the 1980s, the growing feminisation of tertiary education (both of students and academics), favoured the development of women’s studies, especially on their condition within universities with a special focus on the glass ceiling, despite the lack of statistical data by gender, thus making difficult the analysis of women integration in several sectors, namely in educational and scientific research activities. Other agglutinating themes are family, social and legal condition, work, education, and feminine intervention on political and social movements. In the 1990s, Women Studies are institutionalised in the academic context with the creation of the first Master in Women Studies in the Universidade Aberta (Open University), in Lisbon. In 1999, the first Portuguese journal of women studies is created – “Faces de Eva”. Seminars, conferences, thesis, journals, and projects on women’s studies are more and more common. However, results and publications are not so divulgated as they should be, because of lack of comprehensive and coordinated databases. 2. Analysis by topics 2.1. Horizontal and vertical segregation Research questions It is one of the main areas of research in Portugal. Essentially two issues have been considered: - The analysis of vertical gender segregation in educational and professional fields, having reflexes on women professional career progression with special attention to men’s power in control positions and the glass ceiling. - The analysis of horizontal segregation, special in higher education (teaching and research) where women have less visibility than men, and the under-representation of women in technology and technological careers. Research in this area mainly focuses on description, showing the under-representation of women in certain scientific areas and senior positions. Nevertheless, the studies that analyze horizontal segregation in the field of education adopt a more analytical approach which focuses on the analysis of the mechanisms of reproduction of gender stereotypes, especially socialisation, influencing educational and career choices. 1
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
RESUMO - As alterações climáticas alteraram a incidência e distribuição mundial de zoonoses, ao modificarem o perfil epidemiológico dos seus vectores. A leishmaniose visceral é reemergente na bacia mediterrânica, sendo o seu impacto real subestimado. Em Portugal, é endémica em três regiões, de declaração obrigatória desde 1948 e o reservatório é o canídeo. O aumento da incidência da doença no cão e a escassez de informação epidemiológica tornou pertinente investigar a realidade nacional. A partir das bases de dados das notificações e dos grupos de diagnósticos homogéneos hospitalares, foram identificados todos os casos e, consultados todos os processos clínicos dos doentes com episódios de internamento nos hospitais do continente entre 1999-2009. Ocorreram 730 internamentos para 375 indivíduos na maioria: homens, eurocaucasianos, com em média, 27 anos e, residência em Lisboa e Vale do Tejo. A sintomatologia e comorbilidades dos doentes vão de encontro ao descrito internacionalmente. A doença foi subnotificada, com uma demora média de 19 dias. A letalidade foi de 5%. A taxa de incidência média do continente foi de 0,294/100000 habitantes, sem padrão de sazonalidade. O corredor endémico de Bortman construído apresentou picos com amplitudes de 2-3 anos. O mapeamento dos doentes evidenciou casos em regiões não endémicas acompanhando a distribuição da leishmaniose canina. Seria pertinente que futuras investigações construíssem uma modelação matemática que confirmasse a tendência do corredor endémico (pico em 2011?) para accionar um sistema de alerta nos Serviços de Saúde. Seria também útil a avaliação das condições geoclimáticas das localidades com casos para evidenciar possíveis similitudes no território. -------ABSTRACT - Climate changed the incidence and worldwide distribution of zoonosis while the epidemiological profile of their vectors was changing. Visceral leishmaniasis is reemerging in the Mediterranean basin and its real impact underestimated. In Portugal, it’s endemic in three regions; the notification occurs since 1948 and dog is the reservoir. The increased incidence of the canines’ disease and the scarcity of epidemiological information relevant investigate the national reality. From Notifications and Homogeneous’ Diagnostics Groups system databases, all cases were identified and also analyze all clinical processes of inpatients’ hospitals in 1999- 2009 in Portugal. 730 admissions occurred for 375 patients. In most they were men, Caucasians, with an average of 27 years and residency in Lisboa e Vale do Tejo. The symptoms and comorbilidades patient go against described internationally. The disease was under notified, with an average delay of 19 days. Lethality was 5%. The incidence rate was 0,294/100000 inhabitants, without seasonality. The endemic’s Bortman corridor presents peak amplitudes of 2-3 years. Mapping patient’s residency shows that cases’ distribution is similar to endemic canine leishmaniasis. It would be appropriate a research to build a mathematical modeling up to confirm the trend of corridor endemic (peak in 2011?), to trigger an alert system for health services. It would also be useful to evaluate the geo-climatics conditions of localities with cases to highlight possible similarities in the territory.
Resumo:
Management Information Systems 2000, p. 103-111
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies.