136 resultados para Distributed database
Resumo:
In real world applications sequential algorithms of data mining and data exploration are often unsuitable for datasets with enormous size, high-dimensionality and complex data structure. Grid computing promises unprecedented opportunities for unlimited computing and storage resources. In this context there is the necessity to develop high performance distributed data mining algorithms. However, the computational complexity of the problem and the large amount of data to be explored often make the design of large scale applications particularly challenging. In this paper we present the first distributed formulation of a frequent subgraph mining algorithm for discriminative fragments of molecular compounds. Two distributed approaches have been developed and compared on the well known National Cancer Institute’s HIV-screening dataset. We present experimental results on a small-scale computing environment.
Resumo:
A mapping between chains in the Protein Databank and Enzyme Classification numbers is invaluable for research into structure-function relationships. Mapping at the chain level is a non-trivial problem and we present an automatically updated Web-server, which provides this link in a queryable form and as a downloadable XML or flat file.
Resumo:
The current version of this database on CD-ROM contains information on 14 127 cocoa (Theobroma cacao) clones and their 14 112 synonyms, the origin and history of the clones and the clone names, and accession lists for 48 of the major cocoa gene banks including quarantine stations. Also included are morphological data for leaves, fruits and seeds, disease reactions, quality and agronomic characters, and reference information on common abbreviations and acronyms, cocoa gene bank addresses and a full bibliography (with hyperlinked reference to data). New additions are 748 photographs and drawings of 428 individual clones in 11 different locations. Also included are 376 profiles for 15 simple sequence repeat primer pairs on 331 clones held in the University of Reading Intermediate Cocoa Quarantine Facility. Minimum system requirements are Windows 95 or later, a Pentium 166 with 32 MB RAM, CD-ROM drive and a minimum 20 MB hard disk space. A user guide is included in the package.
Resumo:
Dysregulation of lipid and glucose metabolism in the postprandial state are recognised as important risk factors for the development of cardiovascular disease and type 2 diabetes. Our objective was to create a comprehensive, standardised database of postprandial studies to provide insights into the physiological factors that influence postprandial lipid and glucose responses. Data were collated from subjects (n = 467) taking part in single and sequential meal postprandial studies conducted by researchers at the University of Reading, to form the DISRUPT (DIetary Studies: Reading Unilever Postprandial Trials) database. Subject attributes including age, gender, genotype, menopausal status, body mass index, blood pressure and a fasting biochemical profile, together with postprandial measurements of triacylglycerol (TAG), non-esterified fatty acids, glucose, insulin and TAG-rich lipoprotein composition are recorded. A particular strength of the studies is the frequency of blood sampling, with on average 10-13 blood samples taken during each postprandial assessment, and the fact that identical test meal protocols were used in a number of studies, allowing pooling of data to increase statistical power. The DISRUPT database is the most comprehensive postprandial metabolism database that exists worldwide and preliminary analysis of the pooled sequential meal postprandial dataset has revealed both confirmatory and novel observations with respect to the impact of gender and age on the postprandial TAG response. Further analysis of the dataset using conventional statistical techniques along with integrated mathematical models and clustering analysis will provide a unique opportunity to greatly expand current knowledge of the aetiology of inter-individual variability in postprandial lipid and glucose responses.
Resumo:
Distributed computing paradigms for sharing resources such as Clouds, Grids, Peer-to-Peer systems, or voluntary computing are becoming increasingly popular. While there are some success stories such as PlanetLab, OneLab, BOINC, BitTorrent, and SETI@home, a widespread use of these technologies for business applications has not yet been achieved. In a business environment, mechanisms are needed to provide incentives to potential users for participating in such networks. These mechanisms may range from simple non-monetary access rights, monetary payments to specific policies for sharing. Although a few models for a framework have been discussed (in the general area of a "Grid Economy"), none of these models has yet been realised in practice. This book attempts to fill this gap by discussing the reasons for such limited take-up and exploring incentive mechanisms for resource sharing in distributed systems. The purpose of this book is to identify research challenges in successfully using and deploying resource sharing strategies in open-source and commercial distributed systems.
Resumo:
The Java language first came to public attention in 1995. Within a year, it was being speculated that Java may be a good language for parallel and distributed computing. Its core features, including being objected oriented and platform independence, as well as having built-in network support and threads, has encouraged this view. Today, Java is being used in almost every type of computer-based system, ranging from sensor networks to high performance computing platforms, and from enterprise applications through to complex research-based.simulations. In this paper the key features that make Java a good language for parallel and distributed computing are first discussed. Two Java-based middleware systems, namely MPJ Express, an MPI-like Java messaging system, and Tycho, a wide-area asynchronous messaging framework with an integrated virtual registry are then discussed. The paper concludes by highlighting the advantages of using Java as middleware to support distributed applications.
Resumo:
In order to organize distributed educational resources efficiently, to provide active learners an integrated, extendible and cohesive interface to share the dynamically growing multimedia learning materials on the Internet, this paper proposes a generic resource organization model with semantic structures to improve expressiveness, scalability and cohesiveness. We developed an active learning system with semantic support for learners to access and navigate through efficient and flexible manner. We learning resources in an efficient and flexible manner. We provide facilities for instructors to manipulate the structured educational resources via a convenient visual interface. We also developed a resource discovering and gathering engine based on complex semantic associations for several specific topics.
Resumo:
Resource monitoring in distributed systems is required to understand the 'health' of the overall system and to help identify particular problems, such as dysfunctional hardware, a faulty, system or application software. Desirable characteristics for monitoring systems are the ability to connect to any number of different types of monitoring agents and to provide different views of the system, based on a client's particular preferences. This paper outlines and discusses the ongoing activities within the GridRM wide-area resource-monitoring project.
Resumo:
This paper presents a parallel Linear Hashtable Motion Estimation Algorithm (LHMEA). Most parallel video compression algorithms focus on Group of Picture (GOP). Based on LHMEA we proposed earlier [1][2], we developed a parallel motion estimation algorithm focus inside of frame. We divide each reference frames into equally sized regions. These regions are going to be processed in parallel to increase the encoding speed significantly. The theory and practice speed up of parallel LHMEA according to the number of PCs in the cluster are compared and discussed. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass Hexagonal Search (HEXBS) motion estimation, which only searches a small number of Macroblocks (MBs). We evaluated distributed parallel implementation of LHMEA of TPA for real time video compression.
Resumo:
This paper analyzes the performance of Enhanced relay-enabled Distributed Coordination Function (ErDCF) for wireless ad hoc networks under transmission errors. The idea of ErDCF is to use high data rate nodes to work as relays for the low data rate nodes. ErDCF achieves higher throughput and reduces energy consumption compared to IEEE 802.11 Distributed Coordination Function (DCF) in an ideal channel environment. However, there is a possibility that this expected gain may decrease in the presence of transmission errors. In this work, we modify the saturation throughput model of ErDCF to accurately reflect the impact of transmission errors under different rate combinations. It turns out that the throughput gain of ErDCF can still be maintained under reasonable link quality and distance.