950 resultados para Computational grids (Computer systems)
Resumo:
BACKGROUND: Molecular interaction Information is a key resource in modern biomedical research. Publicly available data have previously been provided in a broad array of diverse formats, making access to this very difficult. The publication and wide implementation of the Human Proteome Organisation Proteomics Standards Initiative Molecular Interactions (HUPO PSI-MI) format in 2004 was a major step towards the establishment of a single, unified format by which molecular interactions should be presented, but focused purely on protein-protein interactions. RESULTS: The HUPO-PSI has further developed the PSI-MI XML schema to enable the description of interactions between a wider range of molecular types, for example nucleic acids, chemical entities, and molecular complexes. Extensive details about each supported molecular interaction can now be captured, including the biological role of each molecule within that interaction, detailed description of interacting domains, and the kinetic parameters of the interaction. The format is supported by data management and analysis tools and has been adopted by major interaction data providers. Additionally, a simpler, tab-delimited format MITAB2.5 has been developed for the benefit of users who require only minimal information in an easy to access configuration. CONCLUSION: The PSI-MI XML2.5 and MITAB2.5 formats have been jointly developed by interaction data producers and providers from both the academic and commercial sector, and are already widely implemented and well supported by an active development community. PSI-MI XML2.5 enables the description of highly detailed molecular interaction data and facilitates data exchange between databases and users without loss of information. MITAB2.5 is a simpler format appropriate for fast Perl parsing or loading into Microsoft Excel.
Resumo:
Please consult the paper edition of this thesis to read. It is available on the 5th Floor of the Library at Call Number: Z 9999 E38 D56 1992
Resumo:
Modern computer systems are plagued with stability and security problems: applications lose data, web servers are hacked, and systems crash under heavy load. Many of these problems or anomalies arise from rare program behavior caused by attacks or errors. A substantial percentage of the web-based attacks are due to buffer overflows. Many methods have been devised to detect and prevent anomalous situations that arise from buffer overflows. The current state-of-art of anomaly detection systems is relatively primitive and mainly depend on static code checking to take care of buffer overflow attacks. For protection, Stack Guards and I-leap Guards are also used in wide varieties.This dissertation proposes an anomaly detection system, based on frequencies of system calls in the system call trace. System call traces represented as frequency sequences are profiled using sequence sets. A sequence set is identified by the starting sequence and frequencies of specific system calls. The deviations of the current input sequence from the corresponding normal profile in the frequency pattern of system calls is computed and expressed as an anomaly score. A simple Bayesian model is used for an accurate detection.Experimental results are reported which show that frequency of system calls represented using sequence sets, captures the normal behavior of programs under normal conditions of usage. This captured behavior allows the system to detect anomalies with a low rate of false positives. Data are presented which show that Bayesian Network on frequency variations responds effectively to induced buffer overflows. It can also help administrators to detect deviations in program flow introduced due to errors.
Resumo:
The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.
Resumo:
Compute grids are used widely in many areas of environmental science, but there has been limited uptake of grid computing by the climate modelling community, partly because the characteristics of many climate models make them difficult to use with popular grid middleware systems. In particular, climate models usually produce large volumes of output data, and running them usually involves complicated workflows implemented as shell scripts. For example, NEMO (Smith et al. 2008) is a state-of-the-art ocean model that is used currently for operational ocean forecasting in France, and will soon be used in the UK for both ocean forecasting and climate modelling. On a typical modern cluster, a particular one year global ocean simulation at 1-degree resolution takes about three hours when running on 40 processors, and produces roughly 20 GB of output as 50000 separate files. 50-year simulations are common, during which the model is resubmitted as a new job after each year. Running NEMO relies on a set of complicated shell scripts and command utilities for data pre-processing and post-processing prior to job resubmission. Grid Remote Execution (G-Rex) is a pure Java grid middleware system that allows scientific applications to be deployed as Web services on remote computer systems, and then launched and controlled as if they are running on the user's own computer. Although G-Rex is general purpose middleware it has two key features that make it particularly suitable for remote execution of climate models: (1) Output from the model is transferred back to the user while the run is in progress to prevent it from accumulating on the remote system and to allow the user to monitor the model; (2) The client component is a command-line program that can easily be incorporated into existing model work-flow scripts. G-Rex has a REST (Fielding, 2000) architectural style, which allows client programs to be very simple and lightweight and allows users to interact with model runs using only a basic HTTP client (such as a Web browser or the curl utility) if they wish. This design also allows for new client interfaces to be developed in other programming languages with relatively little effort. The G-Rex server is a standard Web application that runs inside a servlet container such as Apache Tomcat and is therefore easy to install and maintain by system administrators. G-Rex is employed as the middleware for the NERC1 Cluster Grid, a small grid of HPC2 clusters belonging to collaborating NERC research institutes. Currently the NEMO (Smith et al. 2008) and POLCOMS (Holt et al, 2008) ocean models are installed, and there are plans to install the Hadley Centre’s HadCM3 model for use in the decadal climate prediction project GCEP (Haines et al., 2008). The science projects involving NEMO on the Grid have a particular focus on data assimilation (Smith et al. 2008), a technique that involves constraining model simulations with observations. The POLCOMS model will play an important part in the GCOMS project (Holt et al, 2008), which aims to simulate the world’s coastal oceans. A typical use of G-Rex by a scientist to run a climate model on the NERC Cluster Grid proceeds as follows :(1) The scientist prepares input files on his or her local machine. (2) Using information provided by the Grid’s Ganglia3 monitoring system, the scientist selects an appropriate compute resource. (3) The scientist runs the relevant workflow script on his or her local machine. This is unmodified except that calls to run the model (e.g. with “mpirun”) are simply replaced with calls to "GRexRun" (4) The G-Rex middleware automatically handles the uploading of input files to the remote resource, and the downloading of output files back to the user, including their deletion from the remote system, during the run. (5) The scientist monitors the output files, using familiar analysis and visualization tools on his or her own local machine. G-Rex is well suited to climate modelling because it addresses many of the middleware usability issues that have led to limited uptake of grid computing by climate scientists. It is a lightweight, low-impact and easy-to-install solution that is currently designed for use in relatively small grids such as the NERC Cluster Grid. A current topic of research is the use of G-Rex as an easy-to-use front-end to larger-scale Grid resources such as the UK National Grid service.
Resumo:
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.
Resumo:
Frequent pattern discovery in structured data is receiving an increasing attention in many application areas of sciences. However, the computational complexity and the large amount of data to be explored often make the sequential algorithms unsuitable. In this context high performance distributed computing becomes a very interesting and promising approach. In this paper we present a parallel formulation of the frequent subgraph mining problem to discover interesting patterns in molecular compounds. The application is characterized by a highly irregular tree-structured computation. No estimation is available for task workloads, which show a power-law distribution in a wide range. The proposed approach allows dynamic resource aggregation and provides fault and latency tolerance. These features make the distributed application suitable for multi-domain heterogeneous environments, such as computational Grids. The distributed application has been evaluated on the well known National Cancer Institute’s HIV-screening dataset.
Resumo:
Many scientific and engineering applications involve inverting large matrices or solving systems of linear algebraic equations. Solving these problems with proven algorithms for direct methods can take very long to compute, as they depend on the size of the matrix. The computational complexity of the stochastic Monte Carlo methods depends only on the number of chains and the length of those chains. The computing power needed by inherently parallel Monte Carlo methods can be satisfied very efficiently by distributed computing technologies such as Grid computing. In this paper we show how a load balanced Monte Carlo method for computing the inverse of a dense matrix can be constructed, show how the method can be implemented on the Grid, and demonstrate how efficiently the method scales on multiple processors. (C) 2007 Elsevier B.V. All rights reserved.
Resumo:
Monitoring Earth's terrestrial water conditions is critically important to many hydrological applications such as global food production; assessing water resources sustainability; and flood, drought, and climate change prediction. These needs have motivated the development of pilot monitoring and prediction systems for terrestrial hydrologic and vegetative states, but to date only at the rather coarse spatial resolutions (∼10–100 km) over continental to global domains. Adequately addressing critical water cycle science questions and applications requires systems that are implemented globally at much higher resolutions, on the order of 1 km, resolutions referred to as hyperresolution in the context of global land surface models. This opinion paper sets forth the needs and benefits for a system that would monitor and predict the Earth's terrestrial water, energy, and biogeochemical cycles. We discuss six major challenges in developing a system: improved representation of surface‐subsurface interactions due to fine‐scale topography and vegetation; improved representation of land‐atmospheric interactions and resulting spatial information on soil moisture and evapotranspiration; inclusion of water quality as part of the biogeochemical cycle; representation of human impacts from water management; utilizing massively parallel computer systems and recent computational advances in solving hyperresolution models that will have up to 109 unknowns; and developing the required in situ and remote sensing global data sets. We deem the development of a global hyperresolution model for monitoring the terrestrial water, energy, and biogeochemical cycles a “grand challenge” to the community, and we call upon the international hydrologic community and the hydrological science support infrastructure to endorse the effort.
Resumo:
The evolution of commodity computing lead to the possibility of efficient usage of interconnected machines to solve computationally-intensive tasks, which were previously solvable only by using expensive supercomputers. This, however, required new methods for process scheduling and distribution, considering the network latency, communication cost, heterogeneous environments and distributed computing constraints. An efficient distribution of processes over such environments requires an adequate scheduling strategy, as the cost of inefficient process allocation is unacceptably high. Therefore, a knowledge and prediction of application behavior is essential to perform effective scheduling. In this paper, we overview the evolution of scheduling approaches, focusing on distributed environments. We also evaluate the current approaches for process behavior extraction and prediction, aiming at selecting an adequate technique for online prediction of application execution. Based on this evaluation, we propose a novel model for application behavior prediction, considering chaotic properties of such behavior and the automatic detection of critical execution points. The proposed model is applied and evaluated for process scheduling in cluster and grid computing environments. The obtained results demonstrate that prediction of the process behavior is essential for efficient scheduling in large-scale and heterogeneous distributed environments, outperforming conventional scheduling policies by a factor of 10, and even more in some cases. Furthermore, the proposed approach proves to be efficient for online predictions due to its low computational cost and good precision. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
The demands of image processing related systems are robustness, high recognition rates, capability to handle incomplete digital information, and magnanimous flexibility in capturing shape of an object in an image. It is exactly here that, the role of convex hulls comes to play. The objective of this paper is twofold. First, we summarize the state of the art in computational convex hull development for researchers interested in using convex hull image processing to build their intuition, or generate nontrivial models. Secondly, we present several applications involving convex hulls in image processing related tasks. By this, we have striven to show researchers the rich and varied set of applications they can contribute to. This paper also makes a humble effort to enthuse prospective researchers in this area. We hope that the resulting awareness will result in new advances for specific image recognition applications.
Resumo:
This paper elaborates the routing of cable cycle through available routes in a building in order to link a set of devices, in a most reasonable way. Despite of the similarities to other NP-hard routing problems, the only goal is not only to minimize the cost (length of the cycle) but also to increase the reliability of the path (in case of a cable cut) which is assessed by a risk factor. Since there is often a trade-off between the risk and length factors, a criterion for ranking candidates and deciding the most reasonable solution is defined. A set of techniques is proposed to perform an efficient and exact search among candidates. A novel graph is introduced to reduce the search-space, and navigate the search toward feasible and desirable solutions. Moreover, admissible heuristic length estimation helps to early detection of partial cycles which lead to unreasonable solutions. The results show that the method provides solutions which are both technically and financially reasonable. Furthermore, it is proved that the proposed techniques are very efficient in reducing the computational time of the search to a reasonable amount.
Resumo:
In this paper, we introduce a DAI approach called hereinafter Fuzzy Distributed Artificial Intelligence (FDAI). Through the use of fuzzy logic, we have been able to develop mechanisms that we feel may effectively improve current DAI systems, giving much more flexibility and providing the subsidies which a formal theory can bring. The appropriateness of the FDAI approach is explored in an important application, a fuzzy distributed traffic-light control system, where we have been able to aggregate and study several issues concerned with fuzzy and distributed artificial intelligence. We also present a number of current research directions necessary to develop the FDAI approach more fully.
Resumo:
Piecewise-Linear Programming (PLP) is an important area of Mathematical Programming and concerns the minimisation of a convex separable piecewise-linear objective function, subject to linear constraints. In this paper a subarea of PLP called Network Piecewise-Linear Programming (NPLP) is explored. The paper presents four specialised algorithms for NPLP: (Strongly Feasible) Primal Simplex, Dual Method, Out-of-Kilter and (Strongly Polynomial) Cost-Scaling and their relative efficiency is studied. A statistically designed experiment is used to perform a computational comparison of the algorithms. The response variable observed in the experiment is the CPU time to solve randomly generated network piecewise-linear problems classified according to problem class (Transportation, Transshipment and Circulation), problem size, extent of capacitation, and number of breakpoints per arc. Results and conclusions on performance of the algorithms are reported.
Resumo:
Regulatory authorities in many countries, in order to maintain an acceptable balance between appropriate customer service qualities and costs, are introducing a performance-based regulation. These regulations impose penalties, and in some cases rewards, which introduce a component of financial risk to an electric power utility due to the uncertainty associated with preserving a specific level of system reliability. In Brazil, for instance, one of the reliability indices receiving special attention by the utilities is the Maximum Continuous Interruption Duration per customer (MCID). This paper describes a chronological Monte Carlo simulation approach to evaluate probability distributions of reliability indices, including the MCID, and the corresponding penalties. In order to get the desired efficiency, modern computational techniques are used for modeling (UML -Unified Modeling Language) as well as for programming (Object- Oriented Programming). Case studies on a simple distribution network and on real Brazilian distribution systems are presented and discussed. © Copyright KTH 2006.