820 resultados para sistema distribuito data-grid cloud computing CERN LHC Hazelcast Elasticsearch
Resumo:
The increasing volume of data describing humandisease processes and the growing complexity of understanding, managing, and sharing such data presents a huge challenge for clinicians and medical researchers. This paper presents the@neurIST system, which provides an infrastructure for biomedical research while aiding clinical care, by bringing together heterogeneous data and complex processing and computing services. Although @neurIST targets the investigation and treatment of cerebral aneurysms, the system’s architecture is generic enough that it could be adapted to the treatment of other diseases.Innovations in @neurIST include confining the patient data pertaining to aneurysms inside a single environment that offers cliniciansthe tools to analyze and interpret patient data and make use of knowledge-based guidance in planning their treatment. Medicalresearchers gain access to a critical mass of aneurysm related data due to the system’s ability to federate distributed informationsources. A semantically mediated grid infrastructure ensures that both clinicians and researchers are able to seamlessly access andwork on data that is distributed across multiple sites in a secure way in addition to providing computing resources on demand forperforming computationally intensive simulations for treatment planning and research.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
The simulated annealing approach to structure solution from powder diffraction data, as implemented in the DASH program, is easily amenable to parallelization at the individual run level. Very large scale increases in speed of execution can therefore be achieved by distributing individual DASH runs over a network of computers. The GDASH program achieves this by packaging DASH in a form that enables it to run under the Univa UD Grid MP system, which harnesses networks of existing computing resources to perform calculations.
Resumo:
Ice cloud representation in general circulation models remains a challenging task, due to the lack of accurate observations and the complexity of microphysical processes. In this article, we evaluate the ice water content (IWC) and ice cloud fraction statistical distributions from the numerical weather prediction models of the European Centre for Medium-Range Weather Forecasts (ECMWF) and the UK Met Office, exploiting the synergy between the CloudSat radar and CALIPSO lidar. Using the last three weeks of July 2006, we analyse the global ice cloud occurrence as a function of temperature and latitude and show that the models capture the main geographical and temperature-dependent distributions, but overestimate the ice cloud occurrence in the Tropics in the temperature range from −60 °C to −20 °C and in the Antarctic for temperatures higher than −20 °C, but underestimate ice cloud occurrence at very low temperatures. A global statistical comparison of the occurrence of grid-box mean IWC at different temperatures shows that both the mean and range of IWC increases with increasing temperature. Globally, the models capture most of the IWC variability in the temperature range between −60 °C and −5 °C, and also reproduce the observed latitudinal dependencies in the IWC distribution due to different meteorological regimes. Two versions of the ECMWF model are assessed. The recent operational version with a diagnostic representation of precipitating snow and mixed-phase ice cloud fails to represent the IWC distribution in the −20 °C to 0 °C range, but a new version with prognostic variables for liquid water, ice and snow is much closer to the observed distribution. The comparison of models and observations provides a much-needed analysis of the vertical distribution of IWC across the globe, highlighting the ability of the models to reproduce much of the observed variability as well as the deficiencies where further improvements are required.
Resumo:
With the increasing awareness of protein folding disorders, the explosion of genomic information, and the need for efficient ways to predict protein structure, protein folding and unfolding has become a central issue in molecular sciences research. Molecular dynamics computer simulations are increasingly employed to understand the folding and unfolding of proteins. Running protein unfolding simulations is computationally expensive and finding ways to enhance performance is a grid issue on its own. However, more and more groups run such simulations and generate a myriad of data, which raises new challenges in managing and analyzing these data. Because the vast range of proteins researchers want to study and simulate, the computational effort needed to generate data, the large data volumes involved, and the different types of analyses scientists need to perform, it is desirable to provide a public repository allowing researchers to pool and share protein unfolding data. This paper describes efforts to provide a grid-enabled data warehouse for protein unfolding data. We outline the challenge and present first results in the design and implementation of the data warehouse.
Resumo:
The P-found protein folding and unfolding simulation repository is designed to allow scientists to perform analyses across large, distributed simulation data sets. There are two storage components in P-found: a primary repository of simulation data and a data warehouse. Here we demonstrate how grid technologies can support multiple, distributed P-found installations. In particular we look at two aspects, first how grid data management technologies can be used to access the distributed data warehouses; and secondly, how the grid can be used to transfer analysis programs to the primary repositories --- this is an important and challenging aspect of P-found because the data volumes involved are too large to be centralised. The grid technologies we are developing with the P-found system will allow new large data sets of protein folding simulations to be accessed and analysed in novel ways, with significant potential for enabling new scientific discoveries.
Resumo:
Smart healthcare is a complex domain for systems integration due to human and technical factors and heterogeneous data sources involved. As a part of smart city, it is such a complex area where clinical functions require smartness of multi-systems collaborations for effective communications among departments, and radiology is one of the areas highly relies on intelligent information integration and communication. Therefore, it faces many challenges regarding integration and its interoperability such as information collision, heterogeneous data sources, policy obstacles, and procedure mismanagement. The purpose of this study is to conduct an analysis of data, semantic, and pragmatic interoperability of systems integration in radiology department, and to develop a pragmatic interoperability framework for guiding the integration. We select an on-going project at a local hospital for undertaking our case study. The project is to achieve data sharing and interoperability among Radiology Information Systems (RIS), Electronic Patient Record (EPR), and Picture Archiving and Communication Systems (PACS). Qualitative data collection and analysis methods are used. The data sources consisted of documentation including publications and internal working papers, one year of non-participant observations and 37 interviews with radiologists, clinicians, directors of IT services, referring clinicians, radiographers, receptionists and secretary. We identified four primary phases of data analysis process for the case study: requirements and barriers identification, integration approach, interoperability measurements, and knowledge foundations. Each phase is discussed and supported by qualitative data. Through the analysis we also develop a pragmatic interoperability framework that summaries the empirical findings and proposes recommendations for guiding the integration in the radiology context.
Resumo:
The simulated annealing approach to crystal structure determination from powder diffraction data, as implemented in the DASH program, is readily amenable to parallelization at the individual run level. Very large scale increases in speed of execution can be achieved by distributing individual DASH runs over a network of computers. The CDASH program delivers this by using scalable on-demand computing clusters built on the Amazon Elastic Compute Cloud service. By way of example, a 360 vCPU cluster returned the crystal structure of racemic ornidazole (Z0 = 3, 30 degrees of freedom) ca 40 times faster than a typical modern quad-core desktop CPU. Whilst used here specifically for DASH, this approach is of general applicability to other packages that are amenable to coarse-grained parallelism strategies.
Resumo:
Pós-graduação em Engenharia Elétrica - FEIS
Resumo:
Data-intensive Grid applications require huge data transfers between grid computing nodes. These computing nodes, where computing jobs are executed, are usually geographically separated. A grid network that employs optical wavelength division multiplexing (WDM) technology and optical switches to interconnect computing resources with dynamically provisioned multi-gigabit rate bandwidth lightpath is called a Lambda Grid network. A computing task may be executed on any one of several computing nodes which possesses the necessary resources. In order to reflect the reality in job scheduling, allocation of network resources for data transfer should be taken into consideration. However, few scheduling methods consider the communication contention on Lambda Grids. In this paper, we investigate the joint scheduling problem while considering both optical network and computing resources in a Lambda Grid network. The objective of our work is to maximize the total number of jobs that can be scheduled in a Lambda Grid network. An adaptive routing algorithm is proposed and implemented for accomplishing the communication tasks for every job submitted in the network. Four heuristics (FIFO, ESTF, LJF, RS) are implemented for job scheduling of the computational tasks. Simulation results prove the feasibility and efficiency of the proposed solution.
Resumo:
Data-intensive Grid applications require huge data transfers between grid computing nodes. These computing nodes, where computing jobs are executed, are usually geographically separated. A grid network that employs optical wavelength division multiplexing (WDM) technology and optical switches to interconnect computing resources with dynamically provisioned multi-gigabit rate bandwidth lightpath is called a Lambda Grid network. A computing task may be executed on any one of several computing nodes which possesses the necessary resources. In order to reflect the reality in job scheduling, allocation of network resources for data transfer should be taken into consideration. However, few scheduling methods consider the communication contention on Lambda Grids. In this paper, we investigate the joint scheduling problem while considering both optical network and computing resources in a Lambda Grid network. The objective of our work is to maximize the total number of jobs that can be scheduled in a Lambda Grid network. An adaptive routing algorithm is proposed and implemented for accomplishing the communication tasks for every job submitted in the network. Four heuristics (FIFO, ESTF, LJF, RS) are implemented for job scheduling of the computational tasks. Simulation results prove the feasibility and efficiency of the proposed solution.
Resumo:
Current scientific applications have been producing large amounts of data. The processing, handling and analysis of such data require large-scale computing infrastructures such as clusters and grids. In this area, studies aim at improving the performance of data-intensive applications by optimizing data accesses. In order to achieve this goal, distributed storage systems have been considering techniques of data replication, migration, distribution, and access parallelism. However, the main drawback of those studies is that they do not take into account application behavior to perform data access optimization. This limitation motivated this paper which applies strategies to support the online prediction of application behavior in order to optimize data access operations on distributed systems, without requiring any information on past executions. In order to accomplish such a goal, this approach organizes application behaviors as time series and, then, analyzes and classifies those series according to their properties. By knowing properties, the approach selects modeling techniques to represent series and perform predictions, which are, later on, used to optimize data access operations. This new approach was implemented and evaluated using the OptorSim simulator, sponsored by the LHC-CERN project and widely employed by the scientific community. Experiments confirm this new approach reduces application execution time in about 50 percent, specially when handling large amounts of data.
Resumo:
L'obiettivo di questa tesi è studiare la fattibilità dello studio della produzione associata ttH del bosone di Higgs con due quark top nell'esperimento CMS, e valutare le funzionalità e le caratteristiche della prossima generazione di toolkit per l'analisi distribuita a CMS (CRAB versione 3) per effettuare tale analisi. Nel settore della fisica del quark top, la produzione ttH è particolarmente interessante, soprattutto perchè rappresenta l'unica opportunità di studiare direttamente il vertice t-H senza dover fare assunzioni riguardanti possibili contributi dalla fisica oltre il Modello Standard. La preparazione per questa analisi è cruciale in questo momento, prima dell'inizio del Run-2 dell'LHC nel 2015. Per essere preparati a tale studio, le implicazioni tecniche di effettuare un'analisi completa in un ambito di calcolo distribuito come la Grid non dovrebbero essere sottovalutate. Per questo motivo, vengono presentati e discussi un'analisi dello stesso strumento CRAB3 (disponibile adesso in versione di pre-produzione) e un confronto diretto di prestazioni con CRAB2. Saranno raccolti e documentati inoltre suggerimenti e consigli per un team di analisi che sarà eventualmente coinvolto in questo studio. Nel Capitolo 1 è introdotta la fisica delle alte energie a LHC nell'esperimento CMS. Il Capitolo 2 discute il modello di calcolo di CMS e il sistema di analisi distribuita della Grid. Nel Capitolo 3 viene brevemente presentata la fisica del quark top e del bosone di Higgs. Il Capitolo 4 è dedicato alla preparazione dell'analisi dal punto di vista degli strumenti della Grid (CRAB3 vs CRAB2). Nel capitolo 5 è presentato e discusso uno studio di fattibilità per un'analisi del canale ttH in termini di efficienza di selezione.
Resumo:
Energy consumption in data centers is nowadays a critical objective because of its dramatic environmental and economic impact. Over the last years, several approaches have been proposed to tackle the energy/cost optimization problem, but most of them have failed on providing an analytical model to target both the static and dynamic optimization domains for complex heterogeneous data centers. This paper proposes and solves an optimization problem for the energy-driven configuration of a heterogeneous data center. It also advances in the proposition of a new mechanism for task allocation and distribution of workload. The combination of both approaches outperforms previous published results in the field of energy minimization in heterogeneous data centers and scopes a promising area of research.