972 resultados para Non-dedicated clusters
Resumo:
The resource utilization level in open laboratories of several universities has been shown to be very low. Our aim is to take advantage of those idle resources for parallel computation without disturbing the local load. In order to provide a system that lets us execute parallel applications in such a non-dedicated cluster, we use an integral scheduling system that considers both Space and Time sharing concerns. For dealing with the Time Sharing (TS) aspect, we use a technique based on the communication-driven coscheduling principle. This kind of TS system has some implications on the Space Sharing (SS) system, that force us to modify the way job scheduling is traditionally done. In this paper, we analyze the relation between the TS and the SS systems in a non-dedicated cluster. As a consequence of this analysis, we propose a new technique, termed 3DBackfilling. This proposal implements the well known SS technique of backfilling, but applied to an environment with a MultiProgramming Level (MPL) of the parallel applications that is greater than one. Besides, 3DBackfilling considers the requirements of the local workload running on each node. Our proposal was evaluated in a PVM/MPI Linux cluster, and it was compared with several more traditional SS policies applied to non-dedicated environments.
Resumo:
This note describes ParallelKnoppix, a bootable CD that allows econometricians with average knowledge of computers to create and begin using a high performance computing cluster for parallel computing in very little time. The computers used may be heterogeneous machines, and clusters of up to 200 nodes are supported. When the cluster is shut down, all machines are in their original state, so their temporary use in the cluster does not interfere with their normal uses. An example shows how a Monte Carlo study of a bootstrap test procedure may be done in parallel. Using a cluster of 20 nodes, the example runs approximately 20 times faster than it does on a single computer.
Resumo:
This note describes ParallelKnoppix, a bootable CD that allows creation of a Linux cluster in very little time. An experienced user can create a cluster ready to execute MPI programs in less than 10 minutes. The computers used may be heterogeneous machines, of the IA-32 architecture. When the cluster is shut down, all machines except one are in their original state, and the last can be returned to its original state by deleting a directory. The system thus provides a means of using non-dedicated computers to create a cluster. An example session is documented.
Resumo:
Con la mayor capacidad de los nodos de procesamiento en relación a la potencia de cómputo, cada vez más aplicaciones intensivas de datos como las aplicaciones de la bioinformática, se llevarán a ejecutar en clusters no dedicados. Los clusters no dedicados se caracterizan por su capacidad de combinar la ejecución de aplicaciones de usuarios locales con aplicaciones, científicas o comerciales, ejecutadas en paralelo. Saber qué efecto las aplicaciones con acceso intensivo a dados producen respecto a la mezcla de otro tipo (batch, interativa, SRT, etc) en los entornos no-dedicados permite el desarrollo de políticas de planificación más eficientes. Algunas de las aplicaciones intensivas de E/S se basan en el paradigma MapReduce donde los entornos que las utilizan, como Hadoop, se ocupan de la localidad de los datos, balanceo de carga de forma automática y trabajan con sistemas de archivos distribuidos. El rendimiento de Hadoop se puede mejorar sin aumentar los costos de hardware, al sintonizar varios parámetros de configuración claves para las especificaciones del cluster, para el tamaño de los datos de entrada y para el procesamiento complejo. La sincronización de estos parámetros de sincronización puede ser demasiado compleja para el usuario y/o administrador pero procura garantizar prestaciones más adecuadas. Este trabajo propone la evaluación del impacto de las aplicaciones intensivas de E/S en la planificación de trabajos en clusters no-dedicados bajo los paradigmas MPI y Mapreduce.
Resumo:
Our efforts are directed towards the understanding of the coscheduling mechanism in a NOW system when a parallel job is executed jointly with local workloads, balancing parallel performance against the local interactive response. Explicit and implicit coscheduling techniques in a PVM-Linux NOW (or cluster) have been implemented. Furthermore, dynamic coscheduling remains an open question when parallel jobs are executed in a non-dedicated Cluster. A basis model for dynamic coscheduling in Cluster systems is presented in this paper. Also, one dynamic coscheduling algorithm for this model is proposed. The applicability of this algorithm has been proved and its performance analyzed by simulation. Finally, a new tool (named Monito) for monitoring the different queues of messages in such an environments is presented. The main aim of implementing this facility is to provide a mean of capturing the bottlenecks and overheads of the communication system in a PVM-Linux cluster.
Resumo:
Objectives: To compare simulated periodontal bone defect depth measured in digital radiographs with dedicated and non-dedicated software systems and to compare the depth measurements from each program with the measurements in dry mandibles.Methods: Forty periodontal bone defects were created at the proximal area of the first premolar in dry pig mandibles. Measurements of the defects were performed with a periodontal probe in the dry mandible. Periapical digital radiographs of the defects were recorded using the Schick sensor in a standardized exposure setting. All images were read using a Schick dedicated software system (CDR DICOM for Windows v.3.5), and three commonly available non-dedicated software systems (Vix Win 2000 v.1.2; Adobe Photoshop 7.0 and Image Tool 3.0). The defects were measured three times in each image and a consensus was reached among three examiners using the four software systems. The difference between the radiographic measurements was analysed using analysis of variance (ANOVA) and by comparing the measurements from each software system with the dry mandibles measurements using Student's t-test.Results: the mean values of the bone defects measured in the radiographs were 5.07 rum, 5.06 rum, 5.01 mm and 5.11 mm for CDR Digital Image and Communication in Medicine (DICOM) for Windows, Vix Win, Adobe Photoshop, and Image Tool, respectively, and 6.67 mm for the dry mandible. The means of the measurements performed in the four software systems were not significantly different, ANOVA (P = 0.958). A significant underestimation of defect depth was obtained when we compared the mean depths from each software system with the dry mandible measurements (t-test; P congruent to 0.000).Conclusions: the periodontal bone defect measurements in dedicated and in three non-dedicated software systems were not significantly different, but they all underestimated the measurements when compared with the measurements obtained in the dry mandibles.
Resumo:
In this work, we present an integral scheduling system for non-dedicated clusters, termed CISNE-P, which ensures the performance required by the local applications, while simultaneously allocating cluster resources to parallel jobs. Our approach solves the problem efficiently by using a social contract technique. This kind of technique is based on reserving computational resources, preserving a predetermined response time to local users. CISNE-P is a middleware which includes both a previously developed space-sharing job scheduler and a dynamic coscheduling system, a time sharing scheduling component. The experimentation performed in a Linux cluster shows that these two scheduler components are complementary and a good coordination improves global performance significantly. We also compare two different CISNE-P implementations: one developed inside the kernel, and the other entirely implemented in the user space.
Resumo:
Diffuse radio emission in galaxy clusters has been observed with different size and properties. Giant radio halos (RH), Mpc-size sources found in merging clusters, and mini halos (MH), 0.1-0.5 Mpc size sources located in relaxed cool-core clusters, are thought to be distinct classes of objects with different formation mechanisms. However, recent observations have revealed the unexpected presence of diffuse emission on Mpc-scales in relaxed clusters that host a central MH and show no signs of major mergers. The study of these sources is still at the beginning and it is not yet clear what could be the origin of their unusual emission. The main goal of this thesis is to test the occurrence of these peculiar sources and investigate their properties using low frequency radio observations. This thesis consists in the study of a sample of 12 cool-core galaxy clusters which present some level of dynamical disturbances on large-scale. The heterogeneity of sources in the sample allowed me to investigate under which conditions a halo-type emission is present in MH clusters; and also to study the connection between AGN bubbles and the local environment. Using high sensitivity LOFAR observations, I have detected large-scale emission in four non-merging clusters, in addition to the central MH. I have constrained for the first time the spectral properties of diffuse emission in these double radio component galaxy clusters, and I have investigated the connection between their thermal and non-thermal emission for a better comprehension of the acceleration mechanism. Furthermore, I derived upper limits to the halo power for the other clusters in the sample, which could present large-scale diffuse emission under the detection threshold. Finally, I have reconstructed the duty-cycle of one of the most powerful AGN known, located at the centre of a galaxy cluster of the sample.
Resumo:
Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática
Resumo:
One of the major problems when using non-dedicated volunteer resources in adistributed network is the high volatility of these hosts since they can go offlineor become unavailable at any time without control. Furthermore, the use ofvolunteer resources implies some security issues due to the fact that they aregenerally anonymous entities which we know nothing about. So, how to trustin someone we do not know?.Over the last years an important number of reputation-based trust solutionshave been designed to evaluate the participants' behavior in a system.However, most of these solutions are addressed to P2P and ad-hoc mobilenetworks that may not fit well with other kinds of distributed systems thatcould take advantage of volunteer resources as recent cloud computinginfrastructures.In this paper we propose a first approach to design an anonymous reputationmechanism for CoDeS [1], a middleware for building fogs where deployingservices using volunteer resources. The participants are reputation clients(RC), a reputation authority (RA) and a certification authority (CA). Users needa valid public key certificate from the CA to register to the RA and obtain thedata needed to participate into the system, as now an opaque identifier thatwe call here pseudonym and an initial reputation value that users provide toother users when interacting together. The mechanism prevents not only themanipulation of the provided reputation values but also any disclosure of theusers' identities to any other users or authorities so the anonymity isguaranteed.
Resumo:
Switzerland has a complex human immunodeficiency virus (HIV) epidemic involving several populations. We examined transmission of HIV type 1 (HIV-1) in a national cohort study. Latent class analysis was used to identify socioeconomic and behavioral groups among 6,027 patients enrolled in the Swiss HIV Cohort Study between 2000 and 2011. Phylogenetic analysis of sequence data, available for 4,013 patients, was used to identify transmission clusters. Concordance between sociobehavioral groups and transmission clusters was assessed in correlation and multiple correspondence analyses. A total of 2,696 patients were infected with subtype B, 203 with subtype C, 196 with subtype A, and 733 with recombinant subtypes (mainly CRF02_AG and CRF01_AE). Latent class analysis identified 8 patient groups. Most transmission clusters of subtype B were shared between groups of gay men (groups 1-3) or between the heterosexual groups "heterosexual people of lower socioeconomic position" (group 4) and "injection drug users" (group 8). Clusters linking homosexual and heterosexual groups were associated with "older heterosexual and gay people on welfare" (group 5). "Migrant women in heterosexual partnerships" (group 6) and "heterosexual migrants on welfare" (group 7) shared non-B clusters with groups 4 and 5. Combining approaches from social and molecular epidemiology can provide insights into HIV-1 transmission and inform the design of prevention strategies.
Resumo:
The past few decades have seen a considerable increase in the number of parallel and distributed systems. With the development of more complex applications, the need for more powerful systems has emerged and various parallel and distributed environments have been designed and implemented. Each of the environments, including hardware and software, has unique strengths and weaknesses. There is no single parallel environment that can be identified as the best environment for all applications with respect to hardware and software properties. The main goal of this thesis is to provide a novel way of performing data-parallel computation in parallel and distributed environments by utilizing the best characteristics of difference aspects of parallel computing. For the purpose of this thesis, three aspects of parallel computing were identified and studied. First, three parallel environments (shared memory, distributed memory, and a network of workstations) are evaluated to quantify theirsuitability for different parallel applications. Due to the parallel and distributed nature of the environments, networks connecting the processors in these environments were investigated with respect to their performance characteristics. Second, scheduling algorithms are studied in order to make them more efficient and effective. A concept of application-specific information scheduling is introduced. The application- specific information is data about the workload extractedfrom an application, which is provided to a scheduling algorithm. Three scheduling algorithms are enhanced to utilize the application-specific information to further refine their scheduling properties. A more accurate description of the workload is especially important in cases where the workunits are heterogeneous and the parallel environment is heterogeneous and/or non-dedicated. The results obtained show that the additional information regarding the workload has a positive impact on the performance of applications. Third, a programming paradigm for networks of symmetric multiprocessor (SMP) workstations is introduced. The MPIT programming paradigm incorporates the Message Passing Interface (MPI) with threads to provide a methodology to write parallel applications that efficiently utilize the available resources and minimize the overhead. The MPIT allows for communication and computation to overlap by deploying a dedicated thread for communication. Furthermore, the programming paradigm implements an application-specific scheduling algorithm. The scheduling algorithm is executed by the communication thread. Thus, the scheduling does not affect the execution of the parallel application. Performance results achieved from the MPIT show that considerable improvements over conventional MPI applications are achieved.
Resumo:
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.