66 resultados para Cluster of workstations
em CentAUR: Central Archive University of Reading - UK
Resumo:
This paper presents the results of the application of a parallel Genetic Algorithm (GA) in order to design a Fuzzy Proportional Integral (FPI) controller for active queue management on Internet routers. The Active Queue Management (AQM) policies are those policies of router queue management that allow the detection of network congestion, the notification of such occurrences to the hosts on the network borders, and the adoption of a suitable control policy. Two different parallel implementations of the genetic algorithm are adopted to determine an optimal configuration of the FPI controller parameters. Finally, the results of several experiments carried out on a forty nodes cluster of workstations are presented.
Resumo:
In any data mining applications, automated text and text and image retrieval of information is needed. This becomes essential with the growth of the Internet and digital libraries. Our approach is based on the latent semantic indexing (LSI) and the corresponding term-by-document matrix suggested by Berry and his co-authors. Instead of using deterministic methods to find the required number of first "k" singular triplets, we propose a stochastic approach. First, we use Monte Carlo method to sample and to build much smaller size term-by-document matrix (e.g. we build k x k matrix) from where we then find the first "k" triplets using standard deterministic methods. Second, we investigate how we can reduce the problem to finding the "k"-largest eigenvalues using parallel Monte Carlo methods. We apply these methods to the initial matrix and also to the reduced one. The algorithms are running on a cluster of workstations under MPI and results of the experiments arising in textual retrieval of Web documents as well as comparison of the stochastic methods proposed are presented. (C) 2003 IMACS. Published by Elsevier Science B.V. All rights reserved.
Resumo:
Clusters of computers can be used together to provide a powerful computing resource. Large Monte Carlo simulations, such as those used to model particle growth, are computationally intensive and take considerable time to execute on conventional workstations. By spreading the work of the simulation across a cluster of computers, the elapsed execution time can be greatly reduced. Thus a user has apparently the performance of a supercomputer by using the spare cycles on other workstations.
Resumo:
It is known that germin, which is a marker of the onset of growth in germinating wheat, is an oxalate oxidase, and also that germins possess sequence similarity with legumin and vicilin seed storage proteins. These two pieces of information have been combined in order to generate a 3D model of germin based on the structure of vicilin and to examine the model with regard to a potential oxalate oxidase active site. A cluster of three histidine residues has been located within the conserved beta-barrel structure. While there is a relatively low level of overall sequence similarity between the model and the vicilin structures, the conservation of amino acids important in maintaining the scaffold of the beta-barrel lends confidence to the juxtaposition of the histidine residues. The cluster is similar structurally to those found in copper amine oxidase and other proteins, leading to the suggestion that it defines a metal-binding location within the oxalate oxidase active site. It is also proposed that the structural elements involved in intermolecular interactions in vicilins may play a role in oligomer formation in germin/oxalate oxidase.
Resumo:
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable for large-scale, multi-domain, heterogeneous environments, such as computational grids.
Resumo:
Satellite cells, originating in the embryonic dermamyotome, reside beneath the myofibre of mature adult skeletal muscle and constitute the tissue-specific stem cell population. Recent advances following the identification of markers for these cells (including Pax7, Myf5, c-Met and CD34) (CD, cluster of differentiation; c-Met, mesenchymal epithelial transition factor) have led to a greater understanding of the role played by satellite cells in the regeneration of new skeletal muscle during growth and following injury. In response to muscle damage, satellite cells harbour the ability both to form myogenic precursors and to self-renew to repopulate the stem cell niche following myofibre damage. More recently, other stem cell populations including bone marrow stem cells, skeletal muscle side population cells and mesoangioblasts have also been shown to have myogenic potential in culture, and to be able to form skeletal muscle myofibres in vivo and engraft into the satellite cell niche. These cell types, along with satellite cells, have shown potential when used as a therapy for skeletal muscle wasting disorders where the intrinsic stem cell population is genetically unable to repair non-functioning muscle tissue. Accurate understanding of the mechanisms controlling satellite cell lineage progression and self-renewal as well as the recruitment of other stem cell types towards the myogenic lineage is crucial if we are to exploit the power of these cells in combating myopathic conditions. Here we highlight the origin, molecular regulation and therapeutic potential of all the major cell types capable of undergoing myogenic differentiation and discuss their potential therapeutic application.
Resumo:
The Court of Justice has, over the years, often been vilified for exceeding the limits of its jurisdiction by interpreting the provisions of Community legislation in a way not seem originally envisaged by its drafters. A recent example of this approach was a cluster of cases in the context of the free movement of workers and the freedom of establishment (Ritter-Coulais and its progeny), where the Court included within the scope of those provisions situations which, arguably, did not present a sufficient link with their (economic) aim. In particular, in that case law the Court accepted that the mere exercise of free movement for the purpose of taking up residence in the territory of another Member State whilst continuing to exercise an economic activity in the State of origin, suffices for bringing a Member State national within the scope of Articles 39 and 43 EC. It is argued that the most plausible explanation for this approach is that the Court now wishes to re-read the economic fundamental freedoms in such a way as to include within their scope all economically active Union citizens, irrespective of whether their situation presents a sufficient link with the exercise of an economic activity in a cross-border context. It is suggested that this approach is problematic for a number of reasons. It is, therefore, concluded that the Court should revert to its orthodox approach, according to which only situations that involve Union citizens who have moved between Member States for the purpose of taking up an economic activity should be included within the scope of the market freedoms.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
This dissertation deals with aspects of sequential data assimilation (in particular ensemble Kalman filtering) and numerical weather forecasting. In the first part, the recently formulated Ensemble Kalman-Bucy (EnKBF) filter is revisited. It is shown that the previously used numerical integration scheme fails when the magnitude of the background error covariance grows beyond that of the observational error covariance in the forecast window. Therefore, we present a suitable integration scheme that handles the stiffening of the differential equations involved and doesn’t represent further computational expense. Moreover, a transform-based alternative to the EnKBF is developed: under this scheme, the operations are performed in the ensemble space instead of in the state space. Advantages of this formulation are explained. For the first time, the EnKBF is implemented in an atmospheric model. The second part of this work deals with ensemble clustering, a phenomenon that arises when performing data assimilation using of deterministic ensemble square root filters in highly nonlinear forecast models. Namely, an M-member ensemble detaches into an outlier and a cluster of M-1 members. Previous works may suggest that this issue represents a failure of EnSRFs; this work dispels that notion. It is shown that ensemble clustering can be reverted also due to nonlinear processes, in particular the alternation between nonlinear expansion and compression of the ensemble for different regions of the attractor. Some EnSRFs that use random rotations have been developed to overcome this issue; these formulations are analyzed and their advantages and disadvantages with respect to common EnSRFs are discussed. The third and last part contains the implementation of the Robert-Asselin-Williams (RAW) filter in an atmospheric model. The RAW filter is an improvement to the widely popular Robert-Asselin filter that successfully suppresses spurious computational waves while avoiding any distortion in the mean value of the function. Using statistical significance tests both at the local and field level, it is shown that the climatology of the SPEEDY model is not modified by the changed time stepping scheme; hence, no retuning of the parameterizations is required. It is found the accuracy of the medium-term forecasts is increased by using the RAW filter.
Resumo:
Whole-genome sequencing offers new insights into the evolution of bacterial pathogens and the etiology of bacterial disease. Staph- ylococcus aureus is a major cause of bacteria-associated mortality and invasive disease and is carried asymptomatically by 27% of adults. Eighty percent of bacteremias match the carried strain. How- ever, the role of evolutionary change in the pathogen during the progression from carriage to disease is incompletely understood. Here we use high-throughput genome sequencing to discover the genetic changes that accompany the transition from nasal carriage to fatal bloodstream infection in an individual colonized with meth- icillin-sensitive S. aureus. We found a single, cohesive population exhibiting a repertoire of 30 single-nucleotide polymorphisms and four insertion/deletion variants. Mutations accumulated at a steady rate over a 13-mo period, except for a cluster of mutations preceding the transition to disease. Although bloodstream bacteria differed by just eight mutations from the original nasally carried bacteria, half of those mutations caused truncation of proteins, including a prema- ture stop codon in an AraC-family transcriptional regulator that has been implicated in pathogenicity. Comparison with evolution in two asymptomatic carriers supported the conclusion that clusters of pro- tein-truncating mutations are highly unusual. Our results demon- strate that bacterial diversity in vivo is limited but nonetheless detectable by whole-genome sequencing, enabling the study of evolutionary dynamics within the host. Regulatory or structural changes that occur during carriage may be functionally important for pathogenesis; therefore identifying those changes is a crucial step in understanding the biological causes of invasive bacterial disease.
Resumo:
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (EpidemicK-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.
Resumo:
ESA’s first multi-satellite mission Cluster is unique in its concept of 4 satellites orbiting in controlled formations. This will give an unprecedented opportunity to study structure and dynamics of the magnetosphere. In this paper we discuss ways in which ground-based remote-sensing observations of the ionosphere can be used to support the multipoint in-situ satellite measurements. There are a very large number of potentially useful configurations between the satellites and any one ground-based observatory; however, the number of ideal occurrences for any one configuration is low. Many of the ground-based instruments cannot operate continuously and Cluster will take data only for a part of each orbit, depending on how much high-resolution (‘burst-mode’) data are acquired. In addition, there are a great many instrument modes and the formation, size and shape of the cluster of the four satellites to consider. These circumstances create a clear and pressing need for careful planning to ensure that the scientific return from Cluster is maximised by additional coordinated ground-based observations. For this reason, ESA established a working group to coordinate the observations on the ground with Cluster. We will give a number of examples how the combined spacecraft and ground-based observations can address outstanding questions in magnetospheric physics. An online computer tool has been prepared to allow for the planning of conjunctions and advantageous constellations between the Cluster spacecraft and individual or combined ground-based systems. During the mission a ground-based database containing index and summary data will help to identify interesting datasets and allow to select intervals for coordinated studies. We illustrate the philosophy of our approach, using a few important examples of the many possible configurations between the satellite and the ground-based instruments.
Resumo:
Broccoli, a rich source of glucosinolates, is a commonly consumed vegetable of the Brassica family. Hydrolysis products of glucosinolates, isothiocyanates, have been associated with health benefits and contribute to the flavour of Brassica. However, boiling broccoli causes the myrosinase enzyme needed for hydrolysis to denature. In order to ensure hydrolysis, broccoli must either be mildly cooked or active sources of myrosinase, such as mustard seed powder, can be added post-cooking. In this study, samples of broccoli were prepared in six different ways; standard boiling with and without mustard seeds, sous-vide cooking at low temperature (70 °C) and sous-vide cooking at higher temperature (100 ºC) without mustard and with mustard at two different concentrations. The majority of consumers disliked the mildly cooked broccoli samples (70 ºC, 12 min, sous-vide) which had a hard and stringy texture. The highest mean consumer liking was for standard boiled samples (100 ºC, 7 min). Addition of 1% mustard seed powder developed sensory attributes such as pungency, burning sensation, mustard odour and flavour. One cluster of consumers (32%) found mustard seeds to be a good complement to cooked broccoli, however, the majority disliked the mustard-derived sensory attributes. Where the mustard seeds were partially processed, doubling the addition to 2% led to only the same level of mustard flavour and pungency as 1% unprocessed seeds, and mean consumer liking remained unaltered. This suggests that optimisation of the addition level of partially processed mustard seeds may be a route to enhance bioactivity of cooked broccoli without compromising consumer acceptability.
Resumo:
The hypertrophic agonist endothelin-1 rapidly but transiently activates the extracellular signal-regulated kinase 1/2 (ERK1/2) cascade (and other signalling pathways) in cardiac myocytes, but the events linking this to hypertrophy are not understood. Using Affymetrix rat U34A microarrays, we identified the short-term (2-4 h) changes in gene expression induced in neonatal myocytes by endothelin-1 alone or in combination with the ERK1/2 cascade inhibitor, U0126. Expression of 15 genes was significantly changed by U0126 alone, and expression of an additional 78 genes was significantly changed by endothelin-1. Of the genes upregulated by U0126, four are classically induced through the aryl hydrocarbon receptor (AhR) by dioxins suggesting that U0126 activates the xenobiotic response element in cardiac myocytes potentially independently of effects on ERK1/2 signalling. The 78 genes showing altered expression with endothelin-1 formed five clusters: (i) three clusters showing upregulation by endothelin-1 according to time course (4 h > 2 h; 2 h > 4 h; 2 h approximately 4 h) with at least partial inhibition by U0126; (ii) a cluster of 11 genes upregulated by endothelin-1 but unaffected by U0126 suggesting regulation through signalling pathways other than ERK1/2; (iii) a cluster of six genes downregulated by endothelin-1 with attenuation by U0126. Thus, U0126 apparently activates the AhR in cardiac myocytes (which must be taken into account in protracted studies), but careful analysis allows identification of genes potentially regulated acutely via the ERK1/2 cascade. Our data suggest that the majority of changes in gene expression induced by endothelin-1 are mediated by the ERK1/2 cascade.