Biblioteca Digital

956 resultados para Cluster Counting Algorithm

An efficient algorithm to perform multiple testing in epistasis screening

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Research in epistasis or gene-gene interaction detection for human complex traits has grown over the last few years. It has been marked by promising methodological developments, improved translation efforts of statistical epistasis to biological epistasis and attempts to integrate different omics information sources into the epistasis screening to enhance power. The quest for gene-gene interactions poses severe multiple-testing problems. In this context, the maxT algorithm is one technique to control the false-positive rate. However, the memory needed by this algorithm rises linearly with the amount of hypothesis tests. Gene-gene interaction studies will require a memory proportional to the squared number of SNPs. A genome-wide epistasis search would therefore require terabytes of memory. Hence, cache problems are likely to occur, increasing the computation time. In this work we present a new version of maxT, requiring an amount of memory independent from the number of genetic effects to be investigated. This algorithm was implemented in C++ in our epistasis screening software MBMDR-3.0.3. We evaluate the new implementation in terms of memory efficiency and speed using simulated data. The software is illustrated on real-life data for Crohn’s disease. Results: In the case of a binary (affected/unaffected) trait, the parallel workflow of MBMDR-3.0.3 analyzes all gene-gene interactions with a dataset of 100,000 SNPs typed on 1000 individuals within 4 days and 9 hours, using 999 permutations of the trait to assess statistical significance, on a cluster composed of 10 blades, containing each four Quad-Core AMD Opteron(tm) Processor 2352 2.1 GHz. In the case of a continuous trait, a similar run takes 9 days. Our program found 14 SNP-SNP interactions with a multiple-testing corrected p-value of less than 0.05 on real-life Crohn’s disease (CD) data. Conclusions: Our software is the first implementation of the MB-MDR methodology able to solve large-scale SNP-SNP interactions problems within a few days, without using much memory, while adequately controlling the type I error rates. A new implementation to reach genome-wide epistasis screening is under construction. In the context of Crohn’s disease, MBMDR-3.0.3 could identify epistasis involving regions that are well known in the field and could be explained from a biological point of view. This demonstrates the power of our software to find relevant phenotype-genotype higher-order associations.

Evaluation of performance analysis software for Linux computing cluster

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tietokonejärjestelmän osien ja ohjelmistojen suorituskykymittauksista saadaan tietoa,jota voidaan käyttää suorituskyvyn parantamiseen ja laitteistohankintojen päätöksen tukena. Tässä työssä tutustutaan suorituskyvyn mittaamiseen ja mittausohjelmiin eli ns. benchmark-ohjelmistoihin. Työssä etsittiin ja arvioitiin eri tyyppisiä vapaasti saatavilla olevia benchmark-ohjelmia, jotka soveltuvat Linux-laskentaklusterin suorituskyvynanalysointiin. Benchmarkit ryhmiteltiin ja arvioitiin testaamalla niiden ominaisuuksia Linux-klusterissa. Työssä käsitellään myös mittausten tekemisen ja rinnakkaislaskennan haasteita. Benchmarkkeja löytyi moneen tarkoitukseen ja ne osoittautuivat laadultaan ja laajuudeltaan vaihteleviksi. Niitä on myös koottu ohjelmistopaketeiksi, jotta laitteiston suorituskyvystä saisi laajemman kuvan kuin mitä yhdellä ohjelmalla on mahdollista saada. Olennaista on ymmärtää nopeus, jolla dataa saadaan siirretyä prosessorille keskusmuistista, levyjärjestelmistä ja toisista laskentasolmuista. Tyypillinen benchmark-ohjelma sisältää paljon laskentaa tarvitsevan matemaattisen algoritmin, jota käytetään tieteellisissä ohjelmistoissa. Benchmarkista riippuen tulosten ymmärtäminen ja hyödyntäminen voi olla haasteellista.

Cluster-based active learning for compact image classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer queried and sampling is focused on division of clusters showing mixed labels. The model is tested on a VHR image in a multiclass classification setting. The method clearly outperforms random sampling in a transductive setting, but cannot generalize to unseen data, since it aims at optimizing the classification of a given cluster structure.

Coscheduling techniques and monitoring tools for non-dedicated cluster computing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our efforts are directed towards the understanding of the coscheduling mechanism in a NOW system when a parallel job is executed jointly with local workloads, balancing parallel performance against the local interactive response. Explicit and implicit coscheduling techniques in a PVM-Linux NOW (or cluster) have been implemented. Furthermore, dynamic coscheduling remains an open question when parallel jobs are executed in a non-dedicated Cluster. A basis model for dynamic coscheduling in Cluster systems is presented in this paper. Also, one dynamic coscheduling algorithm for this model is proposed. The applicability of this algorithm has been proved and its performance analyzed by simulation. Finally, a new tool (named Monito) for monitoring the different queues of messages in such an environments is presented. The main aim of implementing this facility is to provide a mean of capturing the bottlenecks and overheads of the communication system in a PVM-Linux cluster.

Sherali-Adams Relaxations and Indistinguishability in Counting Logics

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Two graphs with adjacency matrices $\mathbf{A}$ and $\mathbf{B}$ are isomorphic if there exists a permutation matrix $\mathbf{P}$ for which the identity $\mathbf{P}^{\mathrm{T}} \mathbf{A} \mathbf{P} = \mathbf{B}$ holds. Multiplying through by $\mathbf{P}$ and relaxing the permutation matrix to a doubly stochastic matrix leads to the linear programming relaxation known as fractional isomorphism. We show that the levels of the Sherali--Adams (SA) hierarchy of linear programming relaxations applied to fractional isomorphism interleave in power with the levels of a well-known color-refinement heuristic for graph isomorphism called the Weisfeiler--Lehman algorithm, or, equivalently, with the levels of indistinguishability in a logic with counting quantifiers and a bounded number of variables. This tight connection has quite striking consequences. For example, it follows immediately from a deep result of Grohe in the context of logics with counting quantifiers that a fixed number of levels of SA suffice to determine isomorphism of planar and minor-free graphs. We also offer applications in both finite model theory and polyhedral combinatorics. First, we show that certain properties of graphs, such as that of having a flow circulation of a prescribed value, are definable in the infinitary logic with counting with a bounded number of variables. Second, we exploit a lower bound construction due to Cai, Fürer, and Immerman in the context of counting logics to give simple explicit instances that show that the SA relaxations of the vertex-cover and cut polytopes do not reach their integer hulls for up to $\Omega(n)$ levels, where $n$ is the number of vertices in the graph.

Development a distributed simulation environment on a cluster of workstations

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Simulation has traditionally been used for analyzing the behavior of complex real world problems. Even though only some features of the problems are considered, simulation time tends to become quite high even for common simulation problems. Parallel and distributed simulation is a viable technique for accelerating the simulations. The success of parallel simulation depends heavily on the combination of the simulation application, algorithm and message population in the simulation is sufficient, no additional delay is caused by this environment. In this thesis a conservative, parallel simulation algorithm is applied to the simulation of a cellular network application in a distributed workstation environment. This thesis presents a distributed simulation environment, Diworse, which is based on the use of networked workstations. The distributed environment is considered especially hard for conservative simulation algorithms due to the high cost of communication. In this thesis, however, the distributed environment is shown to be a viable alternative if the amount of communication is kept reasonable. Novel ideas of multiple message simulation and channel reduction enable efficient use of this environment for the simulation of a cellular network application. The distribution of the simulation is based on a modification of the well known Chandy-Misra deadlock avoidance algorithm with null messages. The basic Chandy Misra algorithm is modified by using the null message cancellation and multiple message simulation techniques. The modifications reduce the amount of null messages and the time required for their execution, thus reducing the simulation time required. The null message cancellation technique reduces the processing time of null messages as the arriving null message cancels other non processed null messages. The multiple message simulation forms groups of messages as it simulates several messages before it releases the new created messages. If the message population in the simulation is suffiecient, no additional delay is caused by this operation A new technique for considering the simulation application is also presented. The performance is improved by establishing a neighborhood for the simulation elements. The neighborhood concept is based on a channel reduction technique, where the properties of the application exclusively determine which connections are necessary when a certain accuracy for simulation results is required. Distributed simulation is also analyzed in order to find out the effect of the different elements in the implemented simulation environment. This analysis is performed by using critical path analysis. Critical path analysis allows determination of a lower bound for the simulation time. In this thesis critical times are computed for sequential and parallel traces. The analysis based on sequential traces reveals the parallel properties of the application whereas the analysis based on parallel traces reveals the properties of the environment and the distribution.

An algorithm for identifying agent-k-linked allocations in economies with indivisibilities

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We consider envy-free (and budget-balanced) rules that are least manipulable with respect to agents counting or with respect to utility gains. Recently it has been shown that for any profile of quasi-linear preferences, the outcome of any such least manipulable envy-free rule can be obtained via agent-k-linked allocations. This note provides an algorithm for identifying agent-k-linked allocations.

A heuristic algorithm for the Capacitated Vehicle Routing Problem with Synchronized Pick-ups and Drop-offs : a case study for medications delivery and supervision in DR Congo

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Dans des contextes de post-urgence tels que le vit la partie occidentale de la République Démocratique du Congo (RDC), l’un des défis cruciaux auxquels font face les hôpitaux ruraux est de maintenir un niveau de médicaments essentiels dans la pharmacie. Sans ces médicaments pour traiter les maladies graves, l’impact sur la santé de la population est significatif. Les hôpitaux encourent également des pertes financières dues à la péremption lorsque trop de médicaments sont commandés. De plus, les coûts du transport des médicaments ainsi que du superviseur sont très élevés pour les hôpitaux isolés ; les coûts du transport peuvent à eux seuls dépasser ceux des médicaments. En utilisant la province du Bandundu, RDC pour une étude de cas, notre recherche tente de déterminer la faisabilité (en termes et de la complexité du problème et des économies potentielles) d’un problème de routage synchronisé pour la livraison de médicaments et pour les visites de supervision. Nous proposons une formulation du problème de tournées de véhicules avec capacité limitée qui gère plusieurs exigences nouvelles, soit la synchronisation des activités, la préséance et deux fréquences d’activités. Nous mettons en œuvre une heuristique « cluster first, route second » avec une base de données géospatiales qui permet de résoudre le problème. Nous présentons également un outil Internet qui permet de visualiser les solutions sur des cartes. Les résultats préliminaires de notre étude suggèrent qu’une solution synchronisée pourrait offrir la possibilité aux hôpitaux ruraux d’augmenter l’accessibilité des services médicaux aux populations rurales avec une augmentation modique du coût de transport actuel.

An adaptive cluster based routing scheme for mobile wireless sensor networks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Clustering schemes improve energy efficiency of wireless sensor networks. The inclusion of mobility as a new criterion for the cluster creation and maintenance adds new challenges for these clustering schemes. Cluster formation and cluster head selection is done on a stochastic basis for most of the algorithms. In this paper we introduce a cluster formation and routing algorithm based on a mobility factor. The proposed algorithm is compared with LEACH-M protocol based on metrics viz. number of cluster head transitions, average residual energy, number of alive nodes and number of messages lost

Linear hashtable motion estimation algorithm for distributed video processing

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a parallel Linear Hashtable Motion Estimation Algorithm (LHMEA). Most parallel video compression algorithms focus on Group of Picture (GOP). Based on LHMEA we proposed earlier [1][2], we developed a parallel motion estimation algorithm focus inside of frame. We divide each reference frames into equally sized regions. These regions are going to be processed in parallel to increase the encoding speed significantly. The theory and practice speed up of parallel LHMEA according to the number of PCs in the cluster are compared and discussed. Motion Vectors (MV) are generated from the first-pass LHMEA and used as predictors for second-pass Hexagonal Search (HEXBS) motion estimation, which only searches a small number of Macroblocks (MBs). We evaluated distributed parallel implementation of LHMEA of TPA for real time video compression.

Robust background model for pixel based people counting using a single uncalibrated camera

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Several pixel-based people counting methods have been developed over the years. Among these the product of scale-weighted pixel sums and a linear correlation coefficient is a popular people counting approach. However most approaches have paid little attention to resolving the true background and instead take all foreground pixels into account. With large crowds moving at varying speeds and with the presence of other moving objects such as vehicles this approach is prone to problems. In this paper we present a method which concentrates on determining the true-foreground, i.e. human-image pixels only. To do this we have proposed, implemented and comparatively evaluated a human detection layer to make people counting more robust in the presence of noise and lack of empty background sequences. We show the effect of combining human detection with a pixel-map based algorithm to i) count only human-classified pixels and ii) prevent foreground pixels belonging to humans from being absorbed into the background model. We evaluate the performance of this approach on the PETS 2009 dataset using various configurations of the proposed methods. Our evaluation demonstrates that the basic benchmark method we implemented can achieve an accuracy of up to 87% on sequence Â¿S1.L1 13-57 View 001Â¿ and our proposed approach can achieve up to 82% on sequence Â¿S1.L3 14-33 View 001Â¿ where the crowd stops and the benchmark accuracy falls to 64%.

Intelligent agents for fault tolerance: from multi-agent simulation to cluster-based implementation

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recent research in multi-agent systems incorporate fault tolerance concepts, but does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. A task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator, and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.

Multivariable cluster analysis for high-speed industrial machinery

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The overall operation and internal complexity of a particular production machinery can be depicted in terms of clusters of multidimensional points which describe the process states, the value in each point dimension representing a measured variable from the machinery. The paper describes a new cluster analysis technique for use with manufacturing processes, to illustrate how machine behaviour can be categorised and how regions of good and poor machine behaviour can be identified. The cluster algorithm presented is the novel mean-tracking algorithm, capable of locating N-dimensional clusters in a large data space in which a considerable amount of noise is present. Implementation of the algorithm on a real-world high-speed machinery application is described, with clusters being formed from machinery data to indicate machinery error regions and error-free regions. This analysis is seen to provide a promising step ahead in the field of multivariable control of manufacturing systems.

On merging gradient estimation with mean-tracking techniques for cluster identification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper discusses how numerical gradient estimation methods may be used in order to reduce the computational demands on a class of multidimensional clustering algorithms. The study is motivated by the recognition that several current point-density based cluster identification algorithms could benefit from a reduction of computational demand if approximate a-priori estimates of the cluster centres present in a given data set could be supplied as starting conditions for these algorithms. In this particular presentation, the algorithm shown to benefit from the technique is the Mean-Tracking (M-T) cluster algorithm, but the results obtained from the gradient estimation approach may also be applied to other clustering algorithms and their related disciplines.

Neural network basis function center selection using cluster analysis

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper deals with the selection of centres for radial basis function (RBF) networks. A novel mean-tracking clustering algorithm is described as a way in which centers can be chosen based on a batch of collected data. A direct comparison is made between the mean-tracking algorithm and k-means clustering and it is shown how mean-tracking clustering is significantly better in terms of achieving an RBF network which performs accurate function modelling.

«
1
2
3
4
5
6
7
8
...
63
64
»