Assefaw, Parallel Algorithms

Assefaw Gebremedhin, Papers on Parallel Graph Algorithms

Title : Balanced Coloring for Parallel Computing Applications
Authors: H. Lu, M. Halappanavar, D. Charri-a-Miranda, A.H. Gebremedhin and A. Kalyanaraman
Status: Proceedings of IPDPS 2015.

Abstract

Graph coloring is used to identify subsets of independent tasks in parallel scientific computing applications. Traditional coloring heuristics aim to reduce the number of colors used as that number also corresponds to the number of parallel steps in the application. However, if the color classes produced have a skew in their sizes, utilization of hardware resources becomes inefficient, especially for the smaller color classes. Equitable coloring is a theoretical formulation of coloring that guarantees a perfect balance among color classes, and its practical relaxation is referred to as balanced coloring. In this paper, we revisit the problem of balanced coloring in the context of parallel computing. The goal is to achieve a balanced coloring of an input graph without increasing the number of colors that an algorithm oblivious to balance would have used. We propose and study multiple heuristics that aim to achieve such a balanced coloring, present parallelization approaches for multi-core and manycore architectures, and cross-evaluate their effectiveness with respect to the quality of balance achieved and performance. Furthermore, we study the impact of the proposed balanced coloring heuristics on a concrete application - viz. parallel community detection, which is an example of an irregular application. The thorough treatment of balanced coloring presented in this paper from algorithms to application is expected to serve as a valuable resource to parallel application developers who seek to improve parallel performance of their applications using coloring.

Download paper in PDF

Title: Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
Authors: R.A. Rossi, D.F. Gleich, A.H. Gebremedhin, M.M.A. Patwary
Status: Proceedings of WWW2014.

Abstract

We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. Despite clique's status as an NP-hard problem with poor approximation guarantees, our method exhibits nearly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Key to the efficiency of our algorithm are an initial heuristic procedure that finds a large clique quickly and a parallelized branch and bound strategy with aggressive pruning tnd ordering echniques. We use the algorithm to compute the largest temporal strong components of temporal contact networks.

Download paper in PDF

Title: Graph Coloring Algorithms for Multi-core and Massively Multithreaded Architectures
Authors: U. Catalyurek, J. Feo, A.H. Gebremedhin, M. Halappanavar, A. Pothen
Status: Parallel Computing 38 (2012), 576-594.

Abstract

We explore the interplay between architectures and algorithm design in the context of shared-memory platforms and a specific graph problem of central importance in scientific and high-performance computing, distance-1 graph coloring. We introduce two different kinds of multithreaded heuristic algorithms for the stated, NP-hard, problem. The first algorithm relies on speculation and iteration, and is suitable for any shared-memory system. The second algorithm uses dataflow principles, and is targeted at the non-conventional, massively multithreaded Cray XMT system. We study the performance of the algorithms on three different platforms---Cray XMT, Sun Niagara 2, and Intel Nehalem---representing varying degrees of multithreading capabilities. As testbed, we use synthetically generated massive graphs carefully designed to cover a wide spectrum of input types. The results show that the algorithms have scalable runtime performance and use nearly the same number of colors as the underlying serial algorithm, which in turn is effective in practice. The study provides insight into the design of high performance algorithms for irregular problems on many-core architectures.

Download paper in PDF

Title: New Multithreaded Ordering and Coloring Algorithms for Multicore Architectures
Authors: M.M.A. Patwary, A.H. Gebremedhin and A. Pothen
Status: In E. Jeannot, R. Namyst and J. Roman, editors, Euro-Par 2011, volume 6853 of Lecture Notes in Computer Science, pages 250 -- 262. Springer, 2011.

Abstract

We present new multithreaded vertex ordering and distance-k graph coloring algorithms that are well-suited for the emerging and rapidly growing multicore platforms. The vertex ordering techniques rely on various notions of ``degree", are known to be effective in reducing the number of colors used by a greedy coloring algorithm, and are generic enough to be applicable to contexts other than coloring. We employ approximate degree computation in the ordering algorithms and speculation and iteration in the coloring algorithms as our primary remedies for breaking sequentiality and achieving effective parallelization. The algorithms have been implemented using OpenMP, and experiments run on Intel Nehalem and other multi-core machines using a set of carefully designed synthetic graphs and real-world graphs attest that the algorithms provide scalable runtime performance. The number of colors the algorithms use is nearly the same as in the serial case, which in turn is often very close to optimal.

Download paper in PDF

Title: Distributed-memory Parallel Algorithms for Matching and Coloring
Authors: U. Catalyurek, F. Dobrian, A.H. Gebremedhin, M. Halappanavar, A. Pothen
Status: Proceedings of IEEE International Parallel and Distributed Processing Symposium, Workshops and PhD Forums (IPDPSW), Workshop on Parallel Computing and Optimization (PCO'11), pages 1966--1975, 2011

Abstract

We discuss the design and implementation of new highly-scalable distributed-memory parallel algorithms for two prototypical graph problems, edge-weighted matching and distance-1 vertex coloring. Graph algorithms in general have low concurrency, poor data locality, and high ratio of data access to computation costs, making it challenging to achieve scalability on massively parallel machines. We overcome this challenge by employing a variety of techniques, including speculation and iteration, optimized communication, and randomization. We present preliminary results on weak and strong scalability studies conducted on an IBM Blue Gene/P machine employing up to tens of thousands of processors. The results show that the algorithms hold strong potential for computing at petascale.

Download paper in PDF

Title: Distributed-memory Parallel Algorithms for Distance-2 Coloring and Related Problems in Derivative Computation
Authors: D. Bozdag, U. Catalyurek, A. Gebremedhin, F. Manne, E. Boman and F. Ozgunner
Status: SIAM Journal on Scientific Computing, Vol 32, Issue 4, pp 2418--2446, 2010.

Abstract

The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than two from each other. Its applications include derivative computation in numerical optimization and channel assignment in radio networks. We present efficient, distributed-memory, parallel heuristic algorithms for this NP-hard problem as well as for two related problems used in the computation of Jacobians and Hessians. Parallel speedup is achieved through graph partitioning, speculative (iterative) coloring, and a BSP-like organization of parallel computation. Results from experiments conducted on a PC cluster employing up to 96 processors and using large-size real-world as well as synthetically generated test graphs show that the algorithms are scalable. In terms of quality of solution, the algorithms perform remarkably well---the number of colors used by the parallel algorithms was observed to be very close to the number used by the sequential counterparts, which in turn are quite often near optimal. Moreover, the experimental results show that the parallel distance-2 coloring algorithm compares favorably with the alternative approach of solving the distance-2 coloring problem on a graph $G$ by first constructing the square graph $G^2$ and then applying a parallel distance-1 coloring algorithm on $G^2$. Implementations of the algorithms are made available via the Zoltan load-balancing library.

Download paper in PDF

Title: A framework for Scalable Greedy Coloring on Distributed Memory Parallel Computers
Authors: D. Bozdag, A. Gebremedhin, F. Manne, E. Boman and U. Catalyurek
Status: Journal of Parallel and Distributed Computing Vol 68, No 4, pp 515--535, 2008.

Abstract

We present a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers. The framework unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency. The latter techniques include exploiting features of the initial data distribution, the use of speculative coloring and randomization, and a BSP-style organization of computation and communication. We experimentally study the performance of several specialized algorithms designed using the framework and implemented using MPI. The experiments are conducted on two different platforms and the test cases include large-size synthetic graphs as well as real graphs drawn from various application areas. Computational results show that implementations that yield good speedup while at the same time using about the same number of colors as a sequential greedy algorithm can be achieved by setting parameters of the framework in accordance with the size and structure of the graph being colored. Our implementation is freely available as part of the Zoltan parallel data management and load-balancing library.

Download paper in PDF

Title: A Parallel Distance-2 Graph Coloring Algorithm for Distributed Memory Computers
Authors: D. Bozdag, U. Catalyurek, A.H. Gebremedhin, F. Manne, E. G. Boman and F. Ozguner
Status: Lecture Notes in Computer Science, vol 3726, 2005, pages 796 - 806,Springer. Proc. of HPCC 2005, Sept 21 - 25, 2005, Sorrento, Italy.

Abstract

The distance-2 graph coloring problem aims at partitioning the vertex set of a graph into the fewest sets consisting of vertices pairwise at distance greater than two from each other. Application examples include numerical optimization and channel assignment. We present the first distributed-memory heuristic algorithm for this NP-hard problem. Parallel speedup is achieved through graph partitioning, speculative (iterative) coloring, and a BSP-like organization of computation. Experimental results show that the algorithm is scalable, and compares favorably with an alternative approach---solving the problem on a graph G by first constructing the square graph G^2 and then applying a parallel distance-1 coloring algorithm on G^2 .

Download paper in PDF

Title: A Scalable Parallel Graph Coloring Algorithm for Distributed Memory Computers
Authors: E.G. Boman, D. Bozdag, U. Catalyurek, A.H. Gebremedhin and F. Manne
Status: Lecture Notes in Computer Science, vol 3648 , 2005, pages 241 - 251,Springer. Proc. of EuroPar 2005, 30 Aug - 2 Sept, 2005, Lisboa, Portugal.

Abstract

In large-scale parallel applications a graph coloring is often carried out to schedule computational tasks. In this paper, we describe a new distributed-memory algorithm for doing the coloring itself in parallel. The algorithm operates in an iterati ve fashion; in each round vertices are speculatively colored based on limited information, and then a set of incorrectly colored vertices, to be recolored in the next round, is identified. Parallel speedup is achieved in part by reducing the frequency of communication among processors. Experimental results on a PC cluster using up to 16 processors show that the algorithm is scalable.

Download paper in PDF

Title: Speeding up Parallel Graph Coloring
Authors: A.H. Gebremedhin, F.Manne and T. Woods
Status: Lecture Notes in Computer Science, vol 3732, pp 1079-1088, 2005, Springer. Proc. of Para 2004, June 20 - 23, 2004, Lyngby, Denmark.

Abstract

This paper presents new efficient parallel algorithms for finding approximate solutions to graph coloring problems. We consider an existing shared memory parallel graph coloring algorithm and suggest several enhancements both in terms of ordering the vertices so as to minimize cache misses, and performing vertex-to- processor assignments based on graph partitioning instead of random allocation. We report experimental results that demonstrate the performance of our algorithms on an IBM Regatta supercomputer when up to 12 processors are used. Our implementations use OpenMP for parallelization and Metis for graph partitioning. The experiments show that we get up to a 70 % reduction in runtime compared to the previous algorithm.

Download paper in PDF

Title Graph Coloring on Coarse Grained Multicomputers
Authors: A. Gebremedhin, I.Guerrin-Lassous, J. Gustedt and J.A. Telle
Status: Discrete Applied Mathematics, Vol 131, No 1, pp 179--198, 2003

Abstract

We present an efficient and scalable Coarse Grained Multicomputer (CGM) coloring algorithm that colors a graph G with at most D + 1 colors where D is the maximum degree in G . This algorithm is given in two variants: a randomized and a deterministic . We show that on a p-processor CGM model the proposed algorithms require a parallel time of O(|G|/p) and a total work and overall communication cost of O(|G|) . These bounds correspond to the average case for the randomized version and to the worst case for the deterministic variant.

Download paper in PDF

Title: Parallel Distance-k Coloring Algorithms for Numerical Optimization
Authors: A.H. Gebremedhin, F. Manne and A. Pothen
Status: In B. Monien and R. Feldmann (Eds.): EuroPar 2002, Lecture Notes in Computer Science 2400, pp. 912-921, Springer-Verlag 2002.

Abstract

Matrix partitioning problems that arise in the efficient estimation of sparse Jacobians and Hessians can be modeled using variants of graph coloring problems. In a previous work, we argue that distance-2 coloring and distance-3/2 coloring [we now call this star coloring ] are robust and flexible formulations of the respective matrix estimation problems. The problem size in large-scale optimization contexts makes the matrix estimation phase an expensive part of the entire computation both in terms of execution time and memory space. Hence, there is a need for both shared- and distributed-memory parallel algorithms for the stated graph coloring problems. In the current work, we present the first practical shared address space parallel algorithms for these problems. The main idea in our algorithms is to randomly partition the vertex set equally among the available processors, let each processor speculatively color its vertices using information about already colored vertices, detect eventual conflicts in parallel, and finally re-color conflicting vertices sequentially. Randomization is also used in the coloring phases to further reduce conflicts. Our PRAM-analysis shows that the algorithms should give almost linear speedup for sparse graphs that are large relative to the number of processors. Experimental results from our OpenMP implementations on a Cray Origin2000 using various large graphs show that the algorithms indeed yield reasonable speedup for modest numbers of processors.

Download paper in PDF

Title: Scalable Parallel Graph Coloring Algorithms
Authors: A. Gebremedhin and F. Manne
Status: Concurrency: Practice and Expereince, Vol 12, pp1131--1146, 2000.

Abstract

Finding a good graph coloring quickly is often a crucial phase in the development of efficient, parallel algorithms for many scientific and engineering applications. In this paper we consider the problem of solving the graph coloring problem itself in parallel. We present a simple and fast parallel graph coloring heuristic that is well suited for shared memory programming and yields an almost linear speedup on the PRAM model. We also present a second heuristic that improves on the number of colors used. The heuristics have been implemented using OpenMP. Experiments conducted on an SGI Cray Origin 2000 super computer using very large graphs from finite element methods and eigenvalue computations validate the theoretical run-time analysis.

Download paper in PDF

Title: Graph Coloring on a Coarse Grained Multiprocessor
Authors: A.H. Gebremedhin, I.G. Lassous, J. Gustedt and J.A. Telle
Status: In Brandes, Ulrik, Wagner and Dorothea (Eds.): WG 2000, Lecture Notes in Computer Science 1928, pp. 184-195, 2000, Springer-Verlag.

Abstract

We present the first efficient parallel coloring algorithm for the Coarse Grain Multicomputer model. The algorithm uses at most D+1 colors, where D is the maximum degree in the graph.

Download paper in PDF

Title: Parallel Graph Coloring Algorithms using OpenMP
Authors: A.H. Gebremedhin and F. Manne
Status: In Proc. of EWOMP'99, First European Workshop on OpenMP, Sept.30 - Oct. 1, 1999, Lund, Sweden.

Download Extended Abstract in PDF

Go to

Research Areas

Coloring and Automatic Differentiation

Parallel Algorithms

Parallel Computation Models