Henri Casanova, Associate Professor and director of the CoRG Laboratory, discusses recent research on how to design network topologies for high performance computing.

What is this about?

Networks  have been used to interconnect computers for decades. We use some of these on a daily basis (wired internet, wireless networks) for a wide range of tasks. Other networks are built for more specific purposes, such as interconnecting powerful compute servers, or nodes, in High Performance Computing (HPC) platforms. These platforms, sometimes still referred to as “Supercomputers”, are designed to run applications that simply cannot run on a single computer.  For these applications to run fast, the network interconnecting the nodes must also be fast.

One question that has received a lot of attention is: what is the best topology for HPC interconnects? In other words, viewing the compute nodes as the vertices in a graph, how many edges should be used to connect these vertices and where should these edges be?   Many topologies have been proposed that have various advantages and drawbacks in terms of cost (e.g., number of edges) and performance (e.g., number of edges on the path between two nodes).

In a recent article referenced below, ICS professor Henri Casanova and colleagues from the National Institute of Informatics (Tokyo, Japan), Keio University (Yokohama, Japan), and Fordham University (New York, U.S.A.) have made a case for making HPC interconnects purposely random! This goes against decades of research during which non-random, sophisticated topologies have been proposed time and again.

Just how good are random networks?

A common  metric to assess the goodness of a network topology in an HPC system is the diameter. Given two vertices in a graph, one can compute the length of the shortest path between them in number of hops, i.e., how many edges are traversed on that shortest path. The diameter is the largest such length over all possible vertex pairs.  The degree of a network is the number of other vertices to which each vertex is directly connected.  A common goal is to achieve a diameter as low possible without having too high a degree because a high degree means a higher cost.

Consider a topology of 2^15 vertices that are interconnected in a simple ring with 2^15 edges. This topology is said to have degree 2, because each vertex is connected to 2 neighbors.  The diameter is 2^14 as one must hop around half the ring to go from one vertex to the the vertex that’s diametrically opposed to it.  A simple procedure to reduce the diameter is to add shortcuts edges to the network. One can make the degree 3 by adding an edge between each vertex and the vertex diametrically opposed to in in the ring, then one can make the degree 5 by adding an edge between each vertex and the two vertices that are one quarter away from it around the ring, and so on.  Initially, each time the degree increases by 2 the diameter is roughly halved.

Graph theoreticians have known that random graphs have good diameter properties for many years. The main idea behind our research article is to build on this knowledge.  The advantage of random topologies over non-random topologies turns out to be absolutely shocking. Consider adding random shortcut edges to the same ring topology, simply insuring that all vertices have the same degree. The graph below shows the improvement in diameter as the degree increases for both methods: non-random and random.

random_topologyThe main observation  is that adding random shortcuts leads to absolutely stunning improvements over adding non-random shortcuts (the vertical axis is logarithmic!). For instance, at degree 5, the random topology has a diameter of 10 while the non-random topology has a diameter above 4,097! In other words, adding a few random shortcuts works much better than adding a lot of non-random shortcuts. Given such results, shouldn’t we build HPC interconnects as random graphs?

What have you learned from this project?

  • Random HPC interconnects are  better – The randomized ring topology described above is better than traditional non-random HPC interconnects. It is better in terms of graph metrics (diameter, average shortest path length, robustness to edge removal). It is also better in terms of network metrics (latency, throughput, throughput degradation when network links fail) as shown by discrete-event simulation of various workloads.
  • As much randomness as possible is good – It turns out that adding random shortcuts to a ring, the simplest connected non-random topology, is better than adding fewer random shortcuts to a higher degree non-random topology. In other words, the more randomness in the random topology, the better it fares. This is really an interesting result since for decades researchers have striven to create clever non-random topologies with higher degree than the ring. What these results are saying is that one shouldn’t bother: just use a ring, roll dice, and you’ll be better off.
  • Simple randomness is enough – Trying to add “good” random shortcuts, for instance by connected vertices that are known to be far apart, only brings minor benefit. This is good news for random topologies as it means a simple random procedure is effective. Furthermore, little variance was observed between different random instances. So it is not necessary to generate many random topologies to find a good one. Only a few suffice.

The above results and many others are detailed in theresearch article referenced below. The overall conclusion is that random topologies should be strong contenders for HPC interconnects, an area in which, to the best of our knowledge, they have been completely ignored until now.

For more information:

M. Koibuchi, H. Matsutani, H. Amano, D. F. Hsu, H. Casanova, A Case for Random Shortcut Topologies for HPC Interconnects. In Proceedings of the 39th International Symposium on Computer Architecture, June 2012.