My work in the STE||AR group focuses on the research and development of the Active Global Address Space (AGAS). AGAS is a set of addressing services that form a hierarchical namespace spanning all resources in a particular computation. AGAS aims to ease the difficulty of programming across local virtual memory boundaries by exposing a global addressing system that can be used to address both local and remote objects. AGAS is an extension of the PGAS model used by frameworks such as X10, Chapel, UPC and Co-Array Fortran. Unlike PGAS, which statically partitions a global address space into logical blocks, AGAS supports the dynamic addition or subtraction of hardware resources and the migration of globally named objects.
This poster outlines major AGAS developments from 2011. In addition to expanding our understanding of the AGAS model, these developments have realized substantial usability and performance benefits for HPX applications.
AGAS is the oldest HPX subsystem, and was originally written more than five years ago. As such, the first AGAS implementation contained a number of design flaws which ultimately required a complete rewrite of the AGAS subsystem. We call the original AGAS implementation AGAS V1, and the new implementation AGAS V2.
AGAS V1’s central flaw was the use of a communication layer separate from the primary HPX message protocol, the Parcel Transport Layer. This out-of-band AGAS transport only supported synchronous communication, preventing the overlapping of network latencies when making AGAS queries. AGAS V2 uses the Parcel transport layer, facilitating latency hiding and one-sided communication when making AGAS requests (see Figure 1). AGAS V2 has greatly improved the overall scalability of HPX.
AGAS’s primary function is to translate global identifiers (GIDs) to global addresses. A global address is the set of information required to remotely access an object; an object with a global address is called a global object. GIDs are unique identifiers that reference global objects.
Global addresses managed by AGAS require the creation of address translation tables. These tables are stored on a subset of the available localities (in conventional clusters, a compute node is a locality). The locality or localities hosting AGAS data are called AGAS servers. All localities that are not AGAS servers are called hosted localities.
Hosted localities can resolve GIDs by querying AGAS servers. The full resolution of a GID from a hosted locality requires communication across localities, which implies multiple network turnarounds in HPX. To reduce network traffic in HPX, we have implemented software caches for AGAS services, and we are investigating hardware cache solutions via FPGAs.
Initially, HPX’s AGAS cache stored single address entries. In large computations which frequently reference millions of global objects throughout their execution, this naïve method of caching leads to cache thrashing. To alleviate this issue, range-based caching was introduced. Range-based caching is based on HPX’s pre-existing support for the bulk allocation of GIDs and bulk registration of global objects. These bulk operations use contiguous blocks of GIDs and contiguous blocks of local virtual memory. Additionally, each bulk registration operates on a particular type of object (taken as a parameter to the operation) with a fixed data size. Range-based caching optimizes the caching of global addresses that fall into these blocks by storing bulk cache entries which describe entire blocks.
Figure 2 depicts the information needed to resolve a block. The interval [a, b) in Figure 2 is local virtual memory spanned by the block. A similar scheme is used for storing the range of contiguous GIDs associated with block.
Range-based caching allows a relatively small cache to store very large regions of the global address space. This greatly reduces cache evictions and cache misses, reducing HPX’s overhead and improving HPX’s overall scalability. These performance improvements are demonstrated by results from a standard HPX benchmark, shown below in Figure 3 and Figure 4.
The benchmark used was the HPX Eager Future Overhead (EFO) test, which is described in more detail in one of our publications, Adaptive Mesh Refinement for Astrophysics Applications with ParalleX (arXiV:1110.1131, section V, subsection A). EFO is part of the main HPX codebase.