Solving Combination Puzzles – An example HPX application – Part 1

HPX is great for developing applications that run both in a shared memory and distributed memory environment. This is accomplished by leveraging the Active Global Address Space (AGAS). By creating components in AGAS we gain the ability to seamlessly write parallel object oriented applications without the need to manually care about passing messages to different localities of explicitly creating threads. While this idea sounds great it is difficult to think about an implementation which achieves exactly that. As such this blog post is trying to walk you through the development of a recursive back tracking brute force solver for combination puzzles and you will discover that recursion allows us in general to exploit parallelism.

This is the first post in a series. This article series will walk you through the complete lifecycle of an HPX application. From the first basic idea, which is covered in this post to a full fledged HPX application exploiting the unique features of HPX to write programs with a unified semantic for local and possibly remote access to objects. The idea to develop such an application was given by Andreas Schäfer who challenged me to beat his MPI implementation. We’ll see how we fared in the last post of this series.

The basic idea and algorithm

Solving combination puzzles can be done a variety of ways. In order to demonstrate some functionalities of HPX we decided to implement a naive recursive back tracking algorithm which essentially does a depth search over all possible moves for a puzzle and terminates whenever a solution was found, or the maximum search depth of the recursion was reached and no solution was found. The basic algorithm therefore reads as follows:

template
bool solve(Puzzle const & p, std::size_t depth)
{
    if(p.is_solution()) return true;
    if(max_depth == depth) return false;

    std::vector next_moves = p.next_moves();
    for(Puzzle const & next : next_moves)
    {
        if(solve(next, depth+1))
        {
            return true;
        }
    }
    return false;
}

What p.next_moves() is supposed to do is probably best shown with a picture:

The next moves of an example 15 puzzle configuration.

The next moves of an example 15 puzzle configuration.

The image shows the possible next moves for the 15 puzzle which is an example puzzle from the category of combination puzzles.

Recursion is parallelism!

So far so good. Sounds easy enough! But where is the parallelism you might ask. The answer is: Recursion is parallelism!
Let me prove that statement by presenting the following code:


template
bool solve(Puzzle const & p, std::size_t depth)
{
    if(p.is_solution()) return true;
    if(max_depth == depth) return false;

    std::vector next_moves = p.next_moves();

    std::vector solve_futures;
    solve_futures.reserve(next_moves.size());
    for(Puzzle const & next : next_moves)
    {
        bool (*solve_fun)(Puzzle const &, std::size_t) = solve;
        solve_futures.push_back(
            hpx::async(
                hpx::util::bind(
                    solve_fun
                  , next
                  , depth+1
                )
            )
        );
    }

    while(!solve_futures.empty())
    {
        hpx::util::tuple<int, hpx::future >
            wait_result = hpx::wait_any(solve_futures);

        std::size_t idx = hpx::util::get(wait_result);
        std::size_t res = hpx::util::get(wait_result).get();

        solve_futures.erase(solve_futures.begin() + idx);
        if(res)
        {
            if(!solve_futures.empty())
            {
                hpx::wait(solve_futures);
            }
            return true;
        }
    }

    return false;
}

Alright. That was too much. The simple and basic recursion just grew almost three times in size! However, the gain is massive.
Let me walk you through the various parts that changed:

  1. Line 13 to 22:
    Instead of calling the function directly, we just issue an call to async. This will have the effect that a new thread invoking our recursion will get scheduled eventually. We push the result of the async call, which is a future, into a vector. A future is a object representing a value that will be computed eventually. A future can have various representations. In our case, the future is represented by a result of a asynchronously spawned function.
  2. Line 25 to 42:
    This is the new meat of our logic. Instead of getting the results back sequentially, we will get results of completed recursions in any order. As such we wait on any of the futures (Line 19). As a result we get a tuple containing the index to the future in our vector, and the actual future that was completed. We erase that future from our vector since we know it was completed (Line 24). If the result of this function was “true”, we can bail out, wait on any remaining function invocations and return true. Otherwise, if all futures returned with a result of “false”, this specific function invocation was not successful in finding a solution.

A full and working example demonstrating this solver can be found here. You can compile it by following the instructions given in the HPX documentation.

We made it! We turned the inherently sequential execution into a fully parallel version! This version has several problems. The first problem is that it only works on one single locality (locality is the HPX term for node). That is, no distributed computing yet. This will be covered in part two. However, we are already able to exploit the full power of multi-core CPUs.
Another problem is that this algorithm as is, searches every possible branch for a solution, which is not exactly what we want. We’d like to stop the recursion if any other thread has already found a solution. This will be discussed in the third part.

GD Star Rating
loading...
    This entry was posted in Applications, Tutorial by Thomas Heller. Bookmark the permalink.

    About Thomas Heller

    Thomas is a researcher at the Friedrich-Alexander University in Erlangen. He works at the Computer Science Chair for Computer Architecture. His research deals with mapping an abstract formulation of a algorithms developed with the help of a C++ EDSL onto any given heterogeneous multi- and many-core architectures achieving optimal performance.

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>