HPX 1.0.0

The STE||AR Group

Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)


Table of Contents

Preface
What's New
HPX V1.0 (Apr 24, 2017)
Previous HPX Releases
HPX V0.9.99 (Jul 15, 2016)
HPX V0.9.11 (Nov 11, 2015)
HPX V0.9.10 (Mar 24, 2015)
HPX V0.9.9 (Oct 31, 2014, codename Spooky)
HPX V0.9.8 (Mar 24, 2014)
HPX V0.9.7 (Nov 13, 2013)
HPX V0.9.6 (Jul 30, 2013)
HPX V0.9.5 (Jan 16, 2013)
HPX V0.9.0 (Jul 5, 2012)
HPX V0.8.1 (Apr 21, 2012)
HPX V0.8.0 (Mar 23, 2012)
HPX V0.7.0 (Dec 12, 2011)
Tutorial
Getting Started
How to Use HPX Applications with PBS
How to Use HPX Applications with SLURM
Introduction
What makes our Systems Slow?
Technology Demands New Response
Governing Principles applied while Developing HPX
Examples
Fibonacci
Hello World
Accumulator
Interest Calculator
Futurization Example
Manual
The HPX Build System
CMake Basics
Build Prerequisites
Installing Boost Libraries
Building HPX
CMake Variables used to configure HPX
CMake Toolchains shipped with HPX
Build recipes
Setting up the HPX Documentation Tool Chain
Building Projects using HPX
Using HPX with pkg-config
Using HPX with CMake based projects
Testing HPX
Running tests manually
Issue Tracker
Buildbot
Launching HPX
Configure HPX Applications
The HPX INI File Format
Built-in Default Configuration Settings
Loading INI Files
Loading Components
Logging
HPX Command Line Options
More Details about HPX Command Line Options
HPX System Components
The HPX I/O-streams Component
Writing HPX applications
Global Names
Applying Actions
Action Type Definition
Action Invocation
Applying an Action Asynchronously without any Synchronization
Applying an Action Asynchronously with Synchronization
Applying an Action Synchronously
Applying an Action with a Continuation but without any Synchronization
Applying an Action with a Continuation and with Synchronization
Action Error Handling
Writing Components
Defining Components
Defining Client Side Representation Classes
Creating Component Instances
Using Component Instances
Using LCOs
Extended Facilities for Futures
High Level Parallel Facilities
Using Parallel Algorithms
Executors and Executor Traits
Executor Parameters and Executor Parameter Traits
Using Task Blocks
Extensions for Task Blocks
Error Handling
Performance Counters
Performance Counter Names
Consuming Performance Counter Data
Consuming Performance Counter Data from the Command Line
Consuming Performance Counter Data using the HPX API
Providing Performance Counter Data
Exposing Performance Counter Data using a Simple Function
Implementing a Full Performance Counter
Existing HPX Performance Counters
HPX Thread Scheduling Policies
Index
Reference
Header <hpx/components/component_storage/migrate_from_storage.hpp>
Function template migrate_from_storage
Header <hpx/components/component_storage/migrate_to_storage.hpp>
Function template migrate_to_storage
Function template migrate_to_storage
Header <hpx/error.hpp>
Type error — Possible error conditions.
Header <hpx/error_code.hpp>
Class error_codeA hpx::error_code represents an arbitrary error condition.
Header <hpx/exception.hpp>
Class exceptionA hpx::exception is the main exception type used by HPX to report errors.
Struct thread_interruptedA hpx::thread_interrupted is the exception type used by HPX to interrupt a running HPX thread.
Function diagnostic_information — Extract the diagnostic information embedded in the given exception and return a string holding a formatted message.
Function get_error_what — Return the error message of the thrown exception.
Function get_error_locality_id — Return the locality id where the exception was thrown.
Function get_error — Return the locality id where the exception was thrown.
Function get_error_host_name — Return the hostname of the locality where the exception was thrown.
Function get_error_process_id — Return the (operating system) process id of the locality where the exception was thrown.
Function get_error_env — Return the environment of the OS-process at the point the exception was thrown.
Function get_error_function_name — Return the function name from which the exception was thrown.
Function get_error_backtrace — Return the stack backtrace from the point the exception was thrown.
Function get_error_file_name — Return the (source code) file name of the function from which the exception was thrown.
Function get_error_line_number — Return the line number in the (source code) file of the function from which the exception was thrown.
Function get_error_os_thread — Return the sequence number of the OS-thread used to execute HPX-threads from which the exception was thrown.
Function get_error_thread_id — Return the unique thread id of the HPX-thread from which the exception was thrown.
Function get_error_thread_description — Return any additionally available thread description of the HPX-thread from which the exception was thrown.
Function get_error_config — Return the HPX configuration information point from which the exception was thrown.
Function get_error_state — Return the HPX runtime state information at which the exception was thrown.
Header <hpx/exception_fwd.hpp>
Global throwsPredefined error_code object used as "throw on error" tag.
Header <hpx/exception_list.hpp>
Class exception_list
Header <hpx/hpx_finalize.hpp>
Function finalize — Main function to gracefully terminate the HPX runtime system.
Function finalize — Main function to gracefully terminate the HPX runtime system.
Function terminate — Terminate any application non-gracefully.
Function disconnect — Disconnect this locality from the application.
Function disconnect — Disconnect this locality from the application.
Function stop — Stop the runtime system.
Header <hpx/hpx_init.hpp>
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Function init — Main entry point for launching the HPX runtime system.
Header <hpx/hpx_start.hpp>
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Function start — Main non-blocking entry point for launching the HPX runtime system.
Header <hpx/lcos/barrier.hpp>
Class barrier
Header <hpx/lcos/broadcast.hpp>
Function template broadcast — Perform a distributed broadcast operation.
Function template broadcast_apply — Perform an asynchronous (fire&forget) distributed broadcast operation.
Function template broadcast_with_index — Perform a distributed broadcast operation.
Function template broadcast_apply_with_index — Perform an asynchronous (fire&forget) distributed broadcast operation.
Header <hpx/lcos/fold.hpp>
Function template fold — Perform a distributed fold operation.
Function template fold_with_index — Perform a distributed folding operation.
Function template inverse_fold — Perform a distributed inverse folding operation.
Function template inverse_fold_with_index — Perform a distributed inverse folding operation.
Header <hpx/lcos/gather.hpp>
Function template gather_here
Function template gather_there
Function template gather_here
Function template gather_there
Header <hpx/lcos/split_future.hpp>
Function template split_future
Header <hpx/lcos/wait_all.hpp>
Function template wait_all
Function template wait_all
Function template wait_all
Function template wait_all
Function template wait_all_n
Header <hpx/lcos/wait_any.hpp>
Function template wait_any
Function template wait_any
Function template wait_any
Function template wait_any
Function template wait_any
Function template wait_any_n
Header <hpx/lcos/wait_each.hpp>
Function template wait_each
Function template wait_each
Function template wait_each
Function template wait_each_n
Header <hpx/lcos/wait_some.hpp>
Function template wait_some
Function template wait_some
Function template wait_some
Function template wait_some
Function template wait_some_n
Header <hpx/lcos/when_all.hpp>
Function template when_all
Function template when_all
Function template when_all
Function template when_all_n
Header <hpx/lcos/when_any.hpp>
Struct template when_any_result
Function template when_any
Function template when_any
Function template when_any
Function template when_any_n
Header <hpx/lcos/when_each.hpp>
Function template when_each
Function template when_each
Function template when_each
Function template when_each_n
Header <hpx/lcos/when_some.hpp>
Struct template when_some_result
Function template when_some
Function template when_some
Function template when_some
Function template when_some
Function template when_some_n
Header <hpx/parallel/algorithms/adjacent_difference.hpp>
Function template adjacent_difference
Function template adjacent_difference
Header <hpx/parallel/algorithms/adjacent_find.hpp>
Function template adjacent_find
Header <hpx/parallel/algorithms/all_any_none.hpp>
Function template none_of
Function template any_of
Function template all_of
Header <hpx/parallel/algorithms/copy.hpp>
Function template copy
Function template copy_n
Function template copy_if
Header <hpx/parallel/container_algorithms/copy.hpp>
Function template copy
Function template copy_if
Header <hpx/parallel/algorithms/count.hpp>
Function template count
Function template count_if
Header <hpx/parallel/algorithms/equal.hpp>
Function template equal
Function template equal
Header <hpx/parallel/algorithms/exclusive_scan.hpp>
Function template exclusive_scan
Function template exclusive_scan
Header <hpx/parallel/algorithms/fill.hpp>
Function template fill
Function template fill_n
Header <hpx/parallel/algorithms/find.hpp>
Function template find
Function template find_if
Function template find_if_not
Function template find_end
Function template find_first_of
Header <hpx/parallel/algorithms/for_each.hpp>
Global F
Function template for_each_n
Header <hpx/parallel/container_algorithms/for_each.hpp>
Header <hpx/parallel/algorithms/for_loop.hpp>
Function template for_loop
Function template for_loop
Function template for_loop_strided
Function template for_loop_strided
Function template for_loop_n
Function template for_loop_n
Function template for_loop_n_strided
Function template for_loop_n_strided
Header <hpx/parallel/algorithms/for_loop_induction.hpp>
Function template induction
Header <hpx/parallel/algorithms/for_loop_reduction.hpp>
Function template reduction
Header <hpx/parallel/algorithms/generate.hpp>
Function template generate
Function template generate_n
Header <hpx/parallel/container_algorithms/generate.hpp>
Function template generate
Header <hpx/parallel/algorithms/includes.hpp>
Function template includes
Header <hpx/parallel/algorithms/inclusive_scan.hpp>
Function template inclusive_scan
Function template inclusive_scan
Function template inclusive_scan
Header <hpx/parallel/algorithms/is_partitioned.hpp>
Function template is_partitioned
Header <hpx/parallel/algorithms/is_sorted.hpp>
Function template is_sorted
Function template is_sorted_until
Header <hpx/parallel/algorithms/lexicographical_compare.hpp>
Function template lexicographical_compare
Header <hpx/parallel/algorithms/minmax.hpp>
Function template min_element
Function template max_element
Function template minmax_element
Header <hpx/parallel/container_algorithms/minmax.hpp>
Function template min_element
Function template max_element
Function template minmax_element
Header <hpx/parallel/algorithms/mismatch.hpp>
Function template mismatch
Function template mismatch
Header <hpx/parallel/algorithms/move.hpp>
Function template move
Header <hpx/parallel/algorithms/reduce.hpp>
Function template reduce
Function template reduce
Function template reduce
Header <hpx/lcos/reduce.hpp>
Function template reduce — Perform a distributed reduction operation.
Function template reduce_with_index — Perform a distributed reduction operation.
Header <hpx/parallel/algorithms/reduce_by_key.hpp>
Header <hpx/parallel/algorithms/remove_copy.hpp>
Function template remove_copy
Function template remove_copy_if
Header <hpx/parallel/container_algorithms/remove_copy.hpp>
Function template remove_copy
Function template remove_copy_if
Header <hpx/parallel/algorithms/replace.hpp>
Function template replace
Function template replace_if
Function template replace_copy
Function template replace_copy_if
Header <hpx/parallel/container_algorithms/replace.hpp>
Function template replace
Function template replace_if
Function template replace_copy
Function template replace_copy_if
Header <hpx/parallel/algorithms/reverse.hpp>
Function template reverse
Function template reverse_copy
Header <hpx/parallel/container_algorithms/reverse.hpp>
Function template reverse
Function template reverse_copy
Header <hpx/parallel/algorithms/rotate.hpp>
Function template rotate
Function template rotate_copy
Header <hpx/parallel/container_algorithms/rotate.hpp>
Function template rotate
Function template rotate_copy
Header <hpx/parallel/algorithms/search.hpp>
Function template search
Function template search_n
Header <hpx/parallel/algorithms/set_difference.hpp>
Function template set_difference
Header <hpx/parallel/algorithms/set_intersection.hpp>
Function template set_intersection
Header <hpx/parallel/algorithms/set_symmetric_difference.hpp>
Function template set_symmetric_difference
Header <hpx/parallel/algorithms/set_union.hpp>
Function template set_union
Header <hpx/parallel/algorithms/sort.hpp>
Function template sort
Header <hpx/parallel/container_algorithms/sort.hpp>
Function template sort
Header <hpx/parallel/algorithms/sort_by_key.hpp>
Function template sort_by_key
Header <hpx/parallel/algorithms/swap_ranges.hpp>
Function template swap_ranges
Header <hpx/parallel/algorithms/transform.hpp>
Function template transform
Function template transform
Function template transform
Header <hpx/parallel/container_algorithms/transform.hpp>
Function template transform
Function template transform
Function template transform
Header <hpx/parallel/algorithms/transform_exclusive_scan.hpp>
Function template transform_exclusive_scan
Header <hpx/parallel/algorithms/transform_inclusive_scan.hpp>
Function template transform_inclusive_scan
Function template transform_inclusive_scan
Header <hpx/parallel/algorithms/transform_reduce.hpp>
Function template transform_reduce
Header <hpx/parallel/algorithms/transform_reduce_binary.hpp>
Function template transform_reduce
Function template transform_reduce
Header <hpx/parallel/algorithms/uninitialized_copy.hpp>
Function template uninitialized_copy
Function template uninitialized_copy_n
Header <hpx/parallel/algorithms/uninitialized_fill.hpp>
Function template uninitialized_fill
Function template uninitialized_fill_n
Header <hpx/parallel/execution_policy.hpp>
Struct sequenced_task_policy
Struct template sequenced_task_policy_shim
Struct sequenced_policy
Struct template sequenced_policy_shim
Struct parallel_task_policy
Struct template parallel_task_policy_shim
Struct parallel_policy
Struct template parallel_policy_shim
Struct parallel_unsequenced_policy
Global seq — Default sequential execution policy object.
Global par — Default parallel execution policy object.
Global par_unseq — Default vector execution policy object.
Header <hpx/parallel/executors/auto_chunk_size.hpp>
Struct auto_chunk_size
Header <hpx/parallel/executors/dynamic_chunk_size.hpp>
Struct dynamic_chunk_size
Header <hpx/parallel/executors/executor_parameter_traits.hpp>
Struct sequential_executor_parameters
Struct template executor_parameter_traits
Struct template is_executor_parameters
Header <hpx/parallel/executors/executor_traits.hpp>
Struct sequential_execution_tag
Struct parallel_execution_tag
Struct vector_execution_tag
Struct template executor_traits
Struct template is_executor
Header <hpx/parallel/executors/guided_chunk_size.hpp>
Struct guided_chunk_size
Header <hpx/parallel/executors/parallel_executor.hpp>
Struct parallel_executor
Header <hpx/parallel/executors/persistent_auto_chunk_size.hpp>
Struct persistent_auto_chunk_size
Header <hpx/parallel/executors/sequential_executor.hpp>
Struct sequential_executor
Header <hpx/parallel/executors/service_executors.hpp>
Struct service_executor
Struct io_pool_executor
Struct parcel_pool_executor
Struct timer_pool_executor
Struct main_pool_executor
Header <hpx/parallel/executors/static_chunk_size.hpp>
Struct static_chunk_size
Header <hpx/parallel/executors/thread_pool_executors.hpp>
Type definition local_priority_queue_executor
Header <hpx/parallel/executors/timed_executor_traits.hpp>
Struct template timed_executor_traits
Struct template is_timed_executor
Header <hpx/parallel/task_block.hpp>
Class task_canceled_exception
Class template task_block
Function template define_task_block
Function template define_task_block
Function template define_task_block_restore_thread
Function template define_task_block_restore_thread
Header <hpx/performance_counters/manage_counter_type.hpp>
Function install_counter_type — Install a new generic performance counter type in a way, which will uninstall it automatically during shutdown.
Function install_counter_type — Install a new performance counter type in a way, which will uninstall it automatically during shutdown.
Function install_counter_type — Install a new performance counter type in a way, which will uninstall it automatically during shutdown.
Function install_counter_type — Install a new generic performance counter type in a way, which will uninstall it automatically during shutdown.
Header <hpx/runtime/actions/basic_action.hpp>
Macro HPX_REGISTER_ACTION_DECLARATION — Declare the necessary component action boilerplate code.
Macro HPX_REGISTER_ACTION — Define the necessary component action boilerplate code.
Macro HPX_REGISTER_ACTION_ID — Define the necessary component action boilerplate code and assign a predefined unique id to the action.
Header <hpx/runtime/actions/component_action.hpp>
Macro HPX_DEFINE_COMPONENT_ACTION — Registers a member function of a component as an action type with HPX.
Header <hpx/runtime/actions/plain_action.hpp>
Macro HPX_DEFINE_PLAIN_ACTION — Defines a plain action type.
Macro HPX_DECLARE_PLAIN_ACTION — Declares a plain action type.
Macro HPX_PLAIN_ACTION — Defines a plain action type based on the given function func and registers it with HPX.
Macro HPX_PLAIN_ACTION_ID — Defines a plain action type based on the given function func and registers it with HPX.
Header <hpx/runtime/applier_fwd.hpp>
Function get_applier
Function get_applier_ptr
Header <hpx/runtime/basename_registration.hpp>
Function find_all_from_basename
Function find_from_basename
Function find_from_basename — Return registered id from the given base name and sequence number.
Function template find_from_basename — Return registered id from the given base name and sequence number.
Function register_with_basename — Register the given id using the given base name.
Function register_with_basename
Function template register_with_basename
Function unregister_with_basename — Unregister the given id using the given base name.
Header <hpx/runtime/components/binpacking_distribution_policy.hpp>
Struct binpacking_distribution_policy
Global default_binpacking_counter_name
Global binpacked
Header <hpx/runtime/components/colocating_distribution_policy.hpp>
Struct colocating_distribution_policy
Global colocated
Header <hpx/runtime/components/component_factory.hpp>
Macro HPX_REGISTER_COMPONENT — Define a component factory for a component type.
Header <hpx/runtime/components/copy_component.hpp>
Function template copy — Copy given component to the specified target locality.
Function template copy — Copy given component to the specified target locality.
Function template copy — Copy given component to the specified target locality.
Header <hpx/runtime/components/default_distribution_policy.hpp>
Struct default_distribution_policy
Global default_layout
Header <hpx/runtime/components/migrate_component.hpp>
Function template migrate
Function template migrate
Function template migrate
Function template migrate
Header <hpx/runtime/components/new.hpp>
Function template new_ — Create one or more new instances of the given Component type on the specified locality.
Function template new_ — Create multiple new instances of the given Component type on the specified locality.
Function template new_ — Create one or more new instances of the given Component type based on the given distribution policy.
Function template new_ — Create multiple new instances of the given Component type on the localities as defined by the given distribution policy.
Header <hpx/runtime/find_here.hpp>
Function find_here — Return the global id representing this locality.
Header <hpx/runtime/find_localities.hpp>
Function find_root_locality — Return the global id representing the root locality.
Function find_all_localities — Return the list of global ids representing all localities available to this application.
Function find_all_localities — Return the list of global ids representing all localities available to this application which support the given component type.
Function find_remote_localities — Return the list of locality ids of remote localities supporting the given component type. By default this function will return the list of all remote localities (all but the current locality).
Function find_remote_localities — Return the list of locality ids of remote localities supporting the given component type. By default this function will return the list of all remote localities (all but the current locality).
Function find_locality — Return the global id representing an arbitrary locality which supports the given component type.
Header <hpx/runtime/get_colocation_id.hpp>
Function get_colocation_id — Return the id of the locality where the object referenced by the given id is currently located on.
Function get_colocation_id — Asynchronously return the id of the locality where the object referenced by the given id is currently located on.
Header <hpx/runtime/get_locality_id.hpp>
Function get_locality_id — Return the number of the locality this function is being called from.
Header <hpx/runtime/get_locality_name.hpp>
Function get_locality_name — Return the name of the locality this function is called on.
Function get_locality_name — Return the name of the referenced locality.
Header <hpx/runtime/get_num_localities.hpp>
Function get_initial_num_localities — Return the number of localities which were registered at startup for the running application.
Function get_num_localities — Asynchronously return the number of localities which are currently registered for the running application.
Function get_num_localities — Return the number of localities which are currently registered for the running application.
Function get_num_localities — Asynchronously return the number of localities which are currently registered for the running application.
Function get_num_localities — Synchronously return the number of localities which are currently registered for the running application.
Header <hpx/runtime/get_os_thread_count.hpp>
Function get_os_thread_count — Return the number of worker OS- threads used by the given executor to execute HPX threads.
Header <hpx/runtime/get_ptr.hpp>
Function template get_ptr — Returns a future referring to the pointer to the underlying memory of a component.
Function template get_ptr — Returns a future referring to the pointer to the underlying memory of a component.
Function template get_ptr — Returns the pointer to the underlying memory of a component.
Function template get_ptr — Returns the pointer to the underlying memory of a component.
Header <hpx/runtime/get_thread_name.hpp>
Function get_thread_name — Return the name of the calling thread.
Header <hpx/runtime/get_worker_thread_num.hpp>
Function get_worker_thread_num — Return the number of the current OS-thread running in the runtime instance the current HPX-thread is executed with.
Function get_worker_thread_num — Return the number of the current OS-thread running in the runtime instance the current HPX-thread is executed with.
Header <hpx/runtime/naming/unmanaged.hpp>
Function unmanaged
Header <hpx/runtime/report_error.hpp>
Header <hpx/runtime/runtime_mode.hpp>
Type runtime_mode
Function get_runtime_mode_name
Function get_runtime_mode_from_name
Header <hpx/runtime/set_parcel_write_handler.hpp>
Type definition parcel_write_handler_type
Function set_parcel_write_handler
Header <hpx/runtime/shutdown_function.hpp>
Type definition shutdown_function_type
Function register_pre_shutdown_function — Add a function to be executed by a HPX thread during hpx::finalize() but guaranteed before any shutdown function is executed (system-wide)
Function register_shutdown_function — Add a function to be executed by a HPX thread during hpx::finalize() but guaranteed after any pre-shutdown function is executed (system-wide)
Header <hpx/runtime/startup_function.hpp>
Type definition startup_function_type
Function register_pre_startup_function — Add a function to be executed by a HPX thread before hpx_main but guaranteed before any startup function is executed (system-wide).
Function register_startup_function — Add a function to be executed by a HPX thread before hpx_main but guaranteed after any pre-startup function is executed (system-wide).
Header <hpx/runtime/threads/thread_data_fwd.hpp>
Function get_self
Function get_self_ptr
Function get_ctx_ptr
Function get_self_ptr_checked
Function get_self_id
Function get_parent_id
Function get_parent_phase
Function get_self_stacksize
Function get_parent_locality_id
Function get_self_component_id
Function get_thread_count
Function get_thread_count
Function enumerate_threads
Header <hpx/runtime/threads/thread_enums.hpp>
Type thread_state_enum
Type thread_priority
Type thread_state_ex_enum
Type thread_stacksize
Function get_thread_state_name
Function get_thread_priority_name
Function get_thread_state_ex_name
Function get_thread_state_name
Function get_stack_size_name
Header <hpx/runtime/threads/thread_helpers.hpp>
Function suspend
Function suspend
Function suspend
Function suspend
Function suspend
Function suspend
Function suspend
Function get_executor
Function set_thread_state — Set the thread state of the thread referenced by the thread_id id.
Function set_thread_state — Set the thread state of the thread referenced by the thread_id id.
Function set_thread_state — Set the thread state of the thread referenced by the thread_id id.
Function get_thread_description
Function get_thread_state
Function get_thread_phase
Function get_thread_interruption_enabled
Function set_thread_interruption_enabled
Function get_thread_interruption_requested
Function interrupt_thread
Function interruption_point
Function get_thread_priority
Function get_stack_size
Function get_executor
Header <hpx/runtime/trigger_lco.hpp>
Function trigger_lco_event — Trigger the LCO referenced by the given id.
Function trigger_lco_event — Trigger the LCO referenced by the given id.
Function trigger_lco_event — Trigger the LCO referenced by the given id.
Function trigger_lco_event — Trigger the LCO referenced by the given id.
Function template set_lco_value — Set the result value for the LCO referenced by the given id.
Function template set_lco_value — Set the result value for the LCO referenced by the given id.
Function template set_lco_value — Set the result value for the LCO referenced by the given id.
Function template set_lco_value — Set the result value for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Function set_lco_error — Set the error state for the LCO referenced by the given id.
Header <hpx/runtime_fwd.hpp>
Function register_thread
Function unregister_thread
Function get_runtime_instance_number
Function is_starting — Test whether the runtime system is currently being started.
Function is_running — Test whether the runtime system is currently running.
Function is_stopped — Test whether the runtime system is currently stopped.
Function is_stopped_or_shutting_down — Test whether the runtime system is currently being shut down.
Function get_num_worker_threads — Return the number of worker OS- threads used to execute HPX threads.
Function get_system_uptime — Return the system uptime measure on the thread executing this call.
Function start_active_counters — Start all active performance counters, optionally naming the section of code.
Function reset_active_counters — Resets all active performance counters.
Function stop_active_counters — Stop all active performance counters.
Function evaluate_active_counters — Evaluate and output all active performance counters, optionally naming the point in code marked by this function.
Function create_binary_filter — Create an instance of a binary filter plugin.
Header <hpx/throw_exception.hpp>
Macro HPX_THROW_EXCEPTIONThrow a hpx::exception initialized from the given parameters.
Macro HPX_THROWS_IFEither throw a hpx::exception or initialize hpx::error_code from the given parameters.
Terminology
People

The STE||AR Group (Systems Technology, Emergent Parallelism, and Algorithm Research) is an international research group with the goal of promoting the development of scalable parallel applications by providing a community for ideas, a framework for collaboration, and a platform for communicating these concepts to the broader community. The main contributors to HPX in the STE||AR Group are researchers from Louisiana State University (LSU)'s Center for Computation and Technology (CCT) and the Friedrich-Alexander University Erlangen-Nuremberg (FAU)'s Department of Computer Science 3 - Computer Architecture. For a full list of people working in this group and participating in writing this documentation see People.

This documentation is automatically generated for HPX V1.0.0 (from Git commit: da3fd176cd28e9d1fc9bfb47b70a6333d531f517) by the Boost QuickBook and AutoIndex documentation tools. QuickBook and AutoIndex can be found in the collection of Boost Tools.

History

The development of High Performance ParalleX (HPX) began in 2007. At that time, Hartmut Kaiser became interested in the work done by the ParalleX group at the Center for Computation and Technology (CCT), a multi-disciplinary research institute at Louisiana State University (LSU). The ParalleX group was working to develop a new and experimental execution model for future high performance computing architectures. This model was christened ParalleX. The first implementations of ParalleX were crude, and many of those designs had to be discarded entirely. However, over time the team learned quite a bit about how to design a parallel, distributed runtime system which implements the concepts of ParalleX.

From the very beginning, this endeavour has been a group effort. In addition to a handful of interested researchers, there have always been graduate and undergraduate students participating in the discussions, design, and implementation of HPX. In 2011 we decided to formalize our collective research efforts by creating the STE||AR group (Systems Technology, Emergent Parallelism, and Algorithm Research). Over time, the team grew to include researchers around the country and the world. In 2014, the STE||AR Group was reorganized to become the international community it is today. This consortium of researchers aims to develop stable, sustainable, and scalable tools which will enable application developers to exploit the parallelism latent in the machines of today and tomorrow. Our goal of the HPX project is to create a high quality, freely available, open source implementation of ParalleX concepts for conventional and future systems by building a modular and standards conforming runtime system for SMP and distributed application environments. The API exposed by HPX is conformant to the interfaces defined by the C++11/14 ISO standard and adheres to the programming guidelines used by the Boost collection of C++ libraries. We steer the development of HPX with real world applications and aim to provide a smooth migration path for domain scientists.

To learn more about STE||AR and ParalleX, see People and Introduction.

How to use this manual

Some icons are used to mark certain topics indicative of their relevance. These icons precede some text to indicate:

Table 1. Icons

Icon

Name

Meaning

Note

Generally useful information (an aside that doesn't fit in the flow of the text)

Tip

Suggestion on how to do something (especially something that is not obvious)

Important

Important note on something to take particular notice of

Caution

Take special care with this - it may not be what you expect and may cause bad results


The following table describes the syntax that will be used to refer to functions and classes throughout the manual:

Table 2. Syntax for Code References

Syntax

Meaning

foo()

The function foo

foo<>()

The template function foo (used only for template functions that require explicit parameters)

foo

The class foo

foo<>

The class template foo


Support

Please feel free to direct questions to HPX's mailing list: hpx-users@stellar.cct.lsu.edu or log onto our IRC channel which can be found at #ste||ar at Freenode.

General Changes

Here are some of the main highlights and changes for this release (in no particular order):

  • Added the facility hpx::split_future which allows to convert a future<tuple<Ts...>> into a tuple<future<Ts>...>. This functionality is not available when compiling HPX with VS2012.
  • Added a new type of performance counter which allows to return a list of values for each invocation. We also added a first counter of this type which collects a histogram of the times between parcels being created.
  • Added new LCOs: hpx::lcos::channel and hpx::lcos::local::channel which are very similar to the well known channel constructs used in the Go language.
  • Added new performance counters reporting the amount of data handled by the networking layer on a action-by-action basis (please see PR#2289 for more details).
  • Added a new facility hpx::lcos::barrier, replacing the equally named older one. The new facility has a slightly changed API and is much more efficient. Most notable, the new facility exposes a (global) function hpx::lcos::barrier::synchronize() which represents a global barrier across all localities.
  • We have started to add support for vectorization to our parallel algorithm implementations. This support depends on using an external library, currently either Vc Library or Boost.SIMD. Please see IS#2333 for a list of currently supported algorithms. This is an experimental feature and its implementation and/or API might change in the future. Please see this blog-post for more information.
  • The parameter sequence for the hpx::parallel::transform_reduce overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17. The old overload can be still enabled at configure time by specifying -DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On to CMake.
  • The algorithm hpx::parallel::inner_product has been renamed to hpx::parallel::transform_reduce to match the changes this algorithm has undergone while being moved to C++17. The old inner_product names can be still enabled at configure time by specifying -DHPX_WITH_TRANSFORM_REDUCE_COMPATIBILITY=On to CMake.
  • Added versions of hpx::get_ptr taking client side representations for component instances as their parameter (instead of a global id).
  • Added the helper utility hpx::performance_counters::performance_counter_set helping to encapsulate a set of performance counters to be managed concurrently.
  • All execution policies and related classes have been renamed to be consistent with the naming changes applied for C++17. All policies now live in the namespace hpx::parallel::execution. The ols names can be still enabled at configure time by specifying -DHPX_WITH_EXECUTION_POLICY_COMPATIBILITY=On to CMake.
  • The thread scheduling subsystem has undergone a major refactoring which results in significant performance improvements. We have also imroved the performance of creating hpx::future and of various facilities handling those.
  • We have consolidated all of the code in HPX.Compute related to the integration of CUDA. hpx::partitioned_vector has been enabled to be usable with hpx::compute::vector which allows to place the partitions on one or more GPU devices.
  • Added new performance counters exposing various internals of the thread scheduling subsystem, such as the current idle- and busy-loop counters and instantaneous scheduler utilization.
  • Extended and improved the use of the ITTNotify hooks allowing to collect performance counter data and function annotation information from within the Intel Amplifier tool.
  • For APEX< this release includes OTF2 support, updated TAU integration, enhanced support for HPX threads calling direct actions, updated policy support, updated HPX counter integration, HPX send/recv measurement, new scatterplot output, and performance optimizations and bug fixes.
Breaking Changes
  • We have dropped support for the gcc compiler versions V4.6 and 4.7. The minimal gcc version we now test on is gcc V4.8.
  • We have removed (default) support for boost::chrono in interfaces, uses of it have been replaced with std::chrono. This facility can be still enabled at configure time by specifying -DHPX_WITH_BOOST_CHRONO_COMPATIBILITY=On to CMake.
  • The parameter sequence for the hpx::parallel::transform_reduce overload taking one iterator range has changed to match the changes this algorithm has undergone while being moved to C++17.
  • The algorithm hpx::parallel::inner_product has been renamed to hpx::parallel::transform_reduce to match the changes this algorithm has undergone while being moved to C++17.
  • the build options HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY and HPX_WITH_COMPONENT_GET_GID_COMPATIBILITY are now disabled by default. Please change your code still depending on the deprecated interfaces.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • PR#2596 - Adding apex data
  • PR#2595 - Remove obsolete file
  • IS#2594 - FindOpenCL.cmake mismatch with the official cmake module
  • PR#2592 - First attempt to introduce spmd_block in hpx
  • IS#2591 - Feature request: continuation (then) which does not require the callable object to take a future<R> as parameter
  • PR#2588 - Daint fixes
  • PR#2587 - Fixing transfer_(continuation)_action::schedule
  • PR#2585 - Work around MSVC having an ICE when compiling with -Ob2
  • PR#2583 - chaning 7zip command to 7za in roll_release.sh
  • PR#2582 - First attempt to introduce spmd_block in hpx
  • PR#2581 - Enable annotated function for parallel algorithms
  • PR#2580 - First attempt to introduce spmd_block in hpx
  • PR#2579 - Make thread NICE level setting an option
  • PR#2578 - Implementing enqueue instead of busy wait when no sender is available
  • PR#2577 - Retrieve -std=c++11 consistent nvcc flag
  • PR#2576 - Add missing dependencies of cuda based tests
  • PR#2575 - Remove warnings due to some captured variables
  • PR#2573 - Attempt to resolve resolve_locality
  • PR#2572 - Adding APEX hooks to background thread
  • PR#2571 - Pick up hpx.ignore_batch_env from config map
  • PR#2570 - Add commandline options --hpx:print-counters-locally
  • PR#2569 - Fix computeapi unit tests
  • PR#2567 - This adds another barrier::synchronize before registering performance counters
  • PR#2564 - Cray static toolchain support
  • PR#2563 - Fixed unhandled exception during startup
  • PR#2562 - Remove partitioned_vector.cu from build tree when nvcc is used
  • IS#2561 - octo-tiger crash with commit 6e921495ff6c26f125d62629cbaad0525f14f7ab
  • PR#2560 - Prevent -Wundef warnings on Vc version checks
  • PR#2559 - Allowing CUDA callback to set the future directly from an OS thread
  • PR#2558 - Remove warnings due to float precisions
  • PR#2557 - Removing bogus handling of compile flags for CUDA
  • PR#2556 - Fixing scan partitioner
  • PR#2554 - Add more diagnostics to error thrown from find_appropriate_destination
  • IS#2555 - No valid parcelport configured
  • PR#2553 - Add cmake cuda_arch option
  • PR#2552 - Remove incomplete datapar bindings to libflatarray
  • PR#2551 - Rename hwloc_topology to hwloc_topology_info
  • PR#2550 - Apex api updates
  • PR#2549 - Pre-include defines.hpp to get the macro HPX_HAVE_CUDA value
  • PR#2548 - Fixing issue with disconnect
  • PR#2546 - Some fixes around cuda clang partitioned_vector example
  • PR#2545 - Fix uses of the Vc2 datapar flags; the value, not the type, should be passed to functions
  • PR#2542 - Make HPX_WITH_MALLOC easier to use
  • PR#2541 - avoid recompiles when enabling/disabling examples
  • PR#2540 - Fixing usage of target_link_libraries()
  • PR#2539 - fix RPATH behaviour
  • IS#2538 - HPX_WITH_CUDA corrupts compilation flags
  • PR#2537 - Add output of a Bazel Skylark extension for paths and compile options
  • PR#2536 - Add counter exposing total available memory to Windows as well
  • PR#2535 - Remove obsolete support for security
  • IS#2534 - Remove command line option --hpx:run-agas-server
  • PR#2533 - Pre-cache locality endpoints during bootstrap
  • PR#2532 - Fixing handling of GIDs during serialization preprocessing
  • PR#2531 - Amend uses of the term "functor"
  • PR#2529 - added counter for reading available memory
  • PR#2527 - Facilities to create actions from lambdas
  • PR#2526 - Updated docs: HPX_WITH_EXAMPLES
  • PR#2525 - Remove warnings related to unused captured variables
  • IS#2524 - CMAKE failed because it is missing: TCMALLOC_LIBRARY TCMALLOC_INCLUDE_DIR
  • PR#2523 - Fixing compose_cb stack overflow
  • PR#2522 - Instead of unlocking, ignore the lock while creating the message handler
  • PR#2521 - Create LPROGRESS_ logging macro to simplify progress tracking and timings
  • PR#2520 - Intel 17 support
  • PR#2519 - Fix components example
  • PR#2518 - Fixing parcel scheduling
  • IS#2517 - Race condition during Parcel Coalescing Handler creation
  • IS#2516 - HPX locks up when using at least 256 localities
  • IS#2515 - error: Install cannot find "/lib/hpx/libparcel_coalescing.so.0.9.99" but I can see that file
  • PR#2514 - Making sure that all continuations of a shared_future are invoked in order
  • PR#2513 - Fixing locks held during suspension
  • PR#2512 - MPI Parcelport improvements and fixes related to the background work changes
  • PR#2511 - Fixing bit-wise (zero-copy) serialization
  • IS#2509 - Linking errors in hwloc_topology
  • PR#2508 - Added documentation for debugging with core files
  • PR#2506 - Fixing background work invocations
  • PR#2505 - Fix tuple serialization
  • IS#2504 - Ensure continuations are called in the order they have been attached
  • PR#2503 - Adding serialization support for Vc v2 (datapar)
  • PR#2502 - Resolve various, minor compiler warnings
  • PR#2501 - Some other fixes around cuda examples
  • IS#2500 - nvcc / cuda clang issue due to a missing -DHPX_WITH_CUDA flag
  • PR#2499 - Adding support for std::array to wait_all and friends
  • PR#2498 - Execute background work as HPX thread
  • PR#2497 - Fixing configuration options for spinlock-deadlock detection
  • PR#2496 - Accounting for different compilers in CrayKNL toolchain file
  • PR#2494 - Adding component base class which ties a component instance to a given executor
  • PR#2493 - Enable controlling amount of pending threads which must be available to allow thread stealing
  • PR#2492 - Adding new command line option --hpx:print-counter-reset
  • PR#2491 - Resolve ambiguities when compiling with APEX
  • PR#2490 - Resuming threads waiting on future with higher priority
  • IS#2489 - nvcc issue because -std=c++11 appears twice
  • PR#2488 - Adding performance counters exposing the internal idle and busy-loop counters
  • PR#2487 - Allowing for plain suspend to reschedule thread right away
  • PR#2486 - Only flag HPX code for CUDA if HPX_WITH_CUDA is set
  • PR#2485 - Making thread-queue parameters runtime-configurable
  • PR#2484 - Added atomic counter for parcel-destinations
  • PR#2483 - Added priority-queue lifo scheduler
  • PR#2482 - Changing scheduler to steal only if more than a minimal number of tasks are available
  • PR#2481 - Extending command line option --hpx:print-counter-destination to support value 'none'
  • PR#2479 - Added option to disable signal handler
  • PR#2478 - Making sure the sine performance counter module gets loaded only for the corresponding example
  • IS#2477 - Breaking at a throw statement
  • PR#2476 - Annotated function
  • PR#2475 - Ensure that using %osthread% during logging will not throw for non-hpx threads
  • PR#2474 - Remove now superficial non_direct actions from base_lco and friends
  • PR#2473 - Refining support for ITTNotify
  • PR#2472 - Some fixes around hpx compute
  • IS#2470 - redefinition of boost::detail::spinlock
  • IS#2469 - Dataflow performance issue
  • PR#2468 - Perf docs update
  • PR#2466 - Guarantee to execute remote direct actions on HPX-thread
  • PR#2465 - Improve demo : Async copy and fixed device handling
  • PR#2464 - Adding performance counter exposing instantaneous scheduler utilization
  • PR#2463 - Downcast to future<void>
  • PR#2462 - Fixed usage of ITT-Notify API with Intel Amplifier
  • PR#2461 - Cublas demo
  • PR#2460 - Fixing thread bindings
  • PR#2459 - Make -std=c++11 nvcc flag consistent for in-build and installed versions
  • IS#2457 - Segmentation fault when registering a partitioned vector
  • PR#2452 - Properly releasing global barrier for unhandled exceptions
  • PR#2451 - Fixing long shutdown times
  • PR#2450 - Attempting to fix initialization errors on newer platforms (Boost V1.63)
  • PR#2449 - Replace BOOST_COMPILER_FENCE with an HPX version
  • PR#2448 - This fixes a possible race in the migration code
  • PR#2445 - Fixing dataflow et.al. for futures or future-ranges wrapped into ref()
  • PR#2444 - Fix segfaults
  • PR#2443 - Issue 2442
  • IS#2442 - Mismatch between #if/#endif and namespace scope brackets in this_thread_executers.hpp
  • IS#2441 - undeclared identifier BOOST_COMPILER_FENCE
  • PR#2440 - Knl build
  • PR#2438 - Datapar backend
  • PR#2437 - Adapt algorithm parameter sequence changes from C++17
  • PR#2436 - Adapt execution policy name changes from C++17
  • IS#2435 - Trunk broken, undefined reference to hpx::thread::interrupt(hpx::thread::id, bool)
  • PR#2434 - More fixes to resource manager
  • PR#2433 - Added versions of hpx::get_ptr taking client side representations
  • PR#2432 - Warning fixes
  • PR#2431 - Adding facility representing set of performance counters
  • PR#2430 - Fix parallel_executor thread spawning
  • PR#2429 - Fix attribute warning for gcc
  • IS#2427 - Seg fault running octo-tiger with latest HPX commit
  • IS#2426 - Bug in 9592f5c0bc29806fce0dbe73f35b6ca7e027edcb causes immediate crash in Octo-tiger
  • PR#2425 - Fix nvcc errors due to constexpr specifier
  • IS#2424 - Async action on component present on hpx::find_here is executing synchronously
  • PR#2423 - Fix nvcc errors due to constexpr specifier
  • PR#2422 - Implementing hpx::this_thread thread data functions
  • PR#2421 - Adding benchmark for wait_all
  • IS#2420 - Returning object of a component client from another component action fails
  • PR#2419 - Infiniband parcelport
  • IS#2418 - gcc + nvcc fails to compile code that uses partitioned_vector
  • PR#2417 - Fixing context switching
  • PR#2416 - Adding fixes and workarounds to allow compilation with nvcc/msvc (VS2015up3)
  • PR#2415 - Fix errors coming from hpx compute examples
  • PR#2414 - Fixing msvc12
  • PR#2413 - Enable cuda/nvcc or cuda/clang when using add_hpx_executable()
  • PR#2412 - Fix issue in HPX_SetupTarget.cmake when cuda is used
  • PR#2411 - This fixes the core compilation issues with MSVC12
  • IS#2410 - undefined reference to opal_hwloc191_hwloc_.....
  • PR#2409 - Fixing locking for channel and receive_buffer
  • PR#2407 - Solving #2402 and #2403
  • PR#2406 - Improve guards
  • PR#2405 - Enable parallel::for_each for iterators returning proxy types
  • PR#2404 - Forward the explicitly given result_type in the hpx invoke
  • IS#2403 - datapar_execution + zip iterator: lambda arguments aren't references
  • IS#2402 - datapar algorithm instantiated with wrong type #2402
  • PR#2401 - Added support for imported libraries to HPX_Libraries.cmake
  • PR#2400 - Use CMake policy CMP0060
  • IS#2399 - Error trying to push back vector of futures to vector
  • PR#2398 - Allow config #defines to be written out to custom config/defines.hpp
  • IS#2397 - CMake generated config defines can cause tedious rebuilds category
  • IS#2396 - BOOST_ROOT paths are not used at link time
  • PR#2395 - Fix target_link_libraries() issue when HPX Cuda is enabled
  • IS#2394 - Template compilation error using HPX_WITH_DATAPAR_LIBFLATARRAY
  • PR#2393 - Fixing lock registration for recursive mutex
  • PR#2392 - Add keywords in target_link_libraries in hpx_setup_target
  • PR#2391 - Clang goroutines
  • IS#2390 - Adapt execution policy name changes from C++17
  • PR#2389 - Chunk allocator and pool are not used and are obsolete
  • PR#2388 - Adding functionalities to datapar needed by octotiger
  • PR#2387 - Fixing race condition for early parcels
  • IS#2386 - Lock registration broken for recursive_mutex
  • PR#2385 - Datapar zip iterator
  • PR#2384 - Fixing race condition in for_loop_reduction
  • PR#2383 - Continuations
  • PR#2382 - add LibFlatArray-based backend for datapar
  • PR#2381 - remove unused typedef to get rid of compiler warnings
  • PR#2380 - Tau cleanup
  • PR#2379 - Can send immediate
  • PR#2378 - Renaming copy_helper/copy_n_helper/move_helper/move_n_helper
  • IS#2376 - Boost trunk's spinlock initializer fails to compile
  • PR#2375 - Add support for minimal thread local data
  • PR#2374 - Adding API functions set_config_entry_callback
  • PR#2373 - Add a simple utility for debugging that gives supended task backtraces
  • PR#2372 - Barrier Fixes
  • IS#2370 - Can't wait on a wrapped future
  • PR#2369 - Fixing stable_partition
  • PR#2367 - Fixing find_prefixes for Windows platforms
  • PR#2366 - Testing for experimental/optional only in C++14 mode
  • PR#2364 - Adding set_config_entry
  • PR#2363 - Fix papi
  • PR#2362 - Adding missing macros for new non-direct actions
  • PR#2361 - Improve cmake output to help debug compiler incompatibility check
  • PR#2360 - Fixing race condition in condition_variable
  • PR#2359 - Fixing shutdown when parcels are still in flight
  • IS#2357 - failed to insert console_print_action into typename_to_id_t registry
  • PR#2356 - Fixing return type of get_iterator_tuple
  • PR#2355 - Fixing compilation against Boost 1 62
  • PR#2354 - Adding serialization for mask_type if CPU_COUNT > 64
  • PR#2353 - Adding hooks to tie in APEX into the parcel layer
  • IS#2352 - Compile errors when using intel 17 beta (for KNL) on edison
  • PR#2351 - Fix function vtable get_function_address implementation
  • IS#2350 - Build failure - master branch (4de09f5) with Intel Compiler v17
  • PR#2349 - Enabling zero-copy serialization support for std::vector<>
  • PR#2348 - Adding test to verify #2334 is fixed
  • PR#2347 - Bug fixes for hpx.compute and hpx::lcos::channel
  • PR#2346 - Removing cmake "find" files that are in the APEX cmake Modules
  • PR#2345 - Implemented parallel::stable_partition
  • PR#2344 - Making hpx::lcos::channel usable with basename registration
  • PR#2343 - Fix a couple of examples that failed to compile after recent api changes
  • IS#2342 - Enabling APEX causes link errors
  • PR#2341 - Removing cmake "find" files that are in the APEX cmake Modules
  • PR#2340 - Implemented all existing datapar algorithms using Boost.SIMD
  • PR#2339 - Fixing 2338
  • PR#2338 - Possible race in sliding semaphore
  • PR#2337 - Adjust osu_latency test to measure window_size parcels in flight at once
  • PR#2336 - Allowing remote direct actions to be executed without spawning a task
  • PR#2335 - Making sure multiple components are properly initialized from arguments
  • IS#2334 - Cannot construct component with large vector on a remote locality
  • PR#2332 - Fixing hpx::lcos::local::barrier
  • PR#2331 - Updating APEX support to include OTF2
  • PR#2330 - Support for data-parallelism for parallel algorithms
  • IS#2329 - Coordinate settings in cmake
  • PR#2328 - fix LibGeoDecomp builds with HPX + GCC 5.3.0 + CUDA 8RC
  • PR#2326 - Making scan_partitioner work (for now)
  • IS#2323 - Constructing a vector of components only correctly initializes the first component
  • PR#2322 - Fix problems that bubbled up after merging #2278
  • PR#2321 - Scalable barrier
  • PR#2320 - Std flag fixes
  • IS#2319 - -std=c++14 and -std=c++1y with Intel can't build recent Boost builds due to insufficient C++14 support; don't enable these flags by default for Intel
  • PR#2318 - Improve handling of --hpx:bind=<bind-spec>
  • PR#2317 - Making sure command line warnings are printed once only
  • PR#2316 - Fixing command line handling for default bind mode
  • PR#2315 - Set id_retrieved if set_id is present
  • IS#2314 - Warning for requested/allocated thread discrepancy is printed twice
  • IS#2313 - --hpx:print-bind doesn't work with --hpx:pu-step
  • IS#2312 - --hpx:bind range specifier restrictions are overly restrictive
  • IS#2311 - hpx_0.9.99 out of project build fails
  • PR#2310 - Simplify function registration
  • PR#2309 - Spelling and grammar revisions in documentation (and some code)
  • PR#2306 - Correct minor typo in the documentation
  • PR#2305 - Cleaning up and fixing parcel coalescing
  • PR#2304 - Inspect checks for stream related includes
  • PR#2303 - Add functionality allowing to enumerate threads of given state
  • PR#2301 - Algorithm overloads fix for VS2013
  • PR#2300 - Use <cstdint>, add inspect checks
  • PR#2299 - Replace boost::[c]ref with std::[c]ref, add inspect checks
  • PR#2297 - Fixing compilation with no hw_loc
  • PR#2296 - Hpx compute
  • PR#2295 - Making sure for_loop(execution::par, 0, N, ...) is actually executed in parallel
  • PR#2294 - Throwing exceptions if the runtime is not up and running
  • PR#2293 - Removing unused parcel port code
  • PR#2292 - Refactor function vtables
  • PR#2291 - Fixing 2286
  • PR#2290 - Simplify algorithm overloads
  • PR#2289 - Adding performance counters reporting parcel related data on a per-action basis
  • IS#2288 - Remove dormant parcelports
  • IS#2286 - adjustments to parcel handling to support parcelports that do not need a connection cache
  • PR#2285 - add CMake option to disable package export
  • PR#2283 - Add more inspect checks for use of deprecated components
  • IS#2282 - Arithmetic exception in executor static chunker
  • IS#2281 - For loop doesn't parallelize
  • PR#2280 - Fixing 2277: build failure with PAPI
  • PR#2279 - Child vs parent stealing
  • IS#2277 - master branch build failure (53c5b4f) with papi
  • PR#2276 - Compile time launch policies
  • PR#2275 - Replace boost::chrono with std::chrono in interfaces
  • PR#2274 - Replace most uses of Boost.Assign with initializer list
  • PR#2273 - Fixed typos
  • PR#2272 - Inspect checks
  • PR#2270 - Adding test verifying -Ihpx.os_threads=all
  • PR#2269 - Added inspect check for now obsolete boost type traits
  • PR#2268 - Moving more code into source files
  • IS#2267 - Add inspect support to deprecate Boost.TypeTraits
  • PR#2265 - Adding channel LCO
  • PR#2264 - Make support for std::ref mandatory
  • PR#2263 - Constrain tuple_member forwarding constructor
  • IS#2262 - Test hpx.os_threads=all
  • IS#2261 - OS X: Error: no matching constructor for initialization of 'hpx::lcos::local::condition_variable_any'
  • IS#2260 - Make support for std::ref mandatory
  • PR#2259 - Remove most of Boost.MPL, Boost.EnableIf and Boost.TypeTraits
  • PR#2258 - Fixing #2256
  • PR#2257 - Fixing launch process
  • IS#2256 - Actions are not registered if not invoked
  • PR#2255 - Coalescing histogram
  • PR#2254 - Silence explicit initialization in copy-constructor warnings
  • PR#2253 - Drop support for GCC 4.6 and 4.7
  • PR#2252 - Prepare V1.0
  • PR#2251 - Convert to 0.9.99
  • PR#2249 - Adding iterator_facade and iterator_adaptor
  • IS#2248 - Need a feature to yield to a new task immediately
  • PR#2246 - Adding split_future
  • PR#2245 - Add an example for handing over a component instance to a dynamically launched locality
  • IS#2243 - Add example demonstrating AGAS symbolic name registration
  • IS#2242 - pkgconfig test broken on CentOS 7 / Boost 1.61
  • IS#2241 - Compilation error for partitioned vector in hpx_compute branch
  • PR#2240 - Fixing termination detection on one locality
  • IS#2239 - Create a new facility lcos::split_all
  • IS#2236 - hpx::cout vs. std::cout
  • PR#2232 - Implement local-only primary namespace service
  • IS#2147 - would like to know how much data is being routed by particular actions
  • IS#2109 - Warning while compiling hpx
  • IS#1973 - Setting INTERFACE_COMPILE_OPTIONS for hpx_init in CMake taints Fortran_FLAGS
  • IS#1864 - run_guarded using bound function ignores reference
  • IS#1754 - Running with TCP parcelport causes immediate crash or freeze
  • IS#1655 - Enable zip_iterator to be used with Boost traversal iterator categories
  • IS#1591 - Optimize AGAS for shared memory only operation
  • IS#1401 - Need an efficient infiniband parcelport
  • IS#1125 - Fix the IPC parcelport
  • IS#839 - Refactor ibverbs and shmem parcelport
  • IS#702 - Add instrumentation of parcel layer
  • IS#668 - Implement ispc task interface
  • IS#533 - Thread queue/deque internal parameters should be runtime configurable
  • IS#475 - Create a means of combining performance counters into querysets
General Changes

As the version number of this release hints, we consider this release to be a preview for the upcoming HPX V1.0. All of the functionalities we set out to implement for V1.0 are in place; all of the features we wanted to have exposed are ready. We are very happy with the stability and performance of HPX and we would like to present this release to the community in order for us to gather broad feedback before releasing V1.0. We still expect for some minor details to change, but on the whole this release represents what we would like to have in a V1.0.

Overall, since the last release we have had almost 1600 commits while closing almost 400 tickets. These numbers reflect the incredible development activity we have seen over the last couple of months. We would like to express a big 'Thank you!' to all contributors and those who helped to make this release happen.

The most notable addition in terms of new functionality available with this release is the full implementation of object migration (i.e. the ability to transparently move HPX components to a different compute node). Additionally, this release of HPX cleans up many minor issues and some API inconsistencies.

Here are some of the main highlights and changes for this release (in no particular order):

  • We have fixed a couple of issues in AGAS and the parcel layer which have caused hangs, segmentation faults at exit, and a slowdown of applications over time. Fixing those has significantly increased the overall stability and performance of distributed runs.
  • We have started to add parallel algorithm overloads based on the C++ Extensions for Ranges (N4560) proposal. This also includes the addition of projections to the existing algorithms. Please see IS#1668 for a list of algorithms which have been adapted to N4560.
  • We have implemented index-based parallel for-loops based on a corresponding standardization proposal (P0075R1). Please see IS#2016 for a list of available algorithms.
  • We have added implementations for more parallel algorithms as proposed for the upcoming C++ 17 Standard. See IS#1141 for an overview of which algorithms are available by now.
  • We have started to implement a new prototypical functionality with HPX.Compute which uniformly exposes some of the higher level APIs to heterogeneous architectures (currently CUDA). This functionality is an early preview and should not be considered stable. It may change considerably in the future.
  • We have pervasively added (optional) executor arguments to all API functions which schedule new work. Executors are now used throughout the code base as the main means of executing tasks.
  • Added hpx::make_future<R>(future<T> &&) allowing to convert a future of any type T into a future of any other type R, either based on default conversion rules of the embedded types or using a given explicit conversion function.
  • We finally finished the implementation of transparent migration of components to another locality. It is now possible to trigger a migration operation without 'stopping the world' for the object to migrate. HPX will make sure that no work is being performed on an object before it is migrated and that all subsequently scheduled work for the migrated object will be transparently forwarded to the new locality. Please note that the global id of the migrated object does not change, thus the application will not have to be changed in any way to support this new functionality. Please note that this feature is currently considered experimental. See IS#559 and PR#1966 for more details.
  • The hpx::dataflow facility is now usable with actions. Similarly to hpx::async, actions can be specified as an explicit template argument (hpx::dataflow<Action>(target, ...)) or as the first argument (hpx::dataflow(Action(), target, ...)). We have also enabled the use of distribution policies as the target for dataflow invocations. Please see IS#1265 and PR#1912 for more information.
  • Adding overloads of gather_here and gather_there to accept the plain values of the data to gather (in addition to the existing overloads expecting futures).
  • We have cleaned up and refactored large parts of the code base. This helped reducing compile and link times of HPX itself and also of applications depending on it. We have further decreased the dependency of HPX on the Boost libraries by replacing part of those with facilities available from the standard libraries.
  • Wherever possible we have removed dependencies of our API on Boost by replacing those with the equivalent facility from the C++11 standard library.
  • We have added new performance counters for parcel coalescing, file-IO, the AGAS cache, and overall scheduler time. Resetting performance counters has been overhauled and fixed.
  • We have introduced a generic client type hpx::components::client<> and added support for using it with hpx::async. This removes the necessity to implement specific client types for every component type without losing type safety. This deemphasizes the need for using the low level hpx::id_type for referencing (possibly remote) component instances. The plan is to deprecate the direct use of hpx::id_type in user code in the future.
  • We have added a special iterator which supports automatic prefetching of one or more arrays for speeding up loop-like code (see hpx::parallel::util::make_prefetcher_context()).
  • We have extended the interfaces exposed from executors (as proposed by N4406) to accept an arbitrary number of arguments.
Breaking Changes
  • In order to move the dataflow facility to namespace hpx we added a definition of hpx::dataflow which might create ambiguities in existing codes. The previous definition of this facility (hpx::lcos::local::dataflow) has been deprecated and is available only if the constant -DHPX_WITH_LOCAL_DATAFLOW_COMPATIBILITY=On to __cmake __ is defined at configuration time. Please explicitly qualify all uses of the dataflow facility if you enable this compatibility setting and encounter ambiguities.
  • The adaptation of the C++ Extensions for Ranges (N4560) proposal imposes some breaking changes related to the return types of some of the parallel algorithms. Please see IS#1668 for a list of algorithms which have already been adapted.
  • The facility hpx::lcos::make_future_void() has been replaced by hpx::make_future<void>().
  • We have removed support for Intel V13 and gcc 4.4.x.
  • We have removed (default) support for the generic hpx::parallel::execution_poliy because it was removed from the Parallelism TS (N4409) while it was being added to the upcoming C++17 Standard. This facility can be still enabled at configure time by specifying -DHPX_WITH_GENERIC_EXECUTION_POLICY=On to CMake.
  • Uses of boost::shared_ptr and related facilities have been replaced with std::shared_ptr and friends. Uses of boost::unique_lock, boost::lock_guard etc. have also been replaced by the equivalent (and equally named) tools available from the C++11 standard library.
  • Facilities that used to expect an explicit boost::unique_lock now take an std::unique_lock. Additionally, condition_variable no longer aliases condition_variable_any; its interface now only works with std::unique_lock<local::mutex>.
  • Uses of boost::function, boost::bind, boost::tuple have been replaced by the corresponding facilities in HPX (hpx::util::function, hpx::util::bind, and hpx::util::tuple, respectively).
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • PR#2250 - change default chunker of parallel executor to static one
  • PR#2247 - HPX on ppc64le
  • PR#2244 - Fixing MSVC problems
  • PR#2238 - Fixing small typos
  • PR#2237 - Fixing small typos
  • PR#2234 - Fix broken add test macro when extra args are passed in
  • PR#2231 - Fixing possible race during future awaiting in serialization
  • PR#2230 - Fix stream nvcc
  • PR#2229 - Fixed run_as_hpx_thread
  • PR#2228 - On prefetching_test branch : adding prefetching_iterator and related tests used for prefetching containers within lambda functions
  • PR#2227 - Support for HPXCL's opencl::event
  • PR#2226 - Preparing for release of V0.9.99
  • PR#2225 - fix issue when compiling components with hpxcxx
  • PR#2224 - Compute alloc fix
  • PR#2223 - Simplify promise
  • PR#2222 - Replace last uses of boost::function by util::function_nonser
  • PR#2221 - Fix config tests
  • PR#2220 - Fixing gcc 4.6 compilation issues
  • PR#2219 - nullptr support for [unique_]function
  • PR#2218 - Introducing clang tidy
  • PR#2216 - Replace NULL with nullptr
  • IS#2214 - Let inspect flag use of NULL, suggest nullptr instead
  • PR#2213 - Require support for nullptr
  • PR#2212 - Properly find jemalloc through pkg-config
  • PR#2211 - Disable a couple of warnings reported by Intel on Windows
  • PR#2210 - Fixed host::block_allocator::bulk_construct
  • PR#2209 - Started to clean up new sort algorithms, made things compile for sort_by_key
  • PR#2208 - A couple of fixes that were exposed by a new sort algorithm
  • PR#2207 - Adding missing includes in /hpx/include/serialization.hpp
  • PR#2206 - Call package_action::get_future before package_action::apply
  • PR#2205 - The indirect_packaged_task::operator() needs to be run on a HPX thread
  • PR#2204 - Variadic executor parameters
  • PR#2203 - Delay-initialize members of partitoned iterator
  • PR#2202 - Added segmented fill for hpx::vector
  • IS#2201 - Null Thread id encountered on partitioned_vector
  • PR#2200 - Fix hangs
  • PR#2199 - Deprecating hpx/traits.hpp
  • PR#2198 - Making explicit inclusion of external libraries into build
  • PR#2197 - Fix typo in QT CMakeLists
  • PR#2196 - Fixing a gcc warning about attributes being ignored
  • PR#2194 - Fixing partitioned_vector_spmd_foreach example
  • IS#2193 - partitioned_vector_spmd_foreach seg faults
  • PR#2192 - Support Boost.Thread v4
  • PR#2191 - HPX.Compute prototype
  • PR#2190 - Spawning operation on new thread if remaining stack space becomes too small
  • PR#2189 - Adding callback taking index and future to when_each
  • PR#2188 - Adding new example demonstrating receive_buffer
  • PR#2187 - Mask 128-bit ints if CUDA is being used
  • PR#2186 - Make startup & shutdown functions unique_function
  • PR#2185 - Fixing logging output not to cause hang on shutdown
  • PR#2184 - Allowing component clients as action return types
  • IS#2183 - Enabling logging output causes hang on shutdown
  • IS#2182 - 1d_stencil seg fault
  • IS#2181 - Setting small stack size does not change default
  • PR#2180 - Changing default bind mode to balanced
  • PR#2179 - adding prefetching_iterator and related tests used for prefetching containers within lambda functions
  • PR#2177 - Fixing 2176
  • IS#2176 - Launch process test fails on OSX
  • PR#2175 - Fix unbalanced config/warnings includes, add some new ones
  • PR#2174 - Fix test categorization : regression not unit
  • IS#2172 - Different performance results
  • IS#2171 - "negative entry in reference count table" running octotiger on 32 nodes on queenbee
  • IS#2170 - Error while compiling on Mac + boost 1.60
  • PR#2168 - Fixing problems with is_bitwise_serializable
  • IS#2167 - startup & shutdown function should accept unique_function
  • IS#2166 - Simple receive_buffer example
  • PR#2165 - Fix wait all
  • PR#2164 - Fix wait all
  • PR#2163 - Fix some typos in config tests
  • PR#2162 - Improve #includes
  • PR#2160 - Add inspect check for missing #include <list>
  • PR#2159 - Add missing finalize call to stop test hanging
  • PR#2158 - Algo fixes
  • PR#2157 - Stack check
  • IS#2156 - OSX reports stack space incorrectly (generic context coroutines)
  • IS#2155 - Race condition suspected in runtime
  • PR#2154 - Replace boost::detail::atomic_count with the new util::atomic_count
  • PR#2153 - Fix stack overflow on OSX
  • PR#2152 - Define is_bitwise_serializable as is_trivially_copyable when available
  • PR#2151 - Adding missing <cstring> for std::mem* functions
  • IS#2150 - Unable to use component clients as action return types
  • PR#2149 - std::memmove copies bytes, use bytes*sizeof(type) when copying larger types
  • PR#2146 - Adding customization point for parallel copy/move
  • PR#2145 - Applying changes to address warnings issued by latest version of PVS Studio
  • IS#2148 - hpx::parallel::copy is broken after trivially copyable changes
  • PR#2144 - Some minor tweaks to compute prototype
  • PR#2143 - Added Boost version support information over OSX platform
  • PR#2142 - Fixing memory leak in example
  • PR#2141 - Add missing specializations in execution policies
  • PR#2139 - This PR fixes a few problems reported by Clang's Undefined Behavior sanitizer
  • PR#2138 - Revert "Adding fedora docs"
  • PR#2136 - Removed double semicolon
  • PR#2135 - Add deprecated #include check for hpx_fwd.hpp
  • PR#2134 - Resolved memory leak in stencil_8
  • PR#2133 - Replace uses of boost pointer containers
  • PR#2132 - Removing unused typedef
  • PR#2131 - Add several include checks for std facilities
  • PR#2130 - Fixing parcel compression, adding test
  • PR#2129 - Fix invalid attribute warnings
  • IS#2128 - hpx::init seems to segfault
  • PR#2127 - Making executor_traits N-nary
  • PR#2126 - GCC 4.6 fails to deduce the correct type in lambda
  • PR#2125 - Making parcel coalescing test actually test something
  • IS#2124 - Make a testcase for parcel compression
  • IS#2123 - hpx/hpx/runtime/applier_fwd.hpp - Multiple defined types
  • IS#2122 - Exception in primary_namespace::resolve_free_list
  • IS#2121 - Possible memory leak in 1d_stencil_8
  • PR#2120 - Fixing 2119
  • IS#2119 - reduce_by_key compilation problems
  • IS#2118 - Premature unwrapping of boost::ref'ed arguments
  • PR#2117 - Added missing initializer on last constructor for thread_description
  • PR#2116 - Use a lightweight bind implementation when no placeholders are given
  • PR#2115 - Replace boost::shared_ptr with std::shared_ptr
  • PR#2114 - Adding hook functions for executor_parameter_traits supporting timers
  • IS#2113 - Compilation error with gcc version 4.9.3 (MacPorts gcc49 4.9.3_0)
  • PR#2112 - Replace uses of safe_bool with explicit operator bool
  • IS#2111 - Compilation error on QT example
  • IS#2110 - Compilation error when passing non-future argument to unwrapped continuation in dataflow
  • IS#2109 - Warning while compiling hpx
  • IS#2109 - Stack trace of last bug causing issues with octotiger
  • IS#2108 - Stack trace of last bug causing issues with octotiger
  • PR#2107 - Making sure that a missing parcel_coalescing module does not cause startup exceptions
  • PR#2106 - Stop using hpx_fwd.hpp
  • IS#2105 - coalescing plugin handler is not optional any more
  • IS#2104 - Make executor_traits N-nary
  • IS#2103 - Build error with octotiger and hpx commit e657426d
  • PR#2102 - Combining thread data storage
  • PR#2101 - Added repartition version of 1d stencil that uses any performance counter
  • PR#2100 - Drop obsolete TR1 result_of protocol
  • PR#2099 - Replace uses of boost::bind with util::bind
  • PR#2098 - Deprecated inspect checks
  • PR#2097 - Reduce by key, extends #1141
  • PR#2096 - Moving local cache from external to hpx/util
  • PR#2095 - Bump minimum required Boost to 1.50.0
  • PR#2094 - Add include checks for several Boost utilities
  • IS#2093 - /.../local_cache.hpp(89): error #303: explicit type is missing ("int" assumed)
  • PR#2091 - Fix for Raspberry pi build
  • PR#2090 - Fix storage size for util::function<>
  • PR#2089 - Fix #2088
  • IS#2088 - More verbose output from cmake configuration
  • PR#2087 - Making sure init_globally always executes hpx_main
  • IS#2086 - Race condition with recent HPX
  • PR#2085 - Adding #include checker
  • PR#2084 - Replace boost lock types with standard library ones
  • PR#2083 - Simplify packaged task
  • PR#2082 - Updating APEX version for testing
  • PR#2081 - Cleanup exception headers
  • PR#2080 - Make call_once variadic
  • IS#2079 - With GNU C++, line 85 of hpx/config/version.hpp causes link failure when linking application
  • IS#2078 - Simple test fails with _GLIBCXX_DEBUG defined
  • PR#2077 - Instantiate board in nqueen client
  • PR#2076 - Moving coalescing registration to TUs
  • PR#2075 - Fixed some documentation typos
  • PR#2074 - Adding flush-mode to message handler flush
  • PR#2073 - Fixing performance regression introduced lately
  • PR#2072 - Refactor local::condition_variable
  • PR#2071 - Timer based on boost::asio::deadline_timer
  • PR#2070 - Refactor tuple based functionality
  • PR#2069 - Fixed typos
  • IS#2068 - Seg fault with octotiger
  • PR#2067 - Algorithm cleanup
  • PR#2066 - Split credit fixes
  • PR#2065 - Rename HPX_MOVABLE_BUT_NOT_COPYABLE to HPX_MOVABLE_ONLY
  • PR#2064 - Fixed some typos in docs
  • PR#2063 - Adding example demonstrating template components
  • IS#2062 - Support component templates
  • PR#2061 - Replace some uses of lexical_cast<string> with C++11 std::to_string
  • PR#2060 - Replace uses of boost::noncopyable with HPX_NON_COPYABLE
  • PR#2059 - Adding missing for_loop algorithms
  • PR#2058 - Move several definitions to more appropriate headers
  • PR#2057 - Simplify assert_owns_lock and ignore_while_checking
  • PR#2056 - Replacing std::result_of with util::result_of
  • PR#2055 - Fix process launching/connecting back
  • PR#2054 - Add a forwarding coroutine header
  • PR#2053 - Replace uses of boost::unordered_map with std::unordered_map
  • PR#2052 - Rewrite tuple unwrap
  • PR#2050 - Replace uses of BOOST_SCOPED_ENUM with C++11 scoped enums
  • PR#2049 - Attempt to narrow down split_credit problem
  • PR#2048 - Fixing gcc startup hangs
  • PR#2047 - Fixing when_xxx and wait_xxx for MSVC12
  • PR#2046 - adding persistent_auto_chunk_size and related tests for for_each
  • PR#2045 - Fixing HPX_HAVE_THREAD_BACKTRACE_DEPTH build time configuration
  • PR#2044 - Adding missing service executor types
  • PR#2043 - Removing ambiguous definitions for is_future_range and future_range_traits
  • PR#2042 - Clarify that HPX builds can use (much) more than 2GB per process
  • PR#2041 - Changing future_iterator_traits to support pointers
  • IS#2040 - Improve documentation memory usage warning?
  • PR#2039 - Coroutine cleanup
  • PR#2038 - Fix cmake policy CMP0042 warning MACOSX_RPATH
  • PR#2037 - Avoid redundant specialization of [unique_]function_nonser
  • PR#2036 - nvcc dies with an internal error upon pushing/popping warnings inside templates
  • IS#2035 - Use a less restrictive iterator definition in hpx::lcos::detail::future_iterator_traits
  • PR#2034 - Fixing compilation error with thread queue wait time performance counter
  • IS#2033 - Compilation error when compiling with thread queue waittime performance counter
  • IS#2032 - Ambiguous template instantiation for is_future_range and future_range_traits.
  • PR#2031 - Don't restart timer on every incoming parcel
  • PR#2030 - Unify handling of execution policies in parallel algorithms
  • PR#2029 - Make pkg-config .pc files use .dylib on OSX
  • PR#2028 - Adding process component
  • PR#2027 - Making check for compiler compatibility independent on compiler path
  • PR#2025 - Fixing inspect tool
  • PR#2024 - Intel13 removal
  • PR#2023 - Fix errors related to older boost versions and parameter pack expansions in lambdas
  • IS#2022 - gmake fail: "No rule to make target /usr/lib46/libboost_context-mt.so"
  • PR#2021 - Added Sudoku example
  • IS#2020 - Make errors related to init_globally.cpp example while building HPX out of the box
  • PR#2019 - Fixed some compilation and cmake errors encountered in nqueen example
  • PR#2018 - For loop algorithms
  • PR#2017 - Non-recursive at_index implementation
  • IS#2016 - Add index-based for-loops
  • IS#2015 - Change default bind-mode to balanced
  • PR#2014 - Fixed dataflow if invoked action returns a future
  • PR#2013 - Fixing compilation issues with external example
  • PR#2012 - Added Sierpinski Triangle example
  • IS#2011 - Compilation error while running sample hello_world_component code
  • PR#2010 - Segmented move implemented for hpx::vector
  • IS#2009 - pkg-config order incorrect on 14.04 / GCC 4.8
  • IS#2008 - Compilation error in dataflow of action returning a future
  • PR#2007 - Adding new performance counter exposing overall scheduler time
  • PR#2006 - Function includes
  • PR#2005 - Adding an example demonstrating how to initialize HPX from a global object
  • PR#2004 - Fixing 2000
  • PR#2003 - Adding generation parameter to gather to enable using it more than once
  • PR#2002 - Turn on position independent code to solve link problem with hpx_init
  • IS#2001 - Gathering more than once segfaults
  • IS#2000 - Undefined reference to hpx::assertion_failed
  • IS#1999 - Seg fault in hpx::lcos::base_lco_with_value<*>::set_value_nonvirt() when running octo-tiger
  • PR#1998 - Detect unknown command line options
  • PR#1997 - Extending thread description
  • PR#1996 - Adding natvis files to solution (MSVC only)
  • IS#1995 - Command line handling does not produce error
  • PR#1994 - Possible missing include in test_utils.hpp
  • PR#1993 - Add missing LANGUAGES tag to a hpx_add_compile_flag_if_available() call in CMakeLists.txt
  • PR#1992 - Fixing shared_executor_test
  • PR#1991 - Making sure the winsock library is properly initialized
  • PR#1990 - Fixing bind_test placeholder ambiguity coming from boost-1.60
  • PR#1989 - Performance tuning
  • PR#1987 - Make configurable size of internal storage in util::function
  • PR#1986 - AGAS Refactoring+1753 Cache mods
  • PR#1985 - Adding missing task_block::run() overload taking an executor
  • PR#1984 - Adding an optimized LRU Cache implementation (for AGAS)
  • PR#1983 - Avoid invoking migration table look up for all objects
  • PR#1981 - Replacing uintptr_t (which is not defined everywhere) with std::size_t
  • PR#1980 - Optimizing LCO continuations
  • PR#1979 - Fixing Cori
  • PR#1978 - Fix test check that got broken in hasty fix to memory overflow
  • PR#1977 - Refactor action traits
  • PR#1976 - Fixes typo in README.rst
  • PR#1975 - Reduce size of benchmark timing arrays to fix test failures
  • PR#1974 - Add action to update data owned by the partitioned_vector component
  • PR#1972 - Adding partitioned_vector SPMD example
  • PR#1971 - Fixing 1965
  • PR#1970 - Papi fixes
  • PR#1969 - Fixing continuation recursions to not depend on fixed amount of recursions
  • PR#1968 - More segmented algorithms
  • IS#1967 - Simplify component implementations
  • PR#1966 - Migrate components
  • IS#1964 - fatal error: 'boost/lockfree/detail/branch_hints.hpp' file not found
  • IS#1962 - parallel:copy_if has race condition when used on in place arrays
  • PR#1963 - Fixing Static Parcelport initialization
  • PR#1961 - Fix function target
  • IS#1960 - Papi counters don't reset
  • PR#1959 - Fixing 1958
  • IS#1958 - inclusive_scan gives incorrect results with non-commutative operator
  • PR#1957 - Fixing #1950
  • PR#1956 - Sort by key example
  • PR#1955 - Adding regression test for #1946: Hang in wait_all() in distributed run
  • IS#1954 - HPX releases should not use -Werror
  • PR#1953 - Adding performance analysis for AGAS cache
  • PR#1952 - Adapting test for explicit variadics to fail for gcc 4.6
  • PR#1951 - Fixing memory leak
  • IS#1950 - Simplify external builds
  • PR#1949 - Fixing yet another lock that is being held during suspension
  • PR#1948 - Fixed container algorithms for Intel
  • PR#1947 - Adding workaround for tagged_tuple
  • IS#1946 - Hang in wait_all() in distributed run
  • PR#1945 - Fixed container algorithm tests
  • IS#1944 - assertion 'p.destination_locality() == hpx::get_locality()' failed
  • PR#1943 - Fix a couple of compile errors with clang
  • PR#1942 - Making parcel coalescing functional
  • IS#1941 - Re-enable parcel coalescing
  • PR#1940 - Touching up make_future
  • PR#1939 - Fixing problems in over-subscription management in the resource manager
  • PR#1938 - Removing use of unified Boost.Thread header
  • PR#1937 - Cleaning up the use of Boost.Accumulator headers
  • PR#1936 - Making sure interval timer is started for aggregating performance counters
  • PR#1935 - Tagged results
  • PR#1934 - Fix remote async with deferred launch policy
  • IS#1933 - Floating point exception in statistics_counter<boost::accumulators::tag::mean>::get_counter_value
  • PR#1932 - Removing superfluous includes of boost/lockfree/detail/branch_hints.hpp
  • PR#1931 - fix compilation with clang 3.8.0
  • IS#1930 - Missing online documentation for HPX 0.9.11
  • PR#1929 - LWG2485: get() should be overloaded for const tuple&&
  • PR#1928 - Revert "Using ninja for circle-ci builds"
  • PR#1927 - Using ninja for circle-ci builds
  • PR#1926 - Fixing serialization of std::array
  • IS#1925 - Issues with static HPX libraries
  • IS#1924 - Peformance degrading over time
  • IS#1923 - serialization of std::array appears broken in latest commit
  • PR#1922 - Container algorithms
  • PR#1921 - Tons of smaller quality improvements
  • IS#1920 - Seg fault in hpx::serialization::output_archive::add_gid when running octotiger
  • IS#1919 - Intel 15 compiler bug preventing HPX build
  • PR#1918 - Address sanitizer fixes
  • PR#1917 - Fixing compilation problems of parallel::sort with Intel compilers
  • PR#1916 - Making sure code compiles if HPX_WITH_HWLOC=Off
  • IS#1915 - max_cores undefined if HPX_WITH_HWLOC=Off
  • PR#1913 - Add utility member functions for partitioned_vector
  • PR#1912 - Adding support for invoking actions to dataflow
  • PR#1911 - Adding first batch of container algorithms
  • PR#1910 - Keep cmake_module_path
  • PR#1909 - Fix mpirun with pbs
  • PR#1908 - Changing parallel::sort to return the last iterator as proposed by N4560
  • PR#1907 - Adding a minimum version for Open MPI
  • PR#1906 - Updates to the Release Procedure
  • PR#1905 - Fixing #1903
  • PR#1904 - Making sure std containers are cleared before serialization loads data
  • IS#1903 - When running octotiger, I get: assertion '(*new_gids_)[gid].size() == 1' failed: HPX(assertion_failure)
  • IS#1902 - Immediate crash when running hpx/octotiger with _GLIBCXX_DEBUG defined.
  • PR#1901 - Making non-serializable classes non-serializable
  • IS#1900 - Two possible issues with std::list serialization
  • PR#1899 - Fixing a problem with credit splitting as revealed by #1898
  • IS#1898 - Accessing component from locality where it was not created segfaults
  • PR#1897 - Changing parallel::sort to return the last iterator as proposed by N4560
  • IS#1896 - version 1.0?
  • IS#1895 - Warning comment on numa_allocator is not very clear
  • PR#1894 - Add support for compilers that have thread_local
  • PR#1893 - Fixing 1890
  • PR#1892 - Adds typed future_type for executor_traits
  • PR#1891 - Fix wording in certain parallel algorithm docs
  • IS#1890 - Invoking papi counters give segfault
  • PR#1889 - Fixing problems as reported by clang-check
  • PR#1888 - WIP parallel is_heap
  • PR#1887 - Fixed resetting performance counters related to idle-rate, etc
  • IS#1886 - Run hpx with qsub does not work
  • PR#1885 - Warning cleaning pass
  • PR#1884 - Add missing parallel algorithm header
  • PR#1883 - Add feature test for thread_local on Clang for TLS
  • PR#1882 - Fix some redundant qualifiers
  • IS#1881 - Unable to compile Octotiger using HPX and Intel MPI on SuperMIC
  • IS#1880 - clang with libc++ on Linux needs TLS case
  • PR#1879 - Doc fixes for #1868
  • PR#1878 - Simplify functions
  • PR#1877 - Removing most usage of Boost.Config
  • PR#1876 - Add missing parallel algorithms to algorithm.hpp
  • PR#1875 - Simplify callables
  • PR#1874 - Address long standing FIXME on using std::unique_ptr with incomplete types
  • PR#1873 - Fixing 1871
  • PR#1872 - Making sure PBS environment uses specified node list even if no PBS_NODEFILE env is available
  • IS#1871 - Fortran checks should be optional
  • PR#1870 - Touch local::mutex
  • PR#1869 - Documentation refactoring based off #1868
  • PR#1867 - Embrace static_assert
  • PR#1866 - Fix #1803 with documentation refactoring
  • PR#1865 - Setting OUTPUT_NAME as target properties
  • PR#1863 - Use SYSTEM for boost includes
  • PR#1862 - Minor cleanups
  • PR#1861 - Minor Corrections for Release
  • PR#1860 - Fixing hpx gdb script
  • IS#1859 - reset_active_counters resets times and thread counts before some of the counters are evaluated
  • PR#1858 - Release V0.9.11
  • PR#1857 - removing diskperf example from 9.11 release
  • PR#1856 - fix return in packaged_task_base::reset()
  • IS#1842 - Install error: file INSTALL cannot find libhpx_parcel_coalescing.so.0.9.11
  • PR#1839 - Adding fedora docs
  • PR#1824 - Changing version on master to V0.9.12
  • PR#1818 - Fixing #1748
  • IS#1815 - seg fault in AGAS
  • IS#1803 - wait_all documentation
  • IS#1796 - Outdated documentation to be revised
  • IS#1759 - glibc munmap_chunk or free(): invalid pointer on SuperMIC
  • IS#1753 - HPX performance degrades with time since execution begins
  • IS#1748 - All public HPX headers need to be self contained
  • PR#1719 - How to build HPX with Visual Studio
  • IS#1684 - Race condition when using --hpx:connect?
  • PR#1658 - Add serialization for std::set (as there is for std::vector and std::map)
  • PR#1641 - Generic client
  • IS#1632 - heartbeat example fails on separate nodes
  • PR#1603 - Adds preferred namespace check to inspect tool
  • IS#1559 - Extend inspect tool
  • IS#1523 - Remote async with deferred launch policy never executes
  • IS#1472 - Serialization issues
  • IS#1457 - Implement N4392: C++ Latches and Barriers
  • PR#1444 - Enabling usage of moveonly types for component construction
  • IS#1407 - The Intel 13 compiler has failing unit tests
  • IS#1405 - Allow component constructors to take movable only types
  • IS#1265 - Enable dataflow() to be usable with actions
  • IS#1236 - NUMA aware allocators
  • IS#802 - Fix Broken Examples
  • IS#559 - Add hpx::migrate facility
  • IS#449 - Make actions with template arguments usable and add documentation
  • IS#279 - Refactor addressing_service into a base class and two derived classes
  • IS#224 - Changing thread state metadata is not thread safe
  • IS#55 - Uniform syntax for enums should be implemented

Our main focus for this release was the design and development of a coherent set of higher-level APIs exposing various types of parallelism to the application programmer. We introduced the concepts of an executor, which can be used to customize the where and when of execution of tasks in the context of parallelizing codes. We extended all APIs related to managing parallel tasks to support executors which gives the user the choce of either using one of the predefined executor types or to provide its own, possibly application specific, executor. We paid very close attention to align all of these changes with the existing C++ Standards documents or with the ongoing proposals for standardization.

This release is the first after our change to a new development policy. We switched all development to be strictly performed on branches only, all direct commits to our main branch (master) are prohibited. Any change has to go through a peer review before it will be merged to master. As a result the overall stability of our code base has significantly increased, the development process itself has been simplified. This change manifests itself in a large number of pull-requests which have been merged (please see below for a full list of closed issues and pull-requests). All in all for this release, we closed almost 100 issues and merged over 290 pull-requests. There have been over 1600 commits to the master branch since the last release.

General Changes
  • We are moving into the direction of unifying managed and simple components. As such, the classes hpx::components::component and hpx::components::component_base have been added which currently just forward to the currently existing simple component facilities. The examples have been converted to only use those two classes.
  • Added integration with the CircleCI hosted continuous integration service. This gives us constant and immediate feedback on the health of our master branch.
  • The compiler configuration subsystem in the build system has been reimplemented. Instead of using Boost.Config we now use our own lightweight set of cmake scripts to determine the available language and library features supported by the used compiler.
  • The API for creating instances of components has been consolidated. All component instances should be created using the hpx::new_<>() only. It allows to instantiate both, single component instances and multiple component instances. The placement of the created components can be controlled by special distribution policies. Please see the corresponding documentation outlining the use of hpx::new_<>().
  • Introduced four new distribution policies which can be used with many API functions which traditionally expected to be used with a locality id. The new distribution policies are:
  • The new distribution policies can now be also used with hpx::async. This change also deprecates hpx::async_colocated(id, ...) which now is replaced by a distribution policy: hpx::async(hpx::colocated(id), ...).
  • The hpx::vector and hpx::unordered_map data structures can now be used with the new distribution policies as well.
  • The parallel facility hpx::parallel::task_region has been renamed to hpx::parallel::task_block based on the changes in the corresponding standardization proposal N4411.
  • Added extensions to the parallel facility hpx::parallel::task_block allowing to combine a task_block with an execution policy. This implies a minor breaking change as the hpx::parallel::task_block is now a template.
  • Added new LCOs: hpx::lcos::latch and hpx::lcos::local::latch which semantically conform to the proposed std::latch (see N4399).
  • Added performance counters exposing data related to data transferred by input/output (filesystem) operations (thanks to Maciej Brodowicz).
  • Added performance counters allowing to track the number of action invocations (local and remote invocations).
  • Added new command line options --hpx:print-counter-at and --hpx:reset-counters.
  • The hpx::vector component has been renamed to hpx::partitioned_vector to make it explicit that the underlying memory is not contiguous.
  • Introduced a completely new and uniform higher-level parallelism API which is based on executors. All existing parallelism APIs have been adapted to this. We have added a large number of different executor types, such as a numa-aware executor, a this-thread executor, etc.
  • Added support for the MingW toolchain on Windows (thanks to Eric Lemanissier).
  • HPX now includes support for APEX, (Autonomic Performance Environment for eXascale). APEX is an instrumentation and software adaptation library that provides an interface to TAU profiling / tracing as well as runtime adaptation of HPX applications through policy definitions. For more information and documentation, please see https://github.com/khuck/xpress-apex. To enable APEX at configuration time, specify -DHPX_WITH_APEX=On. To also include support for TAU profiling, specify -DHPX_WITH_TAU=On and specify the -DTAU_ROOT, -DTAU_ARCH and -DTAU_OPTIONS cmake parameters.
  • We have implemented many more of the parallel algorithms. Please see IS#1141 for the list of all available parallel algorithms (thanks to Daniel Bourgeois and John Biddiscombe for contributing their work).
Breaking Changes
  • We are moving into the direction of unifying managed and simple components. In order to stop exposing the old facilities, all examples have been converted to use the new classes. The breaking change in this release is that performance counters are now a hpx::components::component_base instead of hpx::components::managed_component_base.
  • We removed the support for stackless threads. It turned out that there was no performance benefit when using stackless threads. As such, we decided to clean up our codebase. This feature was not documented.
  • The CMake project name has changed from 'hpx' to 'HPX' for consistency and compatibilty with naming conventions and other CMake projects. Generated config files go into <prefix>/lib/cmake/HPX and not <prefix>/lib/cmake/hpx.
  • The macro HPX_REGISTER_MINIMAL_COMPONENT_FACTORY has been deprecated. Please use HPX_REGISTER_COMPONENT instead. The old macro will be removed in the next release.
  • The obsolete distributing_factory and binpacking_factory components have been removed. The corresponding functionality is now provided by the hpx::new_<>() API function in conjunction with the hpx::default_layout and hpx::binpacking distribution policies (hpx::default_distribution_policy and hpx::binpacking_distribution_policy)
  • The API function hpx::new_colocated has been deprecated. Please use the consolidated API hpx::new_ in conjunction with the new hpx::colocated distribution policy (hpx::colocating_distribution_policy) instead. The old API function will still be available for at least one release of HPX if the configuration variable HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY is enabled.
  • The API function hpx::async_colocated has been deprecated. Please use the consolidated API hpx::async in conjunction with the new hpx::colocated distribution policy (hpx::colocating_distribution_policy) instead. The old API function will still be available for at least one release of HPX if the configuration variable HPX_WITH_COLOCATED_BACKWARDS_COMPATIBILITY is enabled.
  • The obsolete remote_object component has been removed.
  • Replaced the use of Boost.Serialization with our own solution. While the new version is mostly compatible with Boost.Serialization, this change requires some minor code modifications in user code. For more information, please see the corresponding announcement on the hpx-users@stellar.cct.lsu.edu mailing list.
  • The names used by cmake to influence various configuration options have been unified. The new naming scheme relies on all configuration constants to start with HPX_WITH_..., while the preprocessor constant which is used at build time starts with HPX_HAVE_.... For instance, the former cmake command line -DHPX_MALLOC=... now has to be specified a -DHPX_WITH_MALLOC=... and will cause the preprocessor constant HPX_HAVE_MALLOC to be defined. The actual name of the constant (i.e. MALLOC) has not changed. Please see the corresponding documentation for more details (CMake Variables used to configure HPX).
  • The get_gid() functions exposed by the component base classes hpx::components::server::simple_component_base, hpx::components::server::managed_component_base, and hpx::components::server::fixed_component_base have been replaced by two new functions: get_unmanaged_id() and get_id(). To enable the old function name for backwards compatibility, use the cmake configuration option HPX_WITH_COMPONENT_GET_GID_COMPATIBILITY=On.
  • All functions which were named get_gid() but were returning hpx::id_type have been renamed to get_id(). To enable the old function names for backwards compatibility, use the cmake configuration option HPX_WITH_COMPONENT_GET_GID_COMPATIBILITY=On.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • PR#1855 - Completely removing external/endian
  • PR#1854 - Don't pollute CMAKE_CXX_FLAGS through find_package()
  • PR#1853 - Updating CMake configuration to get correct version of TAU library
  • PR#1852 - Fixing Performance Problems with MPI Parcelport
  • PR#1851 - Fixing hpx_add_link_flag() and hpx_remove_link_flag()
  • PR#1850 - Fixing 1836, adding parallel::sort
  • PR#1849 - Fixing configuration for use of more than 64 cores
  • PR#1848 - Change default APEX version for release
  • PR#1847 - Fix client_base::then on release
  • PR#1846 - Removing broken lcos::local::channel from release
  • PR#1845 - Adding example demonstrating a possible safe-object implementation to release
  • PR#1844 - Removing stubs from accumulator examples
  • PR#1843 - Don't pollute CMAKE_CXX_FLAGS through find_package()
  • PR#1841 - Fixing client_base<>::then
  • PR#1840 - Adding example demonstrating a possible safe-object implementation
  • PR#1838 - Update version rc1
  • PR#1837 - Removing broken lcos::local::channel
  • PR#1835 - Adding exlicit move constructor and assignment operator to hpx::lcos::promise
  • PR#1834 - Making hpx::lcos::promise move-only
  • PR#1833 - Adding fedora docs
  • IS#1832 - hpx::lcos::promise<> must be move-only
  • PR#1831 - Fixing resource manager gcc5.2
  • PR#1830 - Fix intel13
  • PR#1829 - Unbreaking thread test
  • PR#1828 - Fixing #1620
  • PR#1827 - Fixing a memory management issue for the Parquet application
  • IS#1826 - Memory management issue in hpx::lcos::promise
  • PR#1825 - Adding hpx::components::component and hpx::components::component_base
  • PR#1823 - Adding git commit id to circleci build
  • PR#1822 - applying fixes suggested by clang 3.7
  • PR#1821 - Hyperlink fixes
  • PR#1820 - added parallel multi-locality sanity test
  • PR#1819 - Fixing #1667
  • IS#1817 - Hyperlinks generated by inspect tool are wrong
  • PR#1816 - Support hpxrx
  • PR#1814 - Fix async to dispatch to the correct locality in all cases
  • IS#1813 - async(launch::..., action(), ...) always invokes locally
  • PR#1812 - fixed syntax error in CMakeLists.txt
  • PR#1811 - Agas optimizations
  • PR#1810 - drop superfluous typedefs
  • PR#1809 - Allow HPX to be used as an optional package in 3rd party code
  • PR#1808 - Fixing #1723
  • PR#1807 - Making sure resolve_localities does not hang during normal operation
  • IS#1806 - Spinlock no longer movable and deletes operator '=', breaks MiniGhost
  • IS#1804 - register_with_basename causes hangs
  • PR#1801 - Enhanced the inspect tool to take user directly to the problem with hyperlinks
  • IS#1800 - Problems compiling application on smic
  • PR#1799 - Fixing cv exceptions
  • PR#1798 - Documentation refactoring & updating
  • PR#1797 - Updating the activeharmony CMake module
  • PR#1795 - Fixing cv
  • PR#1794 - Fix connect with hpx::runtime_mode_connect
  • PR#1793 - fix a wrong use of HPX_MAX_CPU_COUNT instead of HPX_HAVE_MAX_CPU_COUNT
  • PR#1792 - Allow for default constructed parcel instances to be moved
  • PR#1791 - Fix connect with hpx::runtime_mode_connect
  • IS#1790 - assertion 'action_.get()' failed: HPX(assertion_failure) when running Octotiger with pull request 1786
  • PR#1789 - Fixing discover_counter_types API function
  • IS#1788 - connect with hpx::runtime_mode_connect
  • IS#1787 - discover_counter_types not working
  • PR#1786 - Changing addressing_service to use std::unordered_map instead of std::map
  • PR#1785 - Fix is_iterator for container algorithms
  • PR#1784 - Adding new command line options:
  • PR#1783 - Minor changes for APEX support
  • PR#1782 - Drop legacy forwarding action traits
  • PR#1781 - Attempt to resolve the race between cv::wait_xxx and cv::notify_all
  • PR#1780 - Removing serialize_sequence
  • PR#1779 - Fixed #1501: hwloc configuration options are wrong for MIC
  • PR#1778 - Removing ability to enable/disable parcel handling
  • PR#1777 - Completely removing stackless threads
  • PR#1776 - Cleaning up util/plugin
  • PR#1775 - Agas fixes
  • PR#1774 - Action invocation count
  • PR#1773 - replaced MSVC variable with WIN32
  • PR#1772 - Fixing Problems in MPI parcelport and future serialization.
  • PR#1771 - Fixing intel 13 compiler errors related to variadic template template parameters for lcos::when_ tests
  • PR#1770 - Forwarding decay to std::
  • PR#1769 - Add more characters with special regex meaning to the existing patch
  • PR#1768 - Adding test for receive_buffer
  • PR#1767 - Making sure that uptime counter throws exception on any attempt to be reset
  • PR#1766 - Cleaning up code related to throttling scheduler
  • PR#1765 - Restricting thread_data to creating only with intrusive_pointers
  • PR#1764 - Fixing 1763
  • IS#1763 - UB in thread_data::operator delete
  • PR#1762 - Making sure all serialization registries/factories are unique
  • PR#1761 - Fixed #1751: hpx::future::wait_for fails a simple test
  • PR#1758 - Fixing #1757
  • IS#1757 - pinning not correct using --hpx:bind
  • IS#1756 - compilation error with MinGW
  • PR#1755 - Making output serialization const-correct
  • IS#1753 - HPX performance degrades with time since execution begins
  • IS#1752 - Error in AGAS
  • IS#1751 - hpx::future::wait_for fails a simple test
  • PR#1750 - Removing hpx_fwd.hpp includes
  • PR#1749 - Simplify result_of and friends
  • PR#1747 - Removed superfluous code from message_buffer.hpp
  • PR#1746 - Tuple dependencies
  • IS#1745 - Broken when_some which takes iterators
  • PR#1744 - Refining archive interface
  • PR#1743 - Fixing when_all when only a single future is passed
  • PR#1742 - Config includes
  • PR#1741 - Os executors
  • IS#1740 - hpx::promise has some problems
  • PR#1739 - Parallel composition with generic containers
  • IS#1738 - After building program and successfully linking to a version of hpx DHPX_DIR seems to be ignored
  • IS#1737 - Uptime problems
  • PR#1736 - added convenience c-tor and begin()/end() to serialize_buffer
  • PR#1735 - Config includes
  • PR#1734 - Fixed #1688: Add timer counters for tfunc_total and exec_total
  • IS#1733 - Add unit test for hpx/lcos/local/receive_buffer.hpp
  • PR#1732 - Renaming get_os_thread_count
  • PR#1731 - Basename registration
  • IS#1730 - Use after move of thread_init_data
  • PR#1729 - Rewriting channel based on new gate component
  • PR#1728 - Fixing #1722
  • PR#1727 - Fixing compile problems with apply_colocated
  • PR#1726 - Apex integration
  • PR#1725 - fixed test timeouts
  • PR#1724 - Renaming vector
  • IS#1723 - Drop support for intel compilers and gcc 4.4. based standard libs
  • IS#1722 - Add support for detecting non-ready futures before serialization
  • PR#1721 - Unifying parallel executors, initializing from launch policy
  • PR#1720 - dropped superfluous typedef
  • IS#1718 - Windows 10 x64, VS 2015 - Unknown CMake command "add_hpx_pseudo_target".
  • PR#1717 - Timed executor traits for thread-executors
  • PR#1716 - serialization of arrays didn't work with non-pod types. fixed
  • PR#1715 - List serialization
  • PR#1714 - changing misspellings
  • PR#1713 - Fixed distribution policy executors
  • PR#1712 - Moving library detection to be executed after feature tests
  • PR#1711 - Simplify parcel
  • PR#1710 - Compile only tests
  • PR#1709 - Implemented timed executors
  • PR#1708 - Implement parallel::executor_traits for thread-executors
  • PR#1707 - Various fixes to threads::executors to make custom schedulers work
  • PR#1706 - Command line option --hpx:cores does not work as expected
  • IS#1705 - command line option --hpx:cores does not work as expected
  • PR#1704 - vector deserialization is speeded up a little
  • PR#1703 - Fixing shared_mutes
  • IS#1702 - Shared_mutex does not compile with no_mutex cond_var
  • PR#1701 - Add distribution_policy_executor
  • PR#1700 - Executor parameters
  • PR#1699 - Readers writer lock
  • PR#1698 - Remove leftovers
  • PR#1697 - Fixing held locks
  • PR#1696 - Modified Scan Partitioner for Algorithms
  • PR#1695 - This thread executors
  • PR#1694 - Fixed #1688: Add timer counters for tfunc_total and exec_total
  • PR#1693 - Fix #1691: is_executor template specification fails for inherited executors
  • PR#1692 - Fixed #1662: Possible exception source in coalescing_message_handler
  • IS#1691 - is_executor template specification fails for inherited executors
  • PR#1690 - added macro for non-intrusive serialization of classes without a default c-tor
  • PR#1689 - Replace value_or_error with custom storage, unify future_data state
  • IS#1688 - Add timer counters for tfunc_total and exec_total
  • PR#1687 - Fixed interval timer
  • PR#1686 - Fixing cmake warnings about not existing pseudo target dependencies
  • PR#1685 - Converting partitioners to use bulk async execute
  • PR#1683 - Adds a tool for inspect that checks for character limits
  • PR#1682 - Change project name to (uppercase) HPX
  • PR#1681 - Counter shortnames
  • PR#1680 - Extended Non-intrusive Serialization to Ease Usage for Library Developers
  • PR#1679 - Working on 1544: More executor changes
  • PR#1678 - Transpose fixes
  • PR#1677 - Improve Boost compatibility check
  • PR#1676 - 1d stencil fix
  • IS#1675 - hpx project name is not HPX
  • PR#1674 - Fixing the MPI parcelport
  • PR#1673 - added move semantics to map/vector deserialization
  • PR#1672 - Vs2015 await
  • PR#1671 - Adapt transform for #1668
  • PR#1670 - Started to work on #1668
  • PR#1669 - Add this_thread_executors
  • IS#1667 - Apple build instructions in docs are out of date
  • PR#1666 - Apex integration
  • PR#1665 - Fixes an error with the whitespace check that showed the incorrect location of the error
  • IS#1664 - Inspect tool found incorrect endline whitespace
  • PR#1663 - Improve use of locks
  • IS#1662 - Possible exception source in coalescing_message_handler
  • PR#1661 - Added support for 128bit number serialization
  • PR#1660 - Serialization 128bits
  • PR#1659 - Implemented inner_product and adjacent_diff algos
  • PR#1658 - Add serialization for std::set (as there is for std::vector and std::map)
  • PR#1657 - Use of shared_ptr in io_service_pool changed to unique_ptr
  • IS#1656 - 1d_stencil codes all have wrong factor
  • PR#1654 - When using runtime_mode_connect, find the correct localhost public ip address
  • PR#1653 - Fixing 1617
  • PR#1652 - Remove traits::action_may_require_id_splitting
  • PR#1651 - Fixed performance counters related to AGAS cache timings
  • PR#1650 - Remove leftovers of traits::type_size
  • PR#1649 - Shorten target names on Windows to shorten used path names
  • PR#1648 - Fixing problems introduced by merging #1623 for older compilers
  • PR#1647 - Simplify running automatic builds on Windows
  • IS#1646 - Cache insert and update performance counters are broken
  • IS#1644 - Remove leftovers of traits::type_size
  • IS#1643 - Remove traits::action_may_require_id_splitting
  • PR#1642 - Adds spell checker to the inspect tool for qbk and doxygen comments
  • PR#1640 - First step towards fixing 688
  • PR#1639 - Re-apply remaining changes from limit_dataflow_recursion branch
  • PR#1638 - This fixes possible deadlock in the test ignore_while_locked_1485
  • PR#1637 - Fixing hpx::wait_all() invoked with two vector<future<T>>
  • PR#1636 - Partially re-apply changes from limit_dataflow_recursion branch
  • PR#1635 - Adding missing test for #1572
  • PR#1634 - Revert "Limit recursion-depth in dataflow to a configurable constant"
  • PR#1633 - Add command line option to ignore batch environment
  • PR#1631 - hpx::lcos::queue exhibits strange behavior
  • PR#1630 - Fixed endline_whitespace_check.cpp to detect lines with only whitespace
  • IS#1629 - Inspect trailing whitespace checker problem
  • PR#1628 - Removed meaningless const qualifiers. Minor icpc fix.
  • PR#1627 - Fixing the queue LCO and add example demonstrating its use
  • PR#1626 - Deprecating get_gid(), add get_id() and get_unmanaged_id()
  • PR#1625 - Allowing to specify whether to send credits along with message
  • IS#1624 - Lifetime issue
  • IS#1623 - hpx::wait_all() invoked with two vector<future<T>> fails
  • PR#1622 - Executor partitioners
  • PR#1621 - Clean up coroutines implementation
  • IS#1620 - Revert #1535
  • PR#1619 - Fix result type calculation for hpx::make_continuation
  • PR#1618 - Fixing RDTSC on Xeon/Phi
  • IS#1617 - hpx cmake not working when run as a subproject
  • IS#1616 - cmake problem resulting in RDTSC not working correctly for Xeon Phi creates very strange results for duration counters
  • IS#1615 - hpx::make_continuation requires input and output to be the same
  • PR#1614 - Fixed remove copy test
  • IS#1613 - Dataflow causes stack overflow
  • PR#1612 - Modified foreach partitioner to use bulk execute
  • PR#1611 - Limit recursion-depth in dataflow to a configurable constant
  • PR#1610 - Increase timeout for CircleCI
  • PR#1609 - Refactoring thread manager, mainly extracting thread pool
  • PR#1608 - Fixed running multiple localities without localities parameter
  • PR#1607 - More algorithm fixes to adjacentfind
  • IS#1606 - Running without localities parameter binds to bogus port range
  • IS#1605 - Too many serializations
  • PR#1604 - Changes the HPX image into a hyperlink
  • PR#1601 - Fixing problems with remove_copy algorithm tests
  • PR#1600 - Actions with ids cleanup
  • PR#1599 - Duplicate binding of global ids should fail
  • PR#1598 - Fixing array access
  • PR#1597 - Improved the reliability of connecting/disconnecting localities
  • IS#1596 - Duplicate id binding should fail
  • PR#1595 - Fixing more cmake config constants
  • PR#1594 - Fixing preprocessor constant used to enable C++11 chrono
  • PR#1593 - Adding operator|() for hpx::launch
  • IS#1592 - Error (typo) in the docs
  • IS#1590 - CMake fails when CMAKE_BINARY_DIR contains '+'.
  • IS#1589 - Disconnecting a locality results in segfault using heartbeat example
  • PR#1588 - Fix doc string for config option HPX_WITH_EXAMPLES
  • PR#1586 - Fixing 1493
  • PR#1585 - Additional Check for Inspect Tool to detect Endline Whitespace
  • IS#1584 - Clean up coroutines implementation
  • PR#1583 - Adding a check for end line whitespace
  • PR#1582 - Attempt to fix assert firing after scheduling loop was exited
  • PR#1581 - Fixed adjacentfind_binary test
  • PR#1580 - Prevent some of the internal cmake lists from growing indefinitely
  • PR#1579 - Removing type_size trait, replacing it with special archive type
  • IS#1578 - Remove demangle_helper
  • PR#1577 - Get ptr problems
  • IS#1576 - Refactor async, dataflow, and future::then
  • PR#1575 - Fixing tests for parallel rotate
  • PR#1574 - Cleaning up schedulers
  • PR#1573 - Fixing thread pool executor
  • PR#1572 - Fixing number of configured localities
  • PR#1571 - Reimplement decay
  • PR#1570 - Refactoring async, apply, and dataflow APIs
  • PR#1569 - Changed range for mach-o library lookup
  • PR#1568 - Mark decltype support as required
  • PR#1567 - Removed const from algorithms
  • IS#1566 - CMAKE Configuration Test Failures for clang 3.5 on debian
  • PR#1565 - Dylib support
  • PR#1564 - Converted partitioners and some algorithms to use executors
  • PR#1563 - Fix several #includes for Boost.Preprocessor
  • PR#1562 - Adding configuration option disabling/enabling all message handlers
  • PR#1561 - Removed all occurrences of boost::move replacing it with std::move
  • IS#1560 - Leftover HPX_REGISTER_ACTION_DECLARATION_2
  • PR#1558 - Revisit async/apply SFINAE conditions
  • PR#1557 - Removing type_size trait, replacing it with special archive type
  • PR#1556 - Executor algorithms
  • PR#1555 - Remove the necessity to specify archive flags on the receiving end
  • PR#1554 - Removing obsolete Boost.Serialization macros
  • PR#1553 - Properly fix HPX_DEFINE_*_ACTION macros
  • PR#1552 - Fixed algorithms relying on copy_if implementation
  • PR#1551 - Pxfs - Modifying FindOrangeFS.cmake based on OrangeFS 2.9.X
  • IS#1550 - Passing plain identifier inside HPX_DEFINE_PLAIN_ACTION_1
  • PR#1549 - Fixing intel14/libstdc++4.4
  • PR#1548 - Moving raw_ptr to detail namespace
  • PR#1547 - Adding support for executors to future.then
  • PR#1546 - Executor traits result types
  • PR#1545 - Integrate executors with dataflow
  • PR#1543 - Fix potential zero-copy for primarynamespace::bulk_service_async et.al.
  • PR#1542 - Merging HPX0.9.10 into pxfs branch
  • PR#1541 - Removed stale cmake tests, unused since the great cmake refactoring
  • PR#1540 - Fix idle-rate on platforms without TSC
  • PR#1539 - Reporting situation if zero-copy-serialization was performed by a parcel generated from a plain apply/async
  • PR#1538 - Changed return type of bulk executors and added test
  • IS#1537 - Incorrect cpuid config tests
  • PR#1536 - Changed return type of bulk executors and added test
  • PR#1535 - Make sure promise::get_gid() can be called more than once
  • PR#1534 - Fixed async_callback with bound callback
  • PR#1533 - Updated the link in the documentation to a publically- accessible URL
  • PR#1532 - Make sure sync primitives are not copyable nor movable
  • PR#1531 - Fix unwrapped issue with future ranges of void type
  • PR#1530 - Serialization complex
  • IS#1528 - Unwrapped issue with future<void>
  • IS#1527 - HPX does not build with Boost 1.58.0
  • PR#1526 - Added support for boost.multi_array serialization
  • PR#1525 - Properly handle deferred futures, fixes #1506
  • PR#1524 - Making sure invalid action argument types generate clear error message
  • IS#1522 - Need serialization support for boost multi array
  • IS#1521 - Remote async and zero-copy serialization optimizations don't play well together
  • PR#1520 - Fixing UB whil registering polymorphic classes for serialization
  • PR#1519 - Making detail::condition_variable safe to use
  • PR#1518 - Fix when_some bug missing indices in its result
  • IS#1517 - Typo may affect CMake build system tests
  • PR#1516 - Fixing Posix context
  • PR#1515 - Fixing Posix context
  • PR#1514 - Correct problems with loading dynamic components
  • PR#1513 - Fixing intel glibc4 4
  • IS#1508 - memory and papi counters do not work
  • IS#1507 - Unrecognized Command Line Option Error causing exit status 0
  • IS#1506 - Properly handle deferred futures
  • PR#1505 - Adding #include - would not compile without this
  • IS#1502 - boost::filesystem::exists throws unexpected exception
  • IS#1501 - hwloc configuration options are wrong for MIC
  • PR#1504 - Making sure boost::filesystem::exists() does not throw
  • PR#1500 - Exit application on --hpx:version/-v and --hpx:info
  • PR#1498 - Extended task block
  • PR#1497 - Unique ptr serialization
  • PR#1496 - Unique ptr serialization (closed)
  • PR#1495 - Switching circleci build type to debug
  • IS#1494 - --hpx:version/-v does not exit after printing version information
  • IS#1493 - add an "hpx_" prefix to libraries and components to avoid name conflicts
  • IS#1492 - Define and ensure limitations for arguments to async/apply
  • PR#1489 - Enable idle rate counter on demand
  • PR#1488 - Made sure detail::condition_variable can be safely destroyed
  • PR#1487 - Introduced default (main) template implementation for ignore_while_checking
  • PR#1486 - Add HPX inspect tool
  • IS#1485 - ignore_while_locked doesn't support all Lockable types
  • PR#1484 - Docker image generation
  • PR#1483 - Move external endian library into HPX
  • PR#1482 - Actions with integer type ids
  • IS#1481 - Sync primitives safe destruction
  • IS#1480 - Move external/boost/endian into hpx/util
  • IS#1478 - Boost inspect violations
  • PR#1479 - Adds serialization for arrays; some futher/minor fixes
  • PR#1477 - Fixing problems with the Intel compiler using a GCC 4.4 std library
  • PR#1476 - Adding hpx::lcos::latch and hpx::lcos::local::latch
  • IS#1475 - Boost inspect violations
  • PR#1473 - Fixing action move tests
  • IS#1471 - Sync primitives should not be movable
  • PR#1470 - Removing hpx::util::polymorphic_factory
  • PR#1468 - Fixed container creation
  • IS#1467 - HPX application fail during finalization
  • IS#1466 - HPX doesn't pick up Torque's nodefile on SuperMIC
  • IS#1464 - HPX option for pre and post bootstrap performance counters
  • PR#1463 - Replacing async_colocated(id, ...) with async(colocated(id), ...)
  • PR#1462 - Consolidated task_region with N4411
  • PR#1461 - Consolidate inconsistent CMake option names
  • IS#1460 - Which malloc is actually used? or at least which one is HPX built with
  • IS#1459 - Make cmake configure step fail explicitly if compiler version is not supported
  • IS#1458 - Update parallel::task_region with N4411
  • PR#1456 - Consolidating new_<>()
  • IS#1455 - Replace async_colocated(id, ...) with async(colocated(id), ...)
  • PR#1454 - Removed harmful std::moves from return statements
  • PR#1453 - Use range-based for-loop instead of Boost.Foreach
  • PR#1452 - C++ feature tests
  • PR#1451 - When serializing, pass archive flags to traits::get_type_size
  • IS#1450 - traits:get_type_size needs archive flags to enable zero_copy optimizations
  • IS#1449 - "couldn't create performance counter" - AGAS
  • IS#1448 - Replace distributing factories with new_<T[]>(...)
  • PR#1447 - Removing obsolete remote_object component
  • PR#1446 - Hpx serialization
  • PR#1445 - Replacing travis with circleci
  • PR#1443 - Always stripping HPX command line arguments before executing start function
  • PR#1442 - Adding --hpx:bind=none to disable thread affinities
  • IS#1439 - Libraries get linked in multiple times, RPATH is not properly set
  • PR#1438 - Removed superfluous typedefs
  • IS#1437 - hpx::init() should strip HPX-related flags from argv
  • IS#1436 - Add strong scaling option to htts
  • PR#1435 - Adding async_cb, async_continue_cb, and async_colocated_cb
  • PR#1434 - Added missing install rule, removed some dead CMake code
  • PR#1433 - Add GitExternal and SubProject cmake scripts from eyescale/cmake repo
  • IS#1432 - Add command line flag to disable thread pinning
  • PR#1431 - Fix #1423
  • IS#1430 - Inconsistent CMake option names
  • IS#1429 - Configure setting HPX_HAVE_PARCELPORT_MPI is ignored
  • PR#1428 - Fixes #1419 (closed)
  • PR#1427 - Adding stencil_iterator and transform_iterator
  • PR#1426 - Fixes #1419
  • PR#1425 - During serialization memory allocation should honour allocator chunk size
  • IS#1424 - chunk allocation during serialization does not use memory pool/allocator chunk size
  • IS#1423 - Remove HPX_STD_UNIQUE_PTR
  • IS#1422 - hpx:threads=all allocates too many os threads
  • PR#1420 - added .travis.yml
  • IS#1419 - Unify enums: hpx::runtime::state and hpx::state
  • PR#1416 - Adding travis builder
  • IS#1414 - Correct directory for dispatch_gcc46.hpp iteration
  • IS#1410 - Set operation algorithms
  • IS#1389 - Parallel algorithms relying on scan partitioner break for small number of elements
  • IS#1325 - Exceptions thrown during parcel handling are not handled correctly
  • IS#1315 - Errors while running performance tests
  • IS#1309 - hpx::vector partitions are not easily extendable by applications
  • PR#1300 - Added serialization/de-serialization to examples.tuplespace
  • IS#1251 - hpx::threads::get_thread_count doesn't consider pending threads
  • IS#1008 - Decrease in application performance overtime; occasional spikes of major slowdown
  • IS#1001 - Zero copy serialization raises assert
  • IS#721 - Make HPX usable for Xeon Phi
  • IS#524 - Extend scheduler to support threads which can't be stolen
General Changes

This is the 12th official release of HPX. It coincides with the 7th anniversary of the first commit to our source code repository. Since then, we have seen over 12300 commits amounting to more than 220000 lines of C++ code.

The major focus of this release was to improve the reliability of large scale runs. We believe to have achieved this goal as we now can reliably run HPX applications on up to ~24k cores. We have also shown that HPX can be used with success for symmetric runs (applications using both, host cores and Intel Xeon/Phi coprocessors). This is a huge step forward in terms of the usability of HPX. The main focus of this work involved isolating the causes of the segmentation faults at start up and shut down. Many of these issues were discovered to be the result of the suspension of threads which hold locks.

A very important improvement introduced with this release is the refactoring of the code representing our parcel-port implementation. Parcel- ports can now be implemented by 3rd parties as independent plugins which are dynamically loaded at runtime (static linking of parcel-ports is also supported). This refactoring also includes a massive improvement of the performance of our existing parcel-ports. We were able to significantly reduce the networking latencies and to improve the available networking bandwidth. Please note that in this release we disabled the ibverbs and ipc parcel ports as those have not been ported to the new plugin system yet (see IS#839).

Another corner stone of this release is our work towards a complete implementation of N4409 (Working Draft, Technical Specification for C++ Extensions for Parallelism). This document defines a set of parallel algorithms to be added to the C++ standard library. We now have implemented about 75% of all specified parallel algorithms (see Parallel Algorithms for more details). We also implemented some extensions to N4409 allowing to invoke all of the algorithms asynchronously.

This release adds a first implementation of hpx::vector which is a distributed data structure closely aligned to the functionality of std::vector. The difference is that hpx::vector stores the data in partitions where the partitions can be distributed over different localities. We started to work on allowing to use the parallel algorithms with hpx::vector. At this point we have implemented only a few of the parallel algorithms to support distributed data structures (like hpx::vector) for testing purposes (see IS#1338 for a documentation of our progress).

Breaking Changes

With this release we put a lot of effort into changing the code base to be more compatible to C++11. These changes have caused the following issues for backward compatibility:

  • Move to Variadics- All of the API now uses variadic templates. However, this change required to modify the argument sequence for some of the exiting API functions (hpx::async_continue, hpx::apply_continue, hpx::when_each, hpx::wait_each, synchronous invocation of actions).
  • Changes to Macros- We also removed the macros HPX_STD_FUNCTION and HPX_STD_TUPLE. This shouldn't affect any user code as we replaced HPX_STD_FUNCTION with hpx::util::function_nonser which was the default expansion used for this macro. All HPX API functions which expect a hpx::util::function_nonser (or a hpx::util::unique_function_nonser) can now be transparently called with a compatible std::function instead. Similarly, HPX_STD_TUPLE was replaced by its default expansion as well: hpx::util::tuple.
  • Changes to hpx::unique_future- hpx::unique_future, which was deprecated in the previous release for hpx::future is now completely removed from HPX. This completes the transition to a completely standards conforming implementation of hpx::future.
  • Changes to Supported Compilers- Finally, in order to utilize more C++11 semantics, we have officially dropped support for GCC 4.4 and MSVC 2012. Please see our Build Prerequisites page for more details.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • IS#1402 - Internal shared_future serialization copies
  • IS#1399 - Build takes unusually long time...
  • IS#1398 - Tests using the scan partitioner are broken on at least gcc 4.7 and intel compiler
  • IS#1397 - Completely remove hpx::unique_future
  • IS#1396 - Parallel scan algorithms with different initial values
  • IS#1395 - Race Condition - 1d_stencil_8 - SuperMIC
  • IS#1394 - "suspending thread while at least one lock is being held" - 1d_stencil_8 - SuperMIC
  • IS#1393 - SEGFAULT in 1d_stencil_8 on SuperMIC
  • IS#1392 - Fixing #1168
  • IS#1391 - Parallel Algorithms for scan partitioner for small number of elements
  • IS#1387 - Failure with more than 4 localities
  • IS#1386 - Dispatching unhandled exceptions to outer user code
  • IS#1385 - Adding Copy algorithms, fixing parallel::copy_if
  • IS#1384 - Fixing 1325
  • IS#1383 - Fixed #504: Refactor Dataflow LCO to work with futures, this removes the dataflow component as it is obsolete
  • IS#1382 - is_sorted, is_sorted_until and is_partitioned algorithms
  • IS#1381 - fix for CMake versions prior to 3.1
  • IS#1380 - resolved warning in CMake 3.1 and newer
  • IS#1379 - Compilation error with papi
  • IS#1378 - Towards safer migration
  • IS#1377 - HPXConfig.cmake should include TCMALLOC_LIBRARY and TCMALLOC_INCLUDE_DIR
  • IS#1376 - Warning on uninitialized member
  • IS#1375 - Fixing 1163
  • IS#1374 - Fixing the MSVC 12 release builder
  • IS#1373 - Modifying parallel search algorithm for zero length searches
  • IS#1372 - Modifying parallel search algorithm for zero length searches
  • IS#1371 - Avoid holding a lock during agas::incref while doing a credit split
  • IS#1370 - --hpx:bind throws unexpected error
  • IS#1369 - Getting rid of (void) in loops
  • IS#1368 - Variadic templates support for tuple
  • IS#1367 - One last batch of variadic templates support
  • IS#1366 - Fixing symbolic namespace hang
  • IS#1365 - More held locks
  • IS#1364 - Add counters 1363
  • IS#1363 - Add thread overhead counters
  • IS#1362 - Std config removal
  • IS#1361 - Parcelport plugins
  • IS#1360 - Detuplify transfer_action
  • IS#1359 - Removed obsolete checks
  • IS#1358 - Fixing 1352
  • IS#1357 - Variadic templates support for runtime_support and components
  • IS#1356 - fixed coordinate test for intel13
  • IS#1355 - fixed coordinate.hpp
  • IS#1354 - Lexicographical Compare completed
  • IS#1353 - HPX should set Boost_ADDITIONAL_VERSIONS flags
  • IS#1352 - Error: Cannot find action '' in type registry: HPX(bad_action_code)
  • IS#1351 - Variadic templates support for appliers
  • IS#1350 - Actions simplification
  • IS#1349 - Variadic when and wait functions
  • IS#1348 - Added hpx_init header to test files
  • IS#1347 - Another batch of variadic templates support
  • IS#1346 - Segmented copy
  • IS#1345 - Attempting to fix hangs during shutdown
  • IS#1344 - Std config removal
  • IS#1343 - Removing various distribution policies for hpx::vector
  • IS#1342 - Inclusive scan
  • IS#1341 - Exclusive scan
  • IS#1340 - Adding parallel::count for distributed data structures, adding tests
  • IS#1339 - Update argument order for transform_reduce
  • IS#1337 - Fix dataflow to handle properly ranges of futures
  • IS#1336 - dataflow needs to hold onto futures passed to it
  • IS#1335 - Fails to compile with msvc14
  • IS#1334 - Examples build problem
  • IS#1333 - Distributed transform reduce
  • IS#1332 - Variadic templates support for actions
  • IS#1331 - Some ambiguous calls of map::erase have been prevented by adding additional check in locality constructor.
  • IS#1330 - Defining Plain Actions does not work as described in the documentation
  • IS#1329 - Distributed vector cleanup
  • IS#1328 - Sync docs and comments with code in hello_world example
  • IS#1327 - Typos in docs
  • IS#1326 - Documentation and code diverged in Fibonacci tutorial
  • IS#1325 - Exceptions thrown during parcel handling are not handled correctly
  • IS#1324 - fixed bandwidth calculation
  • IS#1323 - mmap() failed to allocate thread stack due to insufficient resources
  • IS#1322 - HPX fails to build aa182cf
  • IS#1321 - Limiting size of outgoing messages while coalescing parcels
  • IS#1320 - passing a future with launch::deferred in remote function call causes hang
  • IS#1319 - An exception when tries to specify number high priority threads with abp-priority
  • IS#1318 - Unable to run program with abp-priority and numa-sensitivity enabled
  • IS#1317 - N4071 Search/Search_n finished, minor changes
  • IS#1316 - Add config option to make -Ihpx.run_hpx_main!=1 the default
  • IS#1314 - Variadic support for async and apply
  • IS#1313 - Adjust when_any/some to the latest proposed interfaces
  • IS#1312 - Fixing #857: hpx::naming::locality leaks parcelport specific information into the public interface
  • IS#1311 - Distributed get'er/set'er_values for distributed vector
  • IS#1310 - Crashing in hpx::parcelset::policies::mpi::connection_handler::handle_messages() on SuperMIC
  • IS#1308 - Unable to execute an application with --hpx:threads
  • IS#1307 - merge_graph linking issue
  • IS#1306 - First batch of variadic templates support
  • IS#1305 - Create a compiler wrapper
  • IS#1304 - Provide a compiler wrapper for hpx
  • IS#1303 - Drop support for GCC44
  • IS#1302 - Fixing #1297
  • IS#1301 - Compilation error when tried to use boost range iterators with wait_all
  • IS#1298 - Distributed vector
  • IS#1297 - Unable to invoke component actions recursively
  • IS#1294 - HDF5 build error
  • IS#1275 - The parcelport implementation is non-optimal
  • IS#1267 - Added classes and unit tests for local_file, orangefs_file and pxfs_file
  • IS#1264 - Error "assertion '!m_fun' failed" randomly occurs when using TCP
  • IS#1254 - thread binding seems to not work properly
  • IS#1220 - parallel::copy_if is broken
  • IS#1217 - Find a better way of fixing the issue patched by #1216
  • IS#1168 - Starting HPX on Cray machines using aprun isn't working correctly
  • IS#1085 - Replace startup and shutdown barriers with broadcasts
  • IS#981 - With SLURM, --hpx:threads=8 should not be necessary
  • IS#857 - hpx::naming::locality leaks parcelport specific information into the public interface
  • IS#850 - "flush" not documented
  • IS#763 - Create buildbot instance that uses std::bind as HPX_STD_BIND
  • IS#680 - Convert parcel ports into a plugin system
  • IS#582 - Make exception thrown from HPX threads available from hpx::init
  • IS#504 - Refactor Dataflow LCO to work with futures
  • IS#196 - Don't store copies of the locality network metadata in the gva table
General Changes

We have had over 1500 commits since the last release and we have closed over 200 tickets (bugs, feature requests, pull requests, etc.). These are by far the largest numbers of commits and resolved issues for any of the HPX releases so far. We are especially happy about the large number of people who contributed for the first time to HPX.

  • We completed the transition from the older (non-conforming) implementation of hpx::future to the new and fully conforming version by removing the old code and by renaming the type hpx::unique_future to hpx::future. In order to maintain backwards compatibility with existing code which uses the type hpx::unique_future we support the configuration variable HPX_UNIQUE_FUTURE_ALIAS. If this variable is set to ON while running cmake it will additionally define a template alias for this type.
  • We rewrote and significantly changed our build system. Please have a look at the new (now generated) documentation here: HPX build system. Please revisit your build scripts to adapt to the changes. The most notable changes are:
    • HPX_NO_INSTALL is no longer necessary.
    • For external builds, you need to set HPX_DIR instead of HPX_ROOT as described here: Using CMake.
    • IDEs that support multiple configurations (Visual Studio and XCode) can now be used as intended. that means no build dir.
    • Building HPX statically (without dynamic libraries) is now supported (-DHPX_STATIC_LINKING=On).
    • Please note that many variables used to configure the build process have been renamed to unify the naming conventions (see the section CMake Variables used to configure HPX for more information).
    • This also fixes a long list of issues, for more information see IS#1204.
  • We started to implement various proposals to the C++ Standardization committee related to parallelism and concurrency, most notably N4409 (Working Draft, Technical Specification for C++ Extensions for Parallelism), N4411 (Task Region Rev. 3), and N4313 (Working Draft, Technical Specification for C++ Extensions for Concurrency).
  • We completely remodeled our automatic build system to run builds and unit tests on various systems and compilers. This allows us to find most bugs right as they were introduced and helps to maintain a high level of quality and compatibility. The newest build logs can be found at HPX Buildbot Website.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • IS#1296 - Rename make_error_future to make_exceptional_future, adjust to N4123
  • IS#1295 - building issue
  • IS#1293 - Transpose example
  • IS#1292 - Wrong abs() function used in example
  • IS#1291 - non-synchronized shift operators have been removed
  • IS#1290 - RDTSCP is defined as true for Xeon Phi build
  • IS#1289 - Fixing 1288
  • IS#1288 - Add new performance counters
  • IS#1287 - Hierarchy scheduler broken performance counters
  • IS#1286 - Algorithm cleanup
  • IS#1285 - Broken Links in Documentation
  • IS#1284 - Uninitialized copy
  • IS#1283 - missing boost::scoped_ptr includes
  • IS#1282 - Update documentation of build options for schedulers
  • IS#1281 - reset idle rate counter
  • IS#1280 - Bug when executing on Intel MIC
  • IS#1279 - Add improved when_all/wait_all
  • IS#1278 - Implement improved when_all/wait_all
  • IS#1277 - feature request: get access to argc argv and variables_map
  • IS#1276 - Remove merging map
  • IS#1274 - Weird (wrong) string code in papi.cpp
  • IS#1273 - Sequential task execution policy
  • IS#1272 - Avoid CMake name clash for Boost.Thread library
  • IS#1271 - Updates on HPX Test Units
  • IS#1270 - hpx/util/safe_lexical_cast.hpp is added
  • IS#1269 - Added default value for "LIB" cmake variable
  • IS#1268 - Memory Counters not working
  • IS#1266 - FindHPX.cmake is not installed
  • IS#1263 - apply_remote test takes too long
  • IS#1262 - Chrono cleanup
  • IS#1261 - Need make install for papi counters and this builds all the examples
  • IS#1260 - Documentation of Stencil example claims
  • IS#1259 - Avoid double-linking Boost on Windows
  • IS#1257 - Adding additional parameter to create_thread
  • IS#1256 - added buildbot changes to release notes
  • IS#1255 - Cannot build MiniGhost
  • IS#1253 - hpx::thread defects
  • IS#1252 - HPX_PREFIX is too fragile
  • IS#1250 - switch_to_fiber_emulation does not work properly
  • IS#1249 - Documentation is generated under Release folder
  • IS#1248 - Fix usage of hpx_generic_coroutine_context and get tests passing on powerpc
  • IS#1247 - Dynamic linking error
  • IS#1246 - Make cpuid.cpp C++11 compliant
  • IS#1245 - HPX fails on startup (setting thread affinity mask)
  • IS#1244 - HPX_WITH_RDTSC configure test fails, but should succeed
  • IS#1243 - CTest dashboard info for CSCS CDash drop location
  • IS#1242 - Mac fixes
  • IS#1241 - Failure in Distributed with Boost 1.56
  • IS#1240 - fix a race condition in examples.diskperf
  • IS#1239 - fix wait_each in examples.diskperf
  • IS#1238 - Fixed #1237: hpx::util::portable_binary_iarchive failed
  • IS#1237 - hpx::util::portable_binary_iarchive faileds
  • IS#1235 - Fixing clang warnings and errors
  • IS#1234 - TCP runs fail: Transport endpoint is not connected
  • IS#1233 - Making sure the correct number of threads is registered with AGAS
  • IS#1232 - Fixing race in wait_xxx
  • IS#1231 - Parallel minmax
  • IS#1230 - Distributed run of 1d_stencil_8 uses less threads than spec. & sometimes gives errors
  • IS#1229 - Unstable number of threads
  • IS#1228 - HPX link error (cmake / MPI)
  • IS#1226 - Warning about struct/class thread_counters
  • IS#1225 - Adding parallel::replace etc
  • IS#1224 - Extending dataflow to pass through non-future arguments
  • IS#1223 - Remaining find algorithms implemented, N4071
  • IS#1222 - Merging all the changes
  • IS#1221 - No error output when using mpirun with hpx
  • IS#1219 - Adding new AGAS cache performance counters
  • IS#1216 - Fixing using futures (clients) as arguments to actions
  • IS#1215 - Error compiling simple component
  • IS#1214 - Stencil docs
  • IS#1213 - Using more than a few dozen MPI processes on SuperMike results in a seg fault before getting to hpx_main
  • IS#1212 - Parallel rotate
  • IS#1211 - Direct actions cause the future's shared_state to be leaked
  • IS#1210 - Refactored local::promise to be standard conformant
  • IS#1209 - Improve command line handling
  • IS#1208 - Adding parallel::reverse and parallel::reverse_copy
  • IS#1207 - Add copy_backward and move_backward
  • IS#1206 - N4071 additional algorithms implemented
  • IS#1204 - Cmake simplification and various other minor changes
  • IS#1203 - Implementing new launch policy for (local) async: hpx::launch::fork.
  • IS#1202 - Failed assertion in connection_cache.hpp
  • IS#1201 - pkg-config doesn't add mpi link directories
  • IS#1200 - Error when querying time performance counters
  • IS#1199 - library path is now configurable (again)
  • IS#1198 - Error when querying performance counters
  • IS#1197 - tests fail with intel compiler
  • IS#1196 - Silence several warnings
  • IS#1195 - Rephrase initializers to work with VC++ 2012
  • IS#1194 - Simplify parallel algorithms
  • IS#1193 - Adding parallel::equal
  • IS#1192 - HPX(out_of_memory) on including <hpx/hpx.hpp>
  • IS#1191 - Fixing #1189
  • IS#1190 - Chrono cleanup
  • IS#1189 - Deadlock .. somewhere? (probably serialization)
  • IS#1188 - Removed future::get_status()
  • IS#1186 - Fixed FindOpenCL to find current AMD APP SDK
  • IS#1184 - Tweaking future unwrapping
  • IS#1183 - Extended parallel::reduce
  • IS#1182 - future::unwrap hangs for launch::deferred
  • IS#1181 - Adding all_of, any_of, and none_of and corresponding documentation
  • IS#1180 - hpx::cout defect
  • IS#1179 - hpx::async does not work for member function pointers when called on types with self-defined unary operator*
  • IS#1178 - Implemented variadic hpx::util::zip_iterator
  • IS#1177 - MPI parcelport defect
  • IS#1176 - HPX_DEFINE_COMPONENT_CONST_ACTION_TPL does not have a 2-argument version
  • IS#1175 - Create util::zip_iterator working with util::tuple<>
  • IS#1174 - Error Building HPX on linux, root_certificate_authority.cpp
  • IS#1173 - hpx::cout output lost
  • IS#1172 - HPX build error with Clang 3.4.2
  • IS#1171 - CMAKE_INSTALL_PREFIX ignored
  • IS#1170 - Close hpx_benchmarks repository on Github
  • IS#1169 - Buildbot emails have syntax error in url
  • IS#1167 - Merge partial implementation of standards proposal N3960
  • IS#1166 - Fixed several compiler warnings
  • IS#1165 - cmake warns: "tests.regressions.actions" does not exist
  • IS#1164 - Want my own serialization of hpx::future
  • IS#1162 - Segfault in hello_world example
  • IS#1161 - Use HPX_ASSERT to aid the compiler
  • IS#1160 - Do not put -DNDEBUG into hpx_application.pc
  • IS#1159 - Support Clang 3.4.2
  • IS#1158 - Fixed #1157: Rename when_n/wait_n, add when_xxx_n/wait_xxx_n
  • IS#1157 - Rename when_n/wait_n, add when_xxx_n/wait_xxx_n
  • IS#1156 - Force inlining fails
  • IS#1155 - changed header of printout to be compatible with python csv module
  • IS#1154 - Fixing iostreams
  • IS#1153 - Standard manipulators (like std::endl) do not work with hpx::ostream
  • IS#1152 - Functions revamp
  • IS#1151 - Supressing cmake 3.0 policy warning for CMP0026
  • IS#1150 - Client Serialization error
  • IS#1149 - Segfault on Stampede
  • IS#1148 - Refactoring mini-ghost
  • IS#1147 - N3960 copy_if and copy_n implemented and tested
  • IS#1146 - Stencil print
  • IS#1145 - N3960 hpx::parallel::copy implemented and tested
  • IS#1144 - OpenMP examples 1d_stencil do not build
  • IS#1143 - 1d_stencil OpenMP examples do not build
  • IS#1142 - Cannot build HPX with gcc 4.6 on OS X
  • IS#1140 - Fix OpenMP lookup, enable usage of config tests in external CMake projects.
  • IS#1139 - hpx/hpx/config/compiler_specific.hpp
  • IS#1138 - clean up pkg-config files
  • IS#1137 - Improvements to create binary packages
  • IS#1136 - HPX_GCC_VERSION not defined on all compilers
  • IS#1135 - Avoiding collision between winsock2.h and windows.h
  • IS#1134 - Making sure, that hpx::finalize can be called from any locality
  • IS#1133 - 1d stencil examples
  • IS#1131 - Refactor unique_function implementation
  • IS#1130 - Unique function
  • IS#1129 - Some fixes to the Build system on OS X
  • IS#1128 - Action future args
  • IS#1127 - Executor causes segmentation fault
  • IS#1124 - Adding new API functions: register_id_with_basename, unregister_id_with_basename, find_ids_from_basename; adding test
  • IS#1123 - Reduce nesting of try-catch construct in encode_parcels?
  • IS#1122 - Client base fixes
  • IS#1121 - Update hpxrun.py.in
  • IS#1120 - HTTS2 tests compile errors on v110 (VS2012)
  • IS#1119 - Remove references to boost::atomic in accumulator example
  • IS#1118 - Only build test thread_pool_executor_1114_test if HPX_LOCAL_SCHEDULER is set
  • IS#1117 - local_queue_executor linker error on vc110
  • IS#1116 - Disabled performance counter should give runtime errors, not invalid data
  • IS#1115 - Compile error with Intel C++ 13.1
  • IS#1114 - Default constructed executor is not usable
  • IS#1113 - Fast compilation of logging causes ABI incompatibilities between different NDEBUG values
  • IS#1112 - Using thread_pool_executors causes segfault
  • IS#1111 - hpx::threads::get_thread_data always returns zero
  • IS#1110 - Remove unnecessary null pointer checks
  • IS#1109 - More tests adjustments
  • IS#1108 - Clarify build rules for "libboost_atomic-mt.so"?
  • IS#1107 - Remove unnecessary null pointer checks
  • IS#1106 - network_storage benchmark imporvements, adding legends to plots and tidying layout
  • IS#1105 - Add more plot outputs and improve instructions doc
  • IS#1104 - Complete quoting for parameters of some CMake commands
  • IS#1103 - Work on test/scripts
  • IS#1102 - Changed minimum requirement of window install to 2012
  • IS#1101 - Changed minimum requirement of window install to 2012
  • IS#1100 - Changed readme to no longer specify using MSVC 2010 compiler
  • IS#1099 - Error returning futures from component actions
  • IS#1098 - Improve storage test
  • IS#1097 - data_actions quickstart example calls missing function decorate_action of data_get_action
  • IS#1096 - MPI parcelport broken with new zero copy optimization
  • IS#1095 - Warning C4005: _WIN32_WINNT: Macro redefinition
  • IS#1094 - Syntax error for -DHPX_UNIQUE_FUTURE_ALIAS in master
  • IS#1093 - Syntax error for -DHPX_UNIQUE_FUTURE_ALIAS
  • IS#1092 - Rename unique_future<> back to future<>
  • IS#1091 - Inconsistent error message
  • IS#1090 - On windows 8.1 the examples crashed if using more than one os thread
  • IS#1089 - Components should be allowed to have their own executor
  • IS#1088 - Add possibility to select a network interface for the ibverbs parcelport
  • IS#1087 - ibverbs and ipc parcelport uses zero copy optimization
  • IS#1083 - Make shell examples copyable in docs
  • IS#1082 - Implement proper termination detection during shutdown
  • IS#1081 - Implement thread_specific_ptr for hpx::threads
  • IS#1072 - make install not working properly
  • IS#1070 - Complete quoting for parameters of some CMake commands
  • IS#1059 - Fix more unused variable warnings
  • IS#1051 - Implement when_each
  • IS#973 - Would like option to report hwloc bindings
  • IS#970 - Bad flags for Fortran compiler
  • IS#941 - Create a proper user level context switching class for BG/Q
  • IS#935 - Build error with gcc 4.6 and Boost 1.54.0 on hpx trunk and 0.9.6
  • IS#934 - Want to build HPX without dynamic libraries
  • IS#927 - Make hpx/lcos/reduce.hpp accept futures of id_type
  • IS#926 - All unit tests that are run with more than one thread with CTest/hpx_run_test should configure hpx.os_threads
  • IS#925 - regression_dataflow_791 needs to be brought in line with HPX standards
  • IS#899 - Fix race conditions in regression tests
  • IS#879 - Hung test leads to cascading test failure; make tests should support the MPI parcelport
  • IS#865 - future<T> and friends shall work for movable only Ts
  • IS#847 - Dynamic libraries are not installed on OS X
  • IS#816 - First Program tutorial pull request
  • IS#799 - Wrap lexical_cast to avoid exceptions
  • IS#720 - broken configuration when using ccmake on Ubuntu
  • IS#622 - --hpx:hpx and --hpx:debug-hpx-log is nonsensical
  • IS#525 - Extend barrier LCO test to run in distributed
  • IS#515 - Multi-destination version of hpx::apply is broken
  • IS#509 - Push Boost.Atomic changes upstream
  • IS#503 - Running HPX applications on Windows should not require setting %PATH%
  • IS#461 - Add a compilation sanity test
  • IS#456 - hpx_run_tests.py should log output from tests that timeout
  • IS#454 - Investigate threadmanager performance
  • IS#345 - Add more versatile environmental/cmake variable support to hpx_find_* CMake macros
  • IS#209 - Support multiple configurations in generated build files
  • IS#190 - hpx::cout should be a std::ostream
  • IS#189 - iostreams component should use startup/shutdown functions
  • IS#183 - Use Boost.ICL for correctness in AGAS
  • IS#44 - Implement real futures

We have had over 800 commits since the last release and we have closed over 65 tickets (bugs, feature requests, etc.).

With the changes below, HPX is once again leading the charge of a whole new era of computation. By intrinsically breaking down and synchronizing the work to be done, HPX insures that application developers will no longer have to fret about where a segment of code executes. That allows coders to focus their time and energy to understanding the data dependencies of their algorithms and thereby the core obstacles to an efficient code. Here are some of the advantages of using HPX:

  • HPX is solidly rooted in a sophisticated theoretical execution model -- ParalleX
  • HPX exposes an API fully conforming to the C++11 and the draft C++14 standards, extended and applied to distributed computing. Everything programmers know about the concurrency primitives of the standard C++ library is still valid in the context of HPX.
  • It provides a competitive, high performance implementation of modern, future-proof ideas which gives an smooth migration path from todays mainstream techniques
  • There is no need for the programmer to worry about lower level parallelization paradigms like threads or message passing; no need to understand pthreads, MPI, OpenMP, or Windows threads, etc.
  • There is no need to think about different types of parallelism such as tasks, pipelines, or fork-join, task or data parallelism.
  • The same source of your program compiles and runs on Linux, BlueGene/Q, Mac OS X, Windows, and Android.
  • The same code runs on shared memory multi-core systems and supercomputers, on handheld devices and Intel® Xeon Phi™ accelerators, or a heterogeneous mix of those.
General Changes
  • A major API breaking change for this release was introduced by implementing hpx::future and hpx::shared_future fully in conformance with the C++11 Standard. While hpx::shared_future is new and will not create any compatibility problems, we revised the interface and implementation of the existing hpx::future. For more details please see the mailing list archive. To avoid any incompatibilities for existing code we named the type which implements the std::future interface as hpx::unique_future. For the next release this will be renamed to hpx::future, making it full conforming to C++11 Standard.
  • A large part of the code base of HPX has been refactored and partially re-implemented. The main changes were related to
    • The threading subsystem: these changes significantly reduce the amount of overheads caused by the schedulers, improve the modularity of the code base, and extend the variety of available scheduling algorithms.
    • The parcel subsystem: these changes improve the performance of the HPX networking layer, modularize the structure of the parcelports, and simplify the creation of new parcelports for other underlying networking libraries.
    • The API subsystem: these changes improved the conformance of the API to C++11 Standard, extend and unify the available API functionality, and decrease the overheads created by various elements of the API.
    • The robustness of the component loading subsystem has been improved significantly, allowing to more portably and more reliably register the components needed by an application as startup. This additionally speeds up general application initialization.
  • We added new API functionality like hpx::migrate and hpx::copy_component which are the basic building blocks necessary for implementing higher level abstractions for system-wide load balancing, runtime-adaptive resource management, and object-oriented checkpointing and state-management.
  • We removed the use of C++11 move emulation (using Boost.Move), replacing it with C++11 rvalue references. This is the first step towards using more and more native C++11 facilities which we plan to introduce in the future.
  • We improved the reference counting scheme used by HPX which helps managing distributed objects and memory. This improves the overall stability of HPX and further simplifies writing real world applications.
  • The minimal Boost version required to use HPX is now V1.49.0.
  • This release coincides with the first release of HPXPI (V0.1.0), the first implementation of the XPI specification.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • IS#1086 - Expose internal boost::shared_array to allow user management of array lifetime
  • IS#1083 - Make shell examples copyable in docs
  • IS#1080 - /threads{locality#*/total}/count/cumulative broken
  • IS#1079 - Build problems on OS X
  • IS#1078 - Improve robustness of component loading
  • IS#1077 - Fix a missing enum definition for 'take' mode
  • IS#1076 - Merge Jb master
  • IS#1075 - Unknown CMake command "add_hpx_pseudo_target"
  • IS#1074 - Implement apply_continue_callback and apply_colocated_callback
  • IS#1073 - The new apply_colocated and async_colocated functions lead to automatic registered functions
  • IS#1071 - Remove deferred_packaged_task
  • IS#1069 - serialize_buffer with allocator fails at destruction
  • IS#1068 - Coroutine include and forward declarations missing
  • IS#1067 - Add allocator support to util::serialize_buffer
  • IS#1066 - Allow for MPI_Init being called before HPX launches
  • IS#1065 - AGAS cache isn't used/populated on worker localities
  • IS#1064 - Reorder includes to ensure ws2 includes early
  • IS#1063 - Add hpx::runtime::suspend and hpx::runtime::resume
  • IS#1062 - Fix async_continue to propery handle return types
  • IS#1061 - Implement async_colocated and apply_colocated
  • IS#1060 - Implement minimal component migration
  • IS#1058 - Remove HPX_UTIL_TUPLE from code base
  • IS#1057 - Add performance counters for threading subsystem
  • IS#1055 - Thread allocation uses two memory pools
  • IS#1053 - Work stealing flawed
  • IS#1052 - Fix a number of warnings
  • IS#1049 - Fixes for TLS on OSX and more reliable test running
  • IS#1048 - Fixing after 588 hang
  • IS#1047 - Use port '0' for networking when using one locality
  • IS#1046 - composable_guard test is broken when having more than one thread
  • IS#1045 - Security missing headers
  • IS#1044 - Native TLS on FreeBSD via __thread
  • IS#1043 - async et.al. compute the wrong result type
  • IS#1042 - async et.al. implicitly unwrap reference_wrappers
  • IS#1041 - Remove redundant costly Kleene stars from regex searches
  • IS#1040 - CMake script regex match patterns has unnecessary kleenes
  • IS#1039 - Remove use of Boost.Move and replace with std::move and real rvalue refs
  • IS#1038 - Bump minimal required Boost to 1.49.0
  • IS#1037 - Implicit unwrapping of futures in async broken
  • IS#1036 - Scheduler hangs when user code attempts to "block" OS-threads
  • IS#1035 - Idle-rate counter always reports 100% idle rate
  • IS#1034 - Symbolic name registration causes application hangs
  • IS#1033 - Application options read in from an options file generate an error message
  • IS#1032 - hpx::id_type local reference counting is wrong
  • IS#1031 - Negative entry in reference count table
  • IS#1030 - Implement condition_variable
  • IS#1029 - Deadlock in thread scheduling subsystem
  • IS#1028 - HPX-thread cumulative count performance counters report incorrect value
  • IS#1027 - Expose hpx::thread_interrupted error code as a separate exception type
  • IS#1026 - Exceptions thrown in asynchronous calls can be lost if the value of the future is never queried
  • IS#1025 - future::wait_for/wait_until do not remove callback
  • IS#1024 - Remove dependence to boost assert and create hpx assert
  • IS#1023 - Segfaults with tcmalloc
  • IS#1022 - prerequisites link in readme is broken
  • IS#1020 - HPX Deadlock on external synchronization
  • IS#1019 - Convert using BOOST_ASSERT to HPX_ASSERT
  • IS#1018 - compiling bug with gcc 4.8.1
  • IS#1017 - Possible crash in io_pool executor
  • IS#1016 - Crash at startup
  • IS#1014 - Implement Increment/Decrement Merging
  • IS#1013 - Add more logging channels to enable greater control over logging granularity
  • IS#1012 - --hpx:debug-hpx-log and --hpx:debug-agas-log lead to non-thread safe writes
  • IS#1011 - After installation, running applications from the build/staging directory no longer works
  • IS#1010 - Mergable decrement requests are not being merged
  • IS#1009 - --hpx:list-symbolic-names crashes
  • IS#1007 - Components are not properly destroyed
  • IS#1006 - Segfault/hang in set_data
  • IS#1003 - Performance counter naming issue
  • IS#982 - Race condition during startup
  • IS#912 - OS X: component type not found in map
  • IS#663 - Create a buildbot slave based on Clang 3.2/OSX
  • IS#636 - Expose this_locality::apply<act>(p1, p2); for local execution
  • IS#197 - Add --console=address option for PBS runs
  • IS#175 - Asynchronous AGAS API

We have had over 1000 commits since the last release and we have closed over 180 tickets (bugs, feature requests, etc.).

General Changes
  • Ported HPX to BlueGene/Q
  • Improved HPX support for Xeon/Phi accelerators
  • Reimplemented hpx::bind, hpx::tuple, and hpx::function for better performance and better compliance with the C++11 Standard. Added hpx::mem_fn.
  • Reworked hpx::when_all and hpx::when_any for better compliance with the ongoing C++ standardization effort, added heterogeneous version for those functions. Added hpx::when_any_swapped.
  • Added hpx::copy as a precursor for a migrate functionality
  • Added hpx::get_ptr allowing to directly access the memory underlying a given component
  • Added the hpx::lcos::broadcast, hpx::lcos::reduce, and hpx::lcos::fold collective operations
  • Added hpx::get_locality_name allowing to retrieve the name of any of the localities for the application.
  • Added support for more flexible thread affinity control from the HPX command line, such as new modes for --hpx:bind (balanced, scattered, compact), improved default settings when running multiple localities on the same node.
  • Added experimental executors for simpler thread pooling and scheduling. This API may change in the future as it will stay aligned with the ongoing C++ standardization efforts.
  • Massively improved the performance of the HPX serialization code. Added partial support for zero copy serialization of array and bitwise-copyable types.
  • General performance improvements of the code related to threads and futures.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release.

  • IS#1005 - Allow to disable array optimizations and zero copy optimizations for each parcelport
  • IS#1004 - Generate new HPX logo image for the docs
  • IS#1002 - If MPI parcelport is not available, running HPX under mpirun should fail
  • IS#1001 - Zero copy serialization raises assert
  • IS#1000 - Can't connect to a HPX application running with the MPI parcelport from a non MPI parcelport locality
  • IS#999 - Optimize hpx::when_n
  • IS#998 - Fixed const-correctness
  • IS#997 - Making serialize_buffer::data() type save
  • IS#996 - Memory leak in hpx::lcos::promise
  • IS#995 - Race while registering pre-shutdown functions
  • IS#994 - thread_rescheduling regression test does not compile
  • IS#992 - Correct comments and messages
  • IS#991 - setcap cap_sys_rawio=ep for power profiling causes an HPX application to abort
  • IS#989 - Jacobi hangs during execution
  • IS#988 - multiple_init test is failing
  • IS#986 - Can't call a function called "init" from "main" when using <hpx/hpx_main.hpp>
  • IS#984 - Reference counting tests are failing
  • IS#983 - thread_suspension_executor test fails
  • IS#980 - Terminating HPX threads don't leave stack in virgin state
  • IS#979 - Static scheduler not in documents
  • IS#978 - Preprocessing limits are broken
  • IS#977 - Make tests.regressions.lcos.future_hang_on_get shorter
  • IS#976 - Wrong library order in pkgconfig
  • IS#975 - Please reopen #963
  • IS#974 - Option pu-offset ignored in fixing_588 branch
  • IS#972 - Cannot use MKL with HPX
  • IS#969 - Non-existent INI files requested on the command line via --hpx:config do not cause warnings or errors.
  • IS#968 - Cannot build examples in fixing_588 branch
  • IS#967 - Command line description of --hpx:queuing seems wrong
  • IS#966 - --hpx:print-bind physical core numbers are wrong
  • IS#965 - Deadlock when building in Release mode
  • IS#963 - Not all worker threads are working
  • IS#962 - Problem with SLURM integration
  • IS#961 - --hpx:print-bind outputs incorrect information
  • IS#960 - Fix cut and paste error in documentation of get_thread_priority
  • IS#959 - Change link to boost.atomic in documentation to point to boost.org
  • IS#958 - Undefined reference to intrusive_ptr_release
  • IS#957 - Make tuple standard compliant
  • IS#956 - Segfault with a3382fb
  • IS#955 - --hpx:nodes and --hpx:nodefiles do not work with foreign nodes
  • IS#954 - Make order of arguments for hpx::async and hpx::broadcast consistent
  • IS#953 - Cannot use MKL with HPX
  • IS#952 - register_[pre_]shutdown_function never throw
  • IS#951 - Assert when number of threads is greater than hardware concurrency
  • IS#948 - HPX_HAVE_GENERIC_CONTEXT_COROUTINES conflicts with HPX_HAVE_FIBER_BASED_COROUTINES
  • IS#947 - Need MPI_THREAD_MULTIPLE for backward compatibility
  • IS#946 - HPX does not call MPI_Finalize
  • IS#945 - Segfault with hpx::lcos::broadcast
  • IS#944 - OS X: assertion 'pu_offset_ < hardware_concurrency' failed
  • IS#943 - #include <hpx/hpx_main.hpp> does not work
  • IS#942 - Make the BG/Q work with -O3
  • IS#940 - Use separator when concatenating locality name
  • IS#939 - Refactor MPI parcelport to use MPI_Wait instead of multiple MPI_Test calls
  • IS#938 - Want to officially access client_base::gid_
  • IS#937 - client_base::gid_ should be private
  • IS#936 - Want doxygen-like source code index
  • IS#935 - Build error with gcc 4.6 and Boost 1.54.0 on hpx trunk and 0.9.6
  • IS#933 - Cannot build HPX with Boost 1.54.0
  • IS#932 - Components are destructed too early
  • IS#931 - Make HPX work on BG/Q
  • IS#930 - make git-docs is broken
  • IS#929 - Generating index in docs broken
  • IS#928 - Optimize hpx::util::static_ for C++11 compilers supporting magic statics
  • IS#924 - Make kill_process_tree (in process.py) more robust on Mac OSX
  • IS#923 - Correct BLAS and RNPL cmake tests
  • IS#922 - Cannot link against BLAS
  • IS#921 - Implement hpx::mem_fn
  • IS#920 - Output locality with --hpx:print-bind
  • IS#919 - Correct grammar; simplify boolean expressions
  • IS#918 - Link to hello_world.cpp is broken
  • IS#917 - adapt cmake file to new boostbook version
  • IS#916 - fix problem building documentation with xsltproc >= 1.1.27
  • IS#915 - Add another TBBMalloc library search path
  • IS#914 - Build problem with Intel compiler on Stampede (TACC)
  • IS#913 - fix error messages in fibonacci examples
  • IS#911 - Update OS X build instructions
  • IS#910 - Want like to specify MPI_ROOT instead of compiler wrapper script
  • IS#909 - Warning about void* arithmetic
  • IS#908 - Buildbot for MIC is broken
  • IS#906 - Can't use --hpx:bind=balanced with multiple MPI processes
  • IS#905 - --hpx:bind documentation should describe full grammar
  • IS#904 - Add hpx::lcos::fold and hpx::lcos::inverse_fold collective operation
  • IS#903 - Add hpx::when_any_swapped()
  • IS#902 - Add hpx::lcos::reduce collective operation
  • IS#901 - Web documentation is not searchable
  • IS#900 - Web documentation for trunk has no index
  • IS#898 - Some tests fail with GCC 4.8.1 and MPI parcel port
  • IS#897 - HWLOC causes failures on Mac
  • IS#896 - pu-offset leads to startup error
  • IS#895 - hpx::get_locality_name not defined
  • IS#894 - Race condition at shutdown
  • IS#893 - --hpx:print-bind switches std::cout to hexadecimal mode
  • IS#892 - hwloc_topology_load can be expensive -- don't call multiple times
  • IS#891 - The documentation for get_locality_name is wrong
  • IS#890 - --hpx:print-bind should not exit
  • IS#889 - --hpx:debug-hpx-log=FILE does not work
  • IS#888 - MPI parcelport does not exit cleanly for --hpx:print-bind
  • IS#887 - Choose thread affinities more cleverly
  • IS#886 - Logging documentation is confusing
  • IS#885 - Two threads are slower than one
  • IS#884 - is_callable failing with member pointers in C++11
  • IS#883 - Need help with is_callable_test
  • IS#882 - tests.regressions.lcos.future_hang_on_get does not terminate
  • IS#881 - tests/regressions/block_matrix/matrix.hh won't compile with GCC 4.8.1
  • IS#880 - HPX does not work on OS X
  • IS#878 - future::unwrap triggers assertion
  • IS#877 - "make tests" has build errors on Ubuntu 12.10
  • IS#876 - tcmalloc is used by default, even if it is not present
  • IS#875 - global_fixture is defined in a header file
  • IS#874 - Some tests take very long
  • IS#873 - Add block-matrix code as regression test
  • IS#872 - HPX documentation does not say how to run tests with detailed output
  • IS#871 - All tests fail with "make test"
  • IS#870 - Please explicitly disable serialization in classes that don't support it
  • IS#868 - boost_any test failing
  • IS#867 - Reduce the number of copies of hpx::function arguments
  • IS#863 - Futures should not require a default constructor
  • IS#862 - value_or_error shall not default construct its result
  • IS#861 - HPX_UNUSED macro
  • IS#860 - Add functionality to copy construct a component
  • IS#859 - hpx::endl should flush
  • IS#858 - Create hpx::get_ptr<> allowing to access component implementation
  • IS#855 - Implement hpx::INVOKE
  • IS#854 - hpx/hpx.hpp does not include hpx/include/iostreams.hpp
  • IS#853 - Feature request: null future
  • IS#852 - Feature request: Locality names
  • IS#851 - hpx::cout output does not appear on screen
  • IS#849 - All tests fail on OS X after installing
  • IS#848 - Update OS X build instructions
  • IS#846 - Update hpx_external_example
  • IS#845 - Issues with having both debug and release modules in the same directory
  • IS#844 - Create configuration header
  • IS#843 - Tests should use CTest
  • IS#842 - Remove buffer_pool from MPI parcelport
  • IS#841 - Add possibility to broadcast an index with hpx::lcos::broadcast
  • IS#838 - Simplify util::tuple
  • IS#837 - Adopt boost::tuple tests for util::tuple
  • IS#836 - Adopt boost::function tests for util::function
  • IS#835 - Tuple interface missing pieces
  • IS#833 - Partially preprocessing files not working
  • IS#832 - Native papi counters do not work with wild cards
  • IS#831 - Arithmetics counter fails if only one parameter is given
  • IS#830 - Convert hpx::util::function to use new scheme for serializing its base pointer
  • IS#829 - Consistently use decay<T> instead of remove_const< remove_reference<T>>
  • IS#828 - Update future implementation to N3721 and N3722
  • IS#827 - Enable MPI parcelport for bootstrapping whenever application was started using mpirun
  • IS#826 - Support command line option --hpx:print-bind even if --hpx::bind was not used
  • IS#825 - Memory counters give segfault when attempting to use thread wild cards or numbers only total works
  • IS#824 - Enable lambda functions to be used with hpx::async/hpx::apply
  • IS#823 - Using a hashing filter
  • IS#822 - Silence unused variable warning
  • IS#821 - Detect if a function object is callable with given arguments
  • IS#820 - Allow wildcards to be used for performance counter names
  • IS#819 - Make the AGAS symbolic name registry distributed
  • IS#818 - Add future::then() overload taking an executor
  • IS#817 - Fixed typo
  • IS#815 - Create an lco that is performing an efficient broadcast of actions
  • IS#814 - Papi counters cannot specify thread#* to get the counts for all threads
  • IS#813 - Scoped unlock
  • IS#811 - simple_central_tuplespace_client run error
  • IS#810 - ostream error when << any objects
  • IS#809 - Optimize parcel serialization
  • IS#808 - HPX applications throw exception when executed from the build directory
  • IS#807 - Create performance counters exposing overall AGAS statistics
  • IS#795 - Create timed make_ready_future
  • IS#794 - Create heterogeneous when_all/when_any/etc.
  • IS#721 - Make HPX usable for Xeon Phi
  • IS#694 - CMake should complain if you attempt to build an example without its dependencies
  • IS#692 - SLURM support broken
  • IS#683 - python/hpx/process.py imports epoll on all platforms
  • IS#619 - Automate the doc building process
  • IS#600 - GTC performance broken
  • IS#577 - Allow for zero copy serialization/networking
  • IS#551 - Change executable names to have debug postfix in Debug builds
  • IS#544 - Write a custom .lib file on Windows pulling in hpx_init and hpx.dll, phase out hpx_init
  • IS#534 - hpx::init should take functions by std::function and should accept all forms of hpx_main
  • IS#508 - FindPackage fails to set FOO_LIBRARY_DIR
  • IS#506 - Add cmake support to generate ini files for external applications
  • IS#470 - Changing build-type after configure does not update boost library names
  • IS#453 - Document hpx_run_tests.py
  • IS#445 - Significant performance mismatch between MPI and HPX in SMP for allgather example
  • IS#443 - Make docs viewable from build directory
  • IS#421 - Support multiple HPX instances per node in a batch environment like PBS or SLURM
  • IS#316 - Add message size limitation
  • IS#249 - Clean up locking code in big boot barrier
  • IS#136 - Persistent CMake variables need to be marked as cache variables

We have had over 1200 commits since the last release and we have closed roughly 140 tickets (bugs, feature requests, etc.).

General Changes

The major new fetures in this release are:

  • We further consolidated the API exposed by HPX. We aligned our APIs as much as possible with the existing C++11 Standard and related proposals to the C++ standardization committee (such as N3632 and N3857).
  • We implemented a first version of a distributed AGAS service which essentially eliminates all explicit AGAS network traffic.
  • We created a native ibverbs parcelport allowing to take advantage of the superior latency and bandwidth characteristics of Infiniband networks.
  • We successfully ported HPX to the Xeon Phi platform.
  • Support for the SLURM scheduling system was implemented.
  • Major efforts have been dedicated to improving the performance counter framework, numerous new counters were implemented and new APIs were added.
  • We added a modular parcel compression system allowing to improve bandwidth utilization (by reducing the overall size of the tranferred data).
  • We added a modular parcel coalescing system allowing to combine several parcels into larger messages. This reduces latencies introduced by the communication layer.
  • Added an experimental executors API allowing to use different scheduling policies for different parts of the code. This API has been modelled after the Standards proposal N3562. This API is bound to change in the future, though.
  • Added minimal security support for localities which is enforced on the parcelport level. This support is preliminary and experimental and might change in the future.
  • We created a parcelport using low level MPI functions. This is in support of legacy applications which are to be gradually ported and to support platforms where MPI is the only available portable networking layer.
  • We added a preliminary and experimental implementation of a tuple-space object which exposes an interface similar to such systems described in the literature (see for instance The Linda Coordination Language).
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release. This is again a very long list of newly implemented features and fixed issues.

  • IS#806 - make (all) in examples folder does nothing
  • IS#805 - Adding the introduction and fixing DOCBOOK dependencies for Windows use
  • IS#804 - Add stackless (non-suspendable) thread type
  • IS#803 - Create proper serialization support functions for util::tuple
  • IS#800 - Add possibility to disable array optimizations during serialization
  • IS#798 - HPX_LIMIT does not work for local dataflow
  • IS#797 - Create a parcelport which uses MPI
  • IS#796 - Problem with Large Numbers of Threads
  • IS#793 - Changing dataflow test case to hang consistently
  • IS#792 - CMake Error
  • IS#791 - Problems with local::dataflow
  • IS#790 - wait_for() doesn't compile
  • IS#789 - HPX with Intel compiler segfaults
  • IS#788 - Intel compiler support
  • IS#787 - Fixed SFINAEd specializations
  • IS#786 - Memory issues during benchmarking.
  • IS#785 - Create an API allowing to register external threads with HPX
  • IS#784 - util::plugin is throwing an error when a symbol is not found
  • IS#783 - How does hpx:bind work?
  • IS#782 - Added quotes around STRING REPLACE potentially empty arguments
  • IS#781 - Make sure no exceptions propagate into the thread manager
  • IS#780 - Allow arithmetics performance counters to expand its parameters
  • IS#779 - Test case for 778
  • IS#778 - Swapping futures segfaults
  • IS#777 - hpx::lcos::details::when_xxx don't restore completion handlers
  • IS#776 - Compiler chokes on dataflow overload with launch policy
  • IS#775 - Runtime error with local dataflow (copying futures?)
  • IS#774 - Using local dataflow without explicit namespace
  • IS#773 - Local dataflow with unwrap: functor operators need to be const
  • IS#772 - Allow (remote) actions to return a future
  • IS#771 - Setting HPX_LIMIT gives huge boost MPL errors
  • IS#770 - Add launch policy to (local) dataflow
  • IS#769 - Make compile time configuration information available
  • IS#768 - Const correctness problem in local dataflow
  • IS#767 - Add launch policies to async
  • IS#766 - Mark data structures for optimized (array based) serialization
  • IS#765 - Align hpx::any with N3508: Any Library Proposal (Revision 2)
  • IS#764 - Align hpx::future with newest N3558: A Standardized Representation of Asynchronous Operations
  • IS#762 - added a human readable output for the ping pong example
  • IS#761 - Ambiguous typename when constructing derived component
  • IS#760 - Simple components can not be derived
  • IS#759 - make install doesn't give a complete install
  • IS#758 - Stack overflow when using locking_hook<>
  • IS#757 - copy paste error; unsupported function overloading
  • IS#756 - GTCX runtime issue in Gordon
  • IS#755 - Papi counters don't work with reset and evaluate API's
  • IS#753 - cmake bugfix and improved component action docs
  • IS#752 - hpx simple component docs
  • IS#750 - Add hpx::util::any
  • IS#749 - Thread phase counter is not reset
  • IS#748 - Memory performance counter are not registered
  • IS#747 - Create performance counters exposing arithmetic operations
  • IS#745 - apply_callback needs to invoke callback when applied locally
  • IS#744 - CMake fixes
  • IS#743 - Problem Building github version of HPX
  • IS#742 - Remove HPX_STD_BIND
  • IS#741 - assertion 'px != 0' failed: HPX(assertion_failure) for low numbers of OS threads
  • IS#739 - Performance counters do not count to the end of the program or evalution
  • IS#738 - Dedicated AGAS server runs don't work; console ignores -a option.
  • IS#737 - Missing bind overloads
  • IS#736 - Performance counter wildcards do not always work
  • IS#735 - Create native ibverbs parcelport based on rdma operations
  • IS#734 - Threads stolen performance counter total is incorrect
  • IS#733 - Test benchmarks need to be checked and fixed
  • IS#732 - Build fails with Mac, using mac ports clang-3.3 on latest git branch
  • IS#731 - Add global start/stop API for performance counters
  • IS#730 - Performance counter values are apparently incorrect
  • IS#729 - Unhandled switch
  • IS#728 - Serialization of hpx::util::function between two localities causes seg faults
  • IS#727 - Memory counters on Mac OS X
  • IS#725 - Restore original thread priority on resume
  • IS#724 - Performance benchmarks do not depend on main HPX libraries
  • IS#723 - --hpx:nodes=cat $PBS_NODEFILE works; --hpx:nodefile=$PBS_NODEFILE does not.
  • IS#722 - Fix binding const member functions as actions
  • IS#719 - Create performance counter exposing compression ratio
  • IS#718 - Add possibility to compress parcel data
  • IS#717 - strip_credit_from_gid has misleading semantics
  • IS#716 - Non-option arguments to programs run using pbsdsh must be before --hpx:nodes, contrary to directions
  • IS#715 - Re-thrown exceptions should retain the original call site
  • IS#714 - failed assertion in debug mode
  • IS#713 - Add performance counters monitoring connection caches
  • IS#712 - Adjust parcel related performance counters to be connection type specific
  • IS#711 - configuration failure
  • IS#710 - Error "timed out while trying to find room in the connection cache" when trying to start multiple localities on a single computer
  • IS#709 - Add new thread state 'staged' referring to task descriptions
  • IS#708 - Detect/mitigate bad non-system installs of GCC on Redhat systems
  • IS#707 - Many examples do not link with Git HEAD version
  • IS#706 - hpx::init removes portions of non-option command line arguments before last = sign
  • IS#705 - Create rolling average and median aggregating performance counters
  • IS#704 - Create performance counter to expose thread queue waiting time
  • IS#703 - Add support to HPX build system to find librcrtool.a and related headers
  • IS#699 - Generalize instrumentation support
  • IS#698 - compilation failure with hwloc absent
  • IS#697 - Performance counter counts should be zero indexed
  • IS#696 - Distributed problem
  • IS#695 - Bad perf counter time printed
  • IS#693 - --help doesn't print component specific command line options
  • IS#692 - SLURM support broken
  • IS#691 - exception while executing any application linked with hwloc
  • IS#690 - thread_id_test and thread_launcher_test failing
  • IS#689 - Make the buildbots use hwloc
  • IS#687 - compilation error fix (hwloc_topology)
  • IS#686 - Linker Error for Applications
  • IS#684 - Pinning of service thread fails when number of worker threads equals the number of cores
  • IS#682 - Add performance counters exposing number of stolen threads
  • IS#681 - Add apply_continue for asynchronous chaining of actions
  • IS#679 - Remove obsolete async_callback API functions
  • IS#678 - Add new API for setting/triggering LCOs
  • IS#677 - Add async_continue for true continuation style actions
  • IS#676 - Buildbot for gcc 4.4 broken
  • IS#675 - Partial preprocessing broken
  • IS#674 - HPX segfaults when built with gcc 4.7
  • IS#673 - use_guard_pages has inconsistent preprocessor guards
  • IS#672 - External build breaks if library path has spaces
  • IS#671 - release tarballs are tarbombs
  • IS#670 - CMake won't find Boost headers in layout=versioned install
  • IS#669 - Links in docs to source files broken if not installed
  • IS#667 - Not reading ini file properly
  • IS#664 - Adapt new meanings of 'const' and 'mutable'
  • IS#661 - Implement BTL Parcel port
  • IS#655 - Make HPX work with the "decltype" result_of
  • IS#647 - documentation for specifying the number of high priority threads --hpx:high-priority-threads
  • IS#643 - Error parsing host file
  • IS#642 - HWLoc issue with TAU
  • IS#639 - Logging potentially suspends a running thread
  • IS#634 - Improve error reporting from parcel layer
  • IS#627 - Add tests for async and apply overloads that accept regular C++ functions
  • IS#626 - hpx/future.hpp header
  • IS#601 - Intel support
  • IS#557 - Remove action codes
  • IS#531 - AGAS request and response classes should use switch statements
  • IS#529 - Investigate the state of hwloc support
  • IS#526 - Make HPX aware of hyper-threading
  • IS#518 - Create facilities allowing to use plain arrays as action arguments
  • IS#473 - hwloc thread binding is broken on CPUs with hyperthreading
  • IS#383 - Change result type detection for hpx::util::bind to use result_of protocol
  • IS#341 - Consolidate route code
  • IS#219 - Only copy arguments into actions once
  • IS#177 - Implement distributed AGAS
  • IS#43 - Support for Darwin (Xcode + Clang)

We have had over 1000 commits since the last release and we have closed roughly 150 tickets (bugs, feature requests, etc.).

General Changes

This release is continuing along the lines of code and API consolidation, and overall usability inprovements. We dedicated much attention to performance and we were able to significantly improve the threading and networking subsystems.

We successfully ported HPX to the Android platform. HPX applications now not only can run on mobile devices, but we support heterogeneous applications running across architecture boundaries. At the Supercomputing Conference 2012 we demonstrated connecting Android tablets to simulations running on a Linux cluster. The Android tablet was used to query performance counters from the Linux simulation and to steer its parameters.

We successfully ported HPX to Mac OSX (using the Clang compiler). Thanks to Pyry Jahkola for contributing the corresponding patches. Please see the section How to Install HPX on Mac OS for more details.

We made a special effort to make HPX usable in highly concurrent use cases. Many of the HPX API functions which possibly take longer than 100 microseconds to execute now can be invoked asynchronously. We added uniform support for composing futures which simplifies to write asynchronous code. HPX actions (function objects encapsulating possibly concurrent remote function invocations) are now well integrated with all other API facilities such like hpx::bind.

All of the API has been aligned as much as possible with established paradigms. HPX now mirrors many of the facilities as defined in the C++11 Standard, such as hpx::thread, hpx::function, hpx::future, etc.

A lot of work has been put into improving the documentation. Many of the API functions are documented now, concepts are explained in detail, and examples are better described than before. The new documentation index enables finding information with lesser effort.

This is the first release of HPX we perform after the move to Github. This step has enabled a wider participation from the community and further encourages us in our decision to release HPX as a true open source library (HPX is licensed under the very liberal Boost Software License).

Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release. This is by far the longest list of newly implemented features and fixed issues for any of HPX' releases so far.

  • IS#666 - Segfault on calling hpx::finalize twice
  • IS#665 - Adding declaration num_of_cores
  • IS#662 - pkgconfig is building wrong
  • IS#660 - Need uninterrupt function
  • IS#659 - Move our logging library into a different namespace
  • IS#658 - Dynamic performance counter types are broken
  • IS#657 - HPX v0.9.5 (RC1) hello_world example segfaulting
  • IS#656 - Define the affinity of parcel-pool, io-pool, and timer-pool threads
  • IS#654 - Integrate the Boost auto_index tool with documentation
  • IS#653 - Make HPX build on OS X + Clang + libc++
  • IS#651 - Add fine-grained control for thread pinning
  • IS#650 - Command line no error message when using -hpx:(anything)
  • IS#645 - Command line aliases don't work in @file
  • IS#644 - Terminated threads are not always properly cleaned up
  • IS#640 - future_data<T>::set_on_completed_ used without locks
  • IS#638 - hpx build with intel compilers fails on linux
  • IS#637 - --copy-dt-needed-entries breaks with gold
  • IS#635 - Boost V1.53 will add Boost.Lockfree and Boost.Atomic
  • IS#633 - Re-add examples to final 0.9.5 release
  • IS#632 - Example thread_aware_timer is broken
  • IS#631 - FFT application throws error in parcellayer
  • IS#630 - Event synchronization example is broken
  • IS#629 - Waiting on futures hangs
  • IS#628 - Add an HPX_ALWAYS_ASSERT macro
  • IS#625 - Port coroutines context switch benchmark
  • IS#621 - New INI section for stack sizes
  • IS#618 - pkg_config support does not work with a HPX debug build
  • IS#617 - hpx/external/logging/boost/logging/detail/cache_before_init.hpp:139:67: error: 'get_thread_id' was not declared in this scope
  • IS#616 - Change wait_xxx not to use locking
  • IS#615 - Revert visibility 'fix' (fb0b6b8245dad1127b0c25ebafd9386b3945cca9)
  • IS#614 - Fix Dataflow linker error
  • IS#613 - find_here should throw an exception on failure
  • IS#612 - Thread phase doesn't show up in debug mode
  • IS#611 - Make stack guard pages configurable at runtime (initialization time)
  • IS#610 - Co-Locate Components
  • IS#609 - future_overhead
  • IS#608 - --hpx:list-counter-infos problem
  • IS#607 - Update Boost.Context based backend for coroutines
  • IS#606 - 1d_wave_equation is not working
  • IS#605 - Any C++ function that has serializable arguments and a serializable return type should be remotable
  • IS#604 - Connecting localities isn't working anymore
  • IS#603 - Do not verify any ini entries read from a file
  • IS#602 - Rename argument_size to type_size/ added implementation to get parcel size
  • IS#599 - Enable locality specific command line options
  • IS#598 - Need an API that accesses the performance counter reporting the system uptime
  • IS#597 - compiling on ranger
  • IS#595 - I need a place to store data in a thread self pointer
  • IS#594 - 32/64 interoperability
  • IS#593 - Warn if logging is disabled at compile time but requested at runtime
  • IS#592 - Add optional argument value to --hpx:list-counters and --hpx:list-counter-infos
  • IS#591 - Allow for wildcards in performance counter names specified with --hpx:print-counter
  • IS#590 - Local promise semantic differences
  • IS#589 - Create API to query performance counter names
  • IS#587 - Add get_num_localities and get_num_threads to AGAS API
  • IS#586 - Adjust local AGAS cache size based on number of localities
  • IS#585 - Error while using counters in HPX
  • IS#584 - counting argument size of actions, initial pass.
  • IS#581 - Remove RemoteResult template parameter for future<>
  • IS#580 - Add possibility to hook into actions
  • IS#578 - Use angle brackets in HPX error dumps
  • IS#576 - Exception incorrectly thrown when --help is used
  • IS#575 - HPX(bad_component_type) with gcc 4.7.2 and boost 1.51
  • IS#574 - --hpx:connect command line parameter not working correctly
  • IS#571 - hpx::wait() (callback version) should pass the future to the callback function
  • IS#570 - hpx::wait should operate on boost::arrays and std::lists
  • IS#569 - Add a logging sink for Android
  • IS#568 - 2-argument version of HPX_DEFINE_COMPONENT_ACTION
  • IS#567 - Connecting to a running HPX application works only once
  • IS#565 - HPX doesn't shutdown properly
  • IS#564 - Partial preprocessing of new component creation interface
  • IS#563 - Add hpx::start/hpx::stop to avoid blocking main thread
  • IS#562 - All command line arguments swallowed by hpx
  • IS#561 - Boost.Tuple is not move aware
  • IS#558 - boost::shared_ptr<> style semantics/syntax for client classes
  • IS#556 - Creation of partially preprocessed headers should be enabled for Boost newer than V1.50
  • IS#555 - BOOST_FORCEINLINE does not name a type
  • IS#554 - Possible race condition in thread get_id()
  • IS#552 - Move enable client_base
  • IS#550 - Add stack size category 'huge'
  • IS#549 - ShenEOS run seg-faults on single or distributed runs
  • IS#545 - AUTOGLOB broken for add_hpx_component
  • IS#542 - FindHPX_HDF5 still searches multiple times
  • IS#541 - Quotes around application name in hpx::init
  • IS#539 - Race conditition occuring with new lightweight threads
  • IS#535 - hpx_run_tests.py exits with no error code when tests are missing
  • IS#530 - Thread description(<unknown>) in logs
  • IS#523 - Make thread objects more lightweight
  • IS#521 - hpx::error_code is not usable for lightweight error handling
  • IS#520 - Add full user environment to HPX logs
  • IS#519 - Build succeeds, running fails
  • IS#517 - Add a guard page to linux coroutine stacks
  • IS#516 - hpx::thread::detach suspends while holding locks, leads to hang in debug
  • IS#514 - Preprocessed headers for <hpx/apply.hpp> don't compile
  • IS#513 - Buildbot configuration problem
  • IS#512 - Implement action based stack size customization
  • IS#511 - Move action priority into a separate type trait
  • IS#510 - trunk broken
  • IS#507 - no matching function for call to boost::scoped_ptr<hpx::threads::topology>::scoped_ptr(hpx::threads::linux_topology*)
  • IS#505 - undefined_symbol regression test currently failing
  • IS#502 - Adding OpenCL and OCLM support to HPX for Windows and Linux
  • IS#501 - find_package(HPX) sets cmake output variables
  • IS#500 - wait_any/wait_all are badly named
  • IS#499 - Add support for disabling pbs support in pbs runs
  • IS#498 - Error during no-cache runs
  • IS#496 - Add partial preprocessing support to cmake
  • IS#495 - Support HPX modules exporting startup/shutdown functions only
  • IS#494 - Allow modules to specify when to run startup/shutdown functions
  • IS#493 - Avoid constructing a string in make_success_code
  • IS#492 - Performance counter creation is no longer synchronized at startup
  • IS#491 - Performance counter creation is no longer synchronized at startup
  • IS#490 - Sheneos on_completed_bulk seg fault in distributed
  • IS#489 - compiling issue with g++44
  • IS#488 - Adding OpenCL and OCLM support to HPX for the MSVC platform
  • IS#487 - FindHPX.cmake problems
  • IS#485 - Change distributing_factory and binpacking_factory to use bulk creation
  • IS#484 - Change HPX_DONT_USE_PREPROCESSED_FILES to HPX_USE_PREPROCESSED_FILES
  • IS#483 - Memory counter for Windows
  • IS#479 - strange errors appear when requesting performance counters on multiple nodes
  • IS#477 - Create (global) timer for multi-threaded measurements
  • IS#472 - Add partial preprocessing using Wave
  • IS#471 - Segfault stack traces don't show up in release
  • IS#468 - External projects need to link with internal components
  • IS#462 - Startup/shutdown functions are called more than once
  • IS#458 - Consolidate hpx::util::high_resolution_timer and hpx::util::high_resolution_clock
  • IS#457 - index out of bounds in allgather_and_gate on 4 cores or more
  • IS#448 - Make HPX compile with clang
  • IS#447 - 'make tests' should execute tests on local installation
  • IS#446 - Remove SVN-related code from the codebase
  • IS#444 - race condition in smp
  • IS#441 - Patched Boost.Serialization headers should only be installed if needed
  • IS#439 - Components using HPX_REGISTER_STARTUP_MODULE fail to compile with MSVC
  • IS#436 - Verify that no locks are being held while threads are suspended
  • IS#435 - Installing HPX should not clobber existing Boost installation
  • IS#434 - Logging external component failed (Boost 1.50)
  • IS#433 - Runtime crash when building all examples
  • IS#432 - Dataflow hangs on 512 cores/64 nodes
  • IS#430 - Problem with distributing factory
  • IS#424 - File paths referring to XSL-files need to be properly escaped
  • IS#417 - Make dataflow LCOs work out of the box by using partial preprocessing
  • IS#413 - hpx_svnversion.py fails on Windows
  • IS#412 - Make hpx::error_code equivalent to hpx::exception
  • IS#398 - HPX clobbers out-of-tree application specific CMake variables (specifically CMAKE_BUILD_TYPE)
  • IS#394 - Remove code generating random port numbers for network
  • IS#378 - ShenEOS scaling issues
  • IS#354 - Create a coroutines wrapper for Boost.Context
  • IS#349 - Commandline option --localities=N/-lN should be necessary only on AGAS locality
  • IS#334 - Add auto_index support to cmake based documentation toolchain
  • IS#318 - Network benchmarks
  • IS#317 - Implement network performance counters
  • IS#310 - Duplicate logging entries
  • IS#230 - Add compile time option to disable thread debugging info
  • IS#171 - Add an INI option to turn off deadlock detection independently of logging
  • IS#170 - OSHL internal counters are incorrect
  • IS#103 - Better diagnostics for multiple component/action registerations under the same name
  • IS#48 - Support for Darwin (Xcode + Clang)
  • IS#21 - Build fails with GCC 4.6

We have had roughly 800 commits since the last release and we have closed approximately 80 tickets (bugs, feature requests, etc.).

General Changes
  • Significant improvements made to the usability of HPX in large-scale, distributed environments.
  • Renamed hpx::lcos::packaged_task<> to hpx::lcos::packaged_action<> to reflect the semantic differences to a packaged_task as defined by the C++11 Standard.
  • HPX now exposes hpx::thread which is compliant to the C++11 std::thread type except that it (purely locally) represents an HPX thread. This new type does not expose any of the remote capabilities of the underlying HPX-thread implementation.
  • The type hpx::lcos::future<> is now compliant to the C++11 std::future<> type. This type can be used to synchronize both, local and remote operations. In both cases the control flow will 'return' to the future in order to trigger any continuation.
  • The types hpx::lcos::local::promise<> and hpx::lcos::local::packaged_task<> are now compliant to the C++11 std::promise<> and std::packaged_task<> types. These can be used to create a future representing local work only. Use the types hpx::lcos::promise<> and hpx::lcos::packaged_action<> to wrap any (possibly remote) action into a future.
  • hpx::thread and hpx::lcos::future<> are now cancelable.
  • Added support for sequential and logic composition of hpx::lcos::future<>'s. The member function hpx::lcos::future::when() permits futures to be sequentially composed. The helper functions hpx::wait_all, hpx::wait_any, and hpx::wait_n can be used to wait for more than one future at a time.
  • HPX now exposes hpx::apply() and hpx::async() as the preferred way of creating (or invoking) any deferred work. These functions are usable with various types of functions, function objects, and actions and provide a uniform way to spawn deferred tasks.
  • HPX now utilizes hpx::util::bind to (partially) bind local functions and function objects, and also actions. Remote bound actions can have placeholders as well.
  • HPX continuations are now fully polymorphic. The class hpx::actions::forwarding_continuation is an example of how the user can write is own types of continuations. It can be used to execute any function as an continuation of a particular action.
  • Reworked the action invocation API to be fully conformant to normal functions. Actions can now be invoked using hpx::apply(), hpx::async(), or using the operator() implemented on actions. Actions themselves can now be cheaply instantiated as they do not have any members anymore.
  • Reworked the lazy action invocation API. Actions can now be directly bound using hpx::util::bind() by passing an action instance as the first argument.
  • A minimal HPX program now looks like this:

    #include <hpx/hpx_init.hpp>
    
    int hpx_main()
    {
        return hpx::finalize();
    }
    
    int main()
    {
        return hpx::init();
    }
    

    This removes the immediate dependency on the Boost.Program Options library.

[Note]Note

This minimal version of an HPX program does not support any of the default command line arguments (such as --help, or command line options related to PBS). It is suggested to always pass argc and argv to HPX as shown in the example below.

  • In order to support those, but still not to depend on Boost.Program Options, the minimal program can be written as:

    #include <hpx/hpx_init.hpp>
    
    // The arguments for hpx_main can be left off, which very similar to the
    // behavior of `main()` as defined by C++.
    int hpx_main(int argc, char* argv[])
    {
        return hpx::finalize();
    }
    
    int main(int argc, char* argv[])
    {
        return hpx::init(argc, argv);
    }
    
  • Added performance counters exposing the number of component instances which are alive on a given locality.
  • Added performance counters exposing then number of messages sent and received, the number of parcels sent and received, the number of bytes sent and received, the overall time required to send and receive data, and the overall time required to serialize and deserialize the data.
  • Added a new component: hpx::components::binpacking_factory which is equivalent to the existing hpx::components::distributing_factory component, except that it equalizes the overall population of the components to create. It exposes two factory methods, one based on the number of existing instances of the component type to create, and one based on an arbitrary performance counter which will be queried for all relevant localities.
  • Added API functions allowing to access elements of the diagnostic information embedded in the given exception: hpx::get_locality_id(), hpx::get_host_name(), hpx::get_process_id(), hpx::get_function_name(), hpx::get_file_name(), hpx::get_line_number(), hpx::get_os_thread(), hpx::get_thread_id(), and hpx::get_thread_description().
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release:

  • IS#71 - GIDs that are not serialized via handle_gid<> should raise an exception
  • IS#105 - Allow for hpx::util::functions to be registered in the AGAS symbolic namespace
  • IS#107 - Nasty threadmanger race condition (reproducible in sheneos_test)
  • IS#108 - Add millisecond resolution to HPX logs on Linux
  • IS#110 - Shutdown hang in distributed with release build
  • IS#116 - Don't use TSS for the applier and runtime pointers
  • IS#162 - Move local synchronous execution shortcut from hpx::function to the applier
  • IS#172 - Cache sources in CMake and check if they change manually
  • IS#178 - Add an INI option to turn off ranged-based AGAS caching
  • IS#187 - Support for disabling performance counter deployment
  • IS#202 - Support for sending performance counter data to a specific file
  • IS#218 - boost.coroutines allows different stack sizes, but stack pool is unaware of this
  • IS#231 - Implement movable boost::bind
  • IS#232 - Implement movable boost::function
  • IS#236 - Allow binding hpx::util::function to actions
  • IS#239 - Replace hpx::function with hpx::util::function
  • IS#240 - Can't specify RemoteResult with lcos::async
  • IS#242 - REGISTER_TEMPLATE support for plain actions
  • IS#243 - handle_gid<> support for hpx::util::function
  • IS#245 - *_c_cache code throws an exception if the queried GID is not in the local cache
  • IS#246 - Undefined references in dataflow/adaptive1d example
  • IS#252 - Problems configuring sheneos with CMake
  • IS#254 - Lifetime of components doesn't end when client goes out of scope
  • IS#259 - CMake does not detect that MSVC10 has lambdas
  • IS#260 - io_service_pool segfault
  • IS#261 - Late parcel executed outside of pxthread
  • IS#263 - Cannot select allocator with CMake
  • IS#264 - Fix allocator select
  • IS#267 - Runtime error for hello_world
  • IS#269 - pthread_affinity_np test fails to compile
  • IS#270 - Compiler noise due to -Wcast-qual
  • IS#275 - Problem with configuration tests/include paths on Gentoo
  • IS#325 - Sheneos is 200-400 times slower than the fortran equivalent
  • IS#331 - hpx::init() and hpx_main() should not depend on program_options
  • IS#333 - Add doxygen support to CMake for doc toolchain
  • IS#340 - Performance counters for parcels
  • IS#346 - Component loading error when running hello_world in distributed on MSVC2010
  • IS#362 - Missing initializer error
  • IS#363 - Parcel port serialization error
  • IS#366 - Parcel buffering leads to types incompatible exception
  • IS#368 - Scalable alternative to rand() needed for HPX
  • IS#369 - IB over IP is substantially slower than just using standard TCP/IP
  • IS#374 - hpx::lcos::wait should work with dataflows and arbitrary classes meeting the future interface
  • IS#375 - Conflicting/ambiguous overloads of hpx::lcos::wait
  • IS#376 - Find_HPX.cmake should set CMake variable HPX_FOUND for out of tree builds
  • IS#377 - ShenEOS interpolate bulk and interpolate_one_bulk are broken
  • IS#379 - Add support for distributed runs under SLURM
  • IS#382 - _Unwind_Word not declared in boost.backtrace
  • IS#387 - Doxygen should look only at list of specified files
  • IS#388 - Running make install on an out-of-tree application is broken
  • IS#391 - Out-of-tree application segfaults when running in qsub
  • IS#392 - Remove HPX_NO_INSTALL option from cmake build system
  • IS#396 - Pragma related warnings when compiling with older gcc versions
  • IS#399 - Out of tree component build problems
  • IS#400 - Out of source builds on Windows: linker should not receive compiler flags
  • IS#401 - Out of source builds on Windows: components need to be linked with hpx_serialization
  • IS#404 - gfortran fails to link automatically when fortran files are present
  • IS#405 - Inability to specify linking order for external libraries
  • IS#406 - Adapt action limits such that dataflow applications work without additional defines
  • IS#415 - locality_results is not a member of hpx::components::server
  • IS#425 - Breaking changes to traits::*result wrt std::vector<id_type>
  • IS#426 - AUTOGLOB needs to be updated to support fortran

This is a point release including important bug fixes for V0.8.0.

General Changes
  • HPX does not need to be installed anymore to be functional.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this point release:

  • IS#295 - Don't require install path to be known at compile time.
  • IS#371 - Add hpx iostreams to standard build.
  • IS#384 - Fix compilation with GCC 4.7.
  • IS#390 - Remove keep_factory_alive startup call from ShenEOS; add shutdown call to H5close.
  • IS#393 - Thread affinity control is broken.
Bug Fixes (Commits)

Here is a list of the important commits included in this point release:

  • r7642 - External: Fix backtrace memory violation.
  • r7775 - Components: Fix symbol visibility bug with component startup providers. This prevents one components providers from overriding another components.
  • r7778 - Components: Fix startup/shutdown provider shadowing issues.

We have had roughly 1000 commits since the last release and we have closed approximately 70 tickets (bugs, feature requests, etc.).

General Changes
  • Improved PBS support, allowing for arbitrary naming schemes of node-hostnames.
  • Finished verification of the reference counting framework.
  • Implemented decrement merging logic to optimize the distributed reference counting system.
  • Restructured the LCO framework. Renamed hpx::lcos::eager_future<> and hpx::lcos::lazy_future<> into hpx::lcos::packaged_task<> and hpx::lcos::deferred_packaged_task<>. Split hpx::lcos::promise<> into hpx::lcos::packaged_task<> and hpx::lcos::future<>. Added 'local' futures (in namespace hpx::lcos::local).
  • Improved the general performance of local and remote action invocations. This (under certain circumstances) drastically reduces the number of copies created for each of the parameters and return values.
  • Reworked the performance counter framework. Performance counters are now created only when needed, which reduces the overall resource requirements. The new framework allows for much more flexible creation and management of performance counters. The new sine example application demonstrates some of the capabilities of the new infrastructure.
  • Added a buildbot-based continuous build system which gives instant, automated feedback on each commit to SVN.
  • Added more automated tests to verify proper functioning of HPX.
  • Started to create documentation for HPX and its API.
  • Added documentation toolchain to the build system.
  • Added dataflow LCO.
  • Changed default HPX command line options to have hpx: prefix. For instance, the former option --threads is now --hpx:threads. This has been done to make ambiguities with possible application specific command line options as unlikely as possible. See the section HPX Command Line Options for a full list of available options.
  • Added the possibility to define command line aliases. The former short (one-letter) command line options have been predefined as aliases for backwards compatibility. See the section HPX Command Line Options for a detailed description of command line option aliasing.
  • Network connections are now cached based on the connected host. The number of simultaneous connections to a particular host is now limited. Parcels are buffered and bundled if all connections are in use.
  • Added more refined thread affinity control. This is based on the external library Portable Hardware Locality (HWLOC).
  • Improved support for Windows builds with CMake.
  • Added support for components to register their own command line options.
  • Added the possibility to register custom startup/shutdown functions for any component. These functions are guaranteed to be executed by an HPX thread.
  • Added two new experimental thread schedulers: hierarchy_scheduler and periodic_priority_scheduler. These can be activated by using the command line options --hpx:queueing=hierarchy or --hpx:queueing=periodic.
Example Applications
  • Graph500 performance benchmark (thanks to Matthew Anderson for contributing this application).
  • GTC (Gyrokinetic Toroidal Code): a skeleton for particle in cell type codes.
  • Random Memory Access: an example demonstrating random memory accesses in a large array
  • ShenEOS example, demonstrating partitioning of large read-only data structures and exposing an interpolation API.
  • Sine performance counter demo.
  • Accumulator examples demonstrating how to write and use HPX components.
  • Quickstart examples (like hello_world, fibonacci, quicksort, factorial, etc.) demonstrating simple HPX concepts which introduce some of the concepts in HPX.
  • Load balancing and work stealing demos.
API Changes
  • Moved all local LCOs into a separate namespace hpx::lcos::local (for instance, hpx::lcos::local_mutex is now hpx::lcos::local::mutex).
  • Replaced hpx::actions::function with hpx::util::function. Cleaned up related code.
  • Removed hpx::traits::handle_gid and moved handling of global reference counts into the corresponding serialization code.
  • Changed terminology: prefix is now called locality_id, renamed the corresponding API functions (such as hpx::get_prefix, which is now called hpx::get_locality_id).
  • Adding hpx::find_remote_localities(), and hpx::get_num_localities().
  • Changed performance counter naming scheme to make it more bash friendly. The new performance counter naming scheme is now
/object{parentname#parentindex/instance#index}/counter#parameters
  • Added hpx::get_worker_thread_num replacing hpx::threadmanager_base::get_thread_num.
  • Renamed hpx::get_num_os_threads to hpx::get_os_threads_count.
  • Added hpx::threads::get_thread_count.
  • Restructured the Futures sub-system, renaming types in accordance with the terminology used by the C++11 ISO standard.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release:

  • IS#31 - Specialize handle_gid<> for examples and tests
  • IS#72 - Fix AGAS reference counting
  • IS#104 - heartbeat throws an exception when decrefing the performance counter it's watching
  • IS#111 - throttle causes an exception on the target application
  • IS#142 - One failed component loading causes an unrelated component to fail
  • IS#165 - Remote exception propagation bug in AGAS reference counting test
  • IS#186 - Test credit exhaustion/splitting (e.g. prepare_gid and symbol NS)
  • IS#188 - Implement remaining AGAS reference counting test cases
  • IS#258 - No type checking of GIDs in stubs classes
  • IS#271 - Seg fault/shared pointer assertion in distributed code
  • IS#281 - CMake options need descriptive text
  • IS#283 - AGAS caching broken (gva_cache needs to be rewritten with ICL)
  • IS#285 - HPX_INSTALL root directory not the same as CMAKE_INSTALL_PREFIX
  • IS#286 - New segfault in dataflow applications
  • IS#289 - Exceptions should only be logged if not handled
  • IS#290 - c++11 tests failure
  • IS#293 - Build target for component libraries
  • IS#296 - Compilation error with Boost V1.49rc1
  • IS#298 - Illegal instructions on termination
  • IS#299 - gravity aborts with multiple threads
  • IS#301 - Build error with Boost trunk
  • IS#303 - Logging assertion failure in distributed runs
  • IS#304 - Exception 'what' strings are lost when exceptions from decode_parcel are reported
  • IS#306 - Performance counter user interface issues
  • IS#307 - Logging exception in distributed runs
  • IS#308 - Logging deadlocks in distributed
  • IS#309 - Reference counting test failures and exceptions
  • IS#311 - Merge AGAS remote_interface with the runtime_support object
  • IS#314 - Object tracking for id_types
  • IS#315 - Remove handle_gid and handle credit splitting in id_type serialization
  • IS#320 - applier::get_locality_id() should return an error value (or throw an exception)
  • IS#321 - Optimization for id_types which are never split should be restored
  • IS#322 - Command line processing ignored with Boost 1.47.0
  • IS#323 - Credit exhaustion causes object to stay alive
  • IS#324 - Duplicate exception messages
  • IS#326 - Integrate Quickbook with CMake
  • IS#329 - --help and --version should still work
  • IS#330 - Create pkg-config files
  • IS#337 - Improve usability of performance counter timestamps
  • IS#338 - Non-std exceptions deriving from std::exceptions in tfunc may be sliced
  • IS#339 - Decrease the number of send_pending_parcels threads
  • IS#343 - Dynamically setting the stack size doesn't work
  • IS#351 - 'make install' does not update documents
  • IS#353 - Disable FIXMEs in the docs by default; add a doc developer CMake option to enable FIXMEs
  • IS#355 - 'make' doesn't do anything after correct configuration
  • IS#356 - Don't use hpx::util::static_ in topology code
  • IS#359 - Infinite recursion in hpx::tuple serialization
  • IS#361 - Add compile time option to disable logging completely
  • IS#364 - Installation seriously broken in r7443

We have had roughly 1000 commits since the last release and we have closed approximately 120 tickets (bugs, feature requests, etc.).

General Changes
  • Completely removed code related to deprecated AGAS V1, started to work on AGAS V2.1.
  • Started to clean up and streamline the exposed APIs (see 'API changes' below for more details).
  • Revamped and unified performance counter framework, added a lot of new performance counter instances for monitoring of a diverse set of internal HPX parameters (queue lengths, access statistics, etc.).
  • Improved general error handling and logging support.
  • Fixed several race conditions, improved overall stability, decreased memory footprint, improved overall performance (major optimizations include native TLS support and ranged-based AGAS caching).
  • Added support for running HPX applications with PBS.
  • Many updates to the build system, added support for gcc 4.5.x and 4.6.x, added C++11 support.
  • Many updates to default command line options.
  • Added many tests, set up buildbot for continuous integration testing.
  • Better shutdown handling of distributed applications.
Example Applications
  • quickstart/factorial and quickstart/fibonacci, future-recursive parallel algorithms.
  • quickstart/hello_world, distributed hello world example.
  • quickstart/rma, simple remote memory access example
  • quickstart/quicksort, parallel quicksort implementation.
  • gtc, gyrokinetic torodial code.
  • bfs, breadth-first-search, example code for a graph application.
  • sheneos, partitioning of large data sets.
  • accumulator, simple component example.
  • balancing/os_thread_num, balancing/px_thread_phase, examples demonstrating load balancing and work stealing.
API Changes
  • Added hpx::find_all_localities.
  • Added hpx::terminate for non-graceful termination of applications.
  • Added hpx::lcos::async functions for simpler asynchronous programming.
  • Added new AGAS interface for handling of symbolic namespace (hpx::agas::*).
  • Renamed hpx::components::wait to hpx::lcos::wait.
  • Renamed hpx::lcos::future_value to hpx::lcos::promise.
  • Renamed hpx::lcos::recursive_mutex to hpx::lcos::local_recursive_mutex, hpx::lcos::mutex to hpx::lcos::local_mutex
  • Removed support for Boost versions older than V1.38, recommended Boost version is now V1.47 and newer.
  • Removed hpx::process (this will be replaced by a real process implementation in the future).
  • Removed non-functional LCO code (hpx::lcos::dataflow, hpx::lcos::thunk, hpx::lcos::dataflow_variable).
  • Removed deprecated hpx::naming::full_address.
Bug Fixes (Closed Tickets)

Here is a list of the important tickets we closed for this release:

  • IS#28 - Integrate Windows/Linux CMake code for HPX core
  • IS#32 - hpx::cout() should be hpx::cout
  • IS#33 - AGAS V2 legacy client does not properly handle error_code
  • IS#60 - AGAS: allow for registerid to optionally take ownership of the gid
  • IS#62 - adaptive1d compilation failure in Fusion
  • IS#64 - Parcel subsystem doesn't resolve domain names
  • IS#83 - No error handling if no console is available
  • IS#84 - No error handling if a hosted locality is treated as the bootstrap server
  • IS#90 - Add general commandline option -N
  • IS#91 - Add possibility to read command line arguments from file
  • IS#92 - Always log exceptions/errors to the log file
  • IS#93 - Log the command line/program name
  • IS#95 - Support for distributed launches
  • IS#97 - Attempt to create a bad component type in AMR examples
  • IS#100 - factorial and factorial_get examples trigger AGAS component type assertions
  • IS#101 - Segfault when hpx::process::here() is called in fibonacci2
  • IS#102 - unknown_component_address in int_object_semaphore_client
  • IS#114 - marduk raises assertion with default parameters
  • IS#115 - Logging messages for SMP runs (on the console) shouldn't be buffered
  • IS#119 - marduk linking strategy breaks other applications
  • IS#121 - pbsdsh problem
  • IS#123 - marduk, dataflow and adaptive1d fail to build
  • IS#124 - Lower default preprocessing arity
  • IS#125 - Move hpx::detail::diagnostic_information out of the detail namespace
  • IS#126 - Test definitions for AGAS reference counting
  • IS#128 - Add averaging performance counter
  • IS#129 - Error with endian.hpp while building adaptive1d
  • IS#130 - Bad initialization of performance counters
  • IS#131 - Add global startup/shutdown functions to component modules
  • IS#132 - Avoid using auto_ptr
  • IS#133 - On Windows hpx.dll doesn't get installed
  • IS#134 - HPX_LIBRARY does not reflect real library name (on Windows)
  • IS#135 - Add detection of unique_ptr to build system
  • IS#137 - Add command line option allowing to repeatedly evaluate performance counters
  • IS#139 - Logging is broken
  • IS#140 - CMake problem on windows
  • IS#141 - Move all non-component libraries into $PREFIX/lib/hpx
  • IS#143 - adaptive1d throws an exception with the default command line options
  • IS#146 - Early exception handling is broken
  • IS#147 - Sheneos doesn't link on Linux
  • IS#149 - sheneos_test hangs
  • IS#154 - Compilation fails for r5661
  • IS#155 - Sine performance counters example chokes on chrono headers
  • IS#156 - Add build type to --version
  • IS#157 - Extend AGAS caching to store gid ranges
  • IS#158 - r5691 doesn't compile
  • IS#160 - Re-add AGAS function for resolving a locality to its prefix
  • IS#168 - Managed components should be able to access their own GID
  • IS#169 - Rewrite AGAS future pool
  • IS#179 - Complete switch to request class for AGAS server interface
  • IS#182 - Sine performance counter is loaded by other examples
  • IS#185 - Write tests for symbol namespace reference counting
  • IS#191 - Assignment of read-only variable in point_geometry
  • IS#200 - Seg faults when querying performance counters
  • IS#204 - --ifnames and suffix stripping needs to be more generic
  • IS#205 - --list-* and --print-counter-* options do not work together and produce no warning
  • IS#207 - Implement decrement entry merging
  • IS#208 - Replace the spinlocks in AGAS with hpx::lcos::local_mutexes
  • IS#210 - Add an --ifprefix option
  • IS#214 - Performance test for PX-thread creation
  • IS#216 - VS2010 compilation
  • IS#222 - r6045 context_linux_x86.hpp
  • IS#223 - fibonacci hangs when changing the state of an active thread
  • IS#225 - Active threads end up in the FEB wait queue
  • IS#226 - VS Build Error for Accumulator Client
  • IS#228 - Move all traits into namespace hpx::traits
  • IS#229 - Invalid initialization of reference in thread_init_data
  • IS#235 - Invalid GID in iostreams
  • IS#238 - Demangle type names for the default implementation of get_action_name
  • IS#241 - C++11 support breaks GCC 4.5
  • IS#247 - Reference to temporary with GCC 4.4
  • IS#248 - Seg fault at shutdown with GCC 4.4
  • IS#253 - Default component action registration kills compiler
  • IS#272 - G++ unrecognized command line option
  • IS#273 - quicksort example doesn't compile
  • IS#277 - Invalid CMake logic for Windows
Welcome

Welcome to the HPX runtime system libraries! By the time you've completed this tutorial, you'll be at least somewhat comfortable with HPX and how to go about using it.

What's Here

This document is designed to be an extremely gentle introduction, so we included a fair amount of material that may already be very familiar to you. To keep things simple, we also left out some information intermediate and advanced users will probably want. At the end of this document, we'll refer you to resources that can help you pursue these topics further.

Most HPX applications are executed on parallel computers. These platforms typically provide integrated job management services that facilitate the allocation of computing resources for each parallel program. HPX includes out of the box support for one of the most common job management systems, the Portable Batch System (PBS).

All PBS jobs require a script to specify the resource requirements and other parameters associated with a parallel job. The PBS script is basically a shell script with PBS directives placed within commented sections at the beginning of the file. The remaining (not commented-out) portions of the file executes just like any other regular shell script. While the description of all available PBS options is outside the scope of this tutorial (the interested reader may refer to in-depth documentation for more information), below is a minimal example to illustrate the approach. As a test application we will use the multithreaded hello_world program, explained in the section Hello World Example.

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world
APP_OPTIONS=

pbsdsh -u $APP_PATH $APP_OPTIONS --hpx:nodes=`cat $PBS_NODEFILE`
[Caution]Caution

If the first application specific argument (inside $APP_OPTIONS) is a non-option (i.e. does not start with a '-' or a '--', then those have to be placed before the option --hpx:nodes, which in this case should be the last option on the command line.

Alternatively, use the option --hpx:endnodes to explicitly mark the end of the list of node names:

pbsdsh -u $APP_PATH --hpx:nodes=`cat $PBS_NODEFILE` --hpx:endnodes $APP_OPTIONS

The #PBS -l nodes=2:ppn=4 directive will cause two compute nodes to be allocated for the application, as specified in the option nodes. Each of the nodes will dedicate four cores to the program, as per the option ppn, short for "processors per node" (PBS does not distinguish between processors and cores). Note that requesting more cores per node than physically available is pointless and may prevent PBS from accepting the script.

On newer PBS versions the PBS command syntax might be different. For instance, the PBS script above would look like:

#!/bin/bash
#
#PBS -l select=2:ncpus=4

APP_PATH=~/packages/hpx/bin/hello_world
APP_OPTIONS=

pbsdsh -u $APP_PATH $APP_OPTIONS --hpx:nodes=`cat $PBS_NODEFILE`

APP_PATH and APP_OPTIONS are shell variables that respectively specify the correct path to the executable (hello_world in this case) and the command line options. Since the hello_world application doesn't need any command line options, APP_OPTIONS has been left empty. Unlike in other execution environments, there is no need to use the --hpx:threads option to indicate the required number of OS threads per node; the HPX library will derive this parameter automatically from PBS.

Finally, pbsdsh is a PBS command that starts tasks to the resources allocated to the current job. It is recommended to leave this line as shown and modify only the PBS options and shell variables as needed for a specific application.

[Important]Important

A script invoked by pbsdsh starts in a very basic environment: the user's $HOME directory is defined and is the current directory, the LANG variable is set to C, and the PATH is set to the basic /usr/local/bin:/usr/bin:/bin as defined in a system-wide file pbs_environment. Nothing that would normally be set up by a system shell profile or user shell profile is defined, unlike the environment for the main job script.

Another choice is for the pbsdsh command in your main job script to invoke your program via a shell, like sh or bash, so that it gives an initialized environment for each instance. We create a small script runme.sh which is used to invoke the program:

#!/bin/bash
# Small script which invokes the program based on what was passed on its
# command line.
#
# This script is executed by the bash shell which will initialize all
# environment variables as usual.
$@

Now, we invoke this script using the pbsdsh tool:

#!/bin/bash
#
#PBS -l nodes=2:ppn=4

APP_PATH=~/packages/hpx/bin/hello_world
APP_OPTIONS=

pbsdsh -u runme.sh $APP_PATH $APP_OPTIONS --hpx:nodes=`cat $PBS_NODEFILE`

All that remains now is submitting the job to the queuing system. Assuming that the contents of the PBS script were saved in file pbs_hello_world.sh in the current directory, this is accomplished by typing:

qsub ./pbs_hello_world_pbs.sh

If the job is accepted, qsub will print out the assigned job ID, which may look like:

$ 42.supercomputer.some.university.edu

To check the status of your job, issue the following command:

qstat 42.supercomputer.some.university.edu

and look for a single-letter job status symbol. The common cases include:

  • Q - signifies that the job is queued and awaiting its turn to be executed.
  • R - indicates that the job is currently running.
  • C - means that the job has completed.

The example qstat output below shows a job waiting for execution resources to become available:

Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
42.supercomputer          ...ello_world.sh joe_user               0 Q batch

After the job completes, PBS will place two files, pbs_hello_world.sh.o42 and pbs_hello_world.sh.e42, in the directory where the job was submitted. The first contains the standard output and the second contains the standard error from all the nodes on which the application executed. In our example, the error output file should be empty and standard output file should contain something similar to:

hello world from OS-thread 3 on locality 0
hello world from OS-thread 2 on locality 0
hello world from OS-thread 1 on locality 1
hello world from OS-thread 0 on locality 0
hello world from OS-thread 3 on locality 1
hello world from OS-thread 2 on locality 1
hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 1

Congratulations! You have just run your first distributed HPX application!

Just like PBS (described in section Using PBS), SLURM is a job management system which is widely used on large supercomputing systems. Any HPX application can easily be run using SLURM. This section describes how this can be done.

The easiest way to run an HPX application using SLURM is to utilize the command line tool srun which interacts with the SLURM batch scheduling system.

srun -p <partition> -N <number-of-nodes> hpx-application <application-arguments>

Here, <partition> is one of the node partitions existing on the target machine (consult the machines documentation to get a list of existing partitions) and <number-of-nodes> is the number of compute nodes you want to use. By default, the HPX application is started with one locality per node and uses all available cores on a node. You can change the number of localities started per node (for example to account for NUMA effects) by specifying the -n option of srun. The number of cores per locality can be set by -c. The <application-arguments> are any application specific arguments which need to be passed on to the application.

[Note]Note

There is no need to use any of the HPX command line options related to the number of localities, number of threads, or related to networking ports. All of this information is automatically extracted from the SLURM environment by the HPX startup code.

[Important]Important

The srun documentation explicitly states: "If -c is specified without -n, as many tasks will be allocated per node as possible while satisfying the -c restriction. For instance on a cluster with 8 CPUs per node, a job request for 4 nodes and 3 CPUs per task may be allocated 3 or 6 CPUs per node (1 or 2 tasks per node) depending upon resource consumption by other jobs." For this reason, we suggest to always specify -n <number-of-instances>, even if <number-of-instances> is equal to one (1).

Interactive Shells

To get an interactive development shell on one of the nodes you can issue the following command:

srun -p <node-type> -N <number-of-nodes> --pty /bin/bash -l

After the shell has been opened, you can run your HPX application. By default, it uses all available cores. Note that if you requested one node, you don't need to do srun again. However, if you requested more than one node, and want to run your distributed application, you can use srun again to start up the distributed HPX application. It will use the resources that have been requested for the interactive shell.

Scheduling Batch Jobs

The above mentioned method of running HPX applications is fine for development purposes. The disadvantage that comes with srun is that it only returns once the application is finished. This might not be appropriate for longer running applications (for example benchmarks or larger scale simulations). In order to cope with that limitation you can use the sbatch command.

The sbatch command expects a script that it can run once the requested resources are available. In order to request resources you need to add #SBATCH comments in your script or provide the necessary parameters to sbatch directly. The parameters are the same as with srun. The commands you need to execute are the same you would need to start your application as if you were in an interactive shell.

Current advances in high performance computing (HPC) continue to suffer from the issues plaguing parallel computation. These issues include, but are not limited to, ease of programming, inability to handle dynamically changing workloads, scalability, and efficient utilization of system resources. Emerging technological trends such as multi-core processors further highlight limitations of existing parallel computation models. To mitigate the aforementioned problems, it is necessary to rethink the approach to parallelization models. ParalleX contains mechanisms such as multi-threading, parcels, global name space support, percolation and local control objects (LCO). By design, ParalleX overcomes limitations of current models of parallelism by alleviating contention, latency, overhead and starvation. With ParalleX, it is further possible to increase performance by at least an order of magnitude on challenging parallel algorithms, e.g., dynamic directed graph algorithms and adaptive mesh refinement methods for astrophysics. An additional benefit of ParalleX is fine-grained control of power usage, enabling reductions in power consumption.

ParalleX - a new Execution Model for Future Architectures

ParalleX is a new parallel execution model that offers an alternative to the conventional computation models, such as message passing. ParalleX distinguishes itself by:

  • Split-phase transaction model
  • Message-driven
  • Distributed shared memory (not cache coherent)
  • Multi-threaded
  • Futures synchronization
  • Local Control Objects (LCOs)
  • Synchronization for anonymous producer-consumer scenarios
  • Percolation (pre-staging of task data)

The ParalleX model is intrinsically latency hiding, delivering an abundance of variable-grained parallelism within a hierarchical namespace environment. The goal of this innovative strategy is to enable future systems delivering very high efficiency, increased scalability and ease of programming. ParalleX can contribute to significant improvements in the design of all levels of computing systems and their usage from application algorithms and their programming languages to system architecture and hardware design together with their supporting compilers and operating system software.

What is HPX

High Performance ParalleX (HPX) is the first runtime system implementation of the ParalleX execution model. The HPX runtime software package is a modular, feature-complete, and performance oriented representation of the ParalleX execution model targeted at conventional parallel computing architectures such as SMP nodes and commodity clusters. It is academically developed and freely available under an open source license. We provide HPX to the community for experimentation and application to achieve high efficiency and scalability for dynamic adaptive and irregular computational problems. HPX is a C++ library that supports a set of critical mechanisms for dynamic adaptive resource management and lightweight task scheduling within the context of a global address space. It is solidly based on many years of experience in writing highly parallel applications for HPC systems.

The two-decade success of the communicating sequential processes (CSP) execution model and its message passing interface (MPI) programming model has been seriously eroded by challenges of power, processor core complexity, multi-core sockets, and heterogeneous structures of GPUs. Both efficiency and scalability for some current (strong scaled) applications and future Exascale applications demand new techniques to expose new sources of algorithm parallelism and exploit unused resources through adaptive use of runtime information.

The ParalleX execution model replaces CSP to provide a new computing paradigm embodying the governing principles for organizing and conducting highly efficient scalable computations greatly exceeding the capabilities of today's problems. HPX is the first practical, reliable, and performance-oriented runtime system incorporating the principal concepts of the ParalleX model publicly provided in open source release form.

HPX is designed by the STE||AR Group (Systems Technology, Emergent Parallelism, and Algorithm Research) at Louisiana State University (LSU)'s Center for Computation and Technology (CCT) to enable developers to exploit the full processing power of many-core systems with an unprecedented degree of parallelism. STE||AR is a research group focusing on system software solutions and scientific application development for hybrid and many-core hardware architectures.

For more information about the STE||AR Group, see People.

Estimates say that we currently run our computers at way below 100% efficiency. The theoretical peak performance (usually measured in FLOPS - floating point operations per second) is much higher than any practical peak performance reached by any application. This is particularly true for highly parallel hardware. The more hardware parallelism we provide to an application, the better the application must scale in order to efficiently use all the resources of the machine. Roughly speaking, we distinguish two forms of scalability: strong scaling (see Amdahl's Law) and weak scaling (see Gustafson's Law). Strong scaling is defined as how the solution time varies with the number of processors for a fixed total problem size. It gives an estimate of how much faster can we solve a particular problem by throwing more resources at it. Weak scaling is defined as how the solution time varies with the number of processors for a fixed problem size per processor. In other words, it defines how much more data can we process by using more hardware resources.

In order to utilize as much hardware parallelism as possible an application must exhibit excellent strong and weak scaling characteristics, which requires a high percentage of work executed in parallel, i.e. using multiple threads of execution. Optimally, if you execute an application on a hardware resource with N processors it either runs N times faster or it can handle N times more data. Both cases imply 100% of the work is executed on all available processors in parallel. However, this is just a theoretical limit. Unfortunately, there are more things which limit scalability, mostly inherent to the hardware architectures and the programming models we use. We break these limitations into four fundamental factors which make our systems SLOW:

  • Starvation occurs when there is insufficient concurrent work available to maintain high utilization of all resources.
  • Latencies are imposed by the time-distance delay intrinsic to accessing remote resources and services.
  • Overhead is work required for the management of parallel actions and resources on the critical execution path which is not necessary in a sequential variant.
  • Waiting for contention resolution is the delay due to the lack of availability of oversubscribed shared resources.

Each of those four factors manifests itself in multiple and different ways; each of the hardware architectures and programming models expose specific forms. However the interesting part is that all of them are limiting the scalability of applications no matter what part of the hardware jungle we look at. Hand-helds, PCs, supercomputers, or the cloud, all suffer from the reign of the 4 horsemen: Starvation, Latency, Overhead, and Contention. This realization is very important as it allows us to derive the criteria for solutions to the scalability problem from first principles, it allows us to focus our analysis on very concrete patterns and measurable metrics. Moreover, any derived results will be applicable to a wide variety of targets.

Today's computer systems are designed based on the initial ideas of John von Neumann, as published back in 1945, and later extended by the Harvard architecture. These ideas form the foundation, the execution model of computer systems we use currently. But apparently a new response is required in the light of the demands created by today's technology.

So, what are the overarching objectives for designing systems allowing for applications to scale as they should? In our opinion, the main objectives are:

  • Performance: as mentioned, scalable and efficiency are the main criteria people are interested in
  • Fault tolerance: the low expected mean time between failures (MTBF) of future systems requires embracing faults, not trying to avoid them
  • Power: minimizing energy consumption is a must as it is one of the major cost factors today, even more so in the future
  • Generality: any system should be usable for a broad set of use cases
  • Programmability: for me as a programmer this is a very important objective, ensuring long term platform stability and portability

What needs to be done to meet those objectives, to make applications scale better on tomorrow's architectures? Well, the answer is almost obvious: we need to devise a new execution model - a set of governing principles for the holistic design of future systems - targeted at minimizing the effect of the outlined SLOW factors. Everything we create for future systems, every design decision we make, every criteria we apply, has to be validated against this single, uniform metric. This includes changes in the hardware architecture we prevalently use today, and it certainly involves new ways of writing software, starting from the operating system, runtime system, compilers, and at the application level. However the key point is that all those layers have to be co-designed, they are interdependent and cannot be seen as separate facets. The systems we have today have been evolving for over 50 years now. All layers function in a certain way relying on the other layers to do so as well. However, we do not have the time to wait for a coherent system to evolve for another 50 years. The new paradigms are needed now - therefore, co-design is the key.

As it turn out, we do not have to start from scratch. Not everything has to be invented and designed anew. Many of the ideas needed to combat the 4 horsemen have already been had, often more than 30 years ago. All it takes is to gather them into a coherent approach. So please let me highlight some of the derived principles we think to be crucial for defeating SLOW. Some of those are focused on high-performance computing, others are more general.

Focus on Latency Hiding instead of Latency Avoidance

It is impossible to design a system exposing zero latencies. In an effort to come as close as possible to this goal many optimizations are mainly targeted towards minimizing latencies. Examples for this can be seen everywhere, for instance low latency network technologies like InfiniBand, caching memory hierarchies in all modern processors, the constant optimization of existing MPI implementations to reduce related latencies, or the data transfer latencies intrinsic to the way we use GPGPUs today. It is important to note, that existing latencies are often tightly related to some resource having to wait for the operation to be completed. At the same time it would be perfectly fine to do some other, unrelated work in the meantime, allowing the system to hide the latencies by filling the idle-time with useful work. Modern system already employ similar techniques (pipelined instruction execution in the processor cores, asynchronous input/output operations, and many more). What we propose is to go beyond anything we know today and to make latency hiding an intrinsic concept of the operation of the whole system stack.

Embrace Fine-grained Parallelism instead of Heavyweight Threads

If we plan to hide latencies even for very short operations, such as fetching the contents of a memory cell from main memory (if it is not already cached), we need to have very lightweight threads with extremely short context switching times, optimally executable within one cycle. Granted, for mainstream architectures this is not possible today (even if we already have special machines supporting this mode of operation, such as the Cray XMT). For conventional systems however, the smaller the overhead of a context switch and the finer the granularity of the threading system, the better will be the overall system utilization and its efficiency. For today's architectures we already see a flurry of libraries providing exactly this type of functionality: non-pre-emptive, task-queue based parallelization solutions, such as Intel Threading Building Blocks (TBB), Microsoft Parallel Patterns Library (PPL), Cilk++, and many others. The possibility to suspend a current task if some preconditions for its execution are not met (such as waiting for I/O or the result of a different task), seamlessly switching to any other task which can continue, and to reschedule the initial task after the required result has been calculated, which makes the implementation of latency hiding almost trivial.

Rediscover Constrained Based Synchronization to replace Global Barriers

The code we write today is riddled with implicit (and explicit) global barriers. When I say global barrier I mean the synchronization of the control flow between several (very often all) threads (when using OpenMP) or processes (MPI). For instance, an implicit global barrier is inserted after each loop parallelized using OpenMP as the system synchronizes the threads used to execute the different iterations in parallel. In MPI each of the communication steps imposes an explicit barrier onto the execution flow as (often all) nodes have to be synchronized. Each of those barriers acts as an eye of the needle the overall execution is forced to be squeezed through. Even minimal fluctuations in the execution times of the parallel threads (jobs) causes them to wait. Additionally it is often only one of the threads executing doing the actual reduce operation, which further impedes parallelism. A closer analysis of a couple of key algorithms used in science applications reveals that these global barriers are not always necessary. In many cases it is sufficient to synchronize a small subset of the threads. Any operation should proceed whenever the preconditions for its execution are met, and only those. Usually there is no need to wait for iterations of a loop to finish before you could continue calculating other things, all you need is to have those iterations done which were producing the required results for a particular next operation. Good bye global barriers, hello constraint based synchronization! People have been trying to build this type of computing (and even computers) already back in the 1970's. The theory behind what they did is based on ideas around static and dynamic dataflow. There are certain attempts today to get back to those ideas and to incorporate them with modern architectures. For instance, a lot of work is being done in the area of constructing dataflow oriented execution trees. Our results show that employing dataflow techniques in combination with the other ideas, as outlined herein, considerably improves scalability for many problems.

Adaptive Locality Control instead of Static Data Distribution

While this principle seems to be a given for single desktop or laptop computers (the operating system is your friend), it is everything but ubiquitous on modern supercomputers, which are usually built from a large number of separate nodes (i.e. Beowulf clusters), tightly interconnected by a high bandwidth, low latency network. Today's prevalent programming model for those is MPI which does not directly help with proper data distribution, leaving it to the programmer to decompose the data to all of the nodes the application is running on. There are a couple of specialized languages and programming environments based on PGAS (Partitioned Global Address Space) designed to overcome this limitation, such as Chapel, X10, UPC, or Fortress. However all systems based on PGAS rely on static data distribution. This works fine as long as such a static data distribution does not result in in homogeneous workload distributions or other resource utilization imbalances. In a distributed system these imbalances can be mitigated by migrating part of the application data to different localities (nodes). The only framework supporting (limited) migration today is Charm++. The first attempts towards solving related problem go back decades as well, a good example is the Linda coordination language. Nevertheless, none of the other mentioned systems support data migration today, which forces the users to either rely on static data distribution and live with the related performance hits or to implement everything themselves, which is very tedious and difficult. We believe that the only viable way to flexibly support dynamic and adaptive locality control is to provide a global, uniform address space to the applications, even on distributed systems.

Prefer Moving Work to the Data over Moving Data to the Work

For best performance it seems obvious to minimize the amount of bytes transferred from one part of the system to another. This is true on all levels. At the lowest level we try to take advantage of processor memory caches, thus minimizing memory latencies. Similarly, we try to amortize the data transfer time to and from GPGPUs as much as possible. At high levels we try to minimize data transfer between different nodes of a cluster or between different virtual machines on the cloud. Our experience (well, it's almost common wisdom) show that the amount of bytes necessary to encode a certain operation is very often much smaller than the amount of bytes encoding the data the operation is performed upon. Nevertheless we still often transfer the data to a particular place where we execute the operation just to bring the data back to where it came from afterwards. As an example let me look at the way we usually write our applications for clusters using MPI. This programming model is all about data transfer between nodes. MPI is the prevalent programming model for clusters, it is fairly straightforward to understand and to use. Therefore, we often write the applications in a way accommodating this model, centered around data transfer. These applications usually work well for smaller problem sizes and for regular data structures. The larger the amount of data we have to churn and the more irregular the problem domain becomes, the worse are the overall machine utilization and the (strong) scaling characteristics. While it is not impossible to implement more dynamic, data driven, and asynchronous applications using MPI, it is overly difficult to so. At the same time, if we look at applications preferring to execute the code close the locality where the data was placed, i.e. utilizing active messages (for instance based on Charm++), we see better asynchrony, simpler application codes, and improved scaling.

Favor Message Driven Computation over Message Passing

Today's prevalently used programming model on parallel (multi-node) systems is MPI. It is based on message passing (as the name implies), which means that the receiver has to be aware of a message about to come in. Both codes, the sender and the receiver, have to synchronize in order to perform the communication step. Even the newer, asynchronous interfaces require explicitly coding the algorithms around the required communication scheme. As a result, any more than trivial MPI application spends a considerable amount of time waiting for incoming messages, thus causing starvation and latencies to impede full resource utilization. The more complex and more dynamic the data structures and algorithms become, the larger are the adverse effects. The community has discovered message-driven and (data-driven) methods of implementing algorithms a long time ago, and systems such as Charm++ already have integrated active messages demonstrating the validity of the concept. Message driven computation allows sending messages without requiring the receiver to actively wait for them. Any incoming message is handled asynchronously and triggers the encoded action by passing along arguments and - possibly - continuations. HPX combines this scheme with work queue-based scheduling as described above, which allows the system to overlap almost completely any communication with useful work, thereby minimizing latencies.

The following sections of our tutorial analyzes some examples to help you get familiar with the HPX style of programming. We start off with simple examples that utilize basic HPX elements and then begin to expose the reader to the more complex, yet powerful, HPX concepts.

[Note]Note

The instructions for building and running the examples currently only cover Unix variants.

The Fibonacci sequence is a sequence of numbers starting with 0 and 1 where every subsequent number is the sum of the previous two numbers. In this example, we will use HPX to calculate the value of the n-th element of the Fibonacci sequence. In order to compute this problem in parallel, we will use a facility known as a Future.

As shown in the figure below, a Future encapsulates a delayed computation. It acts as a proxy for a result initially not known, most of the time because the computation of the result has not completed yet. The Future synchronizes the access of this value by optionally suspending any HPX-threads requesting the result until the value is available. When a Future is created, it spawns a new HPX-thread (either remotely with a parcel or locally by placing it into the thread queue) which, when run, will execute the action associated with the Future. The arguments of the action are bound when the Future is created.

Figure 1. Schematic of a Future execution

Schematic of a Future execution


Once the action has finished executing, a write operation is performed on the Future. The write operation marks the Future as completed, and optionally stores data returned by the action. When the result of the delayed computation is needed, a read operation is performed on the Future. If the Future's action hasn't completed when a read operation is performed on it, the reader HPX-thread is suspended until the Future is ready. The Future facility allows HPX to schedule work early in a program so that when the function value is needed it will already be calculated and available. We use this property in our Fibonacci example below to enable its parallel execution.

Setup

The source code for this example can be found here: fibonacci.cpp.

To compile this program, go to your HPX build directory (see Getting Started for information on configuring and building HPX) and enter:

make examples.quickstart.fibonacci

To run the program type:

./bin/fibonacci

This should print (time should be approximate):

fibonacci(10) == 55
elapsed time: 0.00186288 [s]

This run used the default settings, which calculate the tenth element of the Fibonacci sequence. To declare which Fibonacci value you want to calculate, use the --n-value option. Additionally you can use the --hpx:threads option to declare how many OS-threads you wish to use when running the program. For instance, running:

./bin/fibonacci --n-value 20 --hpx:threads 4

Will yield:

fibonacci(20) == 6765
elapsed time: 0.233827 [s]
Walkthrough

Now that you have compiled and run the code, let's look at how the code works. Since this code is written in C++, we will begin with the main() function. Here you can see that in HPX, main() is only used to initialize the runtime system. It is important to note that application-specific command line options are defined here. HPX uses Boost.Program Options for command line processing. You can see that our programs --n-value option is set by calling the add_options() method on an instance of boost::program_options::options_description. The default value of the variable is set to 10. This is why when we ran the program for the first time without using the --n-value option the program returned the 10th value of the Fibonacci sequence. The constructor argument of the description is the text that appears when a user uses the --help option to see what command line options are available. HPX_APPLICATION_STRING is a macro that expands to a string constant containing the name of the HPX application currently being compiled.

In HPX main() is used to initialize the runtime system and pass the command line arguments to the program. If you wish to add command line options to your program you would add them here using the instance of the Boost class options_description, and invoking the public member function .add_options() (see Boost Documentation or the Fibonacci Example for more details). hpx::init() calls hpx_main() after setting up HPX, which is where the logic of our program is encoded.

int main(int argc, char* argv[])
{
    // Configure application-specific options
    boost::program_options::options_description
       desc_commandline("Usage: " HPX_APPLICATION_STRING " [options]");

    desc_commandline.add_options()
        ( "n-value",
          boost::program_options::value<std::uint64_t>()->default_value(10),
          "n value for the Fibonacci function")
        ;

    // Initialize and run HPX
    return hpx::init(desc_commandline, argc, argv);
}

The hpx::init() function in main() starts the runtime system, and invokes hpx_main() as the first HPX-thread. Below we can see that the basic program is simple. The command line option --n-value is read in, a timer (hpx::util::high_resolution_timer) is set up to record the time it takes to do the computation, the fibonacci action is invoked synchronously, and the answer is printed out.

int hpx_main(boost::program_options::variables_map& vm)
{
    // extract command line argument, i.e. fib(N)
    std::uint64_t n = vm["n-value"].as<std::uint64_t>();

    {
        // Keep track of the time required to execute.
        hpx::util::high_resolution_timer t;

        // Wait for fib() to return the value
        fibonacci_action fib;
        std::uint64_t r = fib(hpx::find_here(), n);

        char const* fmt = "fibonacci(%1%) == %2%\nelapsed time: %3% [s]\n";
        std::cout << (boost::format(fmt) % n % r % t.elapsed());
    }

    return hpx::finalize(); // Handles HPX shutdown
}

Upon a closer look we see that we've created a std::uint64_t to store the result of invoking our fibonacci_action fib. This action will launch synchronously ( as the work done inside of the action will be asynchronous itself) and return the result of the fibonacci sequence. But wait, what is an action? And what is this fibonacci_action? For starters, an action is a wrapper for a function. By wrapping functions, HPX can send packets of work to different processing units. These vehicles allow users to calculate work now, later, or on certain nodes. The first argument to our action is the location where the action should be run. In this case, we just want to run the action on the machine that we are currently on, so we use hpx::find_here(). The second parameter simply forward the fibonacci sequence n that we wish to calculate. To further understand this we turn to the code to find where fibonacci_action was defined:

// forward declaration of the Fibonacci function
std::uint64_t fibonacci(std::uint64_t n);

// This is to generate the required boilerplate we need for the remote
// invocation to work.
HPX_PLAIN_ACTION(fibonacci, fibonacci_action);

A plain action is the most basic form of action. Plain actions wrap simple global functions which are not associated with any particular object (we will discuss other types of actions in the Accumulator Example). In this block of code the function fibonacci() is declared. After the declaration, the function is wrapped in an action in the declaration HPX_PLAIN_ACTION. This function takes two arguments: the name of the function that is to be wrapped and the name of the action that you are creating.

This picture should now start making sense. The function fibonacci() is wrapped in an action fibonacci_action, which was run synchronously but created asynchronous work, then returns a std::uint64_t representing the result of the function fibonacci(). Now, let's look at the function fibonacci():

std::uint64_t fibonacci(std::uint64_t n)
{
    if (n < 2)
        return n;

    // We restrict ourselves to execute the Fibonacci function locally.
    hpx::naming::id_type const locality_id = hpx::find_here();

    // Invoking the Fibonacci algorithm twice is inefficient.
    // However, we intentionally demonstrate it this way to create some
    // heavy workload.

    fibonacci_action fib;
    hpx::future<std::uint64_t> n1 =
        hpx::async(fib, locality_id, n - 1);
    hpx::future<std::uint64_t> n2 =
        hpx::async(fib, locality_id, n - 2);

    return n1.get() + n2.get();   // wait for the Futures to return their values
}

This block of code is much more straightforward. First, if (n < 2), meaning n is 0 or 1, then we return 0 or 1 (recall the first element of the Fibonacci sequence is 0 and the second is 1). If n is larger than 1, then we spawn two futures, n1 and n2. Each of these futures represents an asynchronous, recursive call to fibonacci(). After we've created both futures, we wait for both of them to finish computing, and then we add them together, and return that value as our result. The recursive call tree will continue until n is equal to 0 or 1, at which point the value can be returned because it is implicitly known. When this termination condition is reached, the futures can then be added up, producing the n-th value of the Fibonacci sequence.

This program will print out a hello world message on every OS-thread on every locality. The output will look something like this:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 1 on locality 1
hello world from OS-thread 0 on locality 0
hello world from OS-thread 0 on locality 1
Setup

The source code for this example can be found here: hello_world.cpp.

To compile this program, go to your HPX build directory (see Getting Started for information on configuring and building HPX) and enter:

make examples.quickstart.hello_world

To run the program type:

./bin/hello_world

This should print:

hello world from OS-thread 0 on locality 0

To use more OS-threads use the command line option --hpx:threads and type the number of threads that you wish to use. For example, typing:

./bin/hello_world --hpx:threads 2

will yield:

hello world from OS-thread 1 on locality 0
hello world from OS-thread 0 on locality 0

Notice how the ordering of the two print statements will change with subsequent runs. To run this program on multiple localities please see the section Using PBS.

Walkthrough

Now that you have compiled and run the code, let's look at how the code works, beginning with main():

Here is the main entry point. By using the include 'hpx/hpx_main.hpp' HPX will invoke the plain old C-main() as its first HPX thread.

int main()
{
    // Get a list of all available localities.
    std::vector<hpx::naming::id_type> localities =
        hpx::find_all_localities();

    // Reserve storage space for futures, one for each locality.
    std::vector<hpx::lcos::future<void> > futures;
    futures.reserve(localities.size());

    for (hpx::naming::id_type const& node : localities)
    {
        // Asynchronously start a new task. The task is encapsulated in a
        // future, which we can query to determine if the task has
        // completed.
        typedef hello_world_foreman_action action_type;
        futures.push_back(hpx::async<action_type>(node));
    }

    // The non-callback version of hpx::lcos::wait_all takes a single parameter,
    // a vector of futures to wait on. hpx::wait_all only returns when
    // all of the futures have finished.
    hpx::wait_all(futures);
    return 0;
}

In this excerpt of the code we again see the use of futures. This time the futures are stored in a vector so that they can easily be accessed. hpx::lcos::wait_all() is a family of functions that wait on for an std::vector<> of futures to become ready. In this piece of code, we are using the synchronous version of hpx::lcos::wait_all(), which takes one argument (the std::vector<> of futures to wait on). This function will not return until all the futures in the vector have been executed.

In the Fibonacci Example, we used hpx::find_here() to specified the target' of our actions. Here, we instead use hpx::find_all_localities(), which returns an std::vector<> containing the identifiers of all the machines in the system, including the one that we are on.

As in the Fibonacci Example our futures are set using hpx::async<>(). The hello_world_foreman_action is declared here:

// Define the boilerplate code necessary for the function 'hello_world_foreman'
// to be invoked as an HPX action.
HPX_PLAIN_ACTION(hello_world_foreman, hello_world_foreman_action);

Another way of thinking about this wrapping technique is as follows: functions (the work to be done) are wrapped in actions, and actions can be executed locally or remotely (e.g. on another machine participating in the computation).

Now it is time to look at the hello_world_foreman() function which was wrapped in the action above:

void hello_world_foreman()
{
    // Get the number of worker OS-threads in use by this locality.
    std::size_t const os_threads = hpx::get_os_thread_count();

    // Find the global name of the current locality.
    hpx::naming::id_type const here = hpx::find_here();

    // Populate a set with the OS-thread numbers of all OS-threads on this
    // locality. When the hello world message has been printed on a particular
    // OS-thread, we will remove it from the set.
    std::set<std::size_t> attendance;
    for (std::size_t os_thread = 0; os_thread < os_threads; ++os_thread)
        attendance.insert(os_thread);

    // As long as there are still elements in the set, we must keep scheduling
    // HPX-threads. Because HPX features work-stealing task schedulers, we have
    // no way of enforcing which worker OS-thread will actually execute
    // each HPX-thread.
    while (!attendance.empty())
    {
        // Each iteration, we create a task for each element in the set of
        // OS-threads that have not said "Hello world". Each of these tasks
        // is encapsulated in a future.
        std::vector<hpx::lcos::future<std::size_t> > futures;
        futures.reserve(attendance.size());

        for (std::size_t worker : attendance)
        {
            // Asynchronously start a new task. The task is encapsulated in a
            // future, which we can query to determine if the task has
            // completed.
            typedef hello_world_worker_action action_type;
            futures.push_back(hpx::async<action_type>(here, worker));
        }

        // Wait for all of the futures to finish. The callback version of the
        // hpx::lcos::wait_each function takes two arguments: a vector of futures,
        // and a binary callback.  The callback takes two arguments; the first
        // is the index of the future in the vector, and the second is the