4 Improving C++ concurrency features

Document number: P2643R0
Date: 2022-09-15
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers
Audience: Concurrency

Improving C++ concurrency features

Revisions

This is the initial revision.

Introduction

When we applied P1135R6 to C++20, we introduced several new concurrency constructs to the C++ concurrency library:

Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made.

Proposed direction

The following is a grossly priority-ordered list of requests that users and implementers both have voiced over the last year:

  1. Add timed versions of atomic::wait.

The primary purpose of this facility is to make it easier to implement other concurrency facilities, but often these other facilities expose timed waiting facilities themselves. Without timed versions of wait, the programmer is left to ad-hoc solutions for timed waiting facilities, and perhaps even all waiting facilities. Anecdotally, at least two implementations of C++20 have added internal timed versions of this facility to implement <semaphore>.

Adding timed versions of atomic::wait removes hurdles to adoption of this facility for its intended purpose.

Adding timed versions of atomic::wait will require a discussion of what facilities from <chrono> need to be present in <atomic> for freestanding implementations.

  1. Return the last observed value from atomic::wait.

After the return from wait, it is common for programs to reload the value of the atomic object. By necessity, the implementation of wait already loaded this value, to compare it with the operand supplied and return non-spuriously. This is duplicate work which, in principle, could be optimized away by the compiler but conservatively isn’t.

Returning the value from atomic::wait is a straightforward way to recover performance lost from the duplicate work.

  1. Avoid spurious polling in atomic::wait with at least one of:
    a. Add an overload of wait taking a predicate instead of a value.

    When the program is waiting for a condition different from “not equal to”, there is an added re-try loop around the wait operation in the program. This loop causes each call to wait to be performed as if it were the first call to wait, oblivious to the fact that the program has already been waiting for some time. This leads to re-executing the short-term polling strategy.

    Taking a predicate instead of a value allows us to push the program-defined condition inside of atomic::wait, delete the outer loop, and allows the implementation to track time spent.

    At least two implementations currently implement atomic::wait in terms of a wait taking a predicate.

    b. Add a hint operand to wait to steer the internal strategy.

    By default, that short-term strategy inside of wait is to poll the atomic object’s value for some time, so as to avoid limiting the responsiveness of the program to that of the operating system kernel’s scheduler. Sometimes, however, it is known that either (a) an event cannot or is not hoped to occur in this short of a window of time, or (b) the program has already supplied its own polling strategy before the call to wait, or © this call to wait is not the first and should be considered a long-term wait.

    Taking a hint would let the program indicate whether the short-term strategy of atomic::wait should execute or not.

  2. Add timed versions of barrier::wait and latch::wait also.

Since every waiting facility in the concurrency library has timed wait functions at this point, it makes sense to add timed versions of these as well.

Although this is a very weak reason to do anything, there is also no clear reason why we should not do it.

Design

The design of the features above is mostly orthogonal, and this section explores them independently.

  1. Return last observed value from atomic ::wait APIs: solved as voidT wait(…);
  2. Fallible and timed versions of wait APIs:
    • Solved by adding:

      • optional<T> try_wait(...),
      • optional<T> try_wait_for(..., chrono::duration<Rep, Period> const&), and
      • optional<T> try_wait_until(..., chrono::time_point<Clock, Duration> const&)

      methods that return nullopt if the wait operation did not synchronize, and an optional<T> containing the T value observed if it did synchronize.

Wording

Return last observed value from atomic::wait

To [atomics.ref.generic.general]:

namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}

UNRESOLVED QUESTION: all atomic_ref types are missing volatile wait overloads?

To [atomics.ref.ops]:

voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;

To [atomics.ref.int]:

namespace std {
  template<> struct atomic_ref<integral> {
    voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.ref.float]:

namespace std {
  template<> struct atomic_ref<floating-point> {
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.ref.pointer]:

namespace std {
  template<class T> struct atomic_ref<T*> {
    voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.generic.general]:

namespace std {
  template<class T> struct atomic {
    voidT wait(T, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.operations]:

voidT wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
voidT wait(T old, memory_order order = memory_order::seq_cst) const noexcept;

To [atomics.types.int]:

namespace std {
  template<> struct atomic<integral> {
    voidT wait(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.float]:

namespace std {
  template<> struct atomic<floating-point> {
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.pointer]:

namespace std {
  template<class T> struct atomic<T*> {
    voidT* wait(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
    voidT* wait(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [util.smartptr.atomic.shared]:

namespace std {
  template<class T> struct atomic<shared_ptr<T>> {
    voidshared_ptr<T> wait(shared_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}

and

voidshared_ptr<T> wait(shared_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;

To [util.smartptr.atomic.weak]:

namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    voidweak_ptr<T> wait(weak_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}
voidweak_ptr<T> wait(weak_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;

No changes to [atomics.nonmembers] are needed.

No changes to [atomic.flag]'s wait APIs are needed.

Fallible and timed-versions of ::wait APIs

To [atomics.ref.generic.general]:

namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    
    optional<T> try_wait(T, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T> try_wait_for(
      T, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      T, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

UNRESOLVED QUESTION: all atomic_ref types are missing volatile wait overloads?

To [atomics.ref.ops]:

optional<T> try_wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
    chrono::duration<Rep, Period> const& rel_time,
    memory_order order = memory_order::seq_cst
) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
    chrono::time_point<Clock, Duration> const& abs_time,
    memory_order order = memory_order::seq_cst
) const noexcept;

To [atomics.ref.int]:

namespace std {
  template<> struct atomic_ref<integral> {
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

To [atomics.ref.float]:

namespace std {
  template<> struct atomic_ref<floating-point> {
    optional<floating-point> try_wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

To [atomics.ref.pointer]:

namespace std {
  template<class T> struct atomic_ref<T*> {
    optional<T*> try_wait(T*, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

To [atomics.types.generic.general]:

namespace std {
  template<class T> struct atomic {
    optional<T> try_wait(T, memory_order = memory_order::seq_cst) const noexcept;
    optional<T> try_wait(T, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<T> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
     optional<T> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
     
  };
}

To [atomics.types.operations]:

optional<T> try_wait(T, memory_order = memory_order::seq_cst) const noexcept;
optional<T> try_wait(T, memory_order = memory_order::seq_cst) const volatile noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
                         chrono::duration<Rep, Period> const& rel_time,
                         memory_order order = memory_order::seq_cst
                        ) const noexcept;
template <class Rep, class Period>
optional<T> try_wait_for(T old, 
                         chrono::duration<Rep, Period> const& rel_time,
                         memory_order order = memory_order::seq_cst
                        ) const volatile noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
                           chrono::time_point<Clock, Duration> const& abs_time,
                           memory_order order = memory_order::seq_cst
                          ) const noexcept;
template <class Clock, class Duration>
optional<T> try_wait_until(T old, 
                           chrono::time_point<Clock, Duration> const& abs_time,
                           memory_order order = memory_order::seq_cst
                          ) const volatile noexcept;

EDITORIAL: analogous to atomic_ref. Intentionally left out from the current revision of this paper.

To [atomics.types.int]:

namespace std {
  template<> struct atomic<integral> {
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const noexcept;
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<integral> try_wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<integral> try_wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}

To [atomics.types.float]:

namespace std {
  template<> struct atomic<floating-point> {
    optional<floating-point> try_wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
    optional<floating-point> try_wait(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> try_wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<floating-point> try_wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}

To [atomics.types.pointer]:

namespace std {
  template<class T> struct atomic<T*> {
    optional<T*> try_wait(T*, memory_order = memory_order::seq_cst) const noexcept;
    optional<T*> try_wait(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<T*> try_wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<T*> try_wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}

To [util.smartptr.atomic.shared]:

namespace std {
  template<class T> struct atomic<shared_ptr<T>> {
    optional<shared_ptr<T>> try_wait(shared_ptr<T>, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<shared_ptr<T>> try_wait_for(
      shared_ptr<T>l, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<shared_ptr<T>> try_wait_until(
      shared_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

EDITORIAL: analogous to the try_wait APIS of atomic_ref, with shared_ptr/weak_ptr tweaks. Intentionally left out of the current revision of this paper.

To [util.smartptr.atomic.weak]:

namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    optional<weak_ptr<T>> try_wait(weak_ptr<T>, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<weak_ptr<T>> try_wait_for(
      weak_ptr<T>l, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Clock, class Duration>
    optional<weak_ptr<T>> try_wait_until(
      weak_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
  };
}

EDITORIAL: analogous to the try_wait APIS of atomic_ref, with shared_ptr/weak_ptr tweaks. Intentionally left out of the current revision of this paper.

EDITORIAL: No changes to [atomics.nonmembers] are needed.

To [atomic.flag]:

namespace std {
  struct atomic_flag {
    bool try_wait(bool, memory_order = memory_order::seq_cst) const noexcept;
    bool try_wait(bool, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    bool try_wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const noexcept;
    template <class Rep, class Period>
    bool try_wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const volatile noexcept;
    template <class Clock, class Duration>
    bool try_wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
     template <class Clock, class Duration>
     bool try_wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile noexcept;
  };
}
bool atomic_flag_try_wait(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const atomic_flag* object, bool old, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag_try_wait_explicit(const volatile atomic_flag* object, bool old, memory_order order = memory_order::seq_cst) noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const volatile noexcept;

For atomic_flag_try_wait, let order be memory_order::seq_cst. Let flag be object for the non-member functions, and this for the member functions.

EDITORIAL: analogous for the atomic_flag_try_wait_for/_until APIs. Intentionally omitted from the current revision of this paper.

UNRESOLVED QUESTION: do we need to change something else for the non-member versions of try_wait, try_wait_for, and try_wait_until operations?

UNRESOLVED QUESTION: do we need to define a “try-wait” atomic operation in atomics.wait?

To [thread.barrier]:

namespace std {
  template <class Completion Function>
  class barrier {
  
  public:
    bool try_wait(arrival_token&& tok) const;
    template <class Rep, class Period>
    bool try_wait_for(arrival_token&& tok, chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool try_wait_until(arrival_token&& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}

UNRESOLVED QUESTION: should we remove const qualification from the new APIs if P2588 is accepted?

EDITORIAL: these changes are compatible with both adding try_wait overloads that accept a memory_order (P2628) and try_wait overloads that accept a bool parity instead of an arrival_token (P2629).

bool try_wait(arrival_token&& arrival) const;

UNRESOLVED QUESTION: if P2588 is accepted, then try_wait is able to complete the phase and the Effects clause needs updating, e.g., as follows: “[…] Otherwise, if all threads have arrived try_wait may complete the phase and return true, or the call has no effects and returns false.”.

template <class Rep, class Period>
bool try_wait_for(arrival_token&& tok, chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(arrival_token&& tok, chrono::time_point<Clock, Duration> const& abs_time) const;

EDITORIAL: try_wait_for and try_wait_until shall have analogous semantics.

To thread.latch:

namespace std {
  class latch {
  public:
    template <class Rep, class Period>
    bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}
template <class Rep, class Period>
bool try_wait_for(chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool try_wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;

EDITORIAL: semantics intentionally omitted from the current revision of this paper.