Document number: P2643R2.
Date: 2024-01-11.
Reply to: Gonzalo Brito Gadeschi <gonzalob _at_ nvidia.com>.
Authors: Gonzalo Brito Gadeschi, Olivier Giroux, Thomas Rodgers.
Audience: LEWG.

Improving C++ concurrency features

Revisions

P2 - (pre-Tokyo submitted)

D2 - (post-Varna draft)

P1 - (Varna submitted)

D1 - (post-Kona draft)

Introduction

P1135R6 introduced serval new concurrency primitives to the C++20 concurrency library:

Though each element included was long coming, and had much implementation experience behind it, fresh user feedback tells us that some improvements could still be made:

  1. Return last observed value from atomic/atomic_ref::wait; this value is lost otherwise.
  2. Add timed versions of atomic/atomic_ref/atomic_flag::wait APIs and other concurrency primitves like barrier and latch, to make it easier to implement concurrency primitives that expose timed waiting facilities themselves by reusing these (e.g., to enable implementing <semaphore>, which already exposes try_acquire/try_acquire_for/try_acquire_unti, on top of atomic).
  3. Avoid spurious polling in atomic/atomic_ref/atomic_flag::wait by accepting a predicate.

This proposal proposes extensions to address these shortcomings. This branch demonstrates its implementability in libstdc++.

Design

The design of the features above is mostly orthogonal, and this section explores them independently.

Return last observed value on wait success

The design to return the last observed value on wait success adds a new API that returns the old value:

template <class T> T atomic<T>::wait_value( T old, memory_order order = memory_order::seq_cst ) const noexcept;

A new template member is added to respect the WG21 policy of avoiding breaking the ABI of atomic::wait.

Example 0: wait-value
Before After
std::atomic<int> a(42);
a.wait(42);
auto o = a.load();
assert(o != 42); // MAY FAIL!
std::atomic<int> a(42);
auto o = a.wait_value(42);

assert(o != 42); // OK!  

The atomic<T>::wait_value method guarantees that the thread is unblocked only if the value changed.

Before this paper, the new atomic<T> value that unblocked the wait is not returned to the caller. This has the following two shortcomings:

After this paper, the value returned by wait_value is returned to the caller, eliminating the need for the subsequent load.

API naming

This proposal names this new API wait_value. Some other options are:

Fallible and timed waiting APIs

The design of the fallible timed versions of wait APIs adds three new APIs to atomic, atomic_ref, atomic_flag, barrier, and latch (sempahore already has try_acquire/try_acquire_for, and try_acquire_until). For atomic these are

template <class T> optional<T> atomic<T>::try_wait( T value, memory_order order = memory_order::seq_cst ) const noexcept; template <class T, class Rep, class Period> optional<T> atomic<T>::wait_for( T value, duration<Rep, Period> const& rel_time, memory_order order = memory_order::seq_cst ) const; template <class T, class Clock, class Duration> optional<T> atomic<T>::wait_until( T value, time_point<Clock, Duration> const& abs_time, memory_order order = memory_order::seq_cst ) const;

They are non-blocking, i.e., they eventually return to the caller in a finite-set of steps, even if the value did not change. This enables the application to “do something else” before attempting to wait again.

On failure, i.e., if the value did not change, they return nullopt and the operation has no effects (it does not synchronize). On success, they return an optional<T> containing the last observed value, which is guaranteed to be different from the one the call site waited on.

The untimed try_wait overload waits for a finite unspecified duration. The implementation may pick a different duration every time, which is why assigning implementation-specific default arguments to the other untimed wait APIs does not suffice. This overload enables the implementation to attempt to wait for a dynamic system-specific amount of time (e.g. depending on system latencies, load, etc.). Furthermore, try_wait is noexcept, but the other APIs wait_for and wait_until may throw timeout-related exceptions.

Since <chrono> and <optional> are not freestanding, these APIs will not be available in freestanding implementations. C++23+ has mechanisms to partially support these in free-standing. We should attempt to support a subset of these new concurrency APIs in freestanding by:

In the following Example 1, the atomic variable t tracks how many tasks need to be processed. As tasks are processed, this counter is decremented. In the example, the application reports progress by printing the number of remaining tasks every second:

Example 1: Print remaining tasks every 1s.
Before After
std::atomic<int> t;
int rem = t.load();
auto b = clock::now();
while (rem != 0) {
 rem = t.load();
 auto e = clock::now()
 if ((e - b) > 1s) {
   cout << rem;
   b = e;
 }
}
std::atomic<int> t;
int rem = t.load();

while (rem != 0) {
 auto o = t.wait_for(rem, 1s);
 rem = o.value_or(rem);
 cout << rem;
} 

    
    

Before this proposal, applications need to re-implement atomic<T>::wait logic, since it may block for a duration that exceeds the 1s reporting time. Doing this is properly is non-trivial and error prone, e.g., this example accidentally calls atomic<T>::load in a loop without any back-off.

After this proposal, the application uses wait_for to efficiently and correctly wait for at most 1s.

For barrier and latch, the proposed fallible wait APIs accept arrival_token&, since the token is re-used across multiple API calls. Since C++23, the wait APIs may modify the barrier value and advance the phase, but implementations that do so use mutable internally, and this proposal keeps them as const methods for consistency with the current wait APIs.

The proposed fallible APIs are the following:

template <class CF> bool barrier<CF>::try_wait( arrival_token& tok ) const; template <class CF, class Rep, class Period> bool barrier<CF>::wait_for( arrival_token& tok, duration<Rep, Period> const& rel_time ) const; template <class CF, class Clock, class Duration> bool barrier<CF>::wait_until( arrival_token& tok, time_point<Clock, Duration> const& abs_time ) const; // bool latch::try_wait() const noexcept; // Available since C++20 template <class Rep, class Period> bool latch::wait_for( duration<Rep, Period> const& rel_time ) const; template <class Clock, class Duration> bool latch::wait_until( time_point<Clock, Duration> const& abs_time ) const;

In the following Example 2, an application uses a barrier to track the global amount of tasks to be processed. Once all tasks have been processed, the barrier completes. The processing thread processes its thread-local tasks first, marking the completion of its tasks by arriving at the barrier with the processed task count. Instead of blocking and idling until all tasks have been processed, the processing thread gives other threads 1 ms to complete their tasks, and on failure, it attempts to help other threads by stealing some of their tasks, until all tasks have been completed. In the same way that arrive and wait enable overlapping independent work in-between arriving and waiting at a barrier, fallible wait methods enable overlapping independent work while waiting on a barrier:

// Example 2 std::barrier b(task_count); // Processing thread: auto processed_task_count = process_thread_local_tasks(); auto t = b.arrive(processed_task_count); while (!b.wait_for(t, 1ms)) { auto stolen_task_count = steal_and_process_tasks(); b.arrive(stolen_task_count); }

Predicated waiting APIs

The wait APIs of C++ concurrency primitives wait for a value to change from x to some other value. It is very common for applications to need waiting on a more complex condition, e.g., “wait for the value to change to precisely 42”, i.e., “wait until x == 42”.

With the current waiting APIs, the application is notified every time the value changes. This is very flexible, since it enables implementing any desired logic on top. The following Example 3 shows how to wait until x == 42:

// Example 3: wait until x == 42. std::atomic<int> x; int last = x.load(); while (last != 42) { // Wait on 'x != last': last = x.wait_value(last); } assert(last == 42);

Unfortunately, this is a forward progress, performance, and energy efficiency “gotcha”. Programs that wait for a condition different from “not equal to” (e.g. “wait for x == 42” above) using atomic::wait APIs include a re-try loop around the wait operation as shown in Example 3. The implementation is oblivious to the fact that the program has already been waiting for some time on a more complex condition, and each call to wait in this re-try loop looks to the implementation as the first call to wait.

This is problematic, because it leads to re-executing the implementation short-term polling strategy. Implementations do not implement waiting as simple busy-polling (loading the value in a loop). Instead they use concurrent algorithms that depend on “how long has this thread been waiting” to schedule system threads appropriately. If a thread is waiting for the first time, it’ll get many resources to provide low latency in case the condition is met quickly. As the waiting time increases, threads get less resources, to enable other threads in the system to run. This is crucial for ensuring forward progress of the whole system, since if a waiting thread prevents other threads from running, the condition its waiting on may never be met, causing the application to hang.

A waiting API that accepts a predicate instead of a value enbles the application to push the program-defined condition into atomic::wait, avoiding the outer re-try loop, and enabling the implementation to track time spent. At least two C++ standard library implementations currently already internally implement atomic::wait in terms of a wait taking a predicate.

The proposed design for the predicated atomic::wait API is analogous to condition_variable::wait API, which take a stop_waiting predicate. None of the APIs is noexcept, since the predicate is allowed to throw. The design picks an argument order that differs from condition_variable: the order of arguments for condition_variable is “(lock, chrono duration/time point, predicate)”, but for the proposed APIs, and just like for atomic::wait_for/_until, the condition (old value or stop_predicate) comes before the chrono types, which comes before the memory_order argument which has a default value.

The proposed design for the predicated atomic::wait and atomic_ref::wait APIs is:

// Untimed: blocks. template <class T, class P> requires predicate<P, T> T atomic<T>::wait_with_predicate( P&& stop_waiting, memory_order = memory_order::seq_cst ) const; // Timed, unspecified duration. template <class T, class P> requires predicate<P, T> optional<T> atomic<T>::try_wait_with_predicate( P&& stop_waiting, memory_order = memory_order::seq_cst ) const; // Timed duration template <class T, class P, class Rep, class Period> requires predicate<P, T> optional<T> atomic<T>::wait_for_with_predicate( P&& stop_waiting, duration<Rep, Period> const& rel_time, memory_order = memory_order::seq_cst ) const; // Time point template <class T, class P, class Clock, class Duration> requires predicate<P, T> optional<T> atomic<T>::wait_until_with_predicate( P&& stop_waiting, time_point<Clock, Duration> const& abs_time, memory_order = memory_order::seq_cst ) const;
Example 4: before/after vs Example 3.
Before After
std::atomic<int> x;
int last = x.load();
while (last != 42) {
   // Wait on 'x != last':
   last = x.wait_value(last);   
}
assert(last == 42);
std::atomic<int> x;
int last =
 x.wait_with_predicate([](int v) {
   return x == 42; 
});
    
assert(last == 42);

Before this proposal, the application that needs to wait on x == 42 needs a re-try loop that causes the implementation to pick the short-term polling strategy every time x changes.

After this proposal, the application passes a predicate to wait on x == 42. While x may change many times until this predicate is satisfied, the implementation is aware that x changing is not the condition the application is waiting on.

Wording

Return last observed value from atomic::wait

Add new APIs to the list of atomic waiting operations in the Note at [atomics.wait#2]:

[Note 2: The following functions are atomic waiting operations:

  1. atomic<T>::wait and atomic<T>::wait_value,
  2. atomic_flag::wait,
  3. atomic_wait and, atomic_wait_explicit, atomic_wait_value, and atomic_wait_value_explicit,
  4. atomic_flag_wait and atomic_flag_wait_explicit, and
  5. atomic_ref<T>::wait and `atomic_ref<T>::wait_value.
    — end note]

To [atomics.syn]:

namespace std {
 // [atomics.nonmembers], non-member functions
 template<class T>
 void atomic_wait(const volatile atomic<T>*,                                   // freestanding
                  typename atomic<T>::value_type) noexcept;
 template<class T>
 void atomic_wait(const atomic<T>*, typename atomic<T>::value_type) noexcept;  // freestanding
 template<class T>
 void atomic_wait_explicit(const volatile atomic<T>*,                          // freestanding
                           typename atomic<T>::value_type,
                           memory_order) noexcept;
 template<class T>
 void atomic_wait_explicit(const atomic<T>*, typename atomic<T>::value_type,   // freestanding
                           memory_order) noexcept;
 template<class T>
 typename atomic<T>::value_type
 atomic_wait_value(const volatile atomic<T>*,                 // freestanding
                   typename atomic<T>::value_type) noexcept;
 template<class T>
 typename atomic<T>::value_type 
 atomic_wait_value(const atomic<T>*,                          // freestanding
                   typename atomic<T>::value_type) noexcept;
 template<class T>
 typename atomic<T>::value_type
 atomic_wait_value_explicit(const volatile atomic<T>*,        // freestanding
                           typename atomic<T>::value_type,
                           memory_order) noexcept;
 template<class T>
 typename atomic<T>::value_type
 atomic_wait_value_explicit(const atomic<T>*,                 // freestanding
                            typename atomic<T>::value_type,
                            memory_order) noexcept;
}

To [atomics.ref.generic.general]:

namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    T wait_value(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.ref.ops]:

void wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
T wait_value(T old, memory_order order = memory_order::seq_cst) const noexcept;

To [atomics.ref.int]:

namespace std {
  template<> struct atomic_ref<integral> {
    integral wait_value(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.ref.float]:

namespace std {
  template<> struct atomic_ref<floating-point> {
    floating-point wait_value(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.ref.pointer]:

namespace std {
  template<class T> struct atomic_ref<T*> {
    T* wait_value(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.generic.general]:

namespace std {
  template<class T> struct atomic {
    T wait_value(T, memory_order = memory_order::seq_cst) const volatile noexcept;
    T wait_value(T, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.operations]:

void wait(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
void wait(T old, memory_order order = memory_order::seq_cst) const noexcept;
T wait_value(T old, memory_order order = memory_order::seq_cst) const volatile noexcept;
T wait_value(T old, memory_order order = memory_order::seq_cst) const noexcept;

To [atomics.types.int]:

namespace std {
  template<> struct atomic<integral> {
    integral wait_value(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
    integral wait_value(integral, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.float]:

namespace std {
  template<> struct atomic<floating-point> {
    floating-point wait_value(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
    floating-point wait_value(floating-point, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [atomics.types.pointer]:

namespace std {
  template<class T> struct atomic<T*> {
    T* wait_value(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
    T* wait_value(T*, memory_order = memory_order::seq_cst) const noexcept;
  };
}

To [util.smartptr.atomic.shared]:

namespace std {
  template<class T> struct atomic<shared_ptr<T>> {
    shared_ptr<T> wait_value(shared_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}

and

void wait_value(shared_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
shared_ptr<T>  wait_value(shared_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;

To [util.smartptr.atomic.weak]:

namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    weak_ptr<T> wait_value(weak_ptr<T> old, memory_order = memory_order::seq_cst) const noexcept;
  };
}
void wait(weak_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;
weak_ptr<T> wait_value(weak_ptr<T> old, memory_order order = memory_order::seq_cst) const noexcept;

No changes to [atomics.nonmembers] are needed.

No changes to [atomic.flag]'s wait APIs are needed.

Fallible and timed versions of ::wait APIs

Add new APIs to the list of atomic waiting operations in the Note at [atomics.wait#2]:

[Note 2: The following functions are atomic waiting operations:

  1. atomic<T>::wait, atomic<T>::try_wait, atomic<T>::wait_for, atomic::<T>::wait_until,
  2. atomic_flag::wait, atomic_flag::try_wait, atomic_flag::wait_for, atomic_flag::wait_until,
  3. atomic_wait and, atomic_wait_explicit, atomic_try_wait, and atomic_try_wait_explicit,
  4. atomic_flag_wait and, atomic_flag_wait_explicit, atomic_flag_try_wait, atomic_flag_try_wait_explicit,and
  5. atomic_ref<T>::wait, atomic_ref<T>::try_wait, atomic_ref<T>::wait_for, atomic_ref<T>::wait_until.
    end note]

To [atomics.syn]:

EDITORIAL: only APIs that do not use <optional> or <chrono> added for C compatibility. That is, only try_wait is added for C compatibility, wait_for and wait_until are not added here.

namespace std {
 // [atomics.flag], flag type and operations
 
 void atomic_flag_wait(const volatile atomic_flag*, bool) noexcept;  // freestanding
 void atomic_flag_wait(const atomic_flag*, bool) noexcept;           // freestanding
 void atomic_flag_wait_explicit(const volatile atomic_flag*,         // freestanding
                                 bool, memory_order) noexcept;
 void atomic_flag_wait_explicit(const atomic_flag*,                  // freestanding
                                 bool, memory_order) noexcept;
                                 
 bool atomic_flag_try_wait(const volatile atomic_flag*, bool) noexcept;  // freestanding
 bool atomic_flag_try_wait(const atomic_flag*, bool) noexcept;           // freestanding
 bool atomic_flag_try_wait_explicit(const volatile atomic_flag*,         // freestanding
                                    bool, memory_order) noexcept;
 bool atomic_flag_try_wait_explicit(const atomic_flag*,                  // freestanding
                                    bool, memory_order) noexcept;
}

To [atomics.ref.generic.general]:

namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    optional<T> try_wait(T, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T> wait_for(
      T, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<T> wait_until(
      T, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
  };
}

To [atomics.ref.ops]:

optional<T> try_wait(T old, memory_order = memory_order::seq_cst) const noexcept;
template <class Rep, class Period>
optional<T> wait_for(T old, 
    chrono::duration<Rep, Period> const& rel_time,
    memory_order order = memory_order::seq_cst
) const;
template <class Clock, class Duration>
optional<T> wait_until(T old, 
    chrono::time_point<Clock, Duration> const& abs_time,
    memory_order order = memory_order::seq_cst
) const;

To [atomics.ref.int]:

namespace std {
  template<> struct atomic_ref<integral> {
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<integral> wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<integral> wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
  };
}

To [atomics.ref.float]:

namespace std {
  template<> struct atomic_ref<floating-point> {
    optional<floating-point> try_wait(
      floating-point, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<floating-point> wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<floating-point> wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
  };
}

To [atomics.ref.pointer]:

namespace std {
  template<class T> struct atomic_ref<T*> {
    optional<T*> try_wait(T* old, memory_order = memory_order::seq_cst) const noexcept;
    template <class Rep, class Period>
    optional<T*> wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<T*> wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
  };
}

To [atomics.types.generic.general]:

namespace std {
  template<class T> struct atomic {
    optional<T> try_wait(T old, memory_order = memory_order::seq_cst) const noexcept;
    optional<T> try_wait(T old, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<T> wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
     optional<T> wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
    template <class Clock, class Duration>
    optional<T> wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<T> wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
  };
}

To [atomics.types.operations]:

optional<T> try_wait(T old, memory_order = memory_order::seq_cst) const noexcept;
optional<T> try_wait(T old, memory_order = memory_order::seq_cst) const volatile noexcept;
template <class Rep, class Period>
optional<T> wait_for(T old, 
                     chrono::duration<Rep, Period> const& rel_time,
                     memory_order order = memory_order::seq_cst
                    ) const;
template <class Rep, class Period>
optional<T> wait_for(T old, 
                     chrono::duration<Rep, Period> const& rel_time,
                     memory_order order = memory_order::seq_cst
                    ) const volatile;
template <class Clock, class Duration>
optional<T> wait_until(T old, 
                       chrono::time_point<Clock, Duration> const& abs_time,
                       memory_order order = memory_order::seq_cst
                      ) const;
template <class Clock, class Duration>
optional<T> wait_until(T old, 
                       chrono::time_point<Clock, Duration> const& abs_time,
                       memory_order order = memory_order::seq_cst
                      ) const volatile;

To [atomics.types.int]:

namespace std {
  template<> struct atomic<integral> {
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const noexcept;
    optional<integral> try_wait(integral, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<integral> wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Rep, class Period>
    optional<integral> wait_for(
      integral, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
    template <class Clock, class Duration>
    optional<integral> wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<integral> wait_until(
      integral, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
  };
}

To [atomics.types.float]:

namespace std {
  template<> struct atomic<floating-point> {
    optional<floating-point> try_wait(floating-point, memory_order = memory_order::seq_cst) const noexcept;
    optional<floating-point> try_wait(floating-point, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<floating-point> wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Rep, class Period>
    optional<floating-point> wait_for(
      floating-point, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
    template <class Clock, class Duration>
    optional<floating-point> wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<floating-point> wait_until(
      floating-point, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
  };
}

To [atomics.types.pointer]:

namespace std {
  template<class T> struct atomic<T*> {
    optional<T*> try_wait(T*, memory_order = memory_order::seq_cst) const noexcept;
    optional<T*> try_wait(T*, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    optional<T*> wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Rep, class Period>
    optional<T*> wait_for(
      T*, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
    template <class Clock, class Duration>
    optional<T*> wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<T*> wait_until(
      T*, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
  };
}

To [util.smartptr.atomic.shared]:

optional<shared_ptr<T>> try_wait(
    shared_ptr<T> old, 
    memory_order order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<shared_ptr<T>> wait_for(
    shared_ptr<T> old, 
    chrono::duration<Rep, Period> const& rel_time, 
    memory_order order = memory_order::seq_cst
) const;
template <class Clock, class Duration>
optional<shared_ptr<T>> wait_until(
    shared_ptr<T> old, 
    chrono::time_point<Clock, Duration> const& abs_time, 
    memory_order order = memory_order::seq_cst
) const;

To [util.smartptr.atomic.weak]:

namespace std {
  template<class T> struct atomic<weak_ptr<T>> {
    optional<weak_ptr<T>> try_wait(
      weak_ptr<T>, 
      memory_order = memory_order::seq_cst
     ) const noexcept;
    template <class Rep, class Period>
    optional<weak_ptr<T>> wait_for(
      weak_ptr<T>l, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
     ) const;
    template <class Clock, class Duration>
    optional<weak_ptr<T>> wait_until(
      weak_ptr<T>, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
  };
}
optional<weak_ptr<T>> try_wait(
    weak_ptr<T> old, 
    memory_order order = memory_order::seq_cst
) const noexcept;
template <class Rep, class Period>
optional<weak_ptr<T>> wait_for(
    weak_ptr<T> old, 
    chrono::duration<Rep, Period> const& rel_time, 
    memory_order order = memory_order::seq_cst
) const;
template <class Clock, class Duration>
optional<weak_ptr<T>> wait_until(
    weak_ptr<T> old, 
    chrono::time_point<Clock, Duration> const& abs_time, 
    memory_order order = memory_order::seq_cst
) const;

To [atomic.flag]:

namespace std {
  struct atomic_flag {
    bool try_wait(bool, memory_order = memory_order::seq_cst) const noexcept;
    bool try_wait(bool, memory_order = memory_order::seq_cst) const volatile noexcept;
    template <class Rep, class Period>
    bool wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const;
    template <class Rep, class Period>
    bool wait_for(
      bool, chrono::duration<Rep, Period> const& rel_time, 
      memory_order = memory_order::seq_cst
    ) const volatile;
    template <class Clock, class Duration>
    bool wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const;
     template <class Clock, class Duration>
     bool wait_until(
      bool, chrono::time_point<Clock, Duration> const& abs_time, 
      memory_order = memory_order::seq_cst
     ) const volatile;
  };
}
bool atomic_flag_try_wait(const atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait(const volatile atomic_flag* object, bool old) noexcept;
bool atomic_flag_try_wait_explicit(const atomic_flag* object, bool old, memory_order order) noexcept;
bool atomic_flag_try_wait_explicit(const volatile atomic_flag* object, bool old, memory_order order) noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const noexcept;
bool atomic_flag::try_wait(bool old, memory_order order = memory_order::seq_cst) const volatile noexcept;

For atomic_flag_try_wait let order be memory_order::seq_cst. Let flag be object for the non-member functions, and this for the member functions.

To [thread.barrier]:

namespace std {
  template <class Completion Function>
  class barrier {
  
  public:
    bool try_wait(arrival_token& tok) const;
    template <class Rep, class Period>
    bool wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}
bool try_wait(arrival_token& tok) const;
template <class Rep, class Period>
bool wait_for(arrival_token& tok, chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool wait_until(arrival_token& tok, chrono::time_point<Clock, Duration> const& abs_time) const;

To thread.latch:

namespace std {
  class latch {
  public:
    bool try_wait() const noexcept;
    template <class Rep, class Period>
    bool wait_for(chrono::duration<Rep, Period> const& rel_time) const;
    template <class Clock, class Duration>
    bool wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;
  };
}
bool try_wait() const noexcept;

SG1: the change below reformulates try_wait in terms of a timeout. This seems equivalent to the current formulation, but may be a breaking change.

template <class Rep, class Period>
bool wait_for(chrono::duration<Rep, Period> const& rel_time) const;
template <class Clock, class Duration>
bool wait_until(chrono::time_point<Clock, Duration> const& abs_time) const;

Fallible, timed, and predicated versions of ::wait APIs

Add new APIs to the list of atomic waiting operations in the Note at [atomics.wait#2]:

[Note 2: The following functions are atomic waiting operations:

  1. atomic<T>::wait, atomic<T>::wait_with_predicate, atomic<T>::try_wait_with_predicate, atomic<T>::wait_for_with_predicate, atomic::<T>::wait_until_with_predicate,
  2. atomic_flag::wait, atomic_flag::wait_with_predicate, try_wait_with_predicate, wait_for_with_predicate, wait_until_with_predicate,
  3. atomic_wait and atomic_wait_explicit,
  4. atomic_flag_wait and, atomic_flag_wait_explicit, and
  5. atomic_ref<T>::wait, atomic_ref<T>::wait_with_predicate, atomic_ref<T>::try_wait_with_predicate, atomic_ref<T>::wait_for_with_predicate, atomic_ref<T>::wait_until_with_predicate.
    end note]

To [atomics.ref.generic.general]:

namespace std {
  template<class T> struct atomic_ref {  // [atomics.ref.generic.general]
    template <class T, class P>
      requires predicate<P, T>
    T wait_with_predicate(
        P&& stop_waiting, 
        memory_order = memory_order::seq_cst) const;
    template <class T, class P>
      requires predicate<P, T>
    optional<T> 
    try_wait_with_predicate(
        P&& stop_waiting, 
        memory_order = memory_order::seq_cst) const;
    template <class T, class P, class Rep, class Period>
      requires predicate<P, T>
    optional<T> wait_for_with_predicate(
        duration<Rep, Period> const& rel_time,
        P&& stop_waiting, 
        memory_order = memory_order::seq_cst) const;
    template <class T, class P, class Clock, class Duration>
      requires predicate<P, T>
    optional<T> wait_until_with_predicate(
        P&& stop_waiting, 
        time_point<Clock, Duration> const& abs_time, 
        memory_order = memory_order::seq_cst) const;
  };
}

To [atomics.ref.ops]:

template <class T, class P>
    requires predicate<P, T>
T wait_with_predicate(
    P&& stop_waiting, 
    memory_order = memory_order::seq_cst) const;
template <class T, class Predicate>
    requires predicate<P, T>
optional<T> try_wait_with_predicate(
    P&& stop_waiting, 
    memory_order = memory_order::seq_cst) const;
template <class T, class Rep, class Period, class Predicate>
    requires predicate<P, T>
optional<T> wait_for_with_predicate(
    duration<Rep, Period> const& rel_time,
    P&& stop_waiting, 
    memory_order = memory_order::seq_cst) const;
template <class T, class Clock, class Duration, class Predicate>
    requires predicate<P, T>
optional<T> wait_until_with_predicate(
    P&& stop_waiting, 
    time_point<Clock, Duration> const& abs_time, 
    memory_order = memory_order::seq_cst) const;

EDITORIAL: intentionally omitting all other modifications required for the predicated APIs until initial design feedback from LEWG. But intended to be analogous for all atomic_ref and atomic specializations, and for atomic_flag.