Padded mdspan layouts

Document #: P2642
Date: 2023-12-05
Project: Programming Language C++
LEWG
Reply-to: Christian Trott (Sandia National Laboratories)
<>
Mark Hoemmen (NVIDIA)
<>
Damien Lebrun-Grandie (Oak Ridge National Laboratory)
<>
Nicolas Morales (Sandia National Laboratories)
<>
Malte Förster (NVIDIA)
<>
Jiaming Yuan (NVIDIA)
<>

Contents

1 Authors

2 Revision history

2.1 Revision 0

Revision 0 submitted 2022-09-14.

2.2 Revision 1

Revision 1 submitted 2022-10-15.

2.3 Revision 2

Revision 2 to be submitted 2023-01-15.

2.4 Revision 3

Revision 3 to be submitted sometime after 2023-07-09.

2.5 Revision 4

2.6 Revision 5

3 Proposed changes and justification

3.1 Summary of proposed changes

We propose two new mdspan layouts, layout_left_padded and layout_right_padded. These layouts support two use cases:

  1. array layouts that are contiguous in one dimension, as supported by commonly used libraries like the BLAS (Basic Linear Algebra Subroutines; see P1417, P1673, and P1674 for historical overview and references) and LAPACK (Linear Algebra PACKage); and

  2. “padded” storage for overaligned access of the start of every contiguous segment of the array.

We also propose changing submdspan of a layout_left resp. layout_right mdspan to return layout_left_padded resp. layout_right_padded instead of layout_stride, when the slice arguments permit it.

3.2 Two new mdspan layouts

The two new mdspan layouts layout_left_padded and layout_right_padded are strided, unique layouts. If the rank is zero or one, then the layouts behave exactly like layout_left resp. layout_right. If the rank is two or more, then the layouts implement a special case of layout_stride where only one stride may differ from the extent that in layout_left resp. layout_right would completely define the stride. We call that stride the padding stride, and the extent that in layout_left resp. layout_right would define it the extent to pad. The padding stride of layout_left_padded is stride(1), and the extent to pad is extent(0). The padding stride of layout_right_padded is stride(rank() - 2), and the extent to pad is extent(rank() - 1). All other strides of layout_left_padded are the same as in layout_left, and all other strides of layout_right_padded are the same as in layout_right.

3.2.1 Optimizations over layout_stride

The two new layouts offer the following optimizations over layout_stride.

  1. They guarantee at compile time that one extent always has stride-1 access. While layout_stride’s member functions are all constexpr, its mapping constructor takes the strides as a std::array with rank() size.

  2. They do not need to store any strides if the padding stride is known at compile time. Even if the padding stride is a run-time value, these layouts only need to store the one stride value (as index_type). The layout_stride::mapping class must store all rank() stride values.

3.2.2 New layouts unify two use cases

The proposed layouts unify two different use cases:

  1. overaligned access to the beginning of each contiguous segment of elements, and

  2. representing exactly the data layout assumed by the General (GE) matrix type in the BLAS’ C binding.

Regarding (1), an appropriate choice of padding can ensure any desired overalignment of the beginning of each contiguous segment of elements in an mdspan, as long as the entire memory allocation has the same overalignment. This is useful for hardware features that require or perform better with overaligned access, such as SIMD (Single Instruction Multiple Data) instructions.

Regarding (2), the padding stride is the same as BLAS’ “leading dimension” of the matrix (LDA) argument. Unlike layout_left and layout_right, any subview of a contiguous subset of rows and columns of a rank-2 layout_left_padded or layout_right_padded mdspan preserves the layout. For example, if A is a rank-2 mdspan whose layout is layout_left_padded<padding_stride>, then submdspan(A, tuple{r1, r2}, tuple{c1, c2}) also has layout layout_left_padded<padding_stride> with the same padding stride as before. The BLAS and algorithms that use it (such as the blocked algorithms in LAPACK) depend on this ability to operate on contiguous submatrices with the same layout as their parent. For this reason, we have replaced the layout_blas_general layout in earlier versions of our P1673 proposal with layout_left_padded and layout_right_padded. Making most effective use of the new layouts in code that uses P1673 calls for integrating them with submdspan. This is why we include submdspan integration in this proposal.

3.2.3 Consider rank-1 case as submdspan of rank-2

One review asked why required_span_size() of layout_right_padded<4>::mapping<extents<size_t, 1, 3>> is 4 instead of 3. We made that choice for the following reasons.

  1. Overalignment should imply correct SIMD access as well as pointer alignment

  2. Consistency of the rank-1 case with submdspan of a rank-2 mdspan

Regarding (1), an important design goal is use with explicit SIMD instructions. This means that we need to be able to access groups of 4 elements at a time. This is also consistent with assume_aligned<N, T>. That doesn’t just return a pointer p such that reinterpret_cast<uintptr_t>(p) is divisible by N; it returns a pointer to an object of type T whose alignment is at least N bytes. layout_right_padded<4>::mapping<extents<size_t, M, 3>> for M in [1, 4] means “assume that each row is a T[4] with byte alignment 4 * sizeof(T).”

Regarding (2), the idea is that the rank-2 or more case (with more than one row, column, etc.) controls the behavior of the rank-1 case. The rank-1 (or rank-2 but single row or column) case should act like a submdspan of the rank-2 case. It helps to understand that we intend to support the BLAS and LAPACK. For example, layout_left_padded<4>::mapping<extents<size_t, 3, 3>> means “a view of the top 3 rows of a 4 x 3 matrix” (LDA = 4, M = 3, N = 3, where LDA is a BLAS abbreviation meaning “leading dimension [of the matrix] A”). (This example switches to layout_left_padded just because the Fortran BLAS only supports column-major order, but the analogous idea applies to the layout_right_padded case that the C BLAS also supports.) Taking a submdspan of the leftmost column results in a rank-1 mdspan with a required_span_size() of 4 elements.

3.2.4 Design change from R0 to R1

A design change from R0 to R1 of this paper makes this overalignment case easier to use and more like the existing std::assume_aligned interface. In R0 of this paper, the user’s padding input parameter (either a compile-time padding_stride or a run-time value) was exactly the padding stride. As such, it had to be greater than or equal to the extent to pad. For example, if users had an extent(0) of 13 and wanted to overalign the corresponding stride(1) to a multiple of 4, they would have had to specify layout_left_padded<16>. This was inconsistent with std::assume_aligned, whose template argument (the byte alignment) would need to be 4 * sizeof(element_type). Also, users who wanted a compile-time padding stride would have needed to compute it themselves from the corresponding compile-time extent, rather than prespecifying a fixed overalignment factor that could be used for any extent. This was not only harder to use, but it made the layout itself (not just the layout mapping) depend on the extent. That was inconsistent with the existing mdspan layouts, where the layout type itself (e.g., layout_left) is always a function from extents specialization to layout mapping.

In R1 and subsequent revisions of this paper, we interpret the case where the input padding stride is less than the extent to pad as an “overalignment factor” instead of a stride. To revisit the above example, layout_left_padded<4> would take an extent(0) of 13 and round up the corresponding stride(1) to 16. However, as before, layout_left_padded<17> would take an extent(0) of 13 and round up the corresponding stride(1) to 17. The rule is consistent: the actual padding stride is always the next multiple of the input padding stride greater than or equal to the extent-to-pad.

In R0 of this paper, the following alias

using overaligned_matrix_t =
  mdspan<float, dextents<size_t, 2>, layout_right_padded<4>>;

would only be meaningful if the run-time extents are less than or equal to 4. In R1 and subsequent revisions, this alias would always mean “the padding stride rounds up the rightmost extent to a multiple of 4, whatever the extent may be.” R0 had no way to express that use case with a compile-time input padding stride. This is important for hardware features and compiler optimizations that require overalignment of multidimensional arrays.

3.2.5 Padding stride equality for layout mapping conversions

layout_left_padded<padding_stride>::mapping<Extents> has a converting constructor from layout_left_padded<other_padding_stride>::mapping<OtherExtents>. Similarly, layout_right_padded<padding_stride>::mapping<Extents> has a converting constructor from layout_right_padded<other_padding_stride>::mapping<OtherExtents>. These constructors require, among other conditions, that if padding_stride and other_padding_stride do not equal dynamic_extent, then padding_stride equals other_padding_stride.

Users may ask why they can’t convert a more overaligned mapping, such as layout_left_padded<4>::mapping, to a less overaligned mapping, such as layout_left_padded<2>::mapping. The problem is that this may not be correct for all extents. For example, the following code would be incorrect if it were well formed (it is not, in this proposal).

layout_left_padded<4>::mapping m_orig{extents{9, 2}};
layout_left_padded<2>::mapping m_new(m_orig);

The issue is that m_orig has an underlying (“physical”) layout of extents{12, 2}, but layout_left_padded<2>::mapping{extents{9, 2}} would have an underlying layout of extents{10, 2}. That is, layout_left_padded<4>::mapping{extents{9, 2}}.stride(1) is 12, but layout_left_padded<2>::mapping{extents{9, 2}}.stride(1) is 10.

In case one is tempted to permit assigning dynamic padding stride to static padding stride, the following code would also be incorrect if it were well formed (it is not, in this proposal). Again, m_orig.stride(1) is 12.

layout_left_padded<dynamic_extent>::mapping m_orig{extents{9, 2}, 4};
layout_left_padded<2>::mapping m_new(m_orig);

The following code is well formed in this proposal, and it gives m_new the expected original padding stride of 12.

layout_left_padded<dynamic_extent>::mapping m_orig{extents{9, 2}, 4};
layout_left_padded<dynamic_extent>::mapping m_new(m_orig);

Similarly, the following code is well formed in this proposal, and it gives m_new the expected original padding stride of 12.

layout_left_padded<4>::mapping m_orig{extents{9, 2}};
layout_left_padded<dynamic_extent>::mapping m_new(m_orig);

3.2.6 New layout mapping constructors in R2

R2 of this proposal adds new constructors to layout_{left,right}_padded::mapping. First, it adds default constructors that default-construct the extents_type object, but otherwise behave like the mapping(const extents_type&) constructor. That is, they fill in the correct run-time padding stride value, if this is possible given the padding_stride template argument. Second, R2 adds more converting constructors. For layout_left_padded::mapping, R2 adds a converting constructor from each of the following.

For layout_right_padded::mapping, R2 adds a converting constructor from each of the following.

3.2.7 Conversion from layout_left to layout_left_padded

The converting constructor from layout_left::mapping to layout_left_padded::mapping exists by analogy with the existing constructor layout_stride::mapping(const StridedLayoutMapping& other) ([mdspan.layout.stride.cons]) that can convert from layout_left::mapping to layout_stride::mapping. layout_left expresses a special case of layout_left_padded, just as layout_left expresses a special case of layout_stride. Thus, this is an implicit conversion as long as the conversion from the input’s extents_type to the result’s extents_type would be implicit.

This conversion is useful for C++ wrappers for the BLAS or LAPACK. layout_left_padded<dynamic_extent>::mapping<dextent<int, 2>> expresses in C++ exactly the 2-D array layout that the BLAS and LAPACK accept, including their requirement that the extents and stride(1) all be run-time values. Thus, a C++ wrapper for the BLAS (see P1673) or LAPACK might reasonably have a specialization for mdspan with layout layout_left_padded<dynamic_extent>::mapping<dextent<int, 2>>, that can call with very few error checks or layout conversions directly into an existing C or Fortran BLAS or LAPACK library. However, users would reasonably want to create their 2-D arrays as layout_left, since it’s a simpler layout that doesn’t need to store the column stride. The converting constructor from layout_left::mapping to layout_left_padded::mapping would let users or libraries easily convert from the less general layout_left to the slightly more general layout_left_padded that a C++ BLAS or LAPACK wrapper would naturally use.

3.2.8 Conversion from layout_stride to layout_left_padded

The converting constructor from layout_stride::mapping to layout_left_padded::mapping exists by analogy with the existing converting constructor from layout_stride::mapping to layout_left::mapping. This constructor is explicit for rank() > 0, because it always converts from a more general case to a more specific case.

Explicit conversions to layout_stride::mapping are useful because layout_stride::mapping can express all the layout mappings in the Standard and this proposal. It’s like a “type-erased” version of all of them. For example, a library of mdspan algorithms might reasonably convert to layout_stride::mapping for some less performance-critical algorithms, as a way to minimize algorithm instantiations for different layouts.

3.2.9 Design change from R2 to R3: extents() return type

In revisions of this proposal up to and including R2, the new layout mappings’ extents() member functions both had return type extents_type. That is, they both returned by value. We did this deliberately, so that we could specify the layout mappings in terms of the behavior of layout_left::mapping resp. layout_right::mapping with a padded extents object, without needing to store the “original” extents. However, we realized after the publication of R2 that this does not respect the existing layout mapping requirements in paragraph 6 of [mdspan.layout.reqmnts]. This specifies the return type of m.extents() for every layout mapping as const extents_type&. That is, extents() must always return by const reference.

We considered changing the layout mapping requirements to permit layout mappings to return either extents_type or const extents_type&. However, we realized that [mdspan.mdspan] specifies that mdspan’s extent(r) member function returns map_.extents().extent(r). Letting a layout mapping’s extents() create and return a temporary could make mdspan’s extent(r) unexpectedly expensive. It should be always be cheap to get a single extent from an mdspan, because it’s a common multidimensional array idiom to write nested for loops over each extent.

Our specification in [mdspan.mdspan] that mdspan’s extent(r) returns map_.extents().extent(r) was also deliberate. It expresses two design choices. First, requiring mdspan to get its extents from its layout mapping (that is, specifying mdspan’s extents() to return map_.extents()) ensures that an mdspan is nothing more that the composition of its data handle, layout mapping, and accessor. The layout mapping controls the extents; an mdspan cannot have “its own extents” that differ from those in its layout mapping. Second, not including extents(r) in the layout mapping means that a layout mapping also cannot have “its own extents” that differ from what extents() returns. Those two choices mean that the following code is well formed and does not trigger an assert for any mdspan x.

// An mdspan's extents are its mapping's extents.
using mapping_type = decltype(x)::mapping_type;
using extents_type = mapping_type::extents_type;
static_assert(std::is_same_v<decltype(x)::extents_type, extents_type>);
assert(x.extents() == x.mapping().extents());

// A mapping's extent(r) must agree with its extents().
auto e = [&] <size_t... Indices> (std::index_sequence<Indices...>) {
    using index_type = extents_type::index_type;
    return extents<index_type, x.static_extent(Indices)...>{
      x.mapping().extent(Indices)...
    };
  } (std::make_index_sequence<x.rank()>());
static_assert(std::is_same_v<decltype(e), extents_type>);
assert(e == x.mapping().extents());

All these design choices add up to the padded layout mappings needing to return const extents_type& from extents(). This means that we cannot use R2’s wording approach of having extents() return a temporary extents object. (Lifetime extension does not apply to a temporary created in and returned from a return statement.) Our wording fix in subsequent revisions is minimal: we add a new exposition-only actual-extents member of type extents_type to both of the padded mappings. However, this is not meant to suggest that implementations should take this approach. Instead of following the wording by using a nested layout_left::mapping resp. layout_right::mapping with a padded extents object, they could just reimplement the padded mappings as special cases of layout_stride. That way, each mapping would only store one extents_type object, and extents() would return a const reference to that object.

3.3 Integration with submdspan

We propose changing submdspan (see P2630, which was accepted into the C++ Working Draft for C++26) of a layout_left resp. layout_right mdspan to return layout_left_padded resp. layout_right_padded instead of layout_stride, if the slice arguments permit it. Taking the submdspan of a layout_left_padded resp. layout_right_padded mdspan will preserve the layout, again if the slice arguments permit it.

The phrase “if the slice arguments permit it” means the following.

3.3.1 layout_left_padded and layout_left cases

In what follows, let left_submatrix be the following function,

template<class Elt, class Extents, class Layout,
  class Accessor, class S0, class S1>
requires(
  is_convertible_v<S0,
    tuple<typename Extents::index_type,
      typename Extents::index_type>> and
  is_convertible_v<S1,
    tuple<typename Extents::index_type,
      typename Extents::index_type>>
)
auto left_submatrix(
  mdspan<Elt, Extents, Layout, Accessor> X, S0 s0, S1 s1)
{
  auto full_extents =
    []<size_t ... Indices>(index_sequence<Indices...>) {
      return tuple{ (Indices, full_extent)... };
    }(make_index_sequence<X.rank() - 2>());
  return apply([&](full_extent_t ... fe) {
      return submdspan(X, s0, s1, fe...);
    }, full_extents);
}

let index_type be an integral type, let s0 be an object of a type S0 such that is_convertible_v<S0, tuple<index_type, index_type>> is true, and let s1 be an object of a type S1 such that is_convertible_v<S1, tuple<index_type, index_type>> is true.

Let X be an mdspan with rank at least two with decltype(X)::index_type naming the same type as index_type, whose layout is layout_left_padded<padding_stride_X> for some constexpr size_t padding_stride_X. Let X_sub be the object returned from left_submatrix(X, s0, s1). Then, X_sub is an mdspan of rank X.rank() with layout layout_left_padded<padding_stride_X>, and X_sub.stride(1) equals X.stride(1).

Let Z be an mdspan with rank at least two with decltype(Z)::index_type naming the same type as index_type, whose layout is layout_left. Let Z_sub be the object returned from left_submatrix(Z, s0, s1). Then, Z_sub is an mdspan of rank Z.rank() with layout layout_left_padded<padding_stride_Z>, where padding_stride_Z is

Also, Z_sub.stride(1) equals Z.stride(1).

3.3.2 layout_right_padded and layout_right cases

In what follows, let right_submatrix be the following function,

template<class Elt, class Extents, class Layout,
  class Accessor, class Srm2, class Srm1>
requires(
  is_convertible_v<Srm2,
    tuple<typename Extents::index_type,
      typename Extents::index_type>> and
  is_convertible_v<Srm1,
    tuple<typename Extents::index_type,
      typename Extents::index_type>>
)
auto right_submatrix(
  mdspan<Elt, Extents, Layout, Accessor> X, Srm2 srm2, Srm1 srm1)
{
  auto full_extents =
    []<size_t ... Indices>(index_sequence<Indices...>) {
      return tuple{ (Indices, full_extent)... };
    }(make_index_sequence<X.rank() - 2>());
  return apply([&](full_extent_t ... fe) {
      return submdspan(X, fe..., srm2, srm1);
    }, full_extents);
}

let srm2 (“s of rank minus 2”) be an object of a type Srm2 such that is_convertible_v<S0, tuple<index_type_X, index_type_X>> is true, and let srm1 (“s of rank minus 1”) be an object of a type Srm1 such that is_convertible_v<S1, tuple<index_type_X, index_type_X>> is true.

Similarly, let Y be an mdspan with rank at least two whose layout is layout_right_padded<padding_stride_Y> for some constexpr size_t padding_stride_Y. Let index_type_Y name the type decltype(Y)::index_type. Let srm2 (“S of rank minus 2”) be an object of a type Srm2 such that is_convertible_v<Srm2, tuple<index_type_Y, index_type_Y>> is true, and let srm1 (“S of rank minus 1”) be an object of a type Srm1 such that is_convertible_v<Srm1, tuple<index_type_Y, index_type_Y>> is true. In the following code fragment,

auto full_extents =
  []<size_t ... Indices>(index_sequence<Indices...>) {
    return tuple{(Indices, full_extent)...};
  }(make_index_sequence<Y.rank() - 2>());

auto Y_sub = apply([&](full_extent_t... fe) {
    return submdspan(Y, fe..., srm2, srm1);
  }, full_extents);

Y_sub is an mdspan of rank Y.rank() with layout layout_left_padded<padding_stride>, and Y_sub.stride(1) equals Y.stride(1).

Let Z be an mdspan with rank at least two whose layout is layout_left. Let index_type_Z name the type decltype(Z)::index_type. Let s0 be an object of a type S0 such that is_convertible_v<S0, tuple<index_type_Z, index_type_Z>> is true, and let s1 be an object of a type S1 such that is_convertible_v<S1, tuple<index_type_Z, index_type_Z>> is true. In the following code fragment,

auto full_extents =
  []<size_t ... Indices>(index_sequence<Indices...>) {
    return tuple{(Indices, full_extent)...};
  }(make_index_sequence<Z.rank() - 2>());

auto Z_sub = apply( [&](full_extent_t... fe) {
    return submdspan(Z, s0, s1, fe...);
  }, full_extents );

Z_sub is an mdspan of rank Z.rank() with layout layout_left_padded<padding_stride_Z>, where padding_stride_Z is s0_val1 - s0_val0 if s0 is convertible to tuple<integral_constant<index_type_Z, s0_val0>, integral_constant<index_type_Z, s0_val1>> with s0_val1 greater than to equal to s0_val0. Also, Z_sub.stride(1) equals Z.stride(1).

Similarly, let W be an mdspan with rank at least two whose layout is layout_right. Let index_type_W name the type decltype(W)::index_type. Let srm2 (“S of rank minus 2”) be an object of a type Srm2 such that is_convertible_v<Srm2, tuple<index_type_W, index_type_W>> is true, and let srm1 (“S of rank minus 1”) be an object of a type Srm1 such that is_convertible_v<Srm1, tuple<index_type_W, index_type_W>> is true. In the following code fragment,

auto full_extents =
  []<size_t ... Indices>(index_sequence<Indices...>) {
    return tuple{(Indices, full_extent)...};
  }(make_index_sequence<W.rank() - 2>());

auto W_sub = apply( [&](full_extent_t... fe) {
    return submdspan(W, fe..., srm2, srm1);
  }, full_extents);

W_sub is an mdspan of rank W.rank() with layout layout_left_padded<padding_stride_W>, where padding_stride_W is srm1_val1 - srm1_val0 if srm1 is convertible to tuple<integral_constant<index_type_W, srm1_val0>, integral_constant<index_type_W, srm1_val1>> with srm1_val1 greater than to equal to srm1_val0. Also, W_sub.stride(1) equals W.stride(1).

Preservation of these layouts under submdspan is an important feature for our linear algebra library proposal P1673 (which was accepted into the C++ Working Draft for C++26). It means that for existing BLAS and LAPACK use cases, if we start with one of these layouts, we know that we can implement fast linear algebra algorithms by calling directly into an optimized C or Fortran BLAS.

3.4 Examples

3.4.1 Directly call C BLAS without checks

We show examples before and after this proposal of functions that compute the matrix-matrix product C +  = AB. The recursive_matrix_product function computes this product recursively, by partitioning each of the three matrices into a 2 x 2 block matrix using the partition function. When the C matrix is small enough, recursive_matrix_product stops recursing and instead calls a base_case_matrix_product function with different overloads for different matrix layouts. If the matrix layouts support it, base_case_matrix_product can call the C BLAS function cblas_sgemm directly on the mdspans’ data. This is fast if the C BLAS is optimized. Otherwise, base_case_matrix_product falls back to a slow generic implementation.

This example is far from ideally optimized, but it hints at the kind of optimizations that linear algebra computations do in practice.

Common code:

template<class Layout>
using out_matrix_view = mdspan<float, dextents<int, 2>, Layout>;

template<class Layout>
using in_matrix_view = mdspan<const float, dextents<int, 2>, Layout>;

// Before this proposal, if Layout is layout_left or layout_right,
// the returned mdspan would all be layout_stride.
// After this proposal, the returned mdspan would be
// layout_left_padded resp. layout_right_padded.
template<class ElementType, class Layout>
auto partition(mdspan<ElementType, dextents<int, 2>, Layout> A)
{
  auto M = A.extent(0);
  auto N = A.extent(1);
  auto A00 = submdspan(A, tuple{0, M / 2}, tuple{0, N / 2});
  auto A01 = submdspan(A, tuple{0, M / 2}, tuple{N / 2, N});
  auto A10 = submdspan(A, tuple{M / 2, M}, tuple{0, N / 2});
  auto A11 = submdspan(A, tuple{M / 2, M}, tuple{N / 2, N});
  return tuple{
    A00, A01,
    A10, A11
  };
}

template<class Layout>
void recursive_matrix_product(in_matrix_view<Layout> A,
  in_matrix_view<Layout> B, out_matrix_view<Layout> C)
{
  // Some hardware-dependent constant
  constexpr int recursion_threshold = 16;
  if(std::max(C.extent(0) || C.extent(1)) <= recursion_threshold) {
    base_case_matrix_product(A, B, C);
  } else {
    auto [C00, C01,
          C10, C11] = partition(C);  
    auto [A00, A01,
          A10, A11] = partition(A);  
    auto [B00, B01,
          B10, B11] = partition(B);
    recursive_matrix_product(A00, B00, C00);
    recursive_matrix_product(A01, B10, C00);
    recursive_matrix_product(A10, B00, C10);
    recursive_matrix_product(A11, B10, C10);
    recursive_matrix_product(A00, B01, C01);
    recursive_matrix_product(A01, B11, C01);
    recursive_matrix_product(A10, B01, C11);
    recursive_matrix_product(A11, B11, C11);
  }
}

// Slow generic implementation
template<class Layout>
void base_case_matrix_product(in_matrix_view<Layout> A,
  in_matrix_view<Layout> B, out_matrix_view<Layout> C)
{
  for(size_t j = 0; j < C.extent(1); ++j) {
    for(size_t i = 0; i < C.extent(0); ++i) {
      typename out_matrix_view<Layout>::value_type C_ij{};
      for(size_t k = 0; k < A.extent(1); ++k) {
        C_ij += A(i,k) * B(k,j);
      }
      C(i,j) += C_ij;
    }
  }
}

A user might interpret layout_left as “column major,” and therefore “the natural layout to pass into the BLAS.”

void base_case_matrix_product(in_matrix_view<layout_left> A,
  in_matrix_view<layout_left> B, out_matrix_view<layout_left> C)
{
  cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
    C.extent(0), C.extent(1), A.extent(1), 1.0f,
    A.data_handle(), A.stride(1), B.data_handle(), B.stride(1),
    1.0f, C.data_handle(), C.stride(1));
}

However, recursive_matrix_product never gets to use the layout_left overload of base_case_matrix_product, because the base case matrices are always layout_stride.

On discovering this, the author of these functions might be tempted to write a custom layout for “BLAS-compatible” matrices. However, submdspan as currently specified in the C++ Working Draft forces partition to return four layout_stride mdspan if given a layout_left (or layout_right) input mdspan. This would, in turn, force users of recursive_matrix_product to commit to a custom layout, if they want to use the BLAS.

Alternately, the author of these functions could specialize base_case_matrix_product for layout_stride, and check whether A.stride(0), B.stride(0), and C.stride(0) are all equal to one before calling cblas_sgemm. However, that would force extra run-time checks for a use case that most users might never encounter, because most users are starting with layout_left matrices or contiguous submatrices thereof.

After our proposal, the author can specialize base_case_matrix_product for exactly the layout supported by the BLAS. They could even get rid of the fall-back implementation if users never exercise it.

template<size_t p>
void base_case_matrix_product(in_matrix_view<layout_left_padded<p>> A,
  in_matrix_view<layout_left_padded<p>> B,
  out_matrix_view<layout_left_padded<p>> C)
{ // same code as above
  cblas_sgemm(CblasColMajor, CblasNoTrans, CblasNoTrans,
    C.extent(0), C.extent(1), A.extent(1), 1.0f,
    A.data_handle(), A.stride(1), B.data_handle(), B.stride(1),
    1.0f, C.data_handle(), C.stride(1));
}

This optimization and simplification would also apply to implementations of P1673 that use a C or Fortran BLAS library where permitted by the mdspan layout(s) and accessor(s).

3.4.2 Overaligned access

By combining these new layouts with an accessor that ensures overaligned access, we can create an mdspan for which the beginning of every contiguous segment of elements is overaligned by some given factor. This can enable use of hardware features that require overaligned memory access.

The following aligned_accessor class template (proposed in our separate proposal P2897, which is currently in LEWG review as of the time of publication) uses the C++ Standard Library function assume_aligned to decorate pointer access.

template<class ElementType, size_t byte_alignment>
struct aligned_accessor {
  using offset_policy = default_accessor<ElementType>;

  using element_type = ElementType;
  using reference = ElementType&;
  using data_handle_type = ElementType*;

  constexpr aligned_accessor() noexcept = default;

  template<class OtherElementType, size_t other_byte_alignment>
  requires (
    std::is_convertible_v<OtherElementType(*)[], element_type(*)[]> &&
    other_byte_alignment == byte_alignment)
  constexpr aligned_accessor(
    aligned_accessor<OtherElementType, other_byte_alignment>) noexcept
  {}

  constexpr reference
  access(data_handle_type p, size_t i) const noexcept {
    return std::assume_aligned< byte_alignment >(p)[i];
  }

  constexpr typename offset_policy::data_handle_type
  offset(data_handle_type p, size_t i) const noexcept {
    return p + i;
  }
};

We include some helper functions for making overaligned array allocations.

template<class ElementType>
struct delete_raw {
  void operator()(ElementType* p) const {
    std::free(p);
  }
};

template<class ElementType>
using allocation_t =
  std::unique_ptr<ElementType[], delete_raw<ElementType>>;

template<class ElementType, std::size_t byte_alignment>
allocation_t<ElementType>
allocate_raw(const std::size_t num_elements)
{
  const std::size_t num_bytes = num_elements * sizeof(ElementType);
  void* ptr = std::aligned_alloc(byte_alignment, num_bytes);
  return {ptr, delete_raw<ElementType>{}};
}

Now we can show our example. This 15 x 17 matrix of float will have extra padding so that every column is aligned to 8 * sizeof(float) bytes. We can use the layout mapping to determine the required storage size (including padding). Users can then prove at compile time that they can use special hardware features that require overaligned access and/or assume that the padding element at the end of each column is accessible memory.

constexpr size_t element_alignment = 8;
constexpr size_t byte_alignment = element_alignment * sizeof(float);

using layout_type = layout_left_padded<element_alignment>;
layout_type::mapping mapping{dextents<int, 2>{15, 17}};
auto allocation =
  allocate_raw<float, byte_alignment>(mapping.required_span_size());

using accessor_type = aligned_accessor<float, byte_alignment>;
mdspan m{allocation.get(), mapping, accessor_type{}};

// m_sub has the same layout as m,
// and each column of m_sub has the same overalignment.
auto m_sub = submdspan(m, tuple{0, 11}, tuple{1, 13}); 

3.5 Design alternatives

3.5.1 Strided layout with compile-time strides

We considered a variant of layout_stride that could encode any combination of compile-time or run-time strides in the layout type. This could, for example, use the same mechanism that extents uses. (The reference implementation calls this mechanism a “partially static array.”) However, we rejected this approach as overly complex for our design goals.

First, the goal of layout_{left,right}_padded isn’t to insist even harder that the compiler bake constants into mapping::operator() evaluation. The goal is to communicate compile-time information to users. The most benefit comes not just from knowing the padding stride at compile time, but also from knowing that one dimension always uses stride-one (contiguous) storage. Putting these two pieces of information together lets users apply compiler annotations like assume_aligned, as in aligned_accessor (P2897). Knowing that one dimension always uses contiguous storage also tells users that they can pass the mdspan’s data directly into C or Fortran libraries like the BLAS or LAPACK. Users can benefit from this even if the padding stride is a run-time value.

Second, the constexpr annotations in the existing layout mappings mean that users might be evaluating layout_stride::mapping::operator() fully at compile time. The reference mdspan implementation has several tests that demonstrate this by using the result of a layout mapping evaluation in a context where it needs to be known at compile time.

Third, the performance benefit of storing some strides as compile-time constants goes down as the rank increases, because most of the strides would end up depending on run-time values anyway. Strided mdspan generally come from a subview of an existing layout_left or layout_right mdspan. In that case, the representation of the strides that preserves the most compile-time information would be just the original mdspan’s extents_type object. (Compare to the exposition-only inner-mapping which we use in the wording for layout_{left,right}_padded.) Computing each stride would then call for a forward (for layout_left) or reverse (for layout_right) product of the original mdspan’s extents. As a result, any stride to the right resp. left of a run-time extent would end up depending on that run-time extent anyway. The larger the rank, the more strides get “touched” by run-time information.

Fourth, a strided mdspan that can represent layouts as general as layout_stride, but has entirely compile-time extents and strides, could be useful for supporting features of a specific computer architecture. However, these hardware features would probably have limitations that would prevent them from supporting general strided layouts anyway. For example, they might require strides to be a power of two, or they might be limited to specific ranges of extents or strides. These limitations would call for custom implementation-specific layouts, not something as general as a “compile-time layout_stride.”

3.5.2 LEWG R2 polls discussion

LEWG’s 2023 took two polls in its review of Revision 2 of this proposal on 2023/03/28. Both polls resulted in the status quo design, but LEWG asked us to add to the next revision a discussion of the questions they posed. We do so in the following sections.

3.5.3 Nest the new policies in corresponding existing ones

LEWG polled on the following question, with no votes in favor and thus no consensus for change. All coauthors present voted against.

The proposed tagged type (layout_left_padded) should be a nested type (layout_left::padded).

The following will explain the context and why the authors oppose this change.

The suggestion was that we should change layout_left_padded<padding_stride> and layout_right_padded<padding_stride> from separate layout policies (the status quo) to nested types layout_left::padded<padding_stride> resp. layout_right::padded<padding_stride>.

The issue with this change is that it is “morphologically confused,” to borrow the words of one LEWG reviewer. The layout mapping policy requirements [mdspan.layout.policy.reqmts] specify a shape (morphology) of two levels of types. The outer type MP is the layout mapping policy, which represents a family of layout mappings parameterized by extents type E. The inner type is the layout mapping MP::mapping<E>. Nesting a policy inside another policy, as in layout_left::padded, would break this rule that “the policy is on the outside, and the mapping is on the inside.”

Note also that the padding_stride template parameter must live outside the mapping. Otherwise, it wouldn’t be possible to construct the mapping from just an extents object.

3.5.4 Layout mapping conversion customization point

LEWG polled on the following question, with no votes in favor and thus no consensus for change. One coauthor voted neutral and two coauthors voted weakly against.

An assume_layout customization point should be provided for layout conversions.

The following will explain the context and why the authors oppose this change.

The status quo design includes converting constructors between some mappings, e.g., from layout_left_padded::mapping<E1> to layout_left::mapping<E2>. These constructors are conditionally explicit if there are nontrivial preconditions. This design matches the existing mdspan mapping conversions, e.g., from layout_stride::mapping<E1> to layout_left::mapping<E2>. The intent is that implicit conversions express and permit “type erasure,” that is, going from information expressed in a compile-time type to information expressed in a member variable or some other way. Type erasure here includes three kinds of conversions.

  1. From a more restrictive mapping to a less restrictive mapping (e.g., from layout_left::mapping<E> to layout_stride::mapping<E>)

  2. From a mapping with static extents to a mapping with dynamic extents (e.g., from layout_left::mapping<extents<int, 2, 3>> to layout_left::mapping<extents<int, dynamic_extent, 3>>)

  3. Both (changing the mapping and the extents)

Conversion in the opposite direction of type erasure generally imposes nontrivial preconditions, so it is explicit. We permit explicit conversions because they let users potentially improve performance by expressing their assumptions in the type system. If users didn’t have explicit conversions, they would likely end up reimplementing them in a possibly less safe way.

LEWG reflector discussion suggested a more general approach to conversions. Instead of conditionally explicit constructors, conversions with nontrivial preconditions (e.g., from layout_stride to layout_left_padded to layout_left) would use an assume_layout customization point. This would have the following advantages.

  1. Users could implement conversion from their custom layout mapping to a Standard layout mapping.

  2. Adding a new layout to the Standard would not require adding converting constructors to all the existing mappings.

However, introducing a customization point would complicate the design. The benefit of this complication would be low, since most custom or new layout mappings likely could not be converted to existing layout mappings. For example, tiled layouts or space-filling (Hilbert) curve layouts are not strided in general, so they could not be converted to anything in the current Standard or this proposal. In the words of one LEWG reviewer, most layouts are “on [their] own little planet.” LEWG discussion expressed a strong preference for not overengineering the design by offering a conversion customization point when most conversions don’t make sense.

3.6 Implementation experience

The stable (main) branch of the reference mdspan implementation implements all of this proposal except submdspan support.

3.7 Desired ship vehicle

C++26 / IS.

4 Wording

Text in blockquotes is not proposed wording, but rather instructions for generating proposed wording. The � character is used to denote a placeholder section number which the editor shall determine.

Make the following changes to the latest C++ Working Draft, which at the time of writing is N4964. All wording is relative to the latest C++ Working Draft.

In [version.syn], increase the value of the __cpp_lib_submdspan macro by replacing YYYMML below with the integer literal encoding the appropriate year (YYYY) and month (MM).

#define __cpp_lib_submdspan YYYYMML // also in <mdspan>

In Section � [mdspan.syn], in the synopsis, after struct layout_stride;, add the following:

template<size_t PaddingValue = dynamic_extent> 
struct layout_left_padded;
template<size_t PaddingValue = dynamic_extent> 
struct layout_right_padded;

In [mdspan.layout.policy.overview], add the following to the code block after the layout_stride definition:

template<size_t PaddingValue>
struct layout_left_padded {
  template<class Extents>
  class mapping;
};
template<size_t PaddingValue>
struct layout_right_padded {
  template<class Extents>
  class mapping;
};

After paragraph 1 of [mdspan.layout.policy.overview], add the following paragraph 2:

2 Each specialization of layout_left_padded and layout_right_padded meets the layout mapping policy requirements and is a trivial type.

In Section � [mdspan.layout.left.overview] (“Overview”), add the following constructor to the layout_left::mapping class declaration, between the constructor converting from layout_right::mapping<OtherExtents> and the constructor converting from layout_stride::mapping<OtherExtents>:

template<class LayoutLeftPaddedMapping>
  constexpr explicit(! is_convertible_v<typename LayoutLeftPaddedMapping::extents_type, extents_type>)
    mapping(const LayoutLeftPaddedMapping&) noexcept;

In Section � [mdspan.layout.left.cons] (“Constructors”), add the following between the constructor converting from layout_right::mapping<OtherExtents> (ending paragraph 8) and the constructor converting from layout_stride::mapping<OtherExtents> (starting paragraph 9 before this proposal), then renumber the following paragraphs in that section accordingly.

template<class LayoutLeftPaddedMapping>
  constexpr explicit(! is_convertible_v<typename LayoutLeftPaddedMapping::extents_type, extents_type>)
    mapping(const LayoutLeftPaddedMapping& other) noexcept;

9 Constraints:

10 Mandates: If

then Extents::static_extent(0) is a multiple of LayoutLeftPaddedMapping::padding_stride.

11 Preconditions:

12 Effects: Direct-non-list-initializes extents_ with other.extents().

In Section � [mdspan.layout.right.overview] (“Overview”), add the following constructor to the layout_right::mapping class declaration, between the constructor converting from layout_left::mapping<OtherExtents> and the constructor converting from layout_stride::mapping<OtherExtents>.

template<class LayoutRightPaddedMapping>
  constexpr explicit(! is_convertible_v<typename LayoutRightPaddedMapping::extents_type, extents_type>)
    mapping(const LayoutRightPaddedMapping&) noexcept;

In Section � [mdspan.layout.right.cons] (“Constructors”), add the following between the constructor converting from layout_left::mapping<OtherExtents> (ending paragraph 8) and the constructor converting from layout_stride::mapping<OtherExtents> (starting paragraph 9 before this proposal), then renumber the following paragraphs in that section accordingly.

template<class LayoutRightPaddedMapping>
  constexpr explicit(! is_convertible_v<typename LayoutRightPaddedMapping::extents_type, extents_type>)
    mapping(const LayoutRightPaddedMapping& other) noexcept;

9 Constraints:

10 Mandates: If

then Extents::static_extent(Extents::rank() - 1) is a multiple of LayoutRightPaddedMapping::padding_stride.

11 Preconditions:

12 Effects: Direct-non-list-initializes extents_ with other.extents().

In Section � [mdspan.layout.stride.cons], in paragraph 7 (Remarks for the constructor layout_stride::mapping(const StridedLayoutMapping&)), right after the word Remarks, add the following text.

Let is-layout-left-padded-mapping-of be the exposition-only variable template defined as follows.

template<class Layout>
struct is-layout-left-padded : // exposition only
  false_type {};

template<size_t padding_stride>
struct is-layout-left-padded<layout_left_padded<padding_stride>> : // exposition only
  true_type {};

template<class Mapping>
constexpr bool is-layout-left-padded-mapping-of // exposition only
  is-layout-left-padded<typename Mapping::layout_type>::value;

Let is-layout-right-padded-mapping-of be the exposition-only variable template defined as follows.

template<class Layout>
struct is-layout-right-padded : // exposition only
  false_type {};

template<size_t padding_stride>
struct is-layout-right-padded<layout_right_padded<padding_stride>> : // exposition only
  true_type {};

template<class Mapping>
constexpr bool is-layout-right-padded-mapping-of // exposition only
  is-layout-right-padded<typename Mapping::layout_type>::value;

In Section � [mdspan.layout.stride.cons], in paragraph 7 (Remarks for the constructor layout_stride::mapping(const StridedLayoutMapping&)), add the following two lines immediately below is-mapping-of<layout_right, LayoutStrideMapping> || and above is-mapping-of<layout_stride, LayoutStrideMapping> ||:

is-layout-left-padded-mapping-of <LayoutStrideMapping> ||
is-layout-right-padded-mapping-of <LayoutStrideMapping> ||

After the end of Section � [mdspan.layout.stride], add the following:

4.1 Class template layout_left_padded::mapping [mdspan.layout.leftpadded]

4.1.1 Overview [mdspan.layout.leftpadded.overview]

1 layout_left_padded provides a layout mapping that behaves like layout_left::mapping, except that the padding stride stride(1) can be greater than or equal to extent(0).

template<size_t PaddingValue>
template<class Extents>
class layout_left_padded<PaddingValue>::mapping {
public:
  static constexpr size_t padding_value = PaddingValue;

  using extents_type = Extents;
  using index_type = typename extents_type::index_type;
  using size_type = typename extents_type::size_type;
  using rank_type = typename extents_type::rank_type;
  using layout_type = layout_left_padded<PaddingValue>;

private:
  static constexpr size_t static-padding-stride = /* see-below */; // exposition only

public:
  // [mdspan.layout.leftpadded.cons], constructors
  constexpr mapping() 
    requires(static-padding-stride != dynamic_extent) noexcept = default;
  constexpr mapping()
    requires(static-padding-stride == dynamic_extent) noexcept;
      : mapping(extents_type{}) {}
  constexpr mapping(const mapping&) noexcept = default;
  constexpr mapping(const extents_type& ext);
  template<class OtherIndexType>
    constexpr mapping(const extents_type& ext, OtherIndexType padding_value);

  template<class OtherExtents>
    constexpr explicit(! is_convertible_v<OtherExtents, extents_type>)
      mapping(const layout_left::mapping<OtherExtents>&);
  template<class OtherExtents>
    constexpr explicit(extents_type::rank() > 0)
      mapping(const layout_stride::mapping<OtherExtents>&);
  template<class LayoutLeftPaddedMapping>
    constexpr explicit( /* see below */ )
      mapping(const LayoutLeftPaddedMapping&);
  template<class LayoutRightPaddedMapping>
    constexpr explicit( /* see below */ )
      mapping(const LayoutRightPaddedMapping&) noexcept;

  constexpr mapping& operator=(const mapping&) noexcept = default;

  // [mdspan.layout.leftpadded.obs], observers
  constexpr const extents_type& extents() const noexcept { return extents_; }
  constexpr array<index_type, extents_type::rank()> strides() const noexcept;

  constexpr index_type required_span_size() const noexcept;

  template<class... Indices>
    constexpr index_type operator()(Indices... idxs) const noexcept;

  static constexpr bool is_always_unique() noexcept { return true; }
  static constexpr bool is_always_exhaustive() noexcept;
  static constexpr bool is_always_strided() noexcept { return true; }

  static constexpr bool is_unique() noexcept { return true; }
  constexpr bool is_exhaustive() const noexcept;
  static constexpr bool is_strided() noexcept { return true; }

  constexpr index_type stride(rank_type r) const noexcept;

  template<class LayoutLeftPaddedMapping>
    friend constexpr bool operator==(
      const mapping&,
      const LayoutLeftPaddedMapping&) noexcept;

private:
  extents<index_type, static-padding-stride> stride-1{}; // exposition only
  extents_type extents_{}; // exposition only

  // [mdspan.submdspan.mapping], submdspan mapping specialization
  template<class... SliceSpecifiers>
    constexpr auto submdspan-mapping-impl(                    // exposition only
      SliceSpecifiers... slices) const -> see below;

  template<class... SliceSpecifiers>
    friend constexpr auto submdspan_mapping(
      const mapping& src, SliceSpecifiers... slices) {
        return src.submdspan-mapping-impl(slices...);
  }
};

2 Throughout [mdspan.layout.leftpadded], let P_rank be the following size extents_type::rank() parameter pack of size_t values:

3 Mandates: If

then the least multiple of padding_value that is greater than or equal to extents_type::static_extent(0) is representable as a value of type size_t, and is representable as a value of type index_type.

static constexpr size_t static-padding-stride = /* see-below */; // exposition only

4 The value is

4.1.2 Constructors [mdspan.layout.leftpadded.cons]

constexpr mapping() 
  requires(static-padding-stride == dynamic_extent) noexcept;

1 Effects: Equivalent to mapping(extents_type{});.

constexpr mapping(const extents_type& ext);

2 Preconditions: If extents_type::rank() is greater than one and padding_value does not equal dynamic_extent, then the least multiple of padding_stride greater than or equal to ext.extent(0) is representable as a value of type index_type.

3 Effects:

template<class OtherIndexType>
constexpr mapping(const extents_type& ext, OtherIndexType pad);

4 Constraints:

5 Preconditions:

6 Effects:

template<class OtherExtents>
  constexpr explicit(! is_convertible_v<OtherExtents, extents_type>)
    mapping(const layout_left::mapping<OtherExtents>& other);

7 Constraints: is_constructible_v<extents_type, OtherExtents> is true.

8 Mandates: If OtherExtents::rank() > 1, static-padding-stride does not equal dynamic_extent, and OtherExtents::static_extent(0) does not equal dynamic_extent, then static-padding-stride equals OtherExtents::static_extent(0).

9 Preconditions:

10 Effects: Equivalent to mapping(other.extents());

template<class OtherExtents>
  constexpr explicit(extents_type::rank() > 0)
    mapping(const layout_stride::mapping<OtherExtents>& other);

11 Constraints: is_constructible_v<extents_type, OtherExtents> is true.

12 Preconditions:

13 Effects:

template<class LayoutLeftPaddedMapping>
  constexpr explicit( /* see below */ )
    mapping(const LayoutLeftPaddedMapping& other);

14 Constraints:

15 Mandates: padding_value == dynamic_extent || LayoutLeftPaddedMapping::padding_value == dynamic_extent || padding_value == LayoutLeftPaddedMapping::padding_value is true.

16 Preconditions:

17 Effects:

18 Remarks: The expression inside explicit is equivalent to: extents_type::rank() > 1 && (padding_value != dynamic_extent || LayoutLeftPaddedMapping::padding_value == dynamic_extent).

template<class LayoutRightPaddedMapping>
  constexpr explicit( /* see below */ )
    mapping(const LayoutRightPaddedMapping& other) noexcept;

19 Constraints:

20 Precondition: other.required_span_size() is representable as a value of type index_type ([basic.fundamental]).

21 Effects: direct-non-list-initializes extents_ with other.extents().

22 Remarks: The expression inside explicit is equivalent to: ! is_convertible_v<typename LayoutRightPaddedMapping::extents_type, extents_type>.

[Note: Neither mapping uses the padding stride in the rank-0 or rank-1 case, so the padding stride does not affect either the constraints or the preconditions. – end note]

4.1.3 Observers [mdspan.layout.leftpadded.obs]

constexpr array<index_type, extents_type::rank()>
  strides() const noexcept;

1 Returns: array<index_type, extents_type::rank()>({stride(P_rank)...}).

constexpr index_type required_span_size() const noexcept;

2 Returns:

template<class... Indices>
constexpr size_t operator()(Indices... idxs) const noexcept;

3 Constraints:

4 Precondition: extents_type::index-cast(idxs) is a multidimensional index in extents() ([mdspan.overview]).

5 Returns: ((static_cast<index_type>(idxs) * stride(P_rank)) + ... + 0);.

static constexpr bool is_always_exhaustive() noexcept;

6 Returns:

constexpr bool is_exhaustive() const noexcept;

7 Returns:

constexpr index_type stride(rank_type r) const noexcept;

8 Preconditions: r is smaller than extents_type::rank().

9 Returns:

template<class LayoutLeftPaddedMapping>
  friend constexpr bool operator==(
    const mapping& x,
    const LayoutLeftPaddedMapping& y) noexcept;

10 Constraints:

11 Returns: true if

4.2 Class template layout_right_padded::mapping [mdspan.layout.rightpadded]

4.2.1 Overview [mdspan.layout.rightpadded.overview]

1 layout_right_padded provides a layout mapping that behaves like layout_right::mapping, except that the padding stride stride(extents_type::rank()-2) can be greater than or equal to extents_type::extent(extents_type::rank()-1).

template<size_t PaddingValue>
template<class Extents>
class layout_right_padded<PaddingValue>::mapping {
public:
  static constexpr size_t padding_value = PaddingValue;

  using extents_type = Extents;
  using index_type = typename extents_type::index_type;
  using size_type = typename extents_type::size_type;
  using rank_type = typename extents_type::rank_type;
  using layout_type = layout_right_padded<PaddingValue>;

private:
  static constexpr size_t rank_ = extents_type::rank(); // exposition only
  static constexpr size_t static-padding-stride = /* see-below */; // exposition only
  static constexpr size_t last-static-extent = // exposition only
    extents_type::static_extent(rank_ - 1);

public:
  // [mdspan.layout.rightpadded.cons], constructors
  constexpr mapping() 
    requires(static-padding-stride != dynamic_extent) noexcept = default;
  constexpr mapping()
    requires(static-padding-stride == dynamic_extent) noexcept;
      : mapping(extents_type{}) {}
  constexpr mapping(const mapping&) noexcept = default;
  constexpr mapping(const extents_type& ext);
  template<class OtherIndexType>
    constexpr mapping(const extents_type& ext, OtherIndexType padding_value);

  template<class OtherExtents>
    constexpr explicit(! is_convertible_v<OtherExtents, extents_type>)
      mapping(const layout_right::mapping<OtherExtents>&);
  template<class OtherExtents>
    constexpr explicit(rank_ > 0)
      mapping(const layout_stride::mapping<OtherExtents>&);
  template<class LayoutRightPaddedMapping>
    constexpr explicit( /* see below */ )
      mapping(const LayoutRightPaddedMapping&);
  template<class LayoutLeftPaddedMapping>
    constexpr explicit( /* see below */ )
      mapping(const LayoutLeftPaddedMapping&) noexcept;

  constexpr mapping& operator=(const mapping&) noexcept = default;

  // [mdspan.layout.rightpadded.obs], observers
  constexpr const extents_type& extents() const noexcept { return extents_; }
  constexpr array<index_type, rank_> strides() const noexcept;

  constexpr index_type required_span_size() const noexcept;

  template<class... Indices>
    constexpr index_type operator()(Indices... idxs) const noexcept;

  static constexpr bool is_always_unique() noexcept { return true; }
  static constexpr bool is_always_exhaustive() noexcept;
  static constexpr bool is_always_strided() noexcept { return true; }

  static constexpr bool is_unique() noexcept { return true; }
  constexpr bool is_exhaustive() const noexcept;
  static constexpr bool is_strided() noexcept { return true; }

  constexpr index_type stride(rank_type r) const noexcept;

  template<class LayoutRightPaddedMapping>
    friend constexpr bool operator==(
      const mapping&,
      const LayoutRightPaddedMapping&) noexcept;

private:
  extents<index_type, static-padding-stride> stride-rm2{}; // exposition only
  extents_type extents_{}; // exposition only

  // [mdspan.submdspan.mapping], submdspan mapping specialization
  template<class... SliceSpecifiers>
    constexpr auto submdspan-mapping-impl(                    // exposition only
      SliceSpecifiers... slices) const -> see below;

  template<class... SliceSpecifiers>
    friend constexpr auto submdspan_mapping(
      const mapping& src, SliceSpecifiers... slices) {
        return src.submdspan-mapping-impl(slices...);
  }
};

2 Throughout [mdspan.layout.rightpadded], let P_rank be the following size rank_ parameter pack of size_t values:

3 Mandates: If

then the least multiple of padding_value that is greater than or equal to last-static-extent is representable as a value of type size_t, and is representable as a value of type index_type.

static constexpr size_t static-padding-stride = /* see-below */; // exposition only

4 The value is

4.2.2 Constructors [mdspan.layout.rightpadded.cons]

constexpr mapping() 
  requires(static-padding-stride == dynamic_extent) noexcept;

1 Effects: Equivalent to mapping(extents_type{});.

constexpr mapping(const extents_type& ext);

2 Preconditions: If rank_ is greater than one and padding_value does not equal dynamic_extent, then the least multiple of padding_stride greater than or equal to ext.extent(rank_ - 1) is representable as a value of type index_type.

3 Effects:

template<class OtherIndexType>
constexpr mapping(const extents_type& ext, OtherIndexType pad);

4 Constraints:

5 Preconditions:

6 Effects:

template<class OtherExtents>
  constexpr explicit(! is_convertible_v<OtherExtents, extents_type>)
    mapping(const layout_right::mapping<OtherExtents>& other);

7 Constraints: is_constructible_v<extents_type, OtherExtents> is true.

8 Mandates: If OtherExtents::rank() > 1, static-padding-stride does not equal dynamic_extent, and OtherExtents::static_extent(rank_- 1) does not equal dynamic_extent, then static-padding-stride equals OtherExtents::static_extent(rank_- 1).

9 Preconditions:

10 Effects: Equivalent to mapping(other.extents());

template<class OtherExtents>
  constexpr explicit(rank_ > 0)
    mapping(const layout_stride::mapping<OtherExtents>& other);

11 Constraints: is_constructible_v<extents_type, OtherExtents> is true.

12 Preconditions:

13 Effects:

template<class LayoutRightPaddedMapping>
  constexpr explicit( /* see below */ )
    mapping(const LayoutRightPaddedMapping& other);

14 Constraints:

15 Mandates: padding_value == dynamic_extent || LayoutRightPaddedMapping::padding_value == dynamic_extent || padding_value == LayoutRightPaddedMapping::padding_value is true.

16 Preconditions:

17 Effects:

18 Remarks: The expression inside explicit is equivalent to:rank_> 1 && (padding_value != dynamic_extent || LayoutRightPaddedMapping::padding_value == dynamic_extent).

template<class LayoutLeftPaddedMapping>
  constexpr explicit( /* see below */ )
    mapping(const LayoutLeftPaddedMapping& other) noexcept;

19 Constraints:

20 Precondition: other.required_span_size() is representable as a value of type index_type ([basic.fundamental]).

21 Effects: direct-non-list-initializes extents_ with other.extents().

22 Remarks: The expression inside explicit is equivalent to: ! is_convertible_v<typename LayoutLeftPaddedMapping::extents_type, extents_type>.

[Note: Neither mapping uses the padding stride in the rank-0 or rank-1 case, so the padding stride does not affect either the constraints or the preconditions. – end note]

4.2.3 Observers [mdspan.layout.rightpadded.obs]

constexpr array<index_type, rank_>
  strides() const noexcept;

1 Returns: array<index_type,rank_>({stride(P_rank)...}).

constexpr index_type required_span_size() const noexcept;

2 Returns:

template<class... Indices>
constexpr size_t operator()(Indices... idxs) const noexcept;

3 Constraints:

4 Precondition: extents_type::index-cast(idxs) is a multidimensional index in extents() ([mdspan.overview]).

5 Returns: ((static_cast<index_type>(idxs) * stride(P_rank)) + ... + 0);.

static constexpr bool is_always_exhaustive() noexcept;

6 Returns:

constexpr bool is_exhaustive() const noexcept;

7 Returns:

constexpr index_type stride(rank_type r) const noexcept;

8 Preconditions: r is smaller than rank_.

9 Returns:

template<class LayoutRightPaddedMapping>
  friend constexpr bool operator==(
    const mapping& x,
    const LayoutRightPaddedMapping& y) noexcept;

10 Constraints:

11 Returns: true if

4.3 Layout specializations of submdspan_mapping [mdspan.submdspan.mapping]

Replace Section � [mdspan.submdspan.mapping] (“Layout specializations of submdspan_mapping”), with:

24.7.3.7.6 Specialization of submdspan_mapping [mdspan.submdspan.mapping]

24.7.3.7.6.1 Common [mdspan.submdspan.mapping.common]

1 The following elements apply to all functions in [mdspan.submdspan.mapping].

2 Constraints: sizeof...(slices) equals extents_type::rank(),

3 Mandates: For each rank index k of extents(), exactly one of the following is true:

4 Preconditions: For each rank index k of extents(), all of the following are true:

5 Let sub_ext be the result of submdspan_extents(extents(), slices...) and let SubExtents be decltype(sub_ext).

6 Let sub_strides be an array<SubExtents::index_type, SubExtents::rank()> such that for each rank index k of extents() for which map-rank[k] is not dynamic_extent, sub_strides[map-rank[k]] equals:

7 Let P be a parameter pack such that is_same_v<make_index_sequence<rank()>, index_sequence<P...>> is true.

8 Let offset be a value of type size_t equal to (*this)(first_<index_type, P>(slices...)...).

24.7.3.7.6.2 layout_left specialization of submdspan_mapping [mdspan.submdspan.mapping.left]

  template<class Extents>
  template<class... SliceSpecifiers>
  constexpr auto layout_left::mapping<Extents>::submdspan-mapping-impl(    // exposition only
    SliceSpecifiers ... slices) const -> see below;

1 Returns:

24.7.3.7.6.3 layout_right specialization of submdspan_mapping [mdspan.submdspan.mapping.right]

  template<class Extents>
  template<class... SliceSpecifiers>
  constexpr auto layout_right::mapping<Extents>::submdspan-mapping-impl(   // exposition only
    SliceSpecifiers ... slices) const -> see below;

1 Returns:

24.7.3.7.6.4 layout_stride specialization of submdspan_mapping [mdspan.submdspan.mapping.stride]

  template<class Extents>
  template<class... SliceSpecifiers>
  constexpr auto layout_stride::mapping<Extents>::submdspan-mapping-impl(  // exposition only
    SliceSpecifiers ... slices) const -> see below;

1 Returns:

24.7.3.7.6.5 layout_left_padded specialization of submdspan_mapping [mdspan.submdspan.mapping.leftpadded]

  template<class Extents>
  template<class... SliceSpecifiers>
  constexpr auto layout_left_padded::mapping<Extents>::submdspan-mapping-impl(    // exposition only
    SliceSpecifiers ... slices) const -> see below;

1 Returns:

24.7.3.7.6.6 layout_right_padded specialization of submdspan_mapping [mdspan.submdspan.mapping.rightpadded]

  template<class Extents>
  template<class... SliceSpecifiers>
  constexpr auto layout_right_padded::mapping<Extents>::submdspan-mapping-impl(    // exposition only
    SliceSpecifiers ... slices) const -> see below;

1 Returns: