Document number: P0893R0
Date: 2018-01-14
Audience: Evolution Working Group
Reply-To: Barry Revzin <barry.revzin@gmail.com>
Herb Sutter <hsutter@microsoft.com>

Chaining Comparisons

Contents

Introduction

The idea of chaining comparisons was first put forth in P0515R0, in section 3.3, reproduced here in its entirety.

C++17 has added fold expressions, which are very useful. However, as Voutilainen and others have reported, fold expressions do not currently work with comparisons. For example:
if (args <...) // not possible with correct semantics in C++17
We can permit two-way comparisons to be chained with the usual pairwise mathematical meaning when the mathematical meaning preserves transitivity (which also always means they have equal precedence). The valid chains are: For example, this:
if (a < b <= c < d)
would be rewritten by the compiler as-if as follows except with single evaluation of b and c:
if ((a < b) && (b <= c) && (c < d)) // but no multiple eval of b and c
To illustrate how the compiler would implement this, here is one valid implementation that would satisfy the requirements including single evaluation, by just defining and invoking a lambda:
if ([](const auto& a, const auto& b, const auto& c, const auto& d)
 { return a<b && b<=c && c<d; } (a,b,c,d))

Chaining support was one alternative suggested by Ville Voutilainen to permit natural use of comparisons in C++17 fold expressions, such as if (args <...). However, chaining is also broadly useful throughout people’s code, so instead of baking the feature into fold expressions only, it’s better to provide general-purpose support that can also express concepts like first <= iter < last. Providing general chaining also enables fold expressions as a special case (and with the “transitive” restriction above avoids the design pitfall of just providing chaining “for all comparison fold expressions,” when they should correctly be supported “for all comparison fold expressions except !=” because != is not transitive).

Without chaining, today we either perform double evaluation or introduce a temporary variable. I’ve many times wanted to write code like 0 <= expr < max without either evaluating expr twice or else having to invent a temporary variable (and usually a new scope) to store the evaluated value. A number of times, I’ve actually written the code without thinking, forgetting it wasn’t supported, and of course it either didn’t compile or did the wrong thing. As an example of “did the wrong thing,” this proposal does change the meaning of some code like the following that is legal today, but that is dubious because it probably doesn’t do what the programmer intended:

int x = 1, y = 3, z = 2;
assert (x < y < z); // today, means “if (true < 2)” – succeeds

In this proposal, the meaning of the condition would be if ((1 < 3) && (3 < 2)) and the assertion will fire. To use Stroustrup’s term, I consider this “code that deserves to be broken;” the change in meaning is probably fixing a bug. (Unless of course we do a code search and find examples that are actually intended.)

Non-chained uses such as (a<b == c<d) keep their existing meaning.

Existing Code in C++

The first question we sought to answer is the last question implied above: How much code exists today that uses chained comparison whose meaning would change in this proposal, and of those cases, how many were intentional (wanted the current semantics and so would be broken by this proposal) or unintentional (compile today, but are bugs and would be silently fixed by this proposal)? Many instances of the latter can be found in questions on StackOverflow [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] ....

To that end, we created a clang-tidy check for all uses of chained comparison operators, ran it on many open source code bases, and solicited help from the C++ community to run it on their own. The check itself casts an intentionally wide net, matching any instance of a @ b @ c for any of the six comparison operators, regardless of the types of these underlying expressions.

Overall, what we found was:

Finding zero instances in many large code bases where the current behavior is intended means this proposal has low negative danger (not a significant breaking change). However, a converse search shows this proposal has existing demand and high positive value: we searched for expressions that would benefit from chaining if it were available (such as idx >= 0 && idx < max) and found a few thousand instances over just a few code bases. That means that this proposal would allow broad improvements across existing code bases, where linter/tidying tools would be able to suggest rewriting a large number of cases of existing code to be clearer, less brittle, and potentially more efficient (such as suggesting rewriting idx >= 0 && idx < max to 0 <= idx < max, where the former is easy to write incorrectly now or under maintenance, and the latter is both clearer and potentially more efficient because it avoids multiple evaluation of idx). It also adds strong justification to pursuing this proposal, because the data show the feature is already needed and its lack is frequently being worked around today by forcing programmers to write more brittle code that is easier to write incorrectly.

Existing Code in Python

While we have no experience with this feature in C++, Python has always supported chaining comparisons:

Unlike C, all comparison operations in Python have the same priority, which is lower than that of any arithmetic, shifting or bitwise operation. Also unlike C, expressions like a < b < c have the interpretation that is conventional in mathematics [...]

Comparisons can be chained arbitrarily, e.g., x < y <= z is equivalent to x < y and y <= z, except that y is evaluated only once (but in both cases z is not evaluated at all when x < y is found to be false).

Formally, if a, b, c, …, y, z are expressions and op1, op2, …, opN are comparison operators, then a op1 b op2 c ... y opN z is equivalent to a op1 b and b op2 c and ... y opN z, except that each expression is evaluated at most once.

Note that a op1 b op2 c doesn’t imply any kind of comparison between a and c, so that, e.g., x < y > z is perfectly legal (though perhaps not pretty).

The result is the ability to write natural comparison chains, without having to pairwise break them up with ands.

However, as the Python documentation itself points out, C++ has higher precedence for the operators {>, >=, <, <=} than for the operators {==, !=}. As a result, the expression a < b == c < d today is parsed as the possibly-meaningful (a < b) == (c < d), and not the likely meaningless ((a < b) == c) < d. To interpret it as Python does would involve changing the underlying grammar of C++ and break such code (though we did not find any instances of this kind of mixed comparison, i.e. a < b == c, in our search).

Issues at Hand

There are several questions that need to be answered about how comparison chaining would work in C++.

Regardless of the choice of options, it should be noted that parentheses are significant here. Operator chaining would only apply to unparenthesized expressions. Adding parentheses would be one way of expressing intent. This is the same way that Python behaves today, where 5 > 4 > 3 evaluates to True (due to its evaluation as 5 > 4 and 4 > 3) while (5 > 4) > 3 evaluates as False (due to its evaluation as True > 3). If those situations arise where a programmer deliberately wants an unchained comparison, that is available to them with the use of parentheses.

We will take each option separately.

Which operators can chain?

We would prefer to see this apply to all operators, built-in and overloaded. This is different from && and ||, which change behavior when overloaded because then they don't short-circuit. However, there are many user-defined types for which comparison chaining would have desirable, well-defined behavior (e.g. std::pair).

Which expressions can chain?

Why do we need a restriction at all? If we decide to only allow for chaining of builtin-operators, then this question is effectively moot. But once we get into the realm of overloaded operators, there are instances of chaining comparisons on objects where the behavior is decidedly not related to comparisons. Examples include Boost.MultiArray:

range r6 = -3 <= range().stride(2) < 7; // not intended to be a chained comparison 
or Boost.Process:
bp::spawn(
    master_test_suite().argv[1],
    "test", "--prefix-once", "test",
    bp::std_in  < in_buf > fut_in, // not a chained comparison
    bp::std_out > fut,
    io_service,
    ec
);
or Boost.Spirit:
rule<char const*> r;
r = '(' > int_ > ',' > int_ > ')'; // not a chained comparison
or even the less obvious Catch2:
std::vector<int> v;
REQUIRE(v.size() == 0); // macro expands to Catch::Decomposer() <= v.size() == 0, not a chained comparison

Simply stating that all comparison chains get transformed into pairwise comparisons &&-ed together would definitely break code. We cannot cast a net that wide.

The simplest approach would be just to accept strictly boolean sub-expressions as candidates. That is, the expression a @ b @ c is transformed into a @ b && b @ c only if both a @ b and b @ c have type cv bool. This would allow the most typical expected usage of range checking on arithmetic types or equality checking amongst many objects, while also avoiding changing the meanings of any of the above examples. If we allow overloaded operators as well, and those overloaded operators return bool (as is typical, and as would be implicitly generated if using the new operator<=>), then this would already allow for a wide variety of uses.

However, there additionally exists some code that has comparison operators that, rather than returning bool instead return std::true_type or std::false_type. Such return types are common in metaprogramming libraries, where we can encode the result into the type of the return object, instead of just the value. These types do satisfy the Boolean concept without being strictly bool, and seem safe to be included. Metaprogramming code could benefit from improved readability as well. It seems safe to include this wider range of possible types.

For overloaded comparisons operators that do not return a Boolean type, chaining can still be supported but just is not automatic: it is the responsibility of the overloaded operator author to make chaining work correctly for their comparison if that is what they want. We observe that these already exist, where overloaded operators like the Boost.MultiArray example already implement a flavor of chaining behavior even in the absence of precedents in the language.

Which operator sequences can chain?

In its original presentation in P0515R0, only a specific subset of comparison operator sequences lead to chaining. Those operator sequences were precisely those that maintain transitivity:

The ability to chain these operator sequences offers clear improvement to readability in real-world code, including major commercial projects:
(src)TodayProposed
clang
return Success((CR_r == APFloat::cmpEqual &&
    CR_i == APFloat::cmpEqual), E);
return Success((CR_r == CR_i == APFloat::cmpEqual), E);
LLVM.Demangle
} else if ('1' <= first[1] && first[1] <= '9') {
} else if ('1' <= first[1] <= '9') {
Boost.Numeric
return x.upper() >= y && y >= x.lower();
return x.upper() >= y >= x.lower();
Boost.Regex
if(sub < (int)m_subs.size() && (sub > 0))
if(0 < sub < (int)m_subs.size())

The Python language, on the other hand, has no such restrictions. Any comparison operator sequence chains, but this appears to permit mainly pitfalls, not new good uses. In particular, it allows for some reasonable-appearing chains like a < b == c < d, but also allows some less likely chains like a < b > c and a != b != c, which are known pitfalls - the Python documentation has to emphasize that these do not actually imply any relationship between a and c. We believe that further investigation in analyzing C++ code bases and languages like Python support the position that all of the chains initially recommended in P0515R0 are useful and should be supported, and that all of the chains not recommended in P0515R0 are unuseful or actively harmful, and so should not be interpreted as chained (any code that writes such chains almost certainly will get something unintended).

We were able to find several expressions of the unrestricted variety that might theoretically be shorted by chaining, but (a) the following rewrites could never actually be made to work without changing the precedence of == and != with respect to <, <=, >, and >= which would be an impossibly large breaking change to consider, and (b) even if we did that, the resulting code is not actually better. In our opinion, it is visually ambiguous and unclear in all cases:
(src)TodayPython-like chaining
(not proposed)
Boost.Math
if((floor(z) == z) && (z < max_factorial::value))
if((floor(z) == z < max_factorial::value))
LLVM.Transforms
if (ObjectSize == Later.Size &&
    ObjectSize >= Earlier.Size)
if (Later.Size == ObjectSize >= Earlier.Size)
LLVM.Support
assert(count != 0 &&
    count <= APFloatBase::integerPartWidth / 4);
assert(0 != count <=
    APFloatBase::integerPartWidth / 4);
LLVM.CodeGen
assert((LCM >= A && LCM >= B)
    && "LCM overflow");
assert((A <= LCM >= B) && "LCM overflow");
Boost.Intrusive
if(n != p && i != p)
if(n != p != i)

What about fold expressions?

Today, all six of the comparison operators are valid binary operators to use in a fold expression, but the expansion rules always produce parenthesized sub-expressions. That is:

template <typename... Ts>
bool ordered(Ts... vals) {
    return (... <= vals);
}

ordered(4, 3, 2, 1); // instantiated as ((4 <= 3) <= 2) <= 1, evaluates to true, even with this proposal

As mentioned earlier, parentheses are significant, so having fold expressions expand as parenthesized would continue to inhibit comparison chaining. This makes today's fold expressions for comparisons not useful and actually buggy. We therefore propose that fold expressions for the comparison operators instantiate as unparenthesized, to make them both useful and correct:

template <typename... Ts>
bool ordered(Ts... vals) {
    return (... <= vals);
}

ordered(4, 3, 2, 1); // proposed instantiate as 4 <= 3 <= 2 <= 1, proposed evaluate as false

The implications of this change are that left and right folds are exactly equivalent for the comparison operators, and there would be no way to get the current behavior. The alternative would be that fold expressions on the comparison operators continue to have questionable use.

Proposal

We propose that expressions of the form a @ b @ c @ ... @ n, where each @ is one of the six comparison operators (whether builtin or overloaded), be evaluated as (a @ b) && (b @ c) && ... && (m @ n) with each expression being evaluated at most once, under the following circumstances:

Any potential chaining candidate that does not meet these restrictions would retain its current meaning (including expressions such as a != b != c). This allows all the DSLs currently existing to continue working properly, since the results of each sub-expression would not model Boolean. Each of the examples presented earlier would continue to have the same behavior they do today. No code would be broken.

We additionally propose that fold expressions over the comparison operators be instantiated as unparenthesized to allow them to be interpreted under these new chaining rules. This rule would apply to both unary and binary folds, and both left and right folds. While (xs != ...) is likely not a meaningful expression (due to != not being transitive) we do not propose making this fold ill-formed at this time.

Optional Extension

With or without this proposal, the only sequence that in our opinion does appear to be nearly always a bug is a sequence of all != (e.g., a != b != c). The proposal, as presented above, neither recommends nor blocks a change to a sequence of !=, though we do suggest that lint tools and style guides should recommend against successive unparenthesized !=.

A possible extension would be to take those expressions of the form a != b != ... != z, which consist of at least three "operands" and where each pairwise comparison has a type that models Boolean, and make them ill-formed. This would, as a consequence, make Boolean-like folds over != ill-formed unless the underlying pack was of size 1 or 2 (and the former of which isn't even a comparison at all). It would make sense at that point to suggest removing != as a fold operator entirely.

Acknowledgements

Thanks to T.C. for help in properly specifying the search for chained comparisons. Thanks to Nicolas Lesser and Titus Winters for contributing data for live use of comparisons.