Document #:

ISO/IEC/JTC1/SC22/WG21/P2717R5

Date:

2023-11-10

Audience:

SG15

Authors:

René Ferdinand Rivera Morell

Reply-to:

grafikrobot@gmail.com

Copyright:

Copyright 2022-2023 René Ferdinand Rivera Morell, Creative Commons Attribution 4.0 International License (CC BY 4.0)

1. Abstract

We propose to add a mechanism for C++ tools to communicate what capabilities a tool implements from the Ecosystem IS. [1]

2. Revision History

2.1. Revision 5 (November 2023)

Change capability name delimiter to . (FULL STOP) per merge poll. And merge wording into draft Ecosystem IS. [1]

2.2. Revision 4 (November 2023)

Add explanation for backward compatibility in regards to what it means for tools.

2.3. Revision 3 (October 2023)

Added poll results from June 2023 WG21 meeting.

Adds the use of an "introspection file" as an alter way to provide the introspection information for situations where it is not desirable or possible to execute a program for the information.

Adds support for two option syntax styles to widen the set of applications that can support implementing this specification without having them change their option syntax parsing. The result is that we now support all major C++ compilers and many more other ecosystem tools.

Adopt the semantic versioning specification to both define the syntax and semantics of compatibility for capabilities.

2.4. Revision 2 (June 2023)

Changed wording to match text in draft IS. Use Unicode for literals as needed. Changed the version numbers to follow the Semantic Versioning, and JSON, specs of not allowed leading zeros. Add array of version ranges to full level support to allow for disjoint support reporting. Updated the JSON schema to correct it not properly excluding extra unknown properties, to correctly restrict the syntax of capability names and values, and to add array of version ranges for capability fields. Change capability names to allow for numbers. Add exposition on command option handling and specification files. Add some possible user interface alternatives for obtaining the specification JSON text. And describe the currently preferred choice.

2.5. Revision 1 (May 2023)

Addition of scope, functionality levels, use cases, and wording. Simplified introspection and declaration interfaces to make implementing introspection trivial and declaration straightforward. The simplification removes the bounded introspection interface as superfluous.

2.6. Revision 0 (December 2022)

Initial text.

3. Motivation

C++ tools will implement the aspects of the Ecosystem IS [1] that are relevant to the particular tool. And when they implement those aspects they may implement a particular edition of them. In order to allow other tools to adjust their behavior to accommodate such differences we need a mechanism of introspection for all tools. Additionally when one tool requests to use another tool’s Ecosystem IS aspect it’s desirable to consistently communicate which edition(s) of that aspect it can use.

4. Scope

This proposal aims to specify a method for tools to communicate which specific aspects of the Ecosystem IS they support and adhere to consumers (either other tools or users). It does not prescribe which aspects of the Ecosystem IS the tools must support or adhere to except to prescribe that supporting any capability of the Ecosystem IS must also support this aspect. Ultimately it wants to make it possible to address two cases:

  • What does the tool support and adhere to?

  • The tool should adhere to what the consumer asks if possible.

5. Design

There are two aspects that this proposal covers:

Introspection

A tool reporting its capabilities to a consumer.

Declaration

A consumer specifying the capability edition and version.

Introspection would allow a consumer to ask the target tool what versions of of capabilities it supports. The target tool would respond with the range of capabilities, or nothing, that it supports. With that information the consumer can go ahead and follow the defined standard, in the Ecosystem IS [1], to further interact with the target tool.

For declaration a consumer can specify a particular capability and a version to interact with. And if the target tool recognizes the specification it can continue to process the consumer’s use of that capability.

Even though these are two separate functions they are by necessity tied to each other. In order for this pairing to work, and generally for tool interoperability to work, the tool consumers and target tools must operate on this minimal pair of functions to bootstrap their interactions. To make that possible, this design follows some basic tenets:

Minimal

The interface of the target tool is a single universal command line argument for each of the two operations.

Concise

The information communicated to and from the target tool and consumer is as brief as needed to convey the required information.

Robust

The interface and information should not result in failure conditions for either the consumer or target tool. Both ends of the interactions need to rely on the stability of the interface to then be able to interoperate.

5.1. Introspection

We used to include a bounded introspection option. But turned out to be not worth the added complexity in the consumer and tool.

The consumer can use a single method to query the target tool and obtain all the capabilities that are available or specifically requested. The use case supported is for unbounded introspection of the available capabilities with a single valueless --std-info option.

And unbounded introspection simply returns everything the tool is capable of doing. The tool has the option to respond with either all minimal single (aka bare) versions or full version ranges. Either can be trivially implemented by tools as most time it can be a hard-wired response text.

Running a tool with the option would look like the following:

$ tool --std-info

And could produce this as a minimal JSON output to indicate the single version of the capabilities it supports:

{
  "$schema": "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "std.info": "1.0.0"
}

Or could produce this as a JSON output in the case of full version ranges:

{
  "$schema": "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "std.info": "[1,2.5]"
}

Which would minimally indicate that the tool only supports the introspection capability at versions "1.0.0" through "2.5.0".

Per the findings of the User Interface research and the consensus of the SG15: P2717R2 (2023-06-16) polling a tool can, additionally, provide the introspection information in a file accompanying the tool. There is a challenge when providing such an introspection file though: It is not practical to specify an absolute location, or locations, across the variety of operating systems and tools in the C++ ecosystem. As such we provide some possibilities:

  1. The name of an introspection file will be the name of the top level invoked tool executable (or script, or equivalent) with any type extension (i.e. “.exe”) removed if it exists. That base name will be appended with the .stdinfo text. For example: cl.execl.stdinfo, or g++g++.stdinfo.

  2. The introspection file can be found: in the same filesystem location as the tool executable (or script, or equivalent), in an implementation defined location relative to the tool location, or in an implementation defined global location (i.e. an absolute path location).

The choice of the .stdinfo extension is subject to a naming discussion.

5.2. Declaration

The consumer can inform, i.e. declare, to the target tool that specific capabilities should use particular versions when responding with information using one or more --std-info=<VersionSpec> options. The declarations can only exist in tandem with options for the mentioned capabilities. It’s expected that a consumer will first introspect a target tool to discover what it supports. Followed by the consumer declaring to the target tool what version(s) of the capabilities it is willing to consume. The target tool can then either accept the declared capability versions or indicate an error.

An exchange between a consumer and target tool would begin with the introspection:

tool "--std-info"

With a target tool response:

{
  "$schema": "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "std.info": "[1,2)",
  "gcc.extra": "[2.1]"
}

Which the consumer can use to declare the specific capability versions:

tool "--std-decl=std.info=2.0.0" "--std-decl=gcc.extra=2.1.0" ...

5.3. Levels

For some use cases it helps to simplify the extent of information the introspection understands. While it would be reasonable to expect a tool written in a modern general purpose programming language to fully implement all aspects of the introspection. It would not be practical to have a shell script parse and recognize the more challenging aspect of parsing version number ranges and matching them together. To support such use cases the introspection has to support levels "min" and "full".

Obviously the "full" level equates to the tool understanding all the arguments and values. The "min" level only understands these:

  • Only introspection --std-info option.

  • Single version number in the responses for --std-info.

This has the effect that a tool which only support the "min" level can only support specific versions of the capabilities it implements. But it also means that consumers will need to adjust their behavior to the tool instead of being able to ask the tool to adjust to the consumer. Consequently the consumer will likely have the more complex logic to do that adjustment.

5.4. Capabilities

For this proposal capabilities refers to any published coherent target tool interface. This can include any single interface, like a single target tool option. Or it can include a collective interface of the target tool that covers many options. A capability is specified as a series of "scoped" identifiers separated by colons (":"). The capability must match this regular expression: [2]

^[a-z0-9_]+(:[a-z0-9_]+)+$

At minimum a capability has two components. The first component is a general scope that identifies if the capability is one in the IS, or if it’s a tool vendor capability.

Standard

A capability with a scope of std indicates that it’s defined in the IS. [1]

Vendor

Any other capability, i.e. other than std, is available for vendors to use as extensions outside the IS. [1]

There was a question on "Why not alow 0-9 in the name?". Considering this brings up the question as to the utility of having numbers in the name. An obvious use case is to add versioning to the name, for example std2. That is a case we want to avoid. As it avoids using the version numbers themselves which subverts the spirit of the introspection. Another use case is to cover vendor specific names for tools that use a number in their names, for example b2, build2. Because that is a currently existing use case, and that forcing such applications that want custom capabilities to create alternate names has various drawbacks, yes, we should accept numbers in the names.

5.5. Version Specification

When indicating the version, or versions, to the target tool or the consumer the version information is specified in two possible forms: a single version, or a single version range.

5.5.1. Semantic Versioning

We use the base (pre-release and build labels are not allowed) specification of Semantic Versioning 2.0.0 [3] to define the syntax and semantics of compatibility.

We define a tool (producer or consumer) to be backward compatible, for semantic versioning, with another tool (consumer or producer) when the consumer that implements an older version of the API can operate, with the same semantics, when interacting with a producer that implement a newer version of the API, and vice versa.

For example: If a producer generates JSON structured data. In a newer, compatible, version if may decided to introduce a new field. If such a field can be ignored by the consumer such that ignoring it does not change the operational semantics of the consumer the API would be considered backward compatible. And hence could be indicated with a MINOR or PATCH version difference per semantic versioning.

The specifics of how the API behaves to achieve backward compatible changes is up to the individual specification of the capabilities. As the ability to be backward caompatible varies with the specifics of many factors, like tool options, data formats, and so on.

5.5.2. Single Version

A single version in this proposal is composed of a one to three dotted whole numbers. The numbers are expected to be strictly increasing. Following SemVer [3] a change to the MAJOR version indicates a backward incompatible change. And changes to the MINOR and PATCH versions indicate backward compatible changes. The format for the version must match the regular expression: [2]

^[0-9]+([.][0-9]+){0,2}$

5.5.3. Version Range

A version range in this proposal indicates a lower and upper bound of versions. It is composed of a pair of versions, separated by a comma, and bracketed by either an inclusive or exclusive symbol. This matches the intuition of a mathematic interval, but with the use of the version triplet number line. [4] Like the interval notation the () brackets indicate an exclusive point. And the [] brackets indicate an inclusive point. As versions are decidedly not single integers we use a , (comma) to separate the start and end of the range instead of using ... Hence the format for the version range must match the regular expression: [2]

^[[(][0-9]+([.][0-9]+){0,2},[0-9]+([.][0-9]+){0,2}[)\\]]$

5.5.4. Multiple Ranges

There are situations where specifying only one version range for what the application supports is not sufficient. For example an application may decide that they add support for a 2.0.0 version but not support further 1.x.y versions. In that case it’s important to be precise in informing consumers of this fact. To allow for that situation one can specify a JSON array instead of the single JSON string for the version range. For example:

{
  "$schema": "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "std.info": "[1.0.0,2.0.0)",
  "gcc.extra": [
    "1.0.0",
    "[2,3)"
  ]
}

Would indicate support for versions 1.0.z through 1.2.0, and version 2.0.0.

5.6. Version Matching

When given two version specifications tools will need to match the two to determine the sub-range that are compatible with both. There are two aspects to doing that matching: comparing the two single versions, and evaluating the sub-range interval.

5.6.1. Single Version Comparison

Comparing two single versions equates to three-way comparing each of the components of both, a and b, as:

  1. If the whole numbers of the first components, i and j, are not equal the comparison is either a < b or a > b if i < j or i > j respectively. Otherwise,

  2. If the whole numbers of the second components, k and l, are not equal the comparison is either a < b or a > b if k < l or k > l respectively. Otherwise,

  3. If the whole numbers of the third components, m and n, are not equal the comparison is either a < b or a > b if m < n or m > n respectively. Otherwise,

  4. The versions are equal, i.e. a == b.

5.6.2. Range Comparison

Tools will need to compare either a single version to a version range, or a version range to another range to determine the overlapping version sub-range. The single version to a version range comparison can be reformulated to a range-to-range comparison. I.e. a comparison of a single range a to a range b is equivalent to a comparison of range [a,a] to range b. Hence we only need to consider the range-to-range comparison. Although implementations may use special case for comparing single-to-range and range-to-single. Range-to-range should follow something like the following to compare a range a,b to m,n, with some varied inclusive or exclusive ends:

  1. If b < m or n < a the range is empty.

  2. Otherwise, assign a partial range x,y = max(a,m), min(b,n).

  3. If a or m are inclusive, then:

    1. If b or n are inclusive, then the range is [x,y].

    2. Otherwise, the range is [x,y).

  4. Otherwise, if b or n are inclusive, then the range is (x,y].

  5. Otherwise, the range is (x,y).

5.7. Format

The information reported by introspection is a JSON [5] format document. Some advantages to using JSON:

  • It is widely used and available either natively or through libraries in many programming languages. Which is particularly important as C++ tools are written in an array of differing programming languages.

  • It is a simple format to understand by both programs and humans.

In maintaining our goals of the interface being minimal, concise, and robust, the format for communicating the capabilities is a single key/value collection, i.e. a JSON object. [5]

Capability Identifier

The key is a string with the capability identifier. The format of the is as described in the Capabilities section.

Version Specification

The value indicates the versions supported by the tool for the capability. The versions follows the format described in the Version Specification section.

In addition to the capability identifier / version specification members, there are additional special members:

Schema

The document can also specify a reference to a JSON Schema. [6] For this the key would be $schema, and the value would a URI to a published stable schema (https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json).

There is one designated capability that is required to appear in the document: The std.info capability with a corresponding version specification. This requirement allows a consumer to identify the format of the rest of the document at all times.

This is a minimal conforming document:

{
  "std.info": "1.0.0"
}

This is also a minimal conforming document. But specifies a range of versions supported for the std.info capability:

{
  "std.info": "[1.0.0,2.0.0)"
}

This example adds a custom vendor capability and the schema reference:

{
  "$schema": "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "std.info": "[1.0.0,2.0.0)",
  "gcc.extra": "1.5.0"
}

See the Wording for a JSON Schema for this format.

5.8. Capability Versions

The capabilities and their version is expected to work similar to how C++ feature macro version ([version.syn]) in that it specifies if a feature of a standard is implemented and at what version. Although the meaning of the capability version is not defined, it’s recommended that it follow some simple rules:

  • The major-number should only change for large changes.

  • The minor-number should only change for fixes that are significant, but not large.

  • The patch-number should only change for fixes that are simple and small.

That is, it follows the industry understanding of sematic versioning. [3]

  • Each part of the version number should always increment, but;

  • The minor-number should reset to zero when the major-number increases, equivalently for the patch-number and minor-number.

These rules set it apart from the C++ feature macros that they impart some meaning to a version relative to other versions.

5.9. User Interface

This proposal currently suggests to add some application command line (CLI) options as the user interface for obtaining the introspection information. In particular adding --std-info=X and --std-info-out=X options for any conforming tool. Some compiler vendors expressed some concerns regarding this choice:

  • Launching the application to get this information can be expensive, particularly in "performance sensitive scenarios".

  • It increases the binary size of applications. Which can impact deployment time in some environments, like continuous integration.

One alternative to adding command line options, in this case, and as suggested, is to have an external fixed file with the content. This alternative hinges on being able to find that file through some reasonably stable method.

We explore the pros and cons of both choices herein. Note, as this feature has not yet been implemented the analysis below is an informed best guess.

First some assumptions:

  1. We are only going to consider the logic for adding the minimal conforming interface and introspection information result. I.e. minimum level (intspct.min) functionality.

  2. We will make some best effort prospective optimizations to an expected implementation. I.e. try to think of minimal code and data that reuses existing functionality in an application.

It is important to understand in these implementation considerations that tools can be both an application and consumer in this. Where an application is a tool producing the introspection information. And a consumer ingests that information. But either can be a compiler driver, linker, assembler, analyzer, build system, package manager, IDE, and so on. For example a package manager will invoke a build system for introspection. But also a build system will invoke a package manager for introspection.

5.9.1. Command Line Options

Adding command line options to an application is a well know practice that has a long history. As such it’s relatively easy to estimate it’s impact.

(A) Application: Sizeof of introspection string in the application "binary".

The absolute minimum level conforming introspection string in this would be {"std.info":"1"}. But that’s not particularly useful as we would expect some other items represented. Being generous we can make a guess of having 10 items: {"std.info":"1","std.first":"1","std.second":"1","std.third":"1","std.fourth":"1","std.fifth":"1","std.sixth":"1","std.seventh":"1","std.eighth":"1","std.ninth":"1"}. Which gives a total of 165 UTF-8 code points, or the same byte count, plus a null terminator. We can round that up to 200 bytes total.

(B) Application: Additional code to handle the options.

This cost is harder to estimate as the collection of application implementations is varied in both method and programming languages. For this we can roughly estimate an implementation difficulty for some of the most used tools in the C++ ecosystem. Below is a survey of the difficulty of adding various command line option syntax in three categories, compiler drivers, build systems, and package managers:

Tool Current --opt=val -opt=val --opt:val -opt:val

Compiler Driver

cl.exe (Windows, macOS)

/opt:val
-opt:val

unknown1

clang (many)

-opt val
-opt=val
--opt=val

Easy; use Joined<["--"], "foo:"> in clang​/​include​/​clang​/​Driver​/​Options.td2

gcc (many)

-opt val
-opt=val
--opt=val

Trivial; just use : instead of = to spell the option in *.opt3

Build System

Cmake (many)

-opt val
-opt=val
--opt val
--opt=val

Easy; Add test for ':' in cm​Command​Line​Argument.h4

MSBuild (many)

/opt:val
-opt:val

unknown1

Ninja (many)

-opt val
--opt=val

Very Hard; requires changing getopt_long5

QMake (many)

-opt val

Medium; it’s custom C++ 6

GNU Make (many)

-opt val
--opt=val

Very Hard; requires changing getopt_long5

autotools (Unix-like)

-opt val

Very Hard

Gradle (Java)

-opt val
-opt=val
--opt val
--opt=val

Easy; it’s a single custom parser: CommandLineParser.java 7

Bazel (Unix, macOS, Windows)

--opt=val
--opt val
-opt val

Very Hard; Mostly Starlark code.

nmake (Windows)

/opt val
-opt val

Easy; it’s a simple C arg parser.

Meson (Python)

-opt=val
--opt=val
--opt val

Hard; uses Python argparse8

SCons (Python)

-o val
--opt=val
--opt val

Hard; uses Python argparse8

B2 (Boost Build)

-oval
-o val
--opt=val
--opt val

Medium; custom C code, conflicts with -oval.

Easy; uses Jam+regex matching.

Medium; custom C code, conflicts with -oval.

Package Manager

Conan (Python)

-opt=val
--opt=val
--opt val

Hard; uses Python argparse8

vcpkg (Many)

--opt=val

Medium; custom C++ code.

NuGet (Many)

-opt val

unknown1

Hunter (CMake)

-Dopt=val

Impossible; it’s written in CMake.

Spack (Unix, macOS)

-opt val
--opt val

Easy; may already be supported from use of Python argparse.

Hard; uses Python argparse8

Build2 (Many)

-opt val
--opt val

Hard; Seems to use a custom language and compiler for argument definition and parsing.

  1. Unable to estimate as it’s closed source.

  2. llvm-project has a few utilities that uses LLVMOption to parse command line options. See fdOpts.td.

  3. Would prefer not to depart from existing POSIX conventions.

  4. https://github.com/Kitware/CMake/blob/master/Source/cmCommandLineArgument.h#L102

  5. Uses gnuopt_long in gnulib/lib/getopt.c. Which has a global effect on the ecosystem of tools that use getop_long across many systems.

  6. https://github.com/qt/qtbase/blob/55aee8697512af105dfefabc1e2ec41d4df1e45e/qmake/option.cpp#L173

  7. https://github.com/gradle/gradle/blob/master/subprojects/cli/src/main/java/org/gradle/cli/CommandLineParser.java

  8. Choosing to change the Python argparse as a solution for this results in a global effect on all Python programs that use argparse and would prevent backward compatibility.

Of the above set of possible option syntaxes and within the set of applications the most widely accepted option syntax is the --opt=val variation. Hence, it currently appears, that the least cost avenue is to use the --opt=val syntax globally for the Ecosystem IS.

Although the cost of using --opt=val varies across the range of applications in aggregate we can estimate the cost as "medium". As most applications already support this option syntax. And it’s possible for some other applications to add limited support for this syntax.

(C) Consumer: Executing the application.

The cost of executing the an application comes in different parts:

  1. There’s the basic cost of the execution itself, which varies between environments. But is a well known cost and easy to account for.

  2. There’s the cost of, at best, one more execution of the application to gather the introspection information.

5.9.2. Specification File

Having an additional specification file can support some additional use cases that using command line options can’t. The idea for this alternative is to have the JSON information in a file that is easily findable by consumers. Some possible locations are: as a specially named sibling to the application, in some standard location in the system with a special file name, manually specified by the user (for example through an environment variable or other consumer specific configuration). There are a couple of differing costs involved in having introspection files:

(A) Application: Deployment of extra file with application "binary".

Most applications already deploy extra files that support the main application. Hence adding another file is of negligible cost. Where the file is located is a concern. As finding a single consistent location for such a file across many environments is very difficult, at best, or impossible at worse. For example, while it’s natural to have a sibling to the executable information file on Windows, it’s not usual on Unix when installing to the system directories (i.e. /bin).

An aspect of having the extra file is both the extra on-disk storage and time to install the file. For many uses this is not a concern. But there are classes of cases where the install is done repeatedly as would be seen in CI testing systems that require fresh installs. This is a concern regardless of where the data lives though. As it’s the same data if it’s an extra file or embedded in the application.

(B) Consumer: Deployment of extra file with application "binary".

A common method of distributing computation, especially C++ compiles, is to transport the tools from one machine to many, for example Incredibuild. The cost of transporting this extra file is minimal though. As the data is small, as shown above, and such systems are already dealing with transporting and caching such information.

(C) Consumer: Additional code to find the application "binary".

If the extra file is available from some location relative to the application consumers will need to implement search methods to first find the application before attempting to find the extra file. This search can be challenging for a variety of reasons like: needing to interpret PATH searching (in the case of not having an absolute file path), accounting for following symbolic links (or equivalents), avoiding user permission restrictions, and so on. The difficulty of this will also differ based on the utilities available in the language the application is written in and what the system provides.

(D) Consumer: Additional code to find the introspection file.

Assuming we have a path to the application, per above, and/or that we have known locations it is relatively straightforward to find a specially named extra file. But that the more choices one has to account for the more implementation there is that can run into problems. Additionally tools like Incredibuild would need to learn about the extra file and consumers might need to use special logic to account for both the usual location of the file and the transported file location.

5.9.3. Alternatives

Given all that we can try and evaluate some alternative user user interface possibilities. Note, that these are not exhaustive. But they are, currently, the most likely to work in the widest set of use cases.

Pros Cons

Single Option Style
--std-info=X, --std-info-file=X

  • Low implementation cost.

  • Uniform handling for consumers.

  • Some applications will need to implement a new option style.

  • Running the application may not be possible by the consumer.

Two Option Styles
--std-info=X, --std-info-file=X and/or -std-info:X, -std-info-file:X

  • Low implementation cost.

  • Limited set of option handling for consumers.

  • Avoids changing Microsoft tools option handling.

  • Adds an extra check, and context, for consumers.

  • Running the application may not be possible by the consumer.

Implementation Defined Option Style
(i.e. current status quo)

  • Low implementation cost.

  • No changes to option handling for producers.

  • Adds extra checks, and contexts, for consumers.

  • Running the application may not be possible by the consumer.

Specification File

  • Avoids cost of adding options for producers.

  • Allows use when the application can’t be executed.

  • Adds complexity of finding the file for consumers.

  • Adds cost of transporting file along with the application where needed.

Specification File and "Two Option Styles"

  • Low implementation cost.

  • Limited set of option handling for consumers.

  • Avoids changing Microsoft tools option handling.

  • Allows use when the application can’t be executed.

  • Some applications will need to implement a new option style.

As we can see, no alternative is a perfect choice. But hopefully we can see that the last one, Specification File and "Two Option Styles" is the most advantageous. But what is it? Other than the obvious of mashing the Specification File and "Two Option Style" alternatives together. The characteristics and requirements would be:

  1. A producer would be required to implement one or both of the two option styles: --opt=val or -opt:val.

  2. A producer would be required to indicate an error for an option style it does not accept.

  3. A producer could implement the std-info-file request as they wish, including reading from a file, reading from internal fixed text, dynamically generating the information, or any other method it deems appropriate.

  4. A consumer that wants to execute the producer directly would be required to try both the --opt=val and -opt:val styles in an order of its choosing to find the style that works for the producer.

  5. A consumer can save the produced information, using the std-info-file option, or other method of its choosing to a file that it can read directly afterwards.

  6. A consumer that does not want to execute the producer directly can use a previously saved information file.

  7. A consumer that does not want to execute the producer directly is required to search a small, defined, set of either relative to the producer or absolute locations for a specified specially named file.

The key differences from the previous specification of only the Single Option Style alternative are:

  • The addition of the -opt:val style.

  • Item (4) on consumers to try both option styles.

  • Item (7) specifying some search location for the information file.

That combination of features and requirements avoids most of the problems one can encounter without creating additional ones.

5.10. Impact On The Standard

This specification adds new functionality that is partly required for programs. Other specifications that define program behavior will need to follow this specification for conformance to the Ecosystem IS.

6. Implementation Experience

None yet.

7. Polls

7.1. SG15: P2717R0 (2023-01-27)

SG15 wants to pursue defining in the Tooling IS a way for tools to provide portable information about which parts of the Tooling IS and vendor extensions they support.

SF F N A SA

5

3

0

0

0

7.2. SG15: P2717R2 (2023-06-16)

Require a specification file and a procedure how it is found.

SF F N A SA

1

6

4

1

0

Result: Consensus

Require tools to support at least one of the two styles of command line options (if the tool can be run at all).

SF F N A SA

4

5

1

2

0

Result: Consensus

The capabilities documented by the IS should be versioned in a way that supports backwards incompatible changes.

SF F N A SA

4

6

1

0

0

Result: Consensus

7.3. SG15: P2717R4 (2023-11-09)

Change the capability identifier separator from : to . and merge P2717 into the draft Tooling Ecosystem IS.

SF F N A SA

7

4

1

0

0

8. Wording

Wording is relative to ecosystem-is/5c7ecf7. [7]

8.1. Normative references

In [intro.refs] add:

POSIX

ISO/IEC 9945:2009, Information technology — Portable Operating System Interface (POSIX®) Base Specifications, Issue 7

JSON

ISO/IEC 21778:2017, Information technology — The JSON data interchange syntax

Mathematics

ISO 80000-2:2019, Quantities and units — Part 2: Mathematics

SemVer

The SemVer Team. Semantic Versioning 2.0.0. June 18 2013. Available at: https://semver.org/spec/v2.0.0.html

8.2. Specification: Conformance

Insert clause before Terms and definitions [intro.defs].

8.2.1. Conformance [cnf]

A conforming implementation shall meet the following criteria for conformance to this standard:

— An application shall support the minimum level functionality of introspection (intspct.min).

8.3. Definitions

Add the following to Terms and definitions [intro.defs].

8.3.1. application [defns.application]

a computer program that performs some desired function.

NOTE 1: From POSIX.

8.3.2. capability [defns.capability]

an aspect of an overall specification that defines a subset of the entire specification.

8.3.3. directory [defns.directory]

a file that contains directory entries.

NOTE 1: From POSIX.

8.3.4. directory entry [defns.direntry]

an object that associates a filename with a file.

NOTE 1: From POSIX.

8.3.5. file [defns.file]

an object that can be written to, or read from, or both.

NOTE 1: From POSIX.

8.3.6. filename [defns.filename]

a sequence of bytes used to name a file.

NOTE 1: From POSIX.

8.3.7. parent directory [defns.parentdir]

a directory containing a directory entry for the file under discussion.

NOTE 1: From POSIX.

8.3.8. pathname [defns.pathname]

a string that is used to identify a file.

NOTE 1: From POSIX.

8.4. Specification: Introspection

Insert clause after Terms and definitions [intro.defs].

8.4.1. Introspection [intspct]

8.4.1.1. Preamble [intspct.pre]

This clause describes options, output, and formats that describe what capabilities of this standard an application supports. An application shall support the minimum level functionality (intspct.min). An application can support the full level functionality (intspct.full).

This clause specifies the std.info capability (intspct.cap).

8.4.1.2. Overview [intspct.overview]

application [ std-info-opt [declaration] ] [ std-info-out-opt file ]

8.4.1.3. Options [intspct.options]

Applications shall accept one of two options syntax variations: --name=value (--name without a value) or -name:value (-name without a value).

Applications shall indicate an error if invoked with an option syntax variation that it does not support.

NOTE 1: An application will report the error in what is conventional for the platform it runs in. On POSIX and Windows it would return an error code, and optionally output to the error stream.

NOTE 2: It is up to a program that interacts with an application implementing introspection to determine what option syntax variation the application supports. One method to accomplish that is to execute the application with one of the two syntax styles and use the error indication to conclude which syntax works. Another is to have a priori knowledge of which syntax variation works.

8.4.1.4. Information Option [intspct.opt.info]

This option shall be supported.

std-info-opt

Outputs the version information of the capabilities supported by the application. The option is specified as --std-info or -std-info. The option can be specified zero or one time. The application shall support the option for minimum level (intspct.min) functionality.

8.4.1.5. Information Output Option [intspct.opt.out]

This option shall be supported.

std-info-out-opt file

The pathname of a file to output the information to. The option is specified as --std-info-out=file or -std-info-out:file. If file is ‘-’, the standard output shall be used. The application shall support the option for minimum level (intspct.min) functionality. Not specifying this option while specifying the std-info-opt option (intspct.opt.info) shall be equivalent to also specifying a std-info-out-opt file option where file is ‘-’.

8.4.1.6. Declaration Option [intspct.opt.decl]

This option should be supported.

std-info-opt declaration

Declares the required capability version of the application. The option is specified as --std-info=declaration or -std-info:declaration. The option can be specified any number of times. The application shall support the option for full level (intspct.full) functionality.

8.4.1.7. Output [intspct.output]

An application shall output a valid JSON text file that conforms to the introspection schema (intspct.schema) to the file specified in the options (intspct.opt.out).

8.4.1.8. Files [intspct.file]

An application can provide an introspection file that contains valid JSON that conforms to the introspection schema (intspct.schema).

An introspection file shall contain the same information as that produced from the std-info-opt information option (intspct.opt.info).

An introspection file shall be named the same as the application with any filename extension replaced with the stdinfo filename extension. It is implementation defined how the filename of the introspection file replaces the application filename extension with the new stdinfo filename extension.

NOTE 1: For Windows, POSIX, and other platforms replacing the filename extension would remove any filename bytes after the last period (U+002E FULL STOP) and append the stdinfo sequence of bytes.

An introspection file shall either have the same parent directory as the application, have an implementation defined parent directory that is relative to the parent directory of the application, or have an implementation defined parent directory.

8.4.1.9. Schema [intspct.schema]

An introspection JSON text file shall contain one introspection JSON object (intspct.schema.obj).

8.4.1.9.1. Introspection Object [intspct.schema.obj]

The introspection object is the root JSON object of the introspection JSON text.

An introspection object can have the following fields.

8.4.1.9.2. JSON Schema Field [intspct.schema.schema]

Name: $schema
Type: string
Value: The value shall be a reference to a JSON Schema specification.
Description: An introspection object can contain this field. If an introspection object does not contain this field the value shall be a reference to the JSON Schema corresponding to the current edition of this standard.

8.4.1.9.3. Capability Field [intspct.schema.cap]

Name: capability-identifier (intspct.cap)
Type: string or array
Value (for string): The value shall be a version-number for minimum level functionality. Or the value shall be a version-range for full level functionality.
Value (for array): The value can be a JSON array for full level functionality. If the value is a JSON array the items in the array shall be a version-number or version-range. Description: An introspection object can contain this field one or more times. When the field appears more than one time the name of the fields shall be unique within the introspection object.

8.4.1.10. Capabilities [intspct.cap]
capability-identifier:

name scope-designator name sub-capability-identifieropt

sub-capability-identifier:

scope-designator name sub-capability-identifieropt

name:

one or more of:
U+0061 .. U+007A LATIN SMALL LETTER A .. Z
U+0030 .. u+0039 DIGIT ZERO .. NINE
U+005F LOW LINE

scope-designator:

U+003A COLON

A capability-identifier is composed of two or more scope-designator delimited name parts.

The name std in a capability-identifier is reserved for capabilities defined in this standard.

Applications can specify vendor designated name parts defined outside of this standard.

8.4.1.11. Versions [intspct.vers]

A version shall be either a single version number (intspct.vers.num) or a version range (intspct.vers.range).

A single version number shall be equivalent to the inclusive version range spanning solely that single version number.

NOTE 1: That is the version number i.j.k is equivalent to version range [i.j.k,i.j.k].

8.4.1.11.1. Version Number [intspct.vers.num]

A version number shall conform to the SemVer <version core> syntax.

A version number can be truncated to only <major> or <major>.<minor> syntax.

A version number composed of only <major> is equivalent to <major>.0.0.

A version number composed of only <major>.<minor> is equivalent to <major>.<minor>.0.

Version numbers define a total ordering where version number a is ordered before a version number b when a has a lower SemVer precedence than b.

8.4.1.11.2. Version Range [intspct.vers.range]
version-range:

version-range-min-bracket version-min-number version-range-max-partopt version-range-max-bracket

version-range-max-part:

U+002C COMMA version-max-number

version-min-number:

version-number

version-max-number:

version-number

version-range-min-bracket

one of: U+005B LEFT SQUARE BRACKET U+0028 LEFT PARENTHESIS

version-range-max-bracket

one of: U+005D RIGHT SQUARE BRACKET U+0029 RIGHT PARENTHESIS

A version range is composed of either one version number bracketed, or two version numbers separated by a U+002C COMMA and bracketed.

EXAMPLE 1: A version range with a single version number “[1.0.0]”.

EXAMPLE 2: A version range with two version numbers “[1.0.0,2.0.0]”.

A version range a that is [i,j] makes i and j inclusive version range numbers, defining a Mathematics closed interval.

A version range a that is (i,j) makes i and j exclusive version range numbers, defining a Mathematics open interval.

A version range a that is (i,j] makes i an exclusive version number and j an inclusive version number, defining a Mathematics half-open interval.

A version range a that is [i,j) makes j an exclusive version number and i an inclusive version number, defining a Mathematics half-open interval.

A version range with a single inclusive version number x is equivalent to the version range [x,x].

A version range with a single exclusive version number x is invalid.

An exclusive version number x does not include the version number x when compared to another version number y.

A version range a with version numbers i and j when compared to a version range b with version number m and n will result in an empty version range when: j < m or n < i.f

Otherwise if i or m are inclusive version numbers and if j or n are inclusive version numbers the resulting range when a is compare to b is the inclusive version numbers "lesser of i and m" and "lesser of j and n".

Otherwise if i or m are inclusive version numbers and if j or n are inclusive version numbers the resulting range when a is compare to b is the inclusive version number "lesser of i and m" and the exclusive version number "lesser of j and n".

Otherwise if j or n are inclusive version numbers the resulting range when a is compared to b is the exclusive version number "lesser of i and m" and the inclusive version number "lesser of j and n".

Otherwise the resulting range when a is compared to b is the exclusive version numbers "lesser of i and m" and "lesser of j and n".

8.4.1.12. Minimum Level [intspct.min]

An application that supports the minimum level functionality indicates it by specifying a single version ([intspct-vers-single]) as the value of the std.info capability (intspct.cap).

EXAMPLE 1:
{ "std.info": "1.0.0" }

8.4.1.13. Full Level [intspct.full]

An application can support the full level functionality as defined in this section. An application that reports supporting the full level functionality shall support all of the functionality in this section.

An application that supports the full level functionality indicates it by specifying a version range ([intspct-vers-single]) or an array of version range items as the value of the std.info capability (intspct.cap).

EXAMPLE 1:
{ "std.info": "[1.0.0]" }

An application that responds with an array of version range items as the value of a capability field shall support the union of the range items indicated.

8.4.1.14. Introspection Information [intspct.info]

An application shall output an introspection schema (intspct.schema) that contains one capability field for each capability that the application supports when given the --std-info option (intspct.opt.info).

An application shall indicate the single version (intspct.vers.num) or version range (intspct.vers.range) of each capability it supports as the value of the capability field.

8.4.1.15. Introspection Declaration [intspct.dcl]

An application that supports the full level functionality when given one or more std-info-opt declaration options shall conform its functionality to the indicated edition of this standard in the given declaration version-number for the given capability.

declaration:

capability-identifier U+003D EQUALS SIGN version-number

An application, when not given a std-info-opt declaration option for a capability it supports, should conform its functionality to the most recent version of the standard it supports for that capability.

An application, when given a capability declaration option and the given version is outside of the version range that the application supports, should indicate an error.

8.4.1.16. Compatability [intspct.compat]

An application shall indicate, per SemVer specification, that version n of the interface it implements is backward compatible with another version p of the interface that another application implements when the <major> number is the same in version n and p and version n follows version p.

8.5. JSON Schema

Insert clause before Bibliography.

8.5.1. Annex A (informative) Tool Introspection JSON Schema [intsjschm]

8.5.1.1. General [intsjschm.general]

This Annex defines the introspection capability schema (intspct.schema) in terms of a JSON Schema. A JSON Schema refers to the IETF RFC draft "JSON Schema: A Media Type for Describing JSON Documents" as specified in https://json-schema.org/draft/2020-12/json-schema-core.html.

This JSON Schema can be referenced as the $schema field with URI value of "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json".

8.5.1.2. JSON Schema Specification [intsjschm.spec]
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id":
    "https://raw.githubusercontent.com/cplusplus/ecosystem-is/release/schema/std_info-1.0.0.json",
  "title": "Tool Introspection Version 1.0.0 JSON Schema",
  "$defs": {
    "VersionMin": {
      "type": "string",
      "pattern": "^[0-9]+([.][0-9]+){0,2}$"
    },
    "VersionFull": {
      "type": "string",
      "pattern": "^[[(][0-9]+([.][0-9]+){0,2}[)\\]]$"
    },
    "VersionRange": {
      "type": "string",
      "pattern": "^[[(][0-9]+([.][0-9]+){0,2},[0-9]+([.][0-9]+){0,2}[)\\]]$"
    },
    "Version": {
      "oneOf": [
        {
          "$ref": "#/$defs/VersionMin"
        },
        {
          "$ref": "#/$defs/VersionFull"
        },
        {
          "$ref": "#/$defs/VersionRange"
        }
      ]
    },
    "Versions": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/Version"
      }
    },
    "VersionSpec": {
      "oneOf": [
        {
          "$ref": "#/$defs/Version"
        },
        {
          "$ref": "#/$defs/Versions"
        }
      ]
    }
  },
  "anyOf": [
    {
      "type": "object",
      "properties": {
        "$schema": {
          "description":
            "JSON Schema URI for the version of the tool introspection format.",
          "type": "string",
          "format": "uri"
        },
        "std.info": {
          "description": "The Tool Introspection format version.",
          "$ref": "#/$defs/VersionSpec"
        }
      },
      "patternProperties": {
        "^[a-z_]+([.][a-z_]+)+$": {
          "$ref": "#/$defs/VersionSpec"
        }
      },
      "additionalProperties": false
    }
  ],
  "required": [
    "std.info"
  ]
}

9. Examples

9.1. Portable Command Lines

Assuming that the Ecosystem IS specifies a common set of portable command line compiler options an interaction between a build system (or user at a command prompt) and a compiler could look like:

Build systems asks the compiler for supported capabilities:

$ c++ --std-info
{ "std.info": "[1]", "std.cli.c++", "[1]" }
The build system would likely want to cache that information as it’s likely to be static for the release of the compiler.

The build system could then declare and use any such portable compiler options:

$ c++ --std-info=std.cli.c++=1 -std=c++26 -I /home/user/boost -o myapp main.cpp

The interaction when the compiler tool only supports the minimum level would be:

$ c++ --std-info
{ "std.info": "1", "std.cli.c++", "1" }
$ c++ -std=c++26 -I /home/user/boost -o myapp main.cpp

This example predicts that it might be useful outside of the C++ ecosystem by using std.cli.c++ to indicate a C++ specific command line. It could be that Fortran uses a different, but possibly overlapping CLI:

$ gcc --std-info
{ "std.info": "1", "std.cli.c++", "1", "gcc.cli.fortran", "1" }
$ gcc -std=f2018 -o myapp main.fpp

10. Acknowledgements

Thanks to Jens Maurer for some initial wording review.

Thanks to Charles-Henri Gros for feedback on version ranges.

Thanks to Fangrui Song for feedback on command line option syntax and assessment of changes to clang and for the applications that use getlong_opt.

Thanks to Jason Merrill and Fangrui Song for assessment of changes to GCC.

Thanks to Olga Arkhipova for suggesting the use of files as the user interface.

11. License

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.


1. C++ Ecosystem International Standard (https://wg21.link/P2656)
2. ECMAScript® 2022 language specification, 13th edition, June 2022 (https://www.ecma-international.org/publications-and-standards/standards/ecma-262/)
3. Semantic Versioning 2.0.0 (https://semver.org/spec/v2.0.0.html)
4. Wikipedia: Interval (mathematics) (https://en.wikipedia.org/wiki/Interval_(mathematics))
5. ISO/IEC 21778:2017 Information technology — The JSON data interchange syntax, (https://www.iso.org/standard/71616.html)
6. JSON Schema: A Media Type for Describing JSON Documents (http://json-schema.org/latest/json-schema-core.html)
7. Working Draft, C++ Ecosystem International Standard 2023-04-01 (https://github.com/cplusplus/ecosystem-is/tree/5c7ecf79235488bb9aa05505cbfe01ff2b9281e0)