Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fs.path.generic.obs] and [fs.path.modifiers]p2, wording of make_preferred and generic_* is ambiguous #5473

Open
strega-nil opened this issue May 17, 2022 · 3 comments

Comments

@strega-nil
Copy link

strega-nil commented May 17, 2022

See STL issue microsoft/STL#2082.

make_preferred

The wording of make_preferred is:

Effects: Each directory-separator of the pathname in the generic format is converted to preferred-separator.

And the definition of directory-separator is:

directory-separator:

  • preferred-separator _directory-separator_ₒₚₜ
  • fallback-separator _directory-separator_ₒₚₜ

in other words, / and \ are both directory separators on Windows, but so are //, ///, ////, \\\\.

Thus, an implementation, given path p{"a//b"}; p.make_preferred(); could either say:

  • / is a directory-separator, and / is a directory-separator, so replace each of them with \ -> p.native() == LR"(a\\b)"
  • // is a directory-separator, so replace it with \ -> p.native() == LR"(a\b)"

All implementations currently take the first approach, including MSVC STL, libstdc++, libc++, and boost::filesystem.

generic_string (and other generic_* functions)

The wording of the generic format observer functions is:

Generic format observer functions return strings formatted according to the generic pathname format. A single slash ('/') character is used as the directory-separator.

(emphasis mine)

This implies to me that path{"a//b"}.generic_string() == LR("a/b"), since // is considered "one directory-separator" for purposes of the generic format. All implementations, however, just replace every singular preferred-separator with a fallback-separator if it's defined, otherwise they return the same thing as p.string().

@CaseyCarter
Copy link
Contributor

@jwakely Would you agree that changing this wording to clearly require the behavior exhibited by all surveyed implementations is editorial, or do you think LWG needs to discuss the question?

@jensmaurer
Copy link
Member

I take it that, on Windows, the directory-separator \\\ is functionally equivalent to \. Do we have any facility in the filesystem library that would strip all the extra \s, to get me a short, nice-looking string?

Beyond that, this issue feels like it's possibly taking away implementation freedom, so handling this via an LWG issue appears to be preferable. Unless there is additional context elsewhere (e.g. examples) that makes the proposed interpretation the only plausible one.

@jwakely
Copy link
Member

jwakely commented May 18, 2022

For make_preferred I think this needs an LWG issue, I don't think this is the only plausible interpretation, and I have more questions.

If we did want to change the make_preferred spec as suggested, we should change the example to show a path like foo/bar//baz so we show what happens with multiple successive directory-separator characters.

The wording for make_preferred says "each directory-separator character" which is the same as [fs.path.generic] p6 step 3, which even has a note clarifying that multiple successive '/' or preferred-separator characters count as a single directory-separator. That contradicts the suggested interpretation for make_preferred.

What should path("//foo/bar").make_preferred() do? Depending on the OS, that either has two directory-separator characters at the start, or //foo is a root-name. Should / characters in a root-name be replaced with \\ ? The normalization algo in [fs.path.generic] says yes, but technically those aren't directory-separator characters, so make_preferred shouldn't change them.

If the intent for make_preferred is just something like: if constexpr (is_same_v<char_type, wchar_t>) ranges::replace(native(), L'/', preferred_separator) we should say so more clearly.

For generic_string I disagree with the description of the issue:

This implies to me that path{"a//b"}.generic_string() == LR("a/b"), since // is considered "one directory-separator" for purposes of the generic format.

I agree (except that generic_string() returns a narrow string, not a wide one).

All implementations, however, just replace every singular preferred-separator with a fallback-separator if it's defined, otherwise they return the same thing as p.string().

That's not what libstdc++ does on POSIX or Windows. It produces "a/b" in all cases. The return value is built up in stages by looking at each component of [path.begin(),path.end()) and adding explicit '/' characters between them. The actual directory separators in the native format are never examined, so there is no way for // to be preserved in the output.

The code to do that is in <bits/fs_path.h>.

I agree that Boost returns "a//b" here, but that seems like a bug. I checked the libc++ code and it is definitely wrong, that's a bug too. I think the spec is clear, but everybody got it wrong except me 😜

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants