Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[text] Create "Text processing library" clause #5226

Open
jensmaurer opened this issue Jan 21, 2022 · 10 comments
Open

[text] Create "Text processing library" clause #5226

jensmaurer opened this issue Jan 21, 2022 · 10 comments

Comments

@jensmaurer
Copy link
Member

jensmaurer commented Jan 21, 2022

This issue is further to #5124.

The proposal is to create a top-level clause entitled "Text processing library" [text] in the Working Draft at the current location of [localization] that contains the following, in order:

  • [charconv]
  • [localization]
  • [format]
  • [re]
  • C library
    • [cctype.syn]
    • [cwctype.syn]
    • [cwchar.syn]
    • [cuchar.syn]
    • [c.mb.wcs]

About 90 pages in total.

@jensmaurer jensmaurer changed the title Create "Text processing library" clause [text] Create "Text processing library" clause Jan 24, 2022
@jensmaurer jensmaurer pinned this issue Feb 6, 2022
@jensmaurer jensmaurer added this to the C++23 milestone Feb 6, 2022
@Mick235711
Copy link
Contributor

Mick235711 commented Feb 10, 2022

After this change, [strings] will only have ~40 pages and be the third-to-smallest library clause (just above [concepts], [diagnostics]). Did string classes really fit in text formatting? At least they are tightly related.

I think probably either merge [strings] with [text] or at least move these two clauses together (I lean towards not merging but adjacent, since strings are fairly self-contained). Currently purposed [localization] position seems to be too far from [strings], I'd say... (But just personal opinion)

@jwakely
Copy link
Member

jwakely commented Feb 10, 2022

Did string classes really fit in text formatting?

No. They are containers of characters, not necessarily text. I can have a std::string containing invalid UTF-8, for example.

@tahonermann
Copy link
Contributor

No. They are containers of characters, not necessarily text. I can have a std::string containing invalid UTF-8, for example.

I don't find that a persuasive argument for separating strings and text. The string related features are clearly intended to hold and work with text despite the lack of invariants to ensure well-formedness with respect to any particular encoding. I imagine that if we introduce additional text related containers in the future that do have strong encoding associations, they will likewise eschew enforcement of well-formedness due to performance considerations. In those cases, we'll probably relegate violations to library UB. Error handling (or lack there of) doesn't strike me as a good basis for library separation.

That being said, I don't have strong opinions regarding this organization so long as it continues not to impact header or module naming.

@jensmaurer jensmaurer modified the milestones: C++23, C++26 Feb 23, 2022
@jensmaurer
Copy link
Member Author

Postponed to C++26.

@jensmaurer jensmaurer unpinned this issue Feb 23, 2022
@tkoeppe
Copy link
Contributor

tkoeppe commented Feb 23, 2022

Given that we'll only need to send the new draft in the March mailing, I wouldn't be entirely opposed to still doing this now, but we should feel unreservedly that we're making an improvement, without any caveats or regrets.

I'd be happy to hear alternative suggestions (e.g. how about merging strings and text), and also positions on the status quo (@tahonermann?).

@jensmaurer
Copy link
Member Author

I'm ok with moving all of [strings] (~50 pages) into [text], with the idea that they are intended to represent text. This gets the total to ~140 pages, with plans to grow further (e.g. regex v2, encoding conversion facilities, possibly more from the scope of ICU).

However, I do like the general idea of having [text] cover everything text-related, even if that means we're heading for a fairly large clause.

We had suggestions to make [filesystem] top-level; this appears to be a reasonable idea, too, but seems not really urgent. Maybe future network facilities also fit under an input/output umbrella, or at least fit together with [filesystem] into a fresh clause.

@cor3ntin
Copy link
Contributor

I really like this direction. But I agree with @jwakely. strings can remain their own section, that wouldn't be terrible. Otherwise they could fit in containers, but they are certainly orthogonal to unicode / text

@tkoeppe
Copy link
Contributor

tkoeppe commented Jul 14, 2022

Maybe we can talk about this again at a future editorial meeting. Last time I asked there wasn't a lot of interest, but we can certainly try this again for 26.

@AlisdairM
Copy link
Contributor

Another vote to move [strings] into [text] if we go in this direction, that the vocabulary type for much of [text] would be defined in [strings]. Also, basic_string is no longer the only container not defined in the [containers] clause, so it would no longer be surprising to find a container elsewhere.

Failing that, I would hope to at least see [strings] and [text] as adjacent clauses in any such reorganization.

Agree with moving [filesystem] to a top level clause as part of such a restructuring.

@cor3ntin
Copy link
Contributor

@tkoeppe Following discussion in Kona, [text.encoding] should also move there, But the rest of the organization outlined by Jens still looks good to me me.
Let me know how we can move forward with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

7 participants