Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[styles] Improve automatic page breaking #1764

Merged
merged 3 commits into from Oct 16, 2017
Merged

Conversation

jensmaurer
Copy link
Member

Strongly discourage breaking between itemdecl and itemdescr.
Mildly discourage breaking before an itemized list.
Add stretchable vertical space between paragraphs and before listings.
Allow substantial whitespace at the end of a page.

Fixes #1752.

@@ -58,3 +58,6 @@

% Leave more room for section numbers in TOC
\cftsetindents{section}{1.5em}{3.0em}

% Allow for substantial empty space at the bottom of a page.
\addtolength{\topskip}{0pt plus 25pt}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we perhaps look for a memoir way of doing this, rather than modifying the lengths directly? I worry about hard-to-trace interactions if we go behind memoir's back like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the memoir documentation, section 2 "Laying out the page", and some of the calculations do consider \topskip (called "t" there), but there is no other explicit mention of \topskip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be a memoir way of doing this, and we need it when the page otherwise has no stretchable vertical space.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section 3.5 of the memoir documentation might be helpful (e.g., the memoir-specific \sloppybottom macro and discussion surrounding it).

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 9, 2017

Hm, this doesn't yet look entirely robust to me. For example, consider this change at the beginning of the Strings clause:

image

Now we have an orphaned heading.

@jensmaurer
Copy link
Member Author

@tkoeppe: Hm... That's a heading, a \libraryindex and a codeblock. Nothing that I directly touched, so this must be a fallout of the \topskip change. Plus, headings should really stick with whatever follows, so this is rather mysterious to me.

@godbyk , @tkoeppe: Does any of you know a way to show in the Latex log the result badness (= sum of penalties / demerits) of each page break? That would allow us to find out whether this particular page break is a case of "several infinitely bad options".

@jensmaurer
Copy link
Member Author

After some experimentation, that orphaned heading for [char.traits.specializations.char] is a myth to me. Going to as much as "\addtolength{\topskip}{0pt plus 35pt}" (note 35pt vs. 25pt) doesn't make that go away, although we should now have given enough allowance to move the heading to the next page.

@jensmaurer
Copy link
Member Author

This stuff in styles.tex looks suspicious:

\newcommand{\beforeskip}{-.7\onelineskip plus -1ex}
\newcommand{\afterskip}{.3\onelineskip minus .2ex}

"plus -1ex" seems broken; we're defining stretchable space with "plus" and I fully expect a positive number afterwards.

@jensmaurer
Copy link
Member Author

Ah, the negative numbers "prevent indentation of the line that follows". But we never indent anything (except explicitly via an environment), so this is totally pointless and confusing.

@zygoloid
Copy link
Member

zygoloid commented Oct 9, 2017

I suspect the negative sign there is a legacy from before we fully turned off memoir's "indent the first line of each paragraph" behavior. If there are no visual diffs, just remove it :)

@jensmaurer
Copy link
Member Author

Conclusion: [char.traits.specializations.char] will need a \newpage in the final preparations for C++20, unless we can find out why the page break happens there. Still much better than the tens of \newpages we needed for C++17.

@zygoloid
Copy link
Member

zygoloid commented Oct 9, 2017

While we're considering such issues, is there some way we could discourage linebreaks within a codeblock / itemdecl environment, except after semicolons or at blank lines? A lot of the C++17 vertical whitespace fixes I applied were fixing that kind of issue.

@tkoeppe
Copy link
Contributor

tkoeppe commented Oct 10, 2017

@zygoloid: And I'd also want a "magic blank line" feature that then doesn't emit the blank line at the top of the next page! But that seems fairly advanced.

@jensmaurer
Copy link
Member Author

@zygoloid: "discourage linebreak"? I guess you mean "pagebreaks".

I have found zero mention of "page break" in the listings documentation. That said, it seems with my changes (in particular the added stretchability), we get far less / close to none pagebreaks within a short itemdecl anyway, which goes a long way.

@godbyk
Copy link
Contributor

godbyk commented Oct 10, 2017

@jensmaurer You can use \tracingpages=1 to enable output of the penalties involved in the page-breaking algorithms. Here's a description of the output from The TeXBook:

Fortunately, there is a convenient way to watch it; you can set \tracingpages=1, thereby instructing TeX to put its page-cost calculations into your log file. For example, here is what appeared on the log file when the author used \tracingpages=1 at the beginning of the present chapter:

%% goal height=528.0, max depth=2.2
% t=10.0 g=528.0 b=10000 p=150 c=100000#
% t=22.0 g=528.0 b=10000 p=0 c=100000#
% t=34.0 g=528.0 b=10000 p=0 c=100000#
⋮ (25 similar lines are being omitted here)
% t=346.0 plus 2.0 g=528.0 b=10000 p=0 c=100000#
% t=358.0 plus 2.0 g=528.0 b=10000 p=150 c=100000#
% t=370.02223 plus 2.0 g=528.0 b=10000 p=-100 c=100000#
% t=398.0 plus 5.0 minus 2.0 g=528.0 b=10000 p=0 c=100000#
% t=409.0 plus 5.0 minus 2.0 g=528.0 b=10000 p=0 c=100000#
% t=420.0 plus 5.0 minus 2.0 g=528.0 b=10000 p=150 c=100000#
% t=431.0 plus 5.0 minus 2.0 g=528.0 b=10000 p=-100 c=100000#
% t=459.0 plus 8.0 minus 4.0 g=528.0 b=10000 p=0 c=100000#
% t=470.0 plus 8.0 minus 4.0 g=528.0 b=10000 p=0 c=100000#
% t=481.0 plus 8.0 minus 4.0 g=528.0 b=10000 p=0 c=100000#
% t=492.0 plus 8.0 minus 4.0 g=528.0 b=10000 p=0 c=100000#
% t=503.0 plus 8.0 minus 4.0 g=528.0 b=3049 p=0 c=3049#
% t=514.0 plus 8.0 minus 4.0 g=528.0 b=533 p=150 c=683#
% t=525.0 plus 8.0 minus 4.0 g=528.0 b=5 p=-100 c=-95#
% t=553.0 plus 11.0 minus 6.0 g=528.0 b=* p=0 c=*

This trace output is admittedly not “user-friendly” in appearance, but after all it comes from deep inside TeX‘s bowels where things have been reduced to numeric calculations. You can learn to read it with a little practice, but you won‘t need to do so very often unless you need to plunge into page-breaking for special applications. Here‘s what it means: The first line, which starts with ‘%%’, is written when the first box or insertion enters the current page list; it shows the “goal height” and the “max depth” that will be used for that page (namely, the current values of \vsize and \maxdepth). In the present manual [referring to The TeXBook —Kevin] we have \vsize=44pc and \maxdepth=2.2pt; dimensions in the log file are always displayed in points. The subsequent lines, which start with a single ‘%’, are written whenever a legal breakpoint is being moved from the list of recent contributions to the current page list. Every % line shows t, which is the total height so far if a page break were to occur, and g, which is the goal height; in this example g stays fixed at 528 pt, but g would have decreased if insertions such as footnotes had occurred on the page. The values of t are steadily increasing from 10 to 22 to 34, etc.; baselines are 12pt apart at the top of the page and 11pt apart at the bottom (where material is set in nine-point type). We are essentially seeing one % line per hbox of text being placed on the current page. However, the % lines are generated by the penalty or glue items that follow the hboxes, not by the boxes themselves. Each % line shows also the badness b, the penalty p, and the cost c associated with a breakpoint; if this cost is the best so far, it is marked with a ‘#’ sign, meaning that “this breakpoint will be used for the current page if nothing better comes along.” Notice that the first 40 or so breaks all have b = 10000, since they are so bad that TeX considers them indistinguishable; in such cases c = 100000, so TeX simply accumulates material until the page is full enough to have b < 10000. A penalty of 150 reflects the \clubpenalty or the \widowpenalty that was inserted as described in Chapter 14. The three lines that say p=-100 are the breakpoints between “dangerous bend” paragraphs; these came from \medbreak commands. The notation b=* and c=* on the final line means that b and c are infinite; the total height of 553 pt cannot be reduced to 528 pt by shrinking the available glue. Therefore the page is ejected at the best previous place, which turns out to be a pretty good break: b=5 and p=-100 yield a net cost of −95.

@jensmaurer
Copy link
Member Author

jensmaurer commented Oct 10, 2017

@godbyk: Thanks a lot, that's cool!

% t=634.67545 plus 56.60901 minus 13.72177 g=684.49998 b=68 p=0 c=68# (just before the heading; great place to break)
% t=669.07999 plus 57.60901 minus 14.58266 g=684.49998 b=2 p=-50 c=-48# (just after the heading
% t=681.88449 plus 58.60901 minus 14.58266 g=684.49998 b=0 p=0 c=0
% t=692.88449 plus 58.60901 minus 14.58266 g=684.49998 b=19 p=0 c=19
% t=703.88449 plus 58.60901 minus 14.58266 g=684.49998 b=* p=0 c=*

It appears that a "listings" environment has a page break penalty of -50 before and after it, which is (in principle) a good idea: Breaking before a listing is certainly better than breaking in the middle of it. However, that somehow seems to override / cancel the general setting "do not break after a heading".

@jensmaurer
Copy link
Member Author

Ok, I've now had a look at the source of the "listings" package. It has a non-configurable page break penalty of -50 before and after each (inline) listing environment. Which, absent other changes, totally runs counter to the idea of keeping an itemdecl together with the itemdescr, and keeping a heading with any listing that might immediately follow.

The correct approach is to make these penalties configurable (and/or conditionally absent), similar to "beginpenalty" / "endpenalty" in the enumitem package.

@jensmaurer
Copy link
Member Author

Update on progress: As an experiment, I've removed listings' \penalty-50, expecting that the "regular" penalties would kick in (i.e. plain page break opportunity with penalty=0). That nearly works as expected, except that there's a \nobreak somewhere in
\indexlibrary{\idxcode{bad_array_new_length}!constructor}
(gasp). And that \nobreak is, in this situation, in vertical mode and effectively prevents a page break at that point. Not good. Found via \tracingmacros=1.

@jensmaurer
Copy link
Member Author

jensmaurer commented Oct 10, 2017

Here is what happens:

  • A section heading sets \if@nobreak to \iftrue and sets \everypar to something that indicates an infinite \clubpenalty. Also, \everypar is defined to reset itself to empty and to set \if@nobreak back to \iffalse. This prevents a lone line of text after a heading for sure, because \everypar is executed for each paragraph (due to the reset, extra stuff is only effective for the first paragraph after a heading).
  • A listing immediately after a section heading seems to reset \everypar, but does not reset \if@nobreak, leaving the setting from the heading turned on indefinitely, although it was supposed to be once-only for the next paragraph (i.e. the first line of the listing). And when the index machinery (via \indexlibrary) asks about the setting some time later, it yields a vertical \nobreak in the middle of plain text. Presumably, the index package wants to retain \nobreak after section headings if people use index commands between the section heading and the first text paragraph of a section.

Now, I have to have an even deeper look at the listings package to find out where it kills \everypar without resetting \if@nobreak.

@jensmaurer
Copy link
Member Author

Ok, the listings package and \everypar need to get a divorce.

@jensmaurer
Copy link
Member Author

I've updated this. The bad macros from the listings package have been redefined / overridden in styles.tex.

@godbyk
Copy link
Contributor

godbyk commented Oct 10, 2017

@jensmaurer There are a bunch of hooks provided by the listings package that we can use. If you point me the places where you'd like to modify the \everypar stuff or other settings, I can see if there's a hook we can use to inject our own code there. The hooks are documented in the listings.dtx source—search for \subsection{Hooks}.

@zygoloid
Copy link
Member

I've been chatting to one of the maintainers of the listings package. One possible option would be to set the EveryPar hook to perform the actions that memoir's section command sets \everypar to do. It looks like that's this part:

    \@nobreakfalse
    \global\@noskipsectrue
    \everypar{%
      \if@noskipsec
        \global\@noskipsecfalse
       {\setbox\z@\lastbox}%
        \clubpenalty\@M
        \begingroup \@svsechd \endgroup
        \unskip
        \@tempskipa #1\relax
        \hskip -\@tempskipa
      \else
        \clubpenalty \@clubpenalty
        \everypar{}%
      \fi}%

@jensmaurer
Copy link
Member Author

@godbyk: I've looked at the hooks, and they don't seem to help avoiding / configuring \everypar or \par\penalty. Please see the large green diff in styles.tex for the specific changes vs. the stock package, each marked by a TeX comment.

To summarize the issue with \everypar: The section heading sets \everypar to "something", but the listings package cavalierly ignores the previous value of \everypar and sets its own value. That's not good. (LaTeX doesn't seem to have infrastructure for an extensible and flexible mechanism for \everypar that could be used by several clients independently. Poster-child global variable problem, I'd say.)

@jensmaurer
Copy link
Member Author

@zygoloid: Ok, that's one option. Except that there's only "add-to-hook", not "remove-from-hook" to perform the \everypar reset. (Nothing an additional level of indirection can't fix.) However, that approach means copying some details of "memoir" into our code, which is probably as bad as copying some details of "listings" into our code. And we need to massage "listings" even more to detect empty lines and semicolons and introduce an attractive page break there.

@jensmaurer
Copy link
Member Author

jensmaurer commented Oct 11, 2017

Here's the next update, with the following additional features:

  • In listings, represent empty lines as vertical space, causing them to vanish across a page break (should make @tkoeppe happy).
  • Inside a codeblock, encourage page breaks at empty lines.
  • Remove line counting from all listings: We never use linecounts, and the \refstepcounter macro creates a PDF node for every line of code. The size of the PDF shrinks from 6.3 MB to 5.5 MB for me.

@jensmaurer
Copy link
Member Author

jensmaurer commented Oct 12, 2017

And here's the final update, with the following additional features:

  • In listings, detect semicolons and closing braces and prefer breaking after these. Don't break before a lone closing brace in a line (should make @zygoloid happy).
    This has some good overall effects, because it keeps "struct X {", "namespace N {", and "private:" together with the stuff that follows. Also, I haven't yet found a header synopsis where a single declaration would be torn across pages.

At this time, we still prefer breaking inside an itemdecl (even when not after a semicolon) vs. breaking between the itemdecl and the itemdescr. (See [alg.remove] for an example.) We can fine-tune this by increasing the "midpenalty" in itemdecl.

@zygoloid
Copy link
Member

At this time, we still prefer breaking inside an itemdecl (even when not after a semicolon) vs. breaking between the itemdecl and the itemdescr.

If this means that we prefer breaking between two declarations in an itemdecl over breaking between itemdecl and itemdescr, then I think that's the right thing, even if in bad cases we end up breaking between decl and descr.

I don't expect this work to remove the need for manual vertical whitespace fixes for C++20, but it sounds like it should at least make them substantially easier, and make the working drafts more readable.

How confident are you that this change doesn't introduce significant formatting regressions (very ugly pagebreaks, text outside the bounds of the textblock, etc)? Given that we only have a couple of days before the mailing deadline, I'm trying to determine whether we should merge this now, or merge it after we cut the pre-meeting working draft.

@tkoeppe Are you OK with the approach in use here? I'm not happy with copy-pasting chunks of listings, but if only the alternative is copy-pasting chunks of memoir, I don't have a strong preference. (Ideally I'd like to minimize the amount of delta we introduce in this way, though.)

@jensmaurer Can you check what the licensing terms on the listings source are? If we're not permitted to copy and modify the code like this, we may need to get more creative...

@godbyk
Copy link
Contributor

godbyk commented Oct 12, 2017

@zygoloid Both listings and memoir are licensed under the LPPL and you can copy the code locally to modify it.

I haven't compared the copied code to the original yet, but if it's possible to patch the macros without explicitly overwriting them, that's sometimes a better option (as it allows for other packages or future updates of those packages to modify the code without us overwriting their modifications).

@jensmaurer
Copy link
Member Author

jensmaurer commented Oct 12, 2017

@zygoloid: Quote:

% The listings package and its drivers may be distributed and/or modified
% under the conditions of the LaTeX Project Public License, either version
% 1.3 of this license or (at your option) any later version.
% The latest version of this license is in
% http://www.latex-project.org/lppl.txt

Since we're not claiming to distribute a modified "listings" package (in fact, you need the original package for my modifications to work), I think we simply make a "Derivative Work" for our own purposes, which is fine.

@jensmaurer
Copy link
Member Author

@zygoloid: I did a "diffpdf" on the core part (few changes, all for the better --- good testbed for tweaking and finding+fixing a few bugs) and a "finger repeatedly hitting the page down key, eyes focused on the bottom end of the page" review of all 1200 or so substantive pages yesterday, and I noticed these things:

  • Orphaned headings before requirement tables in library section (pre-existing problem; maybe it pops up now because the pagination is different). Can be fixed using \needspace as a separate effort, if desired.
  • There's a noticeable amount of whitespace at the end of some pages, e.g. before [support.initlist.access], because we have a section heading, a one-line itemdecl, and a two-line itemdescr, which can't be broken anywhere (breaking after a heading ("orphaned heading") is never even considered; we seriously penalize breaking after an itemdecl; and breaking the two-line itemdescr would create a club and a widow). This seems "as expected", though.
  • The way we tend to break between an itemdescr and the following itemdecl is really nice and helps a lot in the library section.
  • I have not seen text running over the bottom or similar. We can certainly discuss about 10 individual page breaks where best to break if done manually, but that's better left to just before C++20 ships. Looking at the \newpage counts on the C++17 branch, we seem to get an order of magnitude better.

Regarding copy-pasting listings vs. memoir, the changes I did are such that there's no way around copy-pasting parts listings at this time. I've used the provided hooks as much as I could.

@jensmaurer
Copy link
Member Author

@zygoloid: In short, I'm in favor of merging before the mailing. We might consider sending an e-mail to -all so that people stay alert for bad pagination.

@jensmaurer
Copy link
Member Author

@godbyk: I agree with the general approach, but that utterly fails if there's a \par\penalty-50 in the package's macro that you definitely don't want to have (at least not after a section heading). I understand TeX's macro engine is quite powerful, so it might be possible to play advanced tricks with it to somehow excise that part of a particular predefined macro, but the result would likely be rather intractable and probably be as fragile regarding upstream changes.

@godbyk
Copy link
Contributor

godbyk commented Oct 12, 2017

@jensmaurer Yes, it's unfortunately not always possible. It's a known problem with LaTeX. Packages are always stomping on each other, redefining the same macros (sometimes politely, sometimes less so), and vying for control over what hooks are available.

One thing I would recommend is to use \CheckCommand to alert us if the macro we copied has changed from when we copied it (e.g., was updated when the package or class was upgraded or when a new package has been loaded that also modifies the macro). That we don't end up perpetuating bugs or preventing newer features of the package/class from working indefinitely.

@jensmaurer
Copy link
Member Author

@godbyk: I've contacted the "listings" maintainer; maybe the next version of the package already contains the features we need.

@zygoloid
Copy link
Member

I've done some review of the result of this change too, and I like it a lot.

In fact, I've only found one page break I don't like so far: in [util.sharedptr.shared.const]'s itemdecl between p8 and p9, we put the first three declarations on the bottom of one page (with a large section of empty whitespace below them), and put the fourth declaration and the itemdescr at the top of the next page. The net result looks pretty silly.

I think it would be preferable to keep the itemdecl together here, and break the 3-line p9 across the page. (Leaving an orphaned single line is bad, but the current formatting is worse.) I wonder if we can increase the penalty for breaking within an itemdecl to cause that to happen.

@zygoloid
Copy link
Member

@jensmaurer Can you apply @godbyk's suggestion of using \CheckCommand? Then I think this is good to merge, and we can discuss any further tweaks afterwards.

Add keywords beginpenalty, endpenality, midpenalty, emptylinepenalty,
semicolonpenalty.
Also avoid redefining \everypar, because it interferes with section
headings.
@jensmaurer
Copy link
Member Author

@zygoloid: Breaking a 3-line paragraph means working against a \clubpenalty = windowpenalty = 1000. The current listings midpenalty = 500 doesn't compete here, obviously.

@jensmaurer
Copy link
Member Author

@zygoloid: Instead of \CheckCommand (which requires lots of cut&paste), I'm now using \lst@CheckVersion and check that the listings package is at exactly v1.6. It's likely I need to adapt my changes in some way for the next revision of the listings package anyway.

 - Strongly discourage breaking between itemdecl and itemdescr.
 - Mildly discourage breaking inside an itemdecl.
 - Encourage breaking after a codeblock.
 - Encourage breaking at empty lines in a codeblock.
 - Encourage breaking after a semicolon or closing brace.
 - Mildly discourage breaking before an itemized list.
 - Allow substantial whitespace at the end of a page.
@zygoloid zygoloid merged commit 1eb7c1f into cplusplus:master Oct 16, 2017
@jensmaurer jensmaurer deleted the vlayout branch October 18, 2017 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants