Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid tracking PDF result files in the main github draft repository #3318

Open
jensmaurer opened this issue Oct 22, 2019 · 17 comments
Open

Avoid tracking PDF result files in the main github draft repository #3318

jensmaurer opened this issue Oct 22, 2019 · 17 comments
Labels
decision-required A decision of the editorial group (or the Project Editor) is required.

Comments

@jensmaurer
Copy link
Member

From #3313 :

Separately, we should consider not tracking all the papers in the primary git repository; source checkouts of the draft are huge and growing rapidly due to this.

@jensmaurer jensmaurer added the decision-required A decision of the editorial group (or the Project Editor) is required. label Oct 22, 2019
@jensmaurer
Copy link
Member Author

@nemequ said:

The "normal" flow would be to just keep the LaTeX in the repo and attach the PDFs (and Markdown, HTML, etc.) to the releases, preferably automatically generated and deployed by Travis.

@jensmaurer
Copy link
Member Author

Editorial meeting: The purpose of the git repository is for the convenience of the editors. The suggestions do not provide additional convenience. For copies of the resulting PDF draft documents, see the public committee mailings.

@jensmaurer jensmaurer removed the decision-required A decision of the editorial group (or the Project Editor) is required. label Nov 4, 2019
@tkoeppe
Copy link
Contributor

tkoeppe commented Nov 8, 2019

Editorial meeting on Friday: We should keep this open, and consider actually purging the PDFs from the repository. This might require something like git filter-branch. To be investigated.

@tkoeppe tkoeppe reopened this Nov 8, 2019
@tkoeppe
Copy link
Contributor

tkoeppe commented Nov 8, 2019

Editorial meeting cnt'd: If if we do a filter-branch, i.e. we rewrite the entire history, we must (in the sense of "shall") keep a record of all the old commit hashes and their corresponding new hashes. This is because we make reference to commit hashes in various places, such as commit messages and github issues. (We should also incrementally update any affected commit messages as we go along.)

@tkoeppe
Copy link
Contributor

tkoeppe commented Nov 8, 2019

A problem with rewriting history is that we have published commit hashes in N-numbered papers (as part of Edtior's Reports).

@tkoeppe
Copy link
Contributor

tkoeppe commented Nov 8, 2019

Idea: clone the repository into a frozen, archival location, so that the commit hashes remain in some sense valid.

zygoloid added a commit that referenced this issue Nov 8, 2019
@jwakely
Copy link
Member

jwakely commented Nov 9, 2019

Isn't a branch from current master enough to preserve the hashes?

@jensmaurer
Copy link
Member Author

@jwakely: Yes, but then the total repo size won't shrink (because the PDFs are still present on that other branch).

git filter-branch --index-filter 'git rm --cached --ignore-unmatch papers/*.pdf' filtered
removes the PDFs on a branch "filtered". Seems to work; preserves all merge-commit (which is something we might not want).

@zygoloid
Copy link
Member

I wonder if there's anything specific to github that we can do here to preserve the old hashes. Unlike with a stock git repository, git fetch from github does not fetch all history: in particular, it doesn't fetch pull requests. Perhaps we could preserve the history by tracking it as a pull request?

@jensmaurer
Copy link
Member Author

Usual pull requests are branches on a different repo clone (e.g. mine), so they don't show up for a "git fetch" on the main repo. For motions branches (which are, by convention, pushed to the main repo), I do see them when I do a "git fetch".

@zygoloid
Copy link
Member

zygoloid commented Dec 2, 2019

Usual pull requests are branches on a different repo clone (e.g. mine), so they don't show up for a "git fetch" on the main repo. For motions branches (which are, by convention, pushed to the main repo), I do see them when I do a "git fetch".

That's not the whole story. When you create a pull request, the commits of the pull request become visible in the destination repository too. For example, consider unmerged pull request #3496 (merging a commit from your clone); the head commit of that (0095e0b) is visible in the main repository too: 0095e0b

You can also see this from the command line: git fetch origin pull/3496/head will fetch commit 0095e0b from the cplusplus/draft repository, despite it never having been pushed to any branch there.

So the idea here would be that we'd fork the repository as-is, then create a pull request to merge that as-is copy back in, and that would "pin" all the historical commits to this repository, despite them no longer being in the repository that you see if you use git fetch.

@zygoloid
Copy link
Member

zygoloid commented Dec 2, 2019

In fact, it looks like the github model is that internally there is only one repository for an original repository plus all of its forks, irrespective of pull requests, but git fetch by default only sees the refs of the fork named in the specified remote. For example, consider b1b8d32, which was committed to my fork of cplusplus/draft, and has never been part of any pull request, and is not fetched by a git fetch of this repository. Nonetheless, it is visible in this repository via the github UI, and is available on the command line via git fetch origin b1b8d3296d04fb6e751db86e09e34e54a0c262f5.

If we can rely on that, then this becomes even easier: we just need to make sure that a fork of the old repository continues to exist somewhere on github.

@jensmaurer
Copy link
Member Author

Looks like https://github.com/newren/git-filter-repo/ is the tool for the job, including keeping old hash references valid and transparently rewriting commit messages.

@jensmaurer jensmaurer added the decision-required A decision of the editorial group (or the Project Editor) is required. label Feb 22, 2020
@jensmaurer
Copy link
Member Author

This looks pretty close to what we want to achieve:

git-filter-repo \
        --replace-refs update-and-add  \
        --invert-paths --path-glob "papers/n*.pdf" --path-glob "papers/N*.pdf" \
        --path-rename papers/N3338.html:papers/n3338.html \
        --path-rename papers/N3377.html:papers/n3377.html \
        --path-rename papers/N3486.html:papers/n3486.html \
        --path-rename papers/N3798.html:papers/n3798.html \
        --path-rename papers/N3938.html:papers/n3938.html

It prunes the repo from (currently) 432 MB to 48 MB, adds replace refs pointing from the old commit hashes to the new ones (so that references in editor's reports stay valid) and updates commit hashes in commit messages (e.g. "Revert abcdef").

@jensmaurer
Copy link
Member Author

jensmaurer commented May 28, 2021

Editorial meeting 2021-05-28: Create a clickable list with links to all working drafts we ever had.
Go with my suggestion from 2020-02-22.
Rename master to main after checking that pull requests are automatically retargeted.

@jensmaurer jensmaurer removed the decision-required A decision of the editorial group (or the Project Editor) is required. label May 28, 2021
@tkoeppe
Copy link
Contributor

tkoeppe commented May 29, 2021

@Eelis We will probably have to force-push the main branch to do the above filtering, and we'll rename the default branch. Please do let us know if that poses any problems for you!

@Eelis
Copy link
Contributor

Eelis commented May 29, 2021

@Eelis We will probably have to force-push the main branch to do the above filtering, and we'll rename the default branch. Please do let us know if that poses any problems for you!

No problems, do what needs to be done.

JohelEGP referenced this issue in hsutter/cppfront May 7, 2023
And check in a Windows x64 build of `cppfront.exe`
@jensmaurer jensmaurer added the decision-required A decision of the editorial group (or the Project Editor) is required. label Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision-required A decision of the editorial group (or the Project Editor) is required.
Projects
None yet
Development

No branches or pull requests

6 participants
@Eelis @zygoloid @jwakely @tkoeppe @jensmaurer and others