Prerequisite packages to install ================================ - dev-vcs/cvs - dev-vcs/cvs-fast-export - dev-vcs/git - dev-libs/libxslt (for userinfo.xml conversion) Create the author map ===================== Extract userinfo.xml from LDAP on dev.gentoo.org:: $ perl_ldap -U Create authormap.txt from userinfo.xml:: $ ./make-authormap.sh >authormap.txt Fetch and unpack the CVS repository =================================== Fetch a copy of the archived gentoo-x86 CVS repository from: https://projects.gentoo.org/vcs-history/gentoo-x86.tar.gz Run cvs-fast-export =================== :: $ cd var/cvsroot/gentoo-x86 $ find . | cvs-fast-export -A /path/to/authormap.txt -l /path/to/gentoo-x86-export.log -p >/path/to/gentoo-x86-export.out This will run for some time (8 hours on i7-8700), mostly as a single thread, and produce a 21 GiB output file. The CVS repository contains a package app-backup/Attic, which confuses cvs-fast-export: "Files in CVS Attic and RCS directories are treated as though the 'Attic/' or 'RCS/' portion of the path were absent." This can be seen in the output file (note that the ``Attic`` path component is missing):: ---------------------------------------------------------------------- commit refs/heads/master mark :5149424 committer Hanno Böck 1431281161 +0000 data 118 Initial commit of Attic (Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42) from :5149420 M 100644 :5149421 app-backup/Attic-0.15.ebuild M 100644 :5149422 app-backup/ChangeLog M 100644 :5149423 app-backup/metadata.xml ---------------------------------------------------------------------- ---------------------------------------------------------------------- commit refs/heads/master mark :5149426 committer Hanno Böck 1431281167 +0000 data 118 Initial commit of Attic (Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42) from :5149424 M 100644 :5149425 app-backup/Manifest ---------------------------------------------------------------------- This is fixed by an additional sed filter in the following step. Import into Git =============== :: $ mkdir gentoo-x86-git $ cd gentoo-x86-git $ git init $ LC_ALL=C sed '/^Initial commit of Attic$/,/^M [0-7]\{6\} .* app-backup\/Manifest/{s:^\(M [0-7]\{6\} .* app-backup/\)\(.*\):\1Attic/\2:}' \ /path/to/gentoo-x86-export.out | git fast-import Differences to the old conversion ================================= - cvs-fast-export(1) says: "A set of file operations is coalesced into a changeset if either (a) they all share the same commitid, or (b) all have no commitid but identical change comments, authors, and modification dates within the window defined by the time-fuzz parameter." For our case this means that for commits after 2006-03-04T10:23:03Z (commit 531f1a00a131) the commitid has been used to group them together, while earlier ones have been grouped by authors and commit messages, within a 5 minutes time window (which is the default for the fuzz parameter). This results in a total of 1688447 commits in the master branch, while the old conversion has only 788893 commits. Most of the difference can be explained by the fact that ``repoman commit`` actually did two CVS commits, the second one for the Manifest to catch up with the updated $Header$ keywords. Since this reflects the actual workflow, no attempts have been made to squash these pairs of commits. - The new conversion has a complete author map, previously users cbrannon, jerrya, luke-jr, and uid2214 (darkside) were missing. - Commit messages have been left alone. For example, no conversion to Git footer lines has taken place. Conversion of character sets wasn't attempted either. (There are 310 commit messages with non-UTF-8 characters. About 80% of them appear to be latin-1, but the rest is something else, or just contains some garbage characters.) - Category app-backup is now there. - File sci-libs/qfits/Manifest in HEAD differs. The new conversion agrees with the last CVS checkout. - The new conversion has a .gitignore file in its top-level directory. Also metadata/.cvsignore was renamed to metadata/.gitignore (cvs-fast-export does this automatically). - Output of ``diff -qr --exclude=.git`` between tips of old and new repo:: Only in gentoo-x86-git: .gitignore Only in gentoo-x86-git: app-backup Files historical/header.txt and gentoo-x86-git/header.txt differ Only in historical/metadata: .cvsignore Only in gentoo-x86-git/metadata: .gitignore Files historical/sci-libs/qfits/Manifest and gentoo-x86-git/sci-libs/qfits/Manifest differ Notes ===== Keyword expansion ----------------- Although the man page of cvs-fast-export (version 1.57) says that the program "does the equivalent of cvs -kb when checking out masters, not performing any $-keyword expansion at all", it actually does expand $-keywords. For the tip of the trunk, expanded keywords appear to be correct, as can be verified with Manifest checksums. This is not always true earlier in history. For example, the CVS repository was located in /home/cvsroot and moved to /var/cvsroot later (``$Header$`` lines suggest that this move happened in early 2004). Also it is known that some files were moved in the raw repository. Expanded keywords from before such a move won't match. Branch points ------------- cvs-fast-export-1.57 gets confused about branch points, if a file doesn't have any commits on the trunk that are newer than those on the branch. This triggers some warnings during conversion:: cvs-fast-export: warning - non-vendor ./app-admin/analog/files/analog.cfg,v branch RELEASE-1_4 has no parent [and many more of the same type] cvs-fast-export: warning - branch point import-1.1.1 -> master later than branch cvs-fast-export: trunk(85563): 2005-11-30T09:36:17Z en.txt 1.1 cvs-fast-export: branch(85563): 2005-11-30T09:38:30Z app-accessibility/SphinxTrain/files/digest-SphinxTrain-0.9.1-r1 1.1 It also results in commits from the branch showing up in the converted Git master branch. The problem has been `reported upstream`__. For the time being, this is worked around by adding an extra commit to the trunk (and removing it from the converted repository later):: $ export CVSROOT=/var/cvsroot $ cvs checkout gentoo-x86 $ cd gentoo-x86 $ for file in $(find . -type d -name CVS -prune -o -type f -print); do echo >>${file}; done $ cvs commit -m "extra commit in trunk" __ https://gitlab.com/esr/cvs-fast-export/-/issues/57 Missing app-games category -------------------------- It is known that some files and directories have been moved, copied or even deleted in the (server-side) RCS directory. This was advocated__ as late as 2005. For example, the whole ``app-games`` category was deleted__ server-side at some time in late 2003 or early 2004, after its packages had been moved to ``games-*`` categories. Obviously, the history of these files is lost and there is no way for the conversion to recover it. __ https://archives.gentoo.org/gentoo-dev/message/029e91bdc515ddc5ae205b4694e00e91 __ https://archives.gentoo.org/gentoo-dev/message/ad7fa1ecae70e59d43ac70548076afcd .. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. https://creativecommons.org/licenses/by-sa/4.0/ .. Local Variables: .. mode: rst .. indent-tabs-mode: nil .. End: