[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How could you load only once a Linux ultility without a batch --input-files kind of option and repeatedly use it on many files? . . .



On 2020-05-21 16:57, davidson wrote:
On Thu, 21 May 2020, David Christensen wrote:

On 2020-05-21 08:52, davidson wrote:
On Thu, 14 May 2020, Albretch Mueller wrote:

The thing is that I have to call, say sha256sum, on millions of files

Probably debian admin people dealing with packaging have to deal with
the same kinds of issues.

For checksums, mtree(8) from package mtree-netbsd might be worth a look.

Been there, done that; I do not recommend it:

   https://lists.debian.org/debian-user/2020/01/msg00488.html

The thread you refer to reports problems with the mtree-à-la-FreeBSD
("fmtree(8)" [1]) in debian package freebsd-buildutils.

mtree-netbsd is a different debian package, providing
mtree-à-la-NetBSD ("mtree(8)" [2]). It does not seem to suffer from
the deficiency you encountered with fmtree.

1. https://manpages.debian.org/buster/freebsd-buildutils/fmtree.8.en.html
2. https://manpages.debian.org/buster/mtree-netbsd/mtree.8.en.html

Thanks for the tip.


[2] is older than [1].  Both are older than the version on FreeBSD:

    https://www.freebsd.org/cgi/man.cgi?mtree(8)


I cannot remember if I found them both when I went looking for mtree(8) on Debian, but I would have picked the newer of the two.


I was trying to validate migration of ~0.9 TB of content from a Debian Samba server to a FreeBSD Samba server. I know I failed. I seem to recall it was due to a missing feature in the Debian version of mtree(8).


Also, I do not believe the input/ output format of mtree(8) is compatible with the I/O format of sha256sum(1). Using mtree(8) output as sha256sum(1) input, or vice-versa, requires a translation command or script.


I do seem to recall writing a Perl script to parse mtree(8) output. The mtree(8) convert option '-C' was the key. The Debian version I tried lacked it. The other version seems to have it. So, maybe...


I think the simplest answer on Debian is to use find(1), xargs(1) (with the -P option), and sha256sum(1) to generate an SHA256SUMS file.


However, before I learned of mtree(8), I wrote a Perl script to perform essentially the same function -- compare metadata and checksums of two directory trees, or the same tree at two different points in time. I soon discovered how wasteful it is to recompute checksums for 0.9 TB of files (hours) when only a tiny fraction have changed (seconds or minutes). So, I added an update feature to the Perl script. This made the script far more efficient, and therefore usable. AFAIK no version of mtree(8) has this feature. A find(1), xargs(1), and sha256sum(1) pipeline would also lack this feature, and an SHA256SUMS file lacks the metadata fields required to implement it.


David


Reply to: