Your message dated Fri, 06 Oct 2023 20:47:23 +0100 with message-id <87fs2nbkj8.fsf@melete.silentflame.com> and subject line Re: Bug#1052697: tech-ctte: proposed change to apt-listchanges algorithm needs expert consideration has caused the Debian Bug report #1052697, regarding tech-ctte: proposed change to apt-listchanges algorithm needs expert consideration to be marked as done. This means that you claim that the problem has been dealt with. If this is not the case it is now your responsibility to reopen the Bug report if necessary, and/or fix the problem forthwith. (NB: If you are a system administrator and have no idea what this message is talking about, this may indicate a serious mail system misconfiguration somewhere. Please contact owner@bugs.debian.org immediately.) -- 1052697: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052697 Debian Bug Tracking System Contact owner@bugs.debian.org with problems
--- Begin Message ---
- To: Debian Bug Tracking System <submit@bugs.debian.org>
- Subject: tech-ctte: proposed change to apt-listchanges algorithm needs expert consideration
- From: Jonathan Kamens <jik@kamens.us>
- Date: Tue, 26 Sep 2023 06:11:01 -0400
- Message-id: <169572306158.817419.18156493633538125418.reportbug@jik-x1>
Package: tech-ctte Severity: normal Hello all, I am hoping that this is an acceptable request to make of the technical committee, for reasons which I explain below, but if not, then I apologize for bothering you and you should kindly tell me to buzz off. Thanks. I'm in the process of adopting apt-listchanges, and as part of doing that I am reviewing all the outstanding bugs. Bug #748631 is particularly concerning to me, because it involves apt-listchanges failing to display changelog entries that it should have. That bug prompted me to do a deep dive into the algorithm that apt-listchanges uses to determine which changelog and NEWS entries to display to the user. After doing that, I don't tihnk the current algorithm is accurate enough, and I want to revamp it. I have put a lot of thought into what to replace it with and come up with a plan that I think is solid, but I've been in this business for long enough to know that I certainly may be missing something important. I believe -- and maybe I'm wrong about this! -- that it's important enough to accurately tell users what's changing during upgrades that significantly changing the apt-listchanges algorithm is not something I should do unilaterally, hence this request for y'all to review my plan and let me know if I'm missing anything or if you think I'm entirely off-base. If there's a group I should take this to instead of the Technical Committee, please let me know; I'm still learning my way around here. My proposal is appended below. I wrote this from the point of view of a fait accompli, i.e., as if the change has already been implemented, because if I do end up implementing this, then I intend to commit the proposal into the source tree as a design document to help whoever comes after me understand why the program works the way it does. Please let me know your thoughts. Thank you, Jonathan Kamens --- # Determining which entries the user has already seen ## Historical perspective Earlier versions of this program used the following approach to determine which changlog or NEWS entries (hereafter "entries") are new and should be displayed to the user: - Group packages by source package. - Keep track of the highest version number of any of the packages in the group and use that as the threshold for identifying new entries. - Display any entries with version numbers not less than the previously determined version number. This approach was based on two assumptions, neither of which is always true: 1. Assume that the version numbering for all packages that come from the same source package are in the same series. 2. Assume that the version numbering of entries always matches the aforementioned version numbering. For an example of where these assumptions break down, look at the dmsetup package: - The source package for dmsetup is lvm2. - The version number for the dmsetup package is lower, but with a higher epoch than, the version number for the lvm2 package. - The entries in changelog.Debian.gz use lvm2 version numbers, while the ones in changelog.Debian.devmapper.gz use dmsetup version numbers. This approach was also limited in that it only looked at NEWS.Debian[.gz], changelog.Debian[.gz], changlog.Debian.arch[.gz], and changelog[.gz]. For an example of where this fails, again look at dmsetup, which has changelog.Debian.devmapper.gz. Another technique used in earlier versions of this program was to attempt heuristically to ignore version number suffixes which should not be considered when evaluating whether a particular entry was new. The employed heuristics were brittle, potentially leading to missed entries or entries displayed multiple times. ## Current approach The current approach abandons the dependency on version numbers and relies instead on entry checksums. The program maintains a persistent database of previously seen changelog entries containing the following data: - The checksum of the most recently seen entry in each changelog file, including the special file "/network/[package]" for changelog entries fetched over the network for the package named [package]. - The checksums of all entries seen in the past three years, _with the header line of each entry removed_, for each source package. We index content by source package because the same changelog entries frequently appear in multiple binary packages built from the same source package, and we only want the user to see those once. We remove the header line of each entry in the second set of checksums because sometimes a package version uploaded to stable and a different version uploaded to unstable use different header lines for the same changelog entry. Given this stored data, the filtering algorithm is simple: Ignore any entry whose content checksum is in the database, and stop reading a file when the checksum for the most recently seen entry is reached. The database used by the current approach is significantly larger than the database required for the historical approach -- a few megabytes vs. a few kilobytes -- but it is still relatively mall and we consider this an acceptable amount of space to use for a significantly better-performing algorithm. Because this approach uses entry checksums, it is able to include entries from files like changelog.Debian.devmapper that the historical approach ignored. ### Edge case: no database, or no data for a file in the database When the persistent database is not being used in a particular invocation of the program, or when there is no data for a particular file in the database, then the above approach requires modification. In this case, we read and calculate checksums for the same path on disk to seed the database before we parse the file in the package. ### Edge case: no database, changelog data from network When the persistent database is not being used in a particular invocation of the program, and the changelog data for a package is being fetched over the network because it is not present in the package, there is no reliable way to determine which changelog entries have been displayed already, so the program displays all of them. This is sufficiently rare, both because the program is usually used with a persistent database and because there are relatively few packages without embedded changelogs, that it is considered an acceptable performance degradation to exchange for better overall performance. It is also preferable to the historical approach because it errs by displaying extra information to the user rather than by failing to display data that it should have.
--- End Message ---
--- Begin Message ---
- To: Jonathan Kamens <jik@kamens.us>
- Cc: 1052697-close@bugs.debian.org
- Subject: Re: Bug#1052697: tech-ctte: proposed change to apt-listchanges algorithm needs expert consideration
- From: Sean Whitton <spwhitton@spwhitton.name>
- Date: Fri, 06 Oct 2023 20:47:23 +0100
- Message-id: <87fs2nbkj8.fsf@melete.silentflame.com>
- In-reply-to: <169572306158.817419.18156493633538125418.reportbug@jik-x1> (Jonathan Kamens's message of "Tue, 26 Sep 2023 06:11:01 -0400")
- References: <169572306158.817419.18156493633538125418.reportbug@jik-x1>
Hello, On Tue 26 Sep 2023 at 06:11am -04, Jonathan Kamens wrote: > I am hoping that this is an acceptable request to make of the > technical committee, for reasons which I explain below, but if not, > then I apologize for bothering you and you should kindly tell me to > buzz off. Thanks. I'm going to close the bug because we wouldn't want to apply ourselves to it as a whole committee before the topic is first discussed in a venue like debian-devel@lists.debian.org. It's fine to continue discussion here on the bug despite it being closed, however, but you might move it to d-devel, with reference to this bug. Thank you for these efforts. -- Sean WhittonAttachment: signature.asc
Description: PGP signature
--- End Message ---