[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#1056344: libefivar1: sleeps for 10ms twice for every variable read



Control: tags -1 + upstream patch

A sequence of two simple patches fixes this, making the reads instant:
-- >8 --
$ time efibootmgr
BootCurrent: 0006
Timeout: 1 seconds
BootOrder: 0006,0005,0002,0003,0004
Boot0002* UEFI:CD/DVD Drive     BBS(129,,0x0)
Boot0003* UEFI:Removable Device BBS(130,,0x0)
Boot0004* UEFI:Network Device   BBS(131,,0x0)
Boot0005* Debian GNU/Linux trixie/sid with Linux 6.4.0-1-amd64
HD(1,GPT,2823eeb4-962f-cb4d-86ea-c0a4e460bc9c,0x800,0xfa000)/File(\KLAPKI\A2B398A10F5F4AF99258999E14093599\6.4.0-1-AMD64\VMLINUZ-6.4.0-1-AMD64)69006e0069007400720064003d005c006b006c00610070006b0069005c00610032006200330039003800610031003000660035006600340061006600390039003200350038003900390039006500310034003000390033003500390039005c0036002e0034002e0030002d0031002d0061006d006400360034005c0069006e0069007400720064002e0069006d0067002d0036002e0034002e0030002d0031002d0061006d00640036003400200072006f006f0074003d007a00660073003a004100550054004f00200072006500730075006d0065003d0050004100520054004c004100420045004c003d0072006f007a006200690061006e002d007300770061007000200071007500690065007400200069006f006d006d0075003d006f006e00
Boot0006* Debian GNU/Linux trixie/sid with Linux 6.4.0-1-amd64
HD(1,GPT,2823eeb4-962f-cb4d-86ea-c0a4e460bc9c,0x800,0xfa000)/File(\KLAPKI\A2B398A10F5F4AF99258999E14093599\6.4.0-1-AMD64\VMLINUZ-6.4.0-1-AMD64)69006e0069007400720064003d005c006b006c00610070006b0069005c00610032006200330039003800610031003000660035006600340061006600390039003200350038003900390039006500310034003000390033003500390039005c0036002e0034002e0030002d0031002d0061006d006400360034005c0069006e0069007400720064002e0069006d0067002d0036002e0034002e0030002d0031002d0061006d00640036003400200072006f006f0074003d007a00660073003a004100550054004f00200072006500730075006d0065003d0050004100520054004c004100420045004c003d0072006f007a006200690061006e002d007300770061007000200071007500690065007400200069006f006d006d0075003d006f006e00

real    0m0.002s
user    0m0.001s
sys     0m0.000s
-- >8 --
(some would say this is a .168/.002=84x speed improvement).

And bumps the actual max read rate by 50%:
-- >8 --
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
getdents64(4, 0x55cd1adb8420 /* 104 entries */, 32768) = 7568
getdents64(4, 0x55cd1adb8420 /* 0 entries */, 32768) = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\1\0\0\0\r\0U\0E\0F\0I\0:\0C\0D\0/\0D\0V\0D\0"..., 4096) = 59
read(4, "", 4037)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\1\0\0\0\r\0U\0E\0F\0I\0:\0R\0e\0m\0o\0v\0a\0"..., 4096) = 67
read(4, "", 4029)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0004-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\1\0\0\0\r\0U\0E\0F\0I\0:\0N\0e\0t\0w\0o\0r\0"..., 4096) = 63
read(4, "", 4033)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0005-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\7\0\0\0\314\0D\0e\0b\0i\0a\0n\0 \0G\0N\0U\0/\0"..., 4096) = 610
read(4, "", 3486)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0006-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\7\0\0\0\314\0D\0e\0b\0i\0a\0n\0 \0G\0N\0U\0/\0"..., 4096) = 610
read(4, "", 3486)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootNext-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\6\0\0\0\6\0", 4096)           = 6
read(4, "", 4090)                       = 0
close(4)                                = 0
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}, AT_EMPTY_PATH) = 0
write(1, "BootCurrent: 0006\n", 18BootCurrent: 0006
)     = 18
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Timeout-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\1\0", 4096)           = 6
read(4, "", 4090)                       = 0
close(4)                                = 0
write(1, "Timeout: 1 seconds\n", 19Timeout: 1 seconds
)    = 19
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0\6\0\5\0\2\0\3\0\4\0", 4096) = 14
read(4, "", 4082)                       = 0
close(4)                                = 0
-- >8 --

Patches attached, tested against 38-2.
From 3e2a0142e4b19e45f455542994b2b0d6d3f86886 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=BD=D0=B0=D0=B1?= <nabijaczleweli@nabijaczleweli.xyz>
Date: Tue, 21 Nov 2023 14:44:53 +0100
Subject: [PATCH 1/2] Remove extraneous and broken self-imposed 20ms sleep
 (nominally-50/s-rate limit) for each EFI variable read
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Mutt-PGP: OS

Nominally this rate limit is defined to avoid... getting rate-limited?

But it already severely limits the rate to unusable
‒ on two of my real systems this makes efibootmgr take 168ms/194ms,
  which accounts for 95%/82% of the run-time
  (and this is under strace, so it's 100% of the run-time) ‒
for klapki 0.2, this accounts for 36% and a large (140ms!)
start-up delay, and for klapki 0.3 it's well over 50%.
Well before you'd ever run afoul of the real limit.

Discounting "20ms" as "The user is not going to notice." is baffling.
efibootmgr is infuriatingly slow. 20ms is ping-to-america level.

Worse yet, the entire kernel rate-limiter amounts to fs/efivarfs/file.c
-- >8 --
  static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
                  size_t count, loff_t *ppos)
  {
          struct efivar_entry *var = file->private_data;
          unsigned long datasize = 0;
          u32 attributes;
          void *data;
          ssize_t size = 0;
          int err;

          while (!__ratelimit(&file->f_cred->user->ratelimit))
                  msleep(50);
-- >8 --
this is the alloc_uid() ratelimit with 1s interval + 100 burst.

This means that we can (best-case) read 50 variables
(read(...), read(0)) instantly, then do so again the next second.

Best-case because the current implementation is broken anyway:
it sleeps for 10ms after the attribute read (sure),
and then for 10ms after the /two/ reads to read the rest of the
variable (bad).

This limits libefivar to 33⅓ variables per second.

Most systems have roughly this many variables. Most programs only
care about a very thin subset of them, and scarcely come close to
reading enough to run afoul of the kernel limit. But even if they
did, this limit is /significantly harsher/ than the kernel limit ‒
it doesn't increase it (how could it? the limit's already there!),
but severely increases latency for /every single read/, instead of
just those over the rate.
It's strictly worse than just not doing it.

This was confirmed experimentally with strace -TTTT /bin/wc * * * * *
(note the many every-variable expansions so it's noticeable):
there is both visually a very obvious "big burst, little slowdown"
oscillation, but also (non-efivarfs reads filtered out)
  $ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -6 | uniq -c | sort -n
        1 0.8998
        1 0.9015
        1 0.9581
        1 0.9585
        1 0.9586
        5 0.0013
        9 0.0005
        9 0.0012
       46 0.0011
       70 0.0010
       84 0.0008
       85 0.0009
      102 0.0006
      115 0.0007
or indeed
 $ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -5 | uniq -c | sort -n
       1 0.899
       1 0.901
       3 0.958
     130 0.001
     395 0.000
(130+395)/2=262½ variables read in under a millisecond,
and 4½ got limited.

But, much more importantly, the first screenful was free:
99% of programs that don't read every variable
over and over and over, but fit well within the 33
(klapki's 7 and efibootmgr's 8,
 this is with the firmware's base boot entries + two additional ones;
 there isn't a non-hypothetical system in existence with 25 more boot entries).

Fixes: https://bugs.debian.org/1056344
---
 src/efivarfs.c | 12 ------------
 src/vars.c     | 11 -----------
 2 files changed, 23 deletions(-)

diff --git a/src/efivarfs.c b/src/efivarfs.c
index d4cee45..6483004 100644
--- a/src/efivarfs.c
+++ b/src/efivarfs.c
@@ -246,16 +246,6 @@ efivarfs_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 	int fd = -1;
 	char *path = NULL;
 	int rc;
-	int ratelimit;
-
-	/*
-	 * The kernel rate limiter hits us if we go faster than 100 efi
-	 * variable reads per second as non-root.  So if we're not root, just
-	 * delay this long after each read.  The user is not going to notice.
-	 *
-	 * 1s / 100 = 10000us.
-	 */
-	ratelimit = geteuid() == 0 ? 0 : 10000;
 
 	rc = make_efivarfs_path(&path, guid, name);
 	if (rc < 0) {
@@ -269,14 +259,12 @@ efivarfs_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 		goto err;
 	}
 
-	usleep(ratelimit);
 	rc = read(fd, &ret_attributes, sizeof (ret_attributes));
 	if (rc < 0) {
 		efi_error("read failed");
 		goto err;
 	}
 
-	usleep(ratelimit);
 	rc = read_file(fd, &ret_data, &size);
 	if (rc < 0) {
 		efi_error("read_file failed");
diff --git a/src/vars.c b/src/vars.c
index ea37f85..1df6a66 100644
--- a/src/vars.c
+++ b/src/vars.c
@@ -290,16 +290,6 @@ vars_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 	char *path = NULL;
 	int rc;
 	int fd = -1;
-	int ratelimit;
-
-	/*
-	 * The kernel rate limiter hits us if we go faster than 100 efi
-	 * variable reads per second as non-root.  So if we're not root, just
-	 * delay this long after each read.  The user is not going to notice.
-	 *
-	 * 1s / 100 = 10000us.
-	 */
-	ratelimit = geteuid() == 0 ? 0 : 10000;
 
 	rc = asprintf(&path, "%s%s-" GUID_FORMAT "/raw_var", get_vars_path(),
 		      name, GUID_FORMAT_ARGS(&guid));
@@ -314,7 +304,6 @@ vars_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 		goto err;
 	}
 
-	usleep(ratelimit);
 	rc = read_file(fd, &buf, &bufsize);
 	if (rc < 0) {
 		efi_error("read_file(%s) failed", path);
-- 
2.39.2

From ec81a1f5fe6c188cd3fb37c2694a401f6b04a650 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=D0=BD=D0=B0=D0=B1?= <nabijaczleweli@nabijaczleweli.xyz>
Date: Tue, 21 Nov 2023 16:12:07 +0100
Subject: [PATCH 2/2] =?UTF-8?q?Bump=20efivarfs=5Fget=5Fvariable=20throughp?=
 =?UTF-8?q?ut=20to=2050=20variables/second=20from=2033=E2=85=93?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Mutt-PGP: OS

Read the whole variable at once, then copy out the attributes
(for a description of efivarfs rate limiting, see #258)
---
 src/efivarfs.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/src/efivarfs.c b/src/efivarfs.c
index 6483004..3b6154b 100644
--- a/src/efivarfs.c
+++ b/src/efivarfs.c
@@ -241,7 +241,6 @@ efivarfs_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 	__typeof__(errno) errno_value;
 	int ret = -1;
 	size_t size = 0;
-	uint32_t ret_attributes = 0;
 	uint8_t *ret_data;
 	int fd = -1;
 	char *path = NULL;
@@ -259,21 +258,23 @@ efivarfs_get_variable(efi_guid_t guid, const char *name, uint8_t **data,
 		goto err;
 	}
 
-	rc = read(fd, &ret_attributes, sizeof (ret_attributes));
+	rc = read_file(fd, &ret_data, &size);
 	if (rc < 0) {
-		efi_error("read failed");
+		efi_error("read_file failed");
 		goto err;
 	}
+	--size; // read_file pads out 1 extra byte to NUL
 
-	rc = read_file(fd, &ret_data, &size);
-	if (rc < 0) {
-		efi_error("read_file failed");
+	if (size < sizeof (*attributes)) {
+		efi_error("no attributes");
 		goto err;
 	}
 
-	*attributes = ret_attributes;
+	memcpy(attributes, ret_data, sizeof (*attributes));
+	memmove(ret_data, ret_data + sizeof (*attributes), size - sizeof (*attributes));
+
 	*data = ret_data;
-	*data_size = size - 1; // read_file pads out 1 extra byte to NUL it */
+	*data_size = size - sizeof (*attributes);
 
 	ret = 0;
 err:
-- 
2.39.2

Attachment: signature.asc
Description: PGP signature


Reply to: