[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [PATCH 07/17] nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl



On Mon, Sep 25, 2023 at 09:48:38AM +0200, Christoph Hellwig wrote:
> On Wed, Sep 20, 2023 at 03:41:11PM -0500, Samuel Holland wrote:
> > [   14.619101] Buffer I/O error on dev nbd0, logical block 0, async page read
> > 
> > [   14.630490]  nbd0: unable to read partition table
> > 
> > I wonder if disk_force_media_change() is the right thing to call here instead.
> 
> So what are the semantics of clearing the socket?
> 
> The <= 6.5 behavior of invalidating fs caches, but not actually marking
> the fs shutdown is pretty broken, especially if this expects to resurrect
> the device and thus the file system later on.

nbd-client -d calls

ioctl(nbd, NBD_DISCONNECT);
ioctl(nbd, NBD_CLEAR_SOCK);

(error handling removed for clarity)

where "nbd" is the file handle to the nbd device. This expects that the
device is cleared and that then the device can be reused for a different
connection, much like "losetup -d". Expecting that the next connection
would talk to the same file system is wrong.

In netlink mode, it obviously doesn't use the ioctl()s, but instead
sends an NBD_CMD_DISCONNECT command, without any NBD_CLEAR_SOCK, for
which no equivalent message exists. At this point, obviously the same
result is expected in userspace, i.e., the device should now be
available for the next connection that may or may not be the same one.

nbd-client also has "-persist" option that used to work. This does
expect to resurrect the device and file system. It depends on semantics
where the kernel would block IO to the device until the nbd-client
process that initiated the connection exits, thus allowing it to
re-establish the connection if possible. When doing this, we don't issue
a DISCONNECT or CLEAR_SOCK message and obviously the client is expected
to re-establish a connection to the same device, thus some state should
be retained.

These semantics have however been broken at some point over the past decade
or so, but I didn't notice that at the time, so I didn't complain, and
it's therefore probably not relevant anymore. We should perhaps rethink
whether this is still a good idea given the way the netlink mode does
not have a client waiting for a return of the ioctl() call, and if so
how to implement a replacement.

Kind regards,

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.


Reply to: