Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang
- To: Sheng Yang <sheng@...2115...>
- Cc: Paul Clements <paul.clements@...856...>, "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, kernel list <linux-kernel@...25...>
- Subject: Re: [Nbd] NBD: Disconnect connection/kill NBD server cause kernel bug even kernel hang
- From: Pavel Machek <pavel@...28...>
- Date: Wed, 7 Oct 2015 10:19:36 +0200
- Message-id: <20151007081936.GB3431@...2124...>
- In-reply-to: <CA+2rt40LExtmuzBknJ5Pai6zVifAEUrkRUVUnGCRPuJVvAJvYQ@...18...>
- References: <CA+2rt42MwRUdHzx6-bcGARJuKTwZcvzhTDgS8VytajXtSBpnYg@...18...> <CAECXXi6pHBSTYnE4rS2NEH-ZZKmDD+XdqeSMKPt3oKu7pM+bRg@...18...> <CA+2rt40LExtmuzBknJ5Pai6zVifAEUrkRUVUnGCRPuJVvAJvYQ@...18...>
On Mon 2015-09-21 17:33:21, Sheng Yang wrote:
> Thank you Paul! That's exactly the issue I met. I've read the whole
> thread and got a general idea of the issue.
>
> I try to summarize it and please correct me if I'm wrong:
>
> 1. The issue is the result of kill_bdev() when connection has been cut
> when IO is still flying.
> 2. Other block devices driver didn't have this issue because they
> normally keep the buffer and device, deny all the requests and only
> kill the device when all request has been settled down and device has
> been umounted.
> 3. Why NBD cannot handle it in the same way because NBD has dependency
> on userspace nfs-client, which would handle the reconnection/retry. If
> DO_IT ioctl didn't return, then there is no way for userspace to
> reconnect. How to coordinate between kernel and userspace on
> reconnecting while leaving the device open until it's safe to close
> leads to several choices and would leads to quite amount of change.
>
> Sounds like Goswin's suggestion really make sense(
> http://sourceforge.net/p/nbd/mailman/message/31678340/ ). This issue
> has been there for years and impact the stability of NBD a lot. I
> think we should get it fixed.
You can get fix this one, but getting memory management right so that
client & server on same machine is safe is not going to be easy.
Think "nbd-server swapped out, because memory was needed by dirty buffers".
Think "nbd-server can't allocate memory because it is all in the dirty buffers".
Pavel
Reply to: