Re: [Nbd] nbd-server segfault on x86_64
- To: JaniD++ <djani22@...60...>
- Cc: nbd-general@lists.sourceforge.net
- Subject: Re: [Nbd] nbd-server segfault on x86_64
- From: Wouter Verhelst <wouter@...3...>
- Date: Fri, 24 Feb 2006 08:50:48 +0100
- Message-id: <20060224075048.GA27155@...39...>
- In-reply-to: <016401c638b5$bd2914e0$1600a8c0@...74...>
- References: <01f001c62f39$1298c140$9d00a8c0@...74...> <20060213164317.GA7269@...39...> <015801c630e7$fb7957d0$9d00a8c0@...74...> <20060213225347.GB20017@...39...> <016b01c630f9$5d1c17f0$9d00a8c0@...74...> <20060214062547.GA30497@...39...> <016401c638b5$bd2914e0$1600a8c0@...74...>
On Thu, Feb 23, 2006 at 09:13:24PM +0100, JaniD++ wrote:
> ----- Original Message -----
> From: "Wouter Verhelst" <wouter@...3...>
> To: "JaniD++" <djani22@...60...>
> Cc: <nbd-general@lists.sourceforge.net>
> Sent: Tuesday, February 14, 2006 7:25 AM
> Subject: Re: [Nbd] nbd-server segfault on x86_64
>
>
> > On Tue, Feb 14, 2006 at 12:58:12AM +0100, JaniD++ wrote:
> > > > On Mon, Feb 13, 2006 at 10:53:32PM +0100, JaniD++ wrote:
> > > > [...]
> > > > > > > The system:
> > > > > > > P4 Cual core(64 bit), Fedora Core 4 X86_64, Kernel
> 2.6.16-rc1-git4,
> > > nbd
> > > > > > > 2.8.2, compiled on this system.
> > > > > >
> > > > > > I believe these problems have been fixed in 2.8.3, though I'm not
> > > > > > entirely sure. Could you try with 2.8.3? If that does not work,
> we'll
> > > > > > need to debug a bit more.
> > > > >
> > > > > Not needed.
> > > > > It looks like fixed on 2.8.3.
> > > >
> > > > Right, I thought as much.
> >
> > Hmm. Forgot this: there's also a pretty nasty bug in nbd-server 2.8.3
> > involving the incorrect killing of child processes, which will fill up
> > your syslog in no time.
> >
> > It's fixed for the Debian packages, but I still need to do a release for
> > the source.
> >
> > I'll do that once I checked whether the insanely huge devices work.
>
> I'd like to ask, there is some news with the big devices?
Sorry, I forgot; I did some work on it, found out that my approach was
wrong, and had to leave; after that, I didn't look into it anymore.
I also don't have much time at the moment (with FOSDEM and all).
I'll make sure to have some reasonable answer by the weekend.
> I almost run out the disk space, and need to grow the xfs...
Yes, that sounds rather... important.
> [...]
>
> I have another problem at this time.
> Some nodes are rarely and randomly disconnected.
> The nbd-client exits, and my big raid is crashed.
>
> The dmesg messages is like this:
> nbd7: Attempted send on closed socket
> end_request: I/O error, dev nbd7, sector 0
> Buffer I/O error on device nbd7, logical block 0
You should be able to reconnect it at that point -- but I agree, this
disconnect shouldn't happen at all.
> And this cause another problem!
> If the traffic is high enough, this message are slows down or completely
> stops the system.
>
> I try to write a script to check the nbd-clients pid number, but the
> response time is too slow. :(
> It is too hard to implement one option to nbd-client like --nodaemon or
> something else?
> I mean staying in foreground and with -v option is printing useful verbose
> and debug informations?
The way nbd-client is implemented, this is impossible: the only thing
the nbd-client process does is to perform a handshake with nbd-server,
and set up a socket. After that, it runs an ioctl(), which does not
return until the device is disconnected.
I could perhaps make it not fork() before doing that; but it's not as if
I can make it output any useful information without going into kernel
space.
--
Fun will now commence
-- Seven Of Nine, "Ashes to Ashes", stardate 53679.4
Reply to: