[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Can't list root directory




On January 31, 2024 1:28:37 PM PST, hw <hw@adminart.net> wrote:
>On Wed, 2024-01-31 at 09:27 -0500, Gary Dale wrote:
>> On 2024-01-30 15:54, hw wrote:
>> > On Mon, 2024-01-29 at 11:42 -0500, Gary Dale wrote:
>> > > I'm running Debian/Trixie on an AMD64 workstation. I've lost the ability
>> > > to see the root directory even when I am logged in as root (su -).
>> > > 
>> > > This has been happening intermittently for several months. I initially
>> > > thought it might be related to failing NVME drive that was part of a
>> > > RAID1 array that is mounted as "/" but I replaced the device and the
>> > > problem is still happening.
>> > > [...]
>> > What happens when you put the device you replaced back?
>> > 
>> How could putting a known-failing device back in help? The problem 
>> existed before I replaced it and continues to exist after the replacement.
>
>It sounded like you were able to list the root directory (at least
>sometimes) before you did the replacement.  Manually failing the
>device (perhaps after adding it back first) could make a difference.
>
>I've seen such indefinite hangs only when an NFS share has become
>unreachable after it had been mounted.  You could use clonezilla to
>make a copy and then perhaps convert the file system to btrfs.
>
>Do you still have the problem when you remove one of the NVME storage
>things?  Perhaps you have the equivivalent of a bad SATA cable or the
>mainboard doesn't like it when you access two of those at the same
>time, or something like that.  Even simple network cables can behave
>very strangely, and NVME may be a bit more complicated than that.
>
>Running fsck on every boot to work around an issue like this is
>certainly a bad idea.  Doesn't fsck report anything?  If it really
>makes a difference in itself rather than creating some side effect
>that leads to the root directory being readable, it should report
>something.  Perhaps you need to increase its verbosity.
>
>If there's no report then it would look like a side effect and raise
>the question what side effect it might be.  Does fsck run before the
>RAID has been brought up or after?  Is the RAID up when booting is
>completed?  What does mdadm say about the device(s)?  Can you still
>list the root directory when you manually fail either drive?  What
>exactly are the circumstances under which you can and not list the
>root directory?
>
>You need to do some investigating and ask questions like those ...
>

Also, instead of doing "ls -l /" which will stat() every child folder under root, try "/bin/ls -f /" and see if that is successful. That will only do a readdir() on root itself. Also, it might be interesting to get a log of "strace ls -l /" to confirm exactly where the hang happens.

-Loren 

-- 
Sent from my Nexus 4 with K-9 Mail. Please excuse my brevity.


Reply to: