[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[ale] lsof and a hung system
- Subject: [ale] lsof and a hung system
- From: DJPfulio at jdpfu.com (DJ-Pfulio)
- Date: Tue, 20 Oct 2015 13:09:12 -0400
- In-reply-to: <[email protected]>
- References: <[email protected]> <[email protected]> <CAEo=5Pz0OU_Pf4NcadPxXWTV0zqN+bG7JtXyTSfTcS467n0mtA@mail.gmail.com> <CADvA-d=VhZ36PnQNJSS0APfgkQWdreEuaDL+_RcRUp_T9eQGOg@mail.gmail.com> <[email protected]>
But isn't systemd supposed to solve these issues?
BTW, I had to add a similar delay in the startup of a raspberry-pi box
that got systemd with the 4.1 kernel in a debian install.
On 10/20/2015 12:25 PM, Jim Kinney wrote:
> Yep. The 10G card driver had oopsed all over itself and wouldn't keep a
> connection up. I initially tried to stop network, unload the module,
> load the module, start the network but even that failed to reset the
> card completely. I needed to add a sleep 20 before loading the module
> again. Once the connection was actually working the system was cleanly
> rebooted to lop off the zombies and things were happily OK.
> On Tue, 2015-10-20 at 11:32 -0400, Ed Cashin wrote:
>> On Mon, Oct 19, 2015 at 10:58 PM, Jim Kinney <jim.kinney at gmail.com>
>> wrote:
>> ...
>>> Other system with same nfs mounted storage is fine. Storage server
>>> is connected to both number crunchers by dedicated, unswitched
>>> 10Gbps fiber ethernet.
>>>>
>>>
>> You mean with direct connections? In that case, the other number
>> cruncher's connection could be fine, while the affected system could
>> not be able to do networking to the NFS server (for some as yet
>> undetermined reason), which could result in the behavior you describe
>> if the NFS mount is "hard".
>