[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[ale] Lab Workstation Mystery
- Subject: [ale] Lab Workstation Mystery
- From: jim.kinney at gmail.com (Jim Kinney)
- Date: Mon, 28 Mar 2016 12:34:36 -0400
- In-reply-to: <[email protected]>
- References: <[email protected]>
The root dir is NOT NFS mounted so that's a red-herring that you can't
mount the /home later. If /var is not writeable, the system will hang
as it can't log any more. Mounting requires a log entry
Since it's not happening all at once to all the machines it really
smells like a local machine problem. Verify that the drive is not full.
Check to see if the affected machines are on the power circuit.?
Is it the same 2-3 each time? If so, run memtest and badblocks. If swap
gets corrupted, Linux system lock up.
On Mon, 2016-03-28 at 10:54 -0500, Todor Fassl wrote:
> I have a mysterious problem with workstations in a shared use?
> environment. There are 2 labs in different buildings, onewith 6?
> workstations and one with 8. These workstations are used by a group
> of?
> about 30 grad student TAs. All are running ubuntu 15.10.
> Authentication?
> is via ldap and home directories are mounted??via nfs.??Every day, 2
> or?
> 3 of the machines go down. The earliest symptom I can find is that
> the?
> root filesystem is remounted read-only.??Soon they stop responding
> to?
> ssh and snmp and they are essentially locked up. They still respond
> to?
> pings though.
>
> I've caught the machines in the period where the root system is?
> read-only but I can still ssh to them. I've found that I cannot nfs?
> mount home directories on our file server.??I can mount nfs shares
> on?
> other servers. And I can mount the same home directories if I go to?
> another workstation. Restarting nfs on the file server has no effect.
>
> When I try to mount a home directory on an effected machine, the
> mount?
> just hangs.??I ran it with strace and it just showed it was waiting
> --?
> for what, I'm not sure and I don't have a screen cap available at
> the?
> moment. I put a packet sniffer on the server and it showed it
> received a?
> single packet from the client and that's it.
>
> There is nothing in the logs on the client. In fact, they simply stop
> at?
> some point in the process. At first I attributed this to the root?
> filesystem being read-only but it continues after I move /var to a?
> separate file system. At some point it just stops writing records to
> the?
> syslog but I don't know if it's before or after the root filesystem
> is?
> remounted read-only.
>
> Many of the TAs also have identical workstations in their offices.
> None?
> of those machines seem to have this problem.??The TAs do tend to
> walk?
> away from the workstations w/o logging out. But I wrote a script to
> kill?
> off their sessions and it didn't help. I had it send me an email?
> whenever it killed somebody's session and it doesn't seem to be?
> correlated with that. In other words, sometimes machines go down even
> if?
> everyone who has used it has remembered to log out.
>
> I'm pretty desperate. Any ideas?
>
> _______________________________________________
> Ale mailing list
> Ale at ale.org
> http://mail.ale.org/mailman/listinfo/ale
> See JOBS, ANNOUNCE and SCHOOLS lists at
> http://mail.ale.org/mailman/listinfo
--
James P. Kinney III
Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain
http://heretothereideas.blogspot.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ale.org/pipermail/ale/attachments/20160328/d8d8281e/attachment.html>