[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
OS, Hardware, Network - Logging, Monitoring, and Alerting
- Subject: OS, Hardware, Network - Logging, Monitoring, and Alerting
- From: agirling at denetron.com (Andrew Girling)
- Date: Thu, 26 Jun 2008 05:30:43 -0400
- In-reply-to: <[email protected]>
- References: <[email protected]>
On Jun 26, 2008, at 5:22 AM, Rev. Jeffrey Paul wrote:
> Hi. I've a (theoretically) simple problem and I'm wondering how
> others
> solve it.
>
> I've recently deployed ~40 Linux instances on ~20 different Dell
> blades
> and PowerEdges (we're big on virtualization), a few 7204s and 3560s,
> and
> assorted switchable PDUs and whatnot.
>
> We need to monitor standard things like cpu, memory, disk usage on all
> OSes. This is straightforward with net-snmp. It would also be cool
> if
> I could monitor more esoteric things, like ntp synchronization status,
> i/o statistics, etc.
>
> Other stuff we really need to keep an eye on is hardware - redundant
> PSU status in our 7204s and Dells, temperatures and voltages (one of
> our colos in New York peaked at over 40C a few weeks ago, for
> instance), and disk array status (I'd like to know of a failed disk
> in a hardware RAID5 before I get calls about performance issues). Our
> blade chassis have DRACs in them and I think they export this data via
> SNMP (I'm trying to avoid the use of SNMP traps), but not all of our
> other PowerEdges have the DRACs in them so some of this information
> may
> need to be pulled via IPMI from within the host OS. Presumably the
> Cisco gear makes the temperature available via SNMP.
>
> Finally, service checks - standard stuff (dns, http, https, ssh,
> smtp).
>
> Now, to the questions.
>
> 1) Is SNMP the best way to do this? Obviously some of the data
> (service
> checks) will need to be collected other ways.
>
> 2) Is there any good solution that does both logging/trending of this
> data and also notification/monitoring/alerting? I've used both Nagios
> and Cacti in the past, and, due to the number of individual things
> being
> monitored (3-5 items per OS instance, 5-10 items per physical server,
> 10-50 things per network device), setting them both up independently
> seems like a huge pain. Also, I've never really liked Nagios that
> much.
>
> I recently entertained the idea of writing a CGI that output all of
> this
> information in a standard format (csv?), distributing and installing
> it, then
> collecting it periodically at a central location and doing all the
> rrd/notification myself, but then realized that this problem must've
> been solved a million times already.
>
> There's got to be a better way. What do you guys use?
>
> (I'm not opposed to non-free solutions, provided they work better.)
You may want to have a look at Zenoss, http://www.zenoss.com/
Cheers,
Andrew
- Prev by Date:
OS, Hardware, Network - Logging, Monitoring, and Alerting
- Next by Date:
OS, Hardware, Network - Logging, Monitoring, and Alerting
- Previous by thread:
OS, Hardware, Network - Logging, Monitoring, and Alerting
- Next by thread:
OS, Hardware, Network - Logging, Monitoring, and Alerting
- Index(es):