The Best-Laid Plans of SysAdmins, or Trading Off Capital Vs. Labor

Our primary production web server went down a few Mondays ago. It’s 20 miles away in our colocation facility, so whatever our Senior SysAdmin was planning to do Monday didn’t happen and he got to drive up there to troubleshoot instead. The server is on a RAID array and has redundant power and networking, but the motherboard itself failed, so he had to swap the server. Servers having an occasional issue in the enterprise is not an uncommon occurrence, but in a small shop you often only have one person qualified to deal with fixing the issue and can’t just have a hot spare multi-thousand dollar server sitting in the rack below, so he had to truck back to the main office, get one he was using for testing, and take it back up.

We’ve managed to avoid scrambling like this as much as possible, partly because we overbuild the network infrastructure when we consider it prudent. We always have RAID, redundant power and networking, and remote console on all our physical servers. Most of our servers now are virtual, and we run redundant VMWare servers on-site and another one off-site. They’re tied in with our high-end Isilon network-attached-storage cluster which also replicates at the block level offsite. We run switched Gigabit Ethernet to the desktop and have for years. We bought a separate switch to plug our wireless network access points into so we’re more likely to have some means of connecting if the main client switch fails. We have a backup microwave Internet connection on the roof of our building. Why all this for an entity our size?

Our labor budget and our equipment budget are separate in their sources, and while we’re always a little tight, we’re a lot tighter in overhead (labor) then in equipment. So one of my principles is paying more for an equipment setup that’s more reliable, easier to manage, or requires less maintenance. Servers are less likely to go down, and we’re not tracking who has a fast connection and who doesn’t and constantly swapping things around. Our IS staff can plan their time and projects better as opposed to fighting fires everyday. If we only have one or two people qualified to work on something they’re not pulled in 20 directions at once and they get to take vacations and not worry so much about being called as we can get by ’til they return.

Ironically the server used as the spare for the web was being used to test clustering for our web environment so we’d no longer be at risk for exactly what happened. Guess that moves up the priority list now….

2 thoughts on “The Best-Laid Plans of SysAdmins, or Trading Off Capital Vs. Labor

  1. Pingback: A Helpful Vendor Meeting | Small Shop – Tech Heavy

  2. Pingback: A geek with a hat » Remember to feed your sysadmins

Leave a comment