Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Unix Technology IT

Why You Shouldn't Reboot Unix Servers 705

GMGruman writes "It's a persistent myth: reboot your Unix box when something goes wrong or to clean it out. Paul Venezia explains why you should almost never reboot a Unix server, unlike say Windows."
This discussion has been archived. No new comments can be posted.

Why You Shouldn't Reboot Unix Servers

Comments Filter:
  • by Anonymous Coward on Monday February 21, 2011 @01:48PM (#35269476)

    i'm really tired of this semi-technical stuff on slashdot that seems aimed at semi-competent manager-types.

  • by Syncerus ( 213609 ) on Monday February 21, 2011 @01:49PM (#35269486)

    One minor point of disagreement. I'm a fan of the pre-emptive reboot at specific intervals, whether the interval be 30 days, 60 days, or 90 days is up to you. In the past, I've found the pre-emptive reboot will trigger hidden system problems, but at a time when you're actually ready for them, rather than at a time when they happen spontaneously ( 2:30 in the morning ).

  • by pipatron ( 966506 ) <pipatron@gmail.com> on Monday February 21, 2011 @01:51PM (#35269498) Homepage

    FTFA:

    Some argued that other risks arise if you don't reboot, such as the possibility certain critical services aren't set to start at boot, which can cause problems. This is true, but it shouldn't be an issue if you're a good admin. Forgetting to set service startup parameters is a rookie mistake.

    This is retarded. A good admin will test so that everything works, before it will get a chance to actually break. Anyone can fuck up, forget something, whatever. Doesn't matter how experienced you are. Murphys law. The only way to test if it will come up correctly during a non-planned downtime is to actually reboot while you have everything fresh in memory and while you're still around and can fix it. Rebooting in that case is not a bad thing, it's a responsible thing to do.

  • What a load of BS (Score:5, Insightful)

    by kju ( 327 ) * on Monday February 21, 2011 @01:52PM (#35269508)

    I RTFA (shame on me) and it is in my opinion absolutely stupid.

    There is actually only one real reason given and that is that if you reboot after some services ceased working, you might end up with a unbootable machine.

    In my opinion this outcome is absolutely great. Ok, maybe no great, but it is important and rightful. It forces you to fix the problem properly instead of ignoring the known problems and missing yet unknown problems which might bite you in the .... shortly after.

    Also: When services start being flakey on my system, i usually want to run an fsck. In 16 years linux/unix administrations I found quite a time that the FS was corrupted without an apparent reason and with beeing unnoticed before. So a fsck is usually a good thing to run when strange things happen and to be able to run it, i nearly always need to reboot.

    I can't grasp what kind of thinking it must be to continue running a server where some services fail or behave strangely. You could end up with more damage than cause by a outage when the reboot does not go through. You just might want to do the reboot at off-peak hours.

  • by Sycraft-fu ( 314770 ) on Monday February 21, 2011 @01:53PM (#35269520)

    More or less it is "You shouldn't reboot UNIX servers because UNIX admins are tough guys, and we'd rather spend days looking for a solution than ruin our precious uptime!"

    That is NOT a reason not to reboot a UNIX server. In fact it sounds like if you've a properly designed environment with redundant servers for things, a reboot might be just the thing. Who cares about uptime? You don't win awards for having big uptime numbers, it is all about your systems working well and providing what they need and not blowing up in a crisis.

    Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any. If you want to claim "You shouldn't reboot," then you need to present technical reasons why not. Just having more uptime or being somehow "better" than Windows admins is not a reason, it is silly posturing.

  • by afabbro ( 33948 ) on Monday February 21, 2011 @01:54PM (#35269548) Homepage

    This is not a myth I had heard before.

    +1. This article should be held up as a perfect example of building a strawman.

    "It's a persistent myth that some natural phenomena travel faster than the speed of light, but at least one physicist says it's impossible..."

    "It's a persistent myth that calling free() after malloc() is unnecessary, but some software engineers disagree..."

    "It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

  • Re:Uh.. no (Score:5, Insightful)

    by Anrego ( 830717 ) * on Monday February 21, 2011 @02:03PM (#35269662)

    Maybe true if the box is set up then never touched. If anything new has been installed on it.. or updated.. I think it's a good idea to verify that it still boots while the change is still fresh in your head. Yes you have changelogs (or should), but all the time spent reading various documentation and experimenting on your proto box (if you have one) is long gone. There's lots of stuff you can install and start using, but could easily not come up properly on boot.

    And why are reboots bad. If downtime is that big a deal, you should have a redundant setup. If you have a redundant setup, rebooting should be no issue. I've seen a very common trend where people get some "out of the box" redundancy solution running... then check of "redundancy" on the "list of shit we need" and forget about it. Actually verifying from time to time that your system can handle the loss of a box without issue is important (in my view).

  • by aztektum ( 170569 ) on Monday February 21, 2011 @02:04PM (#35269672)

    /. editors: I propose a new rule. Submissions with links to PCWorld, InfoWorld, PCMagazine, Computerworld, CNet, or any other technology periodical you'd see in the check out line of a Walgreens be immediately deleted with prejudice.

    They're the Oprah Magazine of the tech world. They exist to sell ads by writing articles with grabby headlines and little substance.

  • by arth1 ( 260657 ) on Monday February 21, 2011 @02:04PM (#35269680) Homepage Journal

    Don't forget 777 and 666 permissions all over the place, and SELinux and iptables disabled.

    As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo. Learn to use groups, setfattr and setuid/setgid properly, leave admin commands to administrators, and you won't need sudo.

    find /home/* -user 0 -print

    If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

  • Re:Uh.. no (Score:4, Insightful)

    by OzPeter ( 195038 ) on Monday February 21, 2011 @02:14PM (#35269798)

    (wishing that /. would allow edits)

    To add to my previous comment. The general consensus of disaster recovery best practice is that you do not test a backup strategy, you test a restore strategy. Rebooting a server is testing a system restore process.

  • by BlueBlade ( 123303 ) <mafortier&gmail,com> on Monday February 21, 2011 @10:52PM (#35274846)

    - iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

    As a network admin, I have violent fantasies of driving hot nails through the privates of the "Let's block all ICMP by default" admins whenever I come up at a new client's site to troubleshoot some complex networking issues. If you block ICMP echo, you better have an extremely good reason for it. If it's from a public WAN link facing the internet, then *maybe* you might have a case (but most often not). If it's on a web server or other public-facing services, you PROBABLY DON'T HAVE A VALID REASON. If you block traceroutes from anywhere except edge firewalls, you are a clueless idiot. And even then, requests coming from inside interfaces should be let through. THIS IS ESPECIALLY TRUE OVER MPLS AND Site-to-Site VPN LINKS!

    Whew, that felt good. Seriously, blocking icmp doesn't do *anything* for security. If you are getting flooded by icmp packets, just configure a flood threshold. These days, any icmp DoS flood that is bad enough to actually interrupt services very likely doesn't need the extra "reply" traffic to work. And if your clever "security" of not replying to pings on anything that has ports open is stupid, as a simple port scan will reveal the host.

    Please, for the sake of every network admin's sanity, leave ICMP alone. Thank you.

Always draw your curves, then plot your reading.

Working...