Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Unix IT

Seven Habits of Highly Effective Unix Admins 136

jfruh writes: "Being a Unix or Linux admin tends to be an odd kind of job: you often spend much of your workday on your own, with lots of time when you don't have a specific pressing task, punctuated by moments of panic where you need to do something very important right away. Sandra Henry-Stocker, a veteran sysadmin, offers suggestions on how to structure your professional life if you're in this job. Her advice includes setting priorities, knowing your tools, and providing explanations to the co-workers whom you help." What habits have you found effective for system administration?
This discussion has been archived. No new comments can be posted.

Seven Habits of Highly Effective Unix Admins

Comments Filter:
  • Number 6 Problem (Score:5, Insightful)

    by magamiako1 ( 1026318 ) on Friday April 11, 2014 @12:21PM (#46726373)
    The issue with #6 is that users almost invariably never accept an answer here. And a lot of the time it may be something you can't adequately explain, which is something they don't like even more. Especially if you know the problem wasn't the result of something you did.
    • Re: (Score:2, Informative)

      by zacherynuk ( 2782105 )
      Indeed - not only that, but even if you are really good at keeping docs, an intranet log or similar - it still won't be read, understood or appreciated. Later on, with even the best of everyone's interests at heart the worst thing you could ever say is - "I documented this here, and explained it here and asked for feedback here and you said you read it..." Nothing like a few reference facts and common sense to drive a wedge between admins and users.
      • ...habits have you found effective for system administration?

        I like to shut all the ports in the firewall. The sense of calm that descends on the servers is downright pleasant. Of course, then the phone begins to ring...

    • by Anonymous Coward

      With Linux/Unix just say, "Well it's an antiquated operating system." - and if Linux add "and with this F/OSS operating system, well, you get what you paid for and with the addition of [insert package name like systemd] 'blah blah blah blah' caused our problem. I need a raise to work with this shit!"

      If we were on Windows, this wouldn't happen!

      It works the other way around if you are a Windows admin, too. Just replace F/OSS with "closed undocumented source and [insert money hungry profit driven dribble scre

      • by Anonymous Coward

        If you are in a shop that chains together MacMini servers, whe there's problem, start crying, ask for a hug and just say "I don't know what happened?! *sobbing* Mac is Go...Job's gift to mankind kind! *sob* I'm such a failure! *snot runs from nose*"

        You'll get a couple of days off, they'll power down everything, turn it back on, and it'll all be working fine when you get back.

  • Tmux (Score:5, Informative)

    by matthiasvegh ( 1800634 ) on Friday April 11, 2014 @12:21PM (#46726377)
    I discovered tmux (terminal multiplexer) a while back, and is a very potent replacement for screen, it supports splitting windows, having multiple sessions, sharing windows between sessions, customizable status bars etc. Try it out!
    • Try it out!

      Make me!

    • screen also has split windows, multiple sessions, customizable status bar. For my use cases, I could not find a compelling reason to use tmux

      • Re:Tmux (Score:4, Informative)

        by evilviper ( 135110 ) on Friday April 11, 2014 @01:37PM (#46727249) Journal

        For my use cases, I could not find a compelling reason to use tmux

        Obviously if you've been limiting yourself to the features of "screen" for many years, you're not going to think you need the added features of "tmux"...

        A big one is sharing:
        "window can be linked to an arbitrary number of sessions". If you or somebody else has a screen session open, you don't have to detach it from their terminal to see what's on it. You can just attach it to your terminal as well. Works great when you've got a session attached to your desktop, then want to access it on your laptop/tablet/phone/etc. The tmux session will even change geometry to match the smallest terminal window.

        Being more lightweight and responsive is good. Saner keys for some functions, like ctrl-a pg-up to access scrollback. And just the fact that it's still getting active development is an important feature.

        • by Chalex ( 71702 ) on Friday April 11, 2014 @01:40PM (#46727265) Homepage

          screen -x shares the screen just fine for me.

          • I oversimplified the explanation a bit...

            Here it is in nicm@'s words:
            "In particular, being able to share a single window between multiple terminals, with other windows in the same session but entirely separate. Adding this to screen was implausible"

            http://undeadly.org/cgi?action... [undeadly.org]

            • Re:Tmux (Score:4, Informative)

              by dissy ( 172727 ) on Friday April 11, 2014 @03:01PM (#46728169)

              Here it is in nicm@'s words:
              "In particular, being able to share a single window between multiple terminals, with other windows in the same session but entirely separate. Adding this to screen was implausible"

              Perhaps I am still misunderstanding the features of tmux (most likely in fact), but to say that is implausible to add to screen is misleading to say the least, since I have been doing exactly that in screen for nearly a decade.

              On one terminal, either start a new screen session or -r to a detached session.
              If starting a new one, try: screen -S LetsShare

              On a second terminal, run: screen -list
              You should see a list of screen sessions and their status (attached, detached, multi, etc)
              If you used -S on start that will be the name, otherwise it's some tty.host.number string.

              Now on that second terminal run: screen -x

              Try to adjust both terminal sessions so you can see them at the same time. Type in either, watch in either. They are shared seemingly matching your tmux description.

              You can change permissions per terminal so others can't type but will see everything you do (aka tutorial mode) using ^a *

              Also for split/multiple windows showing on the same terminal, use ^a S (control-a capital-S)
              To switch between split windows use ^a tab
              Close a section of split window with ^a Q

              The status bar problem is true and pretty annoying. I fixed it myself with a line in ~/.screenrc but of course I have to pretty much install that user config file on every new system I use which can get annoying.
              If you want an always-on status bar showing window numbers and titles (^a A to change the title), add this to .screenrc (and hope slashdot doesn't munge it!)

              hardstatus alwayslastline "%{= wk}%-Lw%{= BW}%n%f* %t%{-}%+Lw %-=%{= BW}%H%{-}%{-}"

              Note the two "BW" bits? That's background blue and foreground white, and applies to the window with focus. Change B to R for red for example (production vs not-production in my case)

              Here is my whole .screenrc file for copy/paste purposes: http://pastebin.com/kMkuFXi9 [pastebin.com]
              No splash screen, always on status bar, 10k line scrollback history for copy/paste (^a [ and ^a ] ), and auto-open three windows with preset titles and commands running in them.

              I don't mean to knock tmux in any way at all, having not used it (and I do plan to check it out now) - but hopefully these screen tips help out others here.

              • by David Jao ( 2759 )
                I'm not a tmux user, so I may be completely wrong, but I think what they are talking about is that in tmux you can share one window in a session without also sharing all your other windows in that session. You can also easily move tmux windows between sessions, which you can't do in screen. In addition, sharing a tmux window to another user with a different login account is a lot easier in tmux than in screen. There are also forks of tmux that allow two people to use one window with two independent cursors.
        • at least I tried and used tmux for two weeks before reaching my conclusion, as it is the preferred ware over screen in my favorite BSD server distro

          "Saner keys" really doesn't mean much to me, I learn the dozen or so list of keystrokes it takes to do the job, whatever they are. Most of the commonly used "screen" ones do have sensible mnemonics for those that find such things helpful.

          As others have pointed out, screen can do sharing.

    • by emag ( 4640 )

      Does tmux support connecting to serial consoles yet?

    • by antdude ( 79039 )

      I still like the plain old screen. I don't need the fancy splits, status bars, etc. If I wanted more than one screens, I run separate commands. I have had screen crash before even though very rare. I would hate to see tmux crash all my sessions.

    • You should tell Microsoft. I hear they are looking to upgrade Metro.

    • I too prefer tmux to screen, but I would like to warn you about 1 danger of using it. Do NOT disconnect from a tmux session that is being used to upgrade tmux. If tmux happens to upgrade to a version with a newer protocol, you will not be able to reconnect to the tmux session. I did this once and had to build a static version of tmux from the previous version and use that to reconnect and continue the upgrade. Screen is theoretically susceptible to the same problem, but the protocol almost never changes.

      I s

    • I need *WUFF* *WUFF* !
  • by Neruocomp ( 513658 ) on Friday April 11, 2014 @12:24PM (#46726403)
    When working on a problem, I usually have two or more shells open. I don't mean multitasking, but with more then one open, I can issue commands from one and use the others to monitor logs/etc.
  • by zlives ( 2009072 ) on Friday April 11, 2014 @12:25PM (#46726417)

    i thought they were
    sloth, gluttony, pride,...

  • Habits ... (Score:3, Informative)

    by Xaemyl ( 88001 ) on Friday April 11, 2014 @12:33PM (#46726489)

    What habits have I found effective for system administration? BOFH spring to mind ...

  • by Rosco P. Coltrane ( 209368 ) on Friday April 11, 2014 @12:34PM (#46726497)

    I know them all. They all work in Marketing.

    • by rcamans ( 252182 ) on Friday April 11, 2014 @12:54PM (#46726707)

      Apparently you have not interacted with management much, or you would not have restricted your answer to marketing...

    • I know them all. They all work in Marketing.

      No, a couple are in HR as well, and there is at least one in the Finance department. Some days I'm not so sure about IT.

      Have you ever been told you need to submit accurate time sheets for the week on Wednesdays? How the hell do you expect me to give you accurate timesheets for the entire week on a Wednesday when I usually work Wednesday and Friday evenings for an unknown period of time??? And if I had to submit it on Wednesday, don't grumble that I had to submi

  • by tiberus ( 258517 ) on Friday April 11, 2014 @12:38PM (#46726541)

    The first time a task comes up deal with it manually, it may or may not be related to a problem.

    The second time this task occurs deal with it manually.

    The third time this task occurs, it's time to start scripting.

    It may take you a day or more to write the script, test debug, etc. or even longer for complex tasks but, this behavior tends to be a winner. The script is already some degree of documentation, it records the steps, etc. If it's robust enough it can be used to by your support techs to resolve issues, expanding the number of people who can resolve an issue, freeing the admin for other tasks. Scripts tend not to make typos (yes, I know your command line skills are legendary) and can save a lot of time and effort in the long run.

  • If you are not doing active improvements, planning for failover, and using good configuration management techniques then your slow time is adding to the number of hurry-up-and-fix-all-the-things times. There are always external matters like heartbleed that will come along, as a sysadmin's job is not to review the memory allocator in the SSL library regularly. However, if your web services or mail services are down because a single system went offline then you're to be blaming yourself.

  • Did you try turning it off then on again?

  • Everyone knows real programmers code in C, and in C you count from zero. Counting from one? that is so FORTRAN. Retire already, old chap.
  • by rs79 ( 71822 ) <hostmaster@open-rsc.org> on Friday April 11, 2014 @01:13PM (#46726945) Homepage

    "What habits have you found effective for system administration?"

    Carrying an Uzi.

  • by hawguy ( 1600213 ) on Friday April 11, 2014 @01:14PM (#46726957)

    As someone who's managed a team of sysadmins that moved to the Linux world from Windows, I have this tip: "Reboot does not fix anything, it just hides things".

    For some reason, Windows admins have been trained to reboot immediately when things don't work well rather than to figure out why something is failing. I'm sure this was a valid "fix" in older versions of Windows, but Windows has been stable for quite some time, and things shouldn't mysteriously stop working for no reason. Take a bit of time to figure out *why* the CPU is suddenly spiking on the database server, since if you reboot it, you will have lost most of the evidence for why it's happening, and it's likely to happen again. If it's a production server and you can't spend much time, run a few diagnostics (ps, "top", lsof, etc) and save to a file for the postmortem, but don't just go in and reboot before looking around.

    • by evilviper ( 135110 ) on Friday April 11, 2014 @01:44PM (#46727321) Journal

      "Reboot does not fix anything, it just hides things".

      That's not specific to rebooting... It's more a question of doing root-cause analysis, versus quick bandaids. I'm firmly in the RCA camp, but sometimes it's the companies that are to blame, rather than the individual admins. Some companies are heavily slanted towards always getting the quickest possible workaround, rather than ever actually finding and fixing the problem. It's one of those false-economies, like counting lines of code and similar.

    • On the flip side, spending six weeks fixing an issue on a single server running a non-critical, non-time-sensitive service which occurs once or twice a year and is 100% worked around by a reboot probably isn't an efficient use of your time.

      • On the flip side, spending six weeks fixing an issue on a single server running a non-critical, non-time-sensitive service which occurs once or twice a year and is 100% worked around by a reboot probably isn't an efficient use of your time.

        In the long-term, it is. If you let issues like that continue to exist, then you'll get stuck with an unnecessary proliferation of servers, with each running just one service, so rebooting one doesn't take the others down.

        Not to mention that you'll find that you get stuc

    • by pla ( 258480 ) on Friday April 11, 2014 @03:54PM (#46728695) Journal
      For some reason, Windows admins have been trained to reboot immediately when things don't work well rather than to figure out why something is failing.

      Because in the Windows world, I usually don't have the luxury of digging into the kernel's or driver's source code to figure out exactly why it has stopped behaving correctly. If it doesn't log any errors, doesn't export any useful diagnostic messages, doesn't outright crash on reproducible conditions, and just stops working "right", your avenues of further inquiry get very very ugly, very fast.

      I can reboot a VM in well under a minute. For any nontrivial problem that happens roughly twice a month and a reboot makes it go away, it would take twenty years of rebooting to justify spending an entire eight hour day diagnosing the root cause.

      And I say that as someone who (in the Linux world) has written his own kernel patches to work around buggy hardware. In Windows, just not worth the time; because even if you do successfully diagnose the problem, you may well have no ability to correct it.
      • This person knows what's up.
      • For what it's worth, even if you do have access to dive into the code/kernel memory to find what the problem is, you must first know how to read what you're looking at. A lot of good this stuff does for you if you have no idea how it works in the first place. That's not a uniquely Windows problem, though; because very little in the Linux Admin world over the years strictly enforces that you should know this stuff. The technical information on it out there is about as good as the Technet articles on Windows
      • The good news is the modern desire to 'web all the things' with stuff like ROR, PHP, Tomcat, etc; you can generally find in the code where something is an issue without having to necessarily trace system calls. You don't have quite that luxury on compiled applications. Though occasionally you could run into issues with the interpreted languages that just don't compile properly and cause problems--then you're back to the same problem...
    • Tell that to all the linux based copiers around here. Even the dollar bill changer in one of our coke machines stops working until it is rebooted. Granted those items aren't "fixed", but replacing everything that a reboot resolves would be rather expensive.
  • by Anonymous Coward on Friday April 11, 2014 @01:16PM (#46726991)

    Only three things are necessary for a highly effective unix admin:

    To crush your userbase
    To see their accounts deleted before you
    To hear the lamentations of the salesmen

  • ... works like a charm for me.

    rgb

  • Don't waste time reading slashdot.
  • 90% of the job: "Have you tried turning it on and off again?" https://www.youtube.com/watch?... [youtube.com]
  • Using anything like puppet [puppetlabs.com] or chef [getchef.com] under version control to do all server ops will not only leave you with a full timestamped documentation, but will allow you to easily horizontally scale servers, rebuild them should disaster strike and protect you from stupid upstream package updates that b0rk your config files.

    Have a staging and production environment? pushing your chef/puppet scripts to production after they're proven to work insures you have the same changes applied on both sides, and avoid manual oper

    • by mlts ( 1038732 )

      Don't forget Splunk, so the servers that you are managing have a place to dump logs, and where you can do syslog searches from one place. Splunk isn't a magic bullet, but it does a lot of useful functions and can scale up, and it is a very useful troubleshooting tool.

  • I think the most useful talent I've developed is the ability to go to sleep fast and to wake up fast and alert. When the phone rings or pager goes off, the faster you can reach "full on", find and fix the problem, and get back to sleep, the more sleep you get in the long run. Cohorts who have trouble getting to sleep after a late night emergency tend to be seriously dragging by the end of their oncall time.

    • by fl!ptop ( 902193 )

      the most useful talent I've developed is the ability to go to sleep fast and to wake up fast and alert

      Your not fooling anyone, we can all hear snoring coming from your cubicle.

      • the most useful talent I've developed is the ability to go to sleep fast and to wake up fast and alert

        Your not fooling anyone, we can all hear snoring coming from your cubicle.

        I didn't say *where* I was going to sleep...

  • by petes_PoV ( 912422 ) on Friday April 11, 2014 @02:30PM (#46727855)

    Rule #8 would be not to fix problems too quickly (and let some that you can see coming, happen).

    If you fix every problem before it gets serious and avert the other 90%, your bosses will think they have a highly reliable IT infrastructure. They will then cast their eyes about for cost savings - and the biggest target will be the most highly paid admins - the most senior ones - YOU!!!

    So keep the problems coming, as all that management have to assess you on are the number of fixes and the time to fix. Nobody ever got promoted for solving problems that never happened.

    Finally: 60 hours a week? Don't be daft. If you're really an effective administrator you should have your work finished well inside 30 hours and/or 4 working days.

    • This is so true! I worked at a company where I set up nagios with event handlers that would fix a lot of issues when they happened and when it could not fix it, the system would txt me to come fix it. Problems and downtime when to almost 0. It is amazing what happens when you have a system that can catch java leaks and restart the tomcat server.

      When layoffs came around my boss called me in and told me that I was being laid off because there had not been a major issue in 6 months and they could not justify h

    • 60 hours a week? Don't be daft. If you're really an effective administrator you should have your work finished well inside 30 hours

      I half agree. When the system is up and running you can go home at 3PM. But if the system is down, you don't go home until it comes back up. That's the job; on call 24/7. Love it or leave it.

    • by Mozai ( 3547 )

      "60 hours a week? Don't be daft."

      All my technical problems are fixed in less than 30 hours a week. The other 30+ hours a week are fixing problems caused by users; either because QA is toothless, or people not following instructions, or employees who need to be bailed out.

    • I concur. There are 2 mistakes I have made in the past. One is to fix something that people thought was impossible to fix. That sets you up as a godlike figure. People start to expect the impossible as a matter of course. The second is about not pacing myself. You have to establish an understanding with your employers/userbase that a request takes X amount of time, be it 2 days or a week. Once you have established that, you are giving yourself time to fix the ones that really take a week. The rest of the
    • by MrKaos ( 858439 )

      If you fix every problem before it gets serious and avert the other 90%, your bosses will think they have a highly reliable IT infrastructure. They will then cast their eyes about for cost savings - and the biggest target will be the most highly paid admins - the most senior ones - YOU!!! .

      A big part of that effectiveness is being able to identify trends, classes and root causes of issues. The amount of symptomatic issues is a measurement for the impact the issue causes and the metric by which to demonst

  • Find the people on your team who can be trusted to do the job well. Encourage them to do it. Work with them to build their skills as well as yours.

    Find the people on your team who can not be trusted to do the job well, and replace them with shell scripts.

  • For habit #2 Nagios comes in really handy (could watch MRTG et al as well).

    Setup all hosts in Nagios, sending alerts to an email for a couple weeks. Figure out what hosts have certain patterns.
  • For example, I have some processes that involve visual basic scripts that run on a windows virtual server and send data files to a Unix server that reformats the files using Perl, preparing them to be ingested into an Oracle database.

    I guess that answers the question of how many times one can curse in one sentence.

    • Why not have Perl running on the windows box and just send the data out that way?
      Active Perl anyone?
  • Hello IT,
    Have you tried turning it off and back on again?
    No problem mate.
  • It is now that the face time in front of a PC dos, unix, os390, linux Windows years and years worth 2.0 to 8.1 Worthless. What use to bring 100.00 per hour when 100.00 per hour meant something. Now lucky if you get 10.00. Let them google it themselves. Heck we learned with no google.
    • Automate, everything
    • Do not hide behind forms, user-filled requirements waste even more time than collecting the requirements yourself
    • Get involved in project as far upstream as you can
    • Communicate
  • Be honest and candid with your teammates. If you tripped over a power cable, let the other admins know so they don't waste time analyzing the unscheduled reboot. (And, of course, secure that cable.)

    But never, ever let it outside your team. While your fellow techies will generally appreciate your ability to admit fault, it'll only come back to bite you later if you admit fault to anyone outside your group.
  • You should try to become replaceable. Make most your task become automatic or trivial, that systems try to heal themselves when known problems arise. That anyone else can understand how exactly the systems work based on your documentation, or see that a problem is about to happen based on your monitoring.

    That will make your work easier, be able to take appropiate vacations, and be irreplaceable when (not if) things change.

BLISS is ignorance.

Working...