Seven Habits of Highly Effective Unix Admins 136
jfruh writes: "Being a Unix or Linux admin tends to be an odd kind of job: you often spend much of your workday on your own, with lots of time when you don't have a specific pressing task, punctuated by moments of panic where you need to do something very important right away. Sandra Henry-Stocker, a veteran sysadmin, offers suggestions on how to structure your professional life if you're in this job. Her advice includes setting priorities, knowing your tools, and providing explanations to the co-workers whom you help."
What habits have you found effective for system administration?
Number 6 Problem (Score:5, Insightful)
Re: (Score:2, Informative)
What... (Score:2)
I like to shut all the ports in the firewall. The sense of calm that descends on the servers is downright pleasant. Of course, then the phone begins to ring...
Re: (Score:1)
You have the ringer on?
Silly boy! (Score:1)
With Linux/Unix just say, "Well it's an antiquated operating system." - and if Linux add "and with this F/OSS operating system, well, you get what you paid for and with the addition of [insert package name like systemd] 'blah blah blah blah' caused our problem. I need a raise to work with this shit!"
If we were on Windows, this wouldn't happen!
It works the other way around if you are a Windows admin, too. Just replace F/OSS with "closed undocumented source and [insert money hungry profit driven dribble scre
I forgot - MacMini shops (Score:1)
If you are in a shop that chains together MacMini servers, whe there's problem, start crying, ask for a hug and just say "I don't know what happened?! *sobbing* Mac is Go...Job's gift to mankind kind! *sob* I'm such a failure! *snot runs from nose*"
You'll get a couple of days off, they'll power down everything, turn it back on, and it'll all be working fine when you get back.
Tmux (Score:5, Informative)
Re: (Score:3)
Try it out!
Make me!
Re:Tmux (Score:5, Funny)
Re: (Score:1)
ok....
Re: (Score:2)
sudo: try: command not found
Re: (Score:2)
Re: (Score:1)
oodaloop is not in the sudoers file. This incident will be reported.
Re: (Score:2)
make: *** No rule to make target `me!'. Stop.
Re: (Score:2)
screen also has split windows, multiple sessions, customizable status bar. For my use cases, I could not find a compelling reason to use tmux
Re:Tmux (Score:4, Informative)
Obviously if you've been limiting yourself to the features of "screen" for many years, you're not going to think you need the added features of "tmux"...
A big one is sharing:
"window can be linked to an arbitrary number of sessions". If you or somebody else has a screen session open, you don't have to detach it from their terminal to see what's on it. You can just attach it to your terminal as well. Works great when you've got a session attached to your desktop, then want to access it on your laptop/tablet/phone/etc. The tmux session will even change geometry to match the smallest terminal window.
Being more lightweight and responsive is good. Saner keys for some functions, like ctrl-a pg-up to access scrollback. And just the fact that it's still getting active development is an important feature.
Re:Tmux (Score:4)
screen -x shares the screen just fine for me.
Re: (Score:2)
I oversimplified the explanation a bit...
Here it is in nicm@'s words:
"In particular, being able to share a single window between multiple terminals, with other windows in the same session but entirely separate. Adding this to screen was implausible"
http://undeadly.org/cgi?action... [undeadly.org]
Re:Tmux (Score:4, Informative)
Here it is in nicm@'s words:
"In particular, being able to share a single window between multiple terminals, with other windows in the same session but entirely separate. Adding this to screen was implausible"
Perhaps I am still misunderstanding the features of tmux (most likely in fact), but to say that is implausible to add to screen is misleading to say the least, since I have been doing exactly that in screen for nearly a decade.
On one terminal, either start a new screen session or -r to a detached session.
If starting a new one, try: screen -S LetsShare
On a second terminal, run: screen -list
You should see a list of screen sessions and their status (attached, detached, multi, etc)
If you used -S on start that will be the name, otherwise it's some tty.host.number string.
Now on that second terminal run: screen -x
Try to adjust both terminal sessions so you can see them at the same time. Type in either, watch in either. They are shared seemingly matching your tmux description.
You can change permissions per terminal so others can't type but will see everything you do (aka tutorial mode) using ^a *
Also for split/multiple windows showing on the same terminal, use ^a S (control-a capital-S)
To switch between split windows use ^a tab
Close a section of split window with ^a Q
The status bar problem is true and pretty annoying. I fixed it myself with a line in ~/.screenrc but of course I have to pretty much install that user config file on every new system I use which can get annoying. .screenrc (and hope slashdot doesn't munge it!)
If you want an always-on status bar showing window numbers and titles (^a A to change the title), add this to
hardstatus alwayslastline "%{= wk}%-Lw%{= BW}%n%f* %t%{-}%+Lw %-=%{= BW}%H%{-}%{-}"
Note the two "BW" bits? That's background blue and foreground white, and applies to the window with focus. Change B to R for red for example (production vs not-production in my case)
Here is my whole .screenrc file for copy/paste purposes: http://pastebin.com/kMkuFXi9 [pastebin.com]
No splash screen, always on status bar, 10k line scrollback history for copy/paste (^a [ and ^a ] ), and auto-open three windows with preset titles and commands running in them.
I don't mean to knock tmux in any way at all, having not used it (and I do plan to check it out now) - but hopefully these screen tips help out others here.
Re: (Score:2)
Re: (Score:2)
at least I tried and used tmux for two weeks before reaching my conclusion, as it is the preferred ware over screen in my favorite BSD server distro
"Saner keys" really doesn't mean much to me, I learn the dozen or so list of keystrokes it takes to do the job, whatever they are. Most of the commonly used "screen" ones do have sensible mnemonics for those that find such things helpful.
As others have pointed out, screen can do sharing.
Re: (Score:2)
Does tmux support connecting to serial consoles yet?
Re: (Score:2)
I still like the plain old screen. I don't need the fancy splits, status bars, etc. If I wanted more than one screens, I run separate commands. I have had screen crash before even though very rare. I would hate to see tmux crash all my sessions.
Re: (Score:2)
Or do you even use monitors on your Linux?
I'm doing just fine with my ASR-33, thank you!
Re: (Score:2)
You should tell Microsoft. I hear they are looking to upgrade Metro.
Re: (Score:2)
I too prefer tmux to screen, but I would like to warn you about 1 danger of using it. Do NOT disconnect from a tmux session that is being used to upgrade tmux. If tmux happens to upgrade to a version with a newer protocol, you will not be able to reconnect to the tmux session. I did this once and had to build a static version of tmux from the previous version and use that to reconnect and continue the upgrade. Screen is theoretically susceptible to the same problem, but the protocol almost never changes.
I s
Re: (Score:2)
Using multiple shells (Score:3, Informative)
i was so wrong (Score:5, Funny)
i thought they were
sloth, gluttony, pride,...
Re:i was so wrong (Score:4, Funny)
That's "How to be a BOFH"
Seven Deadly Sins and Eneagram Re:i was so wrong (Score:1)
The Seven Deadly Sins
Most sysadmins are 6-wing 5
Type 5 on enneagram, the sin is greed.
Type 4 it is envy
Type 2 Pride
Types 7 Gluttony
Type 8 Lust
Type 9 Sloth
Type 1 Anger
Notice that the core types 3 and 6 do not map directly, Modern mapping add traits of Type 6 Cowardice and Type 3 Deceit and these can be seen as variants of the Sloth at point 9 since they are all sins of omission, not being available, not cmmitting to action and not supporting truth.
Habits ... (Score:3, Informative)
What habits have I found effective for system administration? BOFH spring to mind ...
Re: (Score:1)
Knowing your tools (Score:5, Funny)
I know them all. They all work in Marketing.
Re:Knowing your tools (Score:5, Funny)
Apparently you have not interacted with management much, or you would not have restricted your answer to marketing...
Re: (Score:2)
Or engineering or development.
[John]
Re: (Score:2)
Or engineering or development.
[John]
Oh hey, watch it buster. This is /.
Re: (Score:3)
No, a couple are in HR as well, and there is at least one in the Finance department. Some days I'm not so sure about IT.
Have you ever been told you need to submit accurate time sheets for the week on Wednesdays? How the hell do you expect me to give you accurate timesheets for the entire week on a Wednesday when I usually work Wednesday and Friday evenings for an unknown period of time??? And if I had to submit it on Wednesday, don't grumble that I had to submi
#7 Be Appriopriately Lazy (Score:5, Insightful)
The first time a task comes up deal with it manually, it may or may not be related to a problem.
The second time this task occurs deal with it manually.
The third time this task occurs, it's time to start scripting.
It may take you a day or more to write the script, test debug, etc. or even longer for complex tasks but, this behavior tends to be a winner. The script is already some degree of documentation, it records the steps, etc. If it's robust enough it can be used to by your support techs to resolve issues, expanding the number of people who can resolve an issue, freeing the admin for other tasks. Scripts tend not to make typos (yes, I know your command line skills are legendary) and can save a lot of time and effort in the long run.
Re: (Score:2)
Re:#8 Be Appriopriately Lazy (Score:4, Funny)
Re: (Score:2)
Also Cron is your best friend.
Re: (Score:2)
Just watch out for this:
https://xkcd.com/1319/ [xkcd.com]
Re: (Score:2)
Just watch out for this:
https://xkcd.com/1319/ [xkcd.com]
Either way, you'll have a lot more fun maintaining the script than you would doing the same boring task over and over :)
Re: (Score:1)
It may take you a day or more to write the script, test debug, etc. or even longer for complex tasks but, this behavior tends to be a winner.
Step #7.1: Prepare excuse for mgmt why it is taking you 10 times longer to complete this task than it did the first two times.
Re: (Score:2)
Step #7.1: Prepare excuse for mgmt [...]
#1 - It's not an excuse, it's a reason get in the proper mindset.
#2 - You already know the reason and bonus, bean counters love this. You're gonna save the company long term dollars with a short term expenditure.
I hate to break it to you... (Score:2)
If you are not doing active improvements, planning for failover, and using good configuration management techniques then your slow time is adding to the number of hurry-up-and-fix-all-the-things times. There are always external matters like heartbleed that will come along, as a sysadmin's job is not to review the memory allocator in the SSL library regularly. However, if your web services or mail services are down because a single system went offline then you're to be blaming yourself.
#0 (Score:2)
Did you try turning it off then on again?
The columnist must be FORTRAN programmer. (Score:3)
Re:The columnist must be FORTRAN programmer. (Score:4, Funny)
>And I'm not even a fucking programmer by trade.
Yup. I can tell.
uh... (Score:3)
"What habits have you found effective for system administration?"
Carrying an Uzi.
Rebooting is not a fix (Score:5, Insightful)
As someone who's managed a team of sysadmins that moved to the Linux world from Windows, I have this tip: "Reboot does not fix anything, it just hides things".
For some reason, Windows admins have been trained to reboot immediately when things don't work well rather than to figure out why something is failing. I'm sure this was a valid "fix" in older versions of Windows, but Windows has been stable for quite some time, and things shouldn't mysteriously stop working for no reason. Take a bit of time to figure out *why* the CPU is suddenly spiking on the database server, since if you reboot it, you will have lost most of the evidence for why it's happening, and it's likely to happen again. If it's a production server and you can't spend much time, run a few diagnostics (ps, "top", lsof, etc) and save to a file for the postmortem, but don't just go in and reboot before looking around.
Re:Rebooting is not a fix (Score:4, Insightful)
That's not specific to rebooting... It's more a question of doing root-cause analysis, versus quick bandaids. I'm firmly in the RCA camp, but sometimes it's the companies that are to blame, rather than the individual admins. Some companies are heavily slanted towards always getting the quickest possible workaround, rather than ever actually finding and fixing the problem. It's one of those false-economies, like counting lines of code and similar.
Re: (Score:2)
Slashdot spin-offs for beta-haters:
http://pipedot.org/ [pipedot.org]
http://soylentnews.org/ [soylentnews.org]
Try: http://slashdot.org/?nobeta=1 [slashdot.org]
Re: (Score:2)
There are many more reasons than just the random redirects to the beta site, to prefer Soylent News to /. You should check it out.
Re: (Score:2)
On the flip side, spending six weeks fixing an issue on a single server running a non-critical, non-time-sensitive service which occurs once or twice a year and is 100% worked around by a reboot probably isn't an efficient use of your time.
Re: (Score:2)
In the long-term, it is. If you let issues like that continue to exist, then you'll get stuck with an unnecessary proliferation of servers, with each running just one service, so rebooting one doesn't take the others down.
Not to mention that you'll find that you get stuc
Re:Rebooting is not a fix (Score:5, Informative)
Because in the Windows world, I usually don't have the luxury of digging into the kernel's or driver's source code to figure out exactly why it has stopped behaving correctly. If it doesn't log any errors, doesn't export any useful diagnostic messages, doesn't outright crash on reproducible conditions, and just stops working "right", your avenues of further inquiry get very very ugly, very fast.
I can reboot a VM in well under a minute. For any nontrivial problem that happens roughly twice a month and a reboot makes it go away, it would take twenty years of rebooting to justify spending an entire eight hour day diagnosing the root cause.
And I say that as someone who (in the Linux world) has written his own kernel patches to work around buggy hardware. In Windows, just not worth the time; because even if you do successfully diagnose the problem, you may well have no ability to correct it.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Reboot does not fix anything, it just hides thi (Score:2)
Re: (Score:2)
Bullshit. Windows admins are not trained to reboot when there is a problem
It's amusing that in the post right before yours (and not an AC like you), a Windows Admin explained why he does reboot first:
Because in the Windows world, I usually don't have the luxury of digging into the kernel's or driver's source code to figure out exactly why it has stopped behaving correctly
Only three habits are necessary (Score:5, Funny)
Only three things are necessary for a highly effective unix admin:
To crush your userbase
To see their accounts deleted before you
To hear the lamentations of the salesmen
Re: (Score:1)
Keep a sucker rod handy... (Score:2)
... works like a charm for me.
rgb
Habit #1 (Score:2)
90% of the job (Score:2)
Re: (Score:2)
Automate everything using chef/puppet (Score:2)
Using anything like puppet [puppetlabs.com] or chef [getchef.com] under version control to do all server ops will not only leave you with a full timestamped documentation, but will allow you to easily horizontally scale servers, rebuild them should disaster strike and protect you from stupid upstream package updates that b0rk your config files.
Have a staging and production environment? pushing your chef/puppet scripts to production after they're proven to work insures you have the same changes applied on both sides, and avoid manual oper
Re: (Score:2)
Don't forget Splunk, so the servers that you are managing have a place to dump logs, and where you can do syslog searches from one place. Splunk isn't a magic bullet, but it does a lot of useful functions and can scale up, and it is a very useful troubleshooting tool.
the most useful talent (Score:2)
I think the most useful talent I've developed is the ability to go to sleep fast and to wake up fast and alert. When the phone rings or pager goes off, the faster you can reach "full on", find and fix the problem, and get back to sleep, the more sleep you get in the long run. Cohorts who have trouble getting to sleep after a late night emergency tend to be seriously dragging by the end of their oncall time.
Re: (Score:2)
Your not fooling anyone, we can all hear snoring coming from your cubicle.
Re: (Score:2)
Your not fooling anyone, we can all hear snoring coming from your cubicle.
I didn't say *where* I was going to sleep...
To be an effective admin AND stay in a job (Score:5, Interesting)
Rule #8 would be not to fix problems too quickly (and let some that you can see coming, happen).
If you fix every problem before it gets serious and avert the other 90%, your bosses will think they have a highly reliable IT infrastructure. They will then cast their eyes about for cost savings - and the biggest target will be the most highly paid admins - the most senior ones - YOU!!!
So keep the problems coming, as all that management have to assess you on are the number of fixes and the time to fix. Nobody ever got promoted for solving problems that never happened.
Finally: 60 hours a week? Don't be daft. If you're really an effective administrator you should have your work finished well inside 30 hours and/or 4 working days.
Re: (Score:3)
This is so true! I worked at a company where I set up nagios with event handlers that would fix a lot of issues when they happened and when it could not fix it, the system would txt me to come fix it. Problems and downtime when to almost 0. It is amazing what happens when you have a system that can catch java leaks and restart the tomcat server.
When layoffs came around my boss called me in and told me that I was being laid off because there had not been a major issue in 6 months and they could not justify h
Re: (Score:2)
60 hours a week? Don't be daft. If you're really an effective administrator you should have your work finished well inside 30 hours
I half agree. When the system is up and running you can go home at 3PM. But if the system is down, you don't go home until it comes back up. That's the job; on call 24/7. Love it or leave it.
Re: (Score:2)
"60 hours a week? Don't be daft."
All my technical problems are fixed in less than 30 hours a week. The other 30+ hours a week are fixing problems caused by users; either because QA is toothless, or people not following instructions, or employees who need to be bailed out.
Re: (Score:1)
Re: (Score:2)
A big part of that effectiveness is being able to identify trends, classes and root causes of issues. The amount of symptomatic issues is a measurement for the impact the issue causes and the metric by which to demonst
Delegate and Automate. (Score:2)
Find the people on your team who can be trusted to do the job well. Encourage them to do it. Work with them to build their skills as well as yours.
Find the people on your team who can not be trusted to do the job well, and replace them with shell scripts.
To "Know your systems" (Score:1)
Setup all hosts in Nagios, sending alerts to an email for a couple weeks. Figure out what hosts have certain patterns.
From TFA (Score:2)
For example, I have some processes that involve visual basic scripts that run on a windows virtual server and send data files to a Unix server that reformats the files using Perl, preparing them to be ingested into an Oracle database.
I guess that answers the question of how many times one can curse in one sentence.
Re: (Score:2)
Active Perl anyone?
What habits have you found effective for system ad (Score:2)
Have you tried turning it off and back on again?
No problem mate.
Cut your own throat some more. (Score:1)
"Easy" (Score:2)
Carefully regulate your honesty (Score:1)
But never, ever let it outside your team. While your fellow techies will generally appreciate your ability to admit fault, it'll only come back to bite you later if you admit fault to anyone outside your group.
Path to obsolescence (Score:2)
You should try to become replaceable. Make most your task become automatic or trivial, that systems try to heal themselves when known problems arise. That anyone else can understand how exactly the systems work based on your documentation, or see that a problem is about to happen based on your monitoring.
That will make your work easier, be able to take appropiate vacations, and be irreplaceable when (not if) things change.
Re:One habit is ... (Score:5, Funny)
Re: (Score:1)
OK fatass.
Re: (Score:2)
The reason there are more fat people in IT isn't because we want to be. It is because the GOOD IT people get fat because they know that the best IT people never need to leave their seats. If you have to leave your seat to do something as an admin, you are doing something wrong and not using the technology that is available to you to be able to fix everything but physical hardware failure or installation from your seat.
This is why my office chair is a toilet. Actually my entire desk is in a toilet cubicle with the rest of the IT Team 'just in case of emergencies'. Curiously though the sound of urination is no different from the sound of people pissing on things to make their territory but they can't because we are already pissing on everything.
It's sometimes very odd when someone urgently bursts in during one of our meetings, but they usually leaved feeling relieved.
Re: (Score:1)
Re: (Score:2)
Have a mini-fridge under your cubicle desk for constant snacking. The constant snaking would be the habit. Really though, there are too many fat bastards in IT.
Obligatory video: Valve Snack Bar [youtube.com].
Re:Bait (Score:4, Interesting)
From TFS, I really don't get why that applies only to Unix admins. That describes the years I've spent as a Windows admin as well.
Re: (Score:2)
Basic time management, inter-personal skills and some grasp of hygiene are pretty much must-haves.
Knowledge of the tools required to perform your duties and save the planet are a gimme, surely, once you are in that position. I am sure a 'nix admin can't avoid other disciplines the same as a wintel admin can't avoid *nix. Difference perhaps is that a decent multi-disciplines adm
Re:Bait (Score:5, Funny)
You really need to have a beard to get it. Do you have a beard? You don't sound like you have a proper beard.
Beard (Score:1)
You really need to have a beard to get it. Do you have a beard? You don't sound like you have a proper beard.
Ehhh, of course you need a beard. But the article also says, to be successful you should remove spaghetti once in a while:
Habit 7: Make time for yourself
[... ]Taking care of yourself is an important part of doing a good job."
Re: (Score:2)
And as a zOS Systems Programmer too.
Re: (Score:2)
Yes... getting out of "the tedious low end job that sysadmin is" just so that you can sit in painfully dull meetings all day. Great plan that.