Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Programming IT Technology

Software Logging Schemes? 225

Posted by kdawson
from the ripsaw-or-chainsaw dept.
MySkippy writes "I've been a software engineer for just over 10 years, and I've seen a lot of different styles of logging in the applications I've worked on. Some were extremely verbose — about 1 logging line for every 2 lines of code. Others were very lacking, with maybe 1 line in 200 devoted to logging. I personally find that writing debug and informational messages about every 2 to 5 lines works well for debugging an issue, but can become cumbersome when reading through a log for analysis. I like to write warning messages when thresholds or limits are being approached — these tend to be infrequent. I log errors whenever I catch one (but I've never put a 'fatal' message in my code, because if it's truly a fatal error I probably didn't catch it). Recently I came across log4j and log4net and have begun using them both. That brings me to my question: how do the coders on Slashdot handle logging in their code?"
This discussion has been archived. No new comments can be posted.

Software Logging Schemes?

Comments Filter:
  • by AaronLawrence (600990) * on Saturday August 16, 2008 @07:09PM (#24630435)

    As usual, "it depends" on what you are trying to achieve. Nobody can give you a blanket recommendation. But I guess in general: the log files need to give you enough information, that can't be got in other ways, to solve any problem that comes up.

    We have a realtime product that goes all over the world and talks to hardware that we can't always get access to ourselves. Therefore, we sometimes must debug our code remotely. Obviously, logging is critical to this. We keep sometimes hundreds of MB of logs and have archiving rules and a tool for users to collect them. Every layer of the system keeps it's own logs, and all logs have timestamps to milliseconds.

    In our case we log all the data back and forth, and then every important decision the code makes. For example if it decides there is something wrong with incoming data, it must log that. Any action it decides to take must be logged. Any data that will be passed on to other layers/the outside world must be shown. Generally, whenever we forget to log some of this data we will later regret it ("why the hell is it ignoring that device state..."). We also log at startup, basically the whole system configuration so that we can reproduce it.

    Callstacks when there is an exception can be very useful. However, a lot of "errors" (at least in our case) are not exceptions but rather unexpected data or behaviour. We rarely have a crash and in state-based systems a callstack doesn't tell you much about what's going on. So a callstack is not useful for all situations.

    Other times, you just want logging to give you a clue where in the code it was so you can run up the debugger and step through it (you do know how to step through code in the debugger, right?). In that case, too much logging can just get in the way. It might be sufficient in a GUI or web app to say which screen/page and which button was clicked.

    You'd hope users could report this kind of details, but not always: if the user is working in another language, in another country, with two layers of helpdesk between you and them, and they are busy doing other things when the problem occurs and only call in the issue an hour later, and the helpdesk takes a week to report it to you - you may just get a vague or even misleading report that no-one can remember when you ask questions. In those cases log files may be all you have to go on.

    There is also a tradeoff between log detail and manageability. Besides the difficulty in reading long log files, having a lot of detail means maintaining a lot of extra code. It also means that log files can become unmanagably large. In our case those hundreds of MB of logs can be a huge problem for customers to send to us because they have low quality internet connections (small companies in Mexico for example).

    • by ericfitz (59316) on Saturday August 16, 2008 @07:45PM (#24630709)

      +1 for parent.

      If you want good logging, then define requirements for it, just as you would for any other feature of the program. You also need to define the audience for the log. The comments thread has focused on debug logging for developers (Linus "no debuggers" Torvalds would be proud) but there are a number of reasons why the users who are stuck^h^h^h blessed with your software might want logging. For instance:

      - audit trails (often required by organizational security requirements or regulatory requirements)
      - accounting/billing (you'd be amazed at the odd ways people come up with to bill for things)
      - health monitoring (the admin might not want to watch your program 24x7 to see if it is running; they might want to program automation to be alerted when it is not working properly)
      - troubleshooting (believe it or not, your software might actually break when running in the wild)

      Anyway, think about your use cases, and then think about what to instrument for each use case, and what to put in the events.

      For instance, if you want to make your daemon monitorable for health, then think about all its dependencies. Does it read config from a file? The file is a dependency. What happens if a value is invalid? Does it fail or use a default? If it fails, reading the value is a dependency. Need a network socket? Dependency. Connection to remote machine? Dependency (actually multiple- name resolution, network connectivity, authentication, app-level connectivity, etc.). After you've enumerated all your dependencies, then add instrumentation in your code to log events when the dependency is unsatisfied (==unhealthy/broken), and when it is satisfied (==healthy). Make sure to log BOTH states, so that the monitoring app can decide which state you are in. Make sure to log only once per state transition. In each event, try to put as much information about the situation as you can- why you are in the state ("the value foo from daemon-config was invalid"), status codes, etc.- give your user a fighting chance of being able to use your log to diagnose and resolve the issue.

      If you want to instrument for audit, then I suggest reading the Orange Book or the Common Criteria documents for suggestions on what needs to be audited and what information to put in the events.

      For accounting, examine the RADIUS RFCs.

      Hope this helps.

      • by Machtyn (759119)
        And for the love of Pete (whomever he may be) be consistent. If your program archives your log, make sure to use a consistent naming convention, and one that makes sense, throughout all the log files.

        Make sure you date/time stamps are consistent. If you are going to AM/PM designations, do this for all logs. 24-hour timestamp, all logs! Using an mm-dd-yyyy, mm-dd-yy, or yyyy-mm-dd style of date stamping, use one and don't vary.

        Being a young whippersnapper in the field, I wasn't sure if I should have
      • by jhol13 (1087781)

        Aspect oriented programming would seem to solve quite a few of your requirements.

        Unfortunately I have not (yet?) had time to evaluate AspectJ to know how it really performs.

  • Three levels (Score:3, Interesting)

    by spaceyhackerlady (462530) on Saturday August 16, 2008 @07:10PM (#24630443)

    I find I usually end up with 3 levels of logging:

    Normal operation, often with some notion of "Yes, I'm still running even though I haven't done anything else lately".

    Details. Usually corresponding to processing steps.

    Algorithm tracing. This includes things like logging SQL queries. This is usually only of interest to me.

    ...laura

  • by MichaelSmith (789609) on Saturday August 16, 2008 @07:11PM (#24630453) Homepage Journal
    I work on a large air traffic control system. Logging is a huge issue. Log files are collected centrally by a separate application. One important issue IMO is making the contents of your various log files meaningful to people who are not familiar with them.

    If your system has objects of type A B and C which can be handled by different components of your system then your should make the logging system in those components print information about those objects in exactly the same way.

    While you are at it, make the log format easily parsable by software. You don't want to be looking for a needle in a gigabyte size haystack of trace information without help from a tool which understands what it is looking for.
    • Enjoy all the fun of ADS-B =) As an IT professional and a private pilot, I hope if you're working on a project related to that, it works flawlessly.
      • Re: (Score:3, Funny)

        by MichaelSmith (789609)

        Enjoy all the fun of ADS-B =) As an IT professional and a private pilot, I hope if you're working on a project related to that, it works flawlessly.

        Cripes if it is working flawlessly we had better stop changing stuff ;)

  • It varies (Score:3, Interesting)

    by pjwhite (18503) on Saturday August 16, 2008 @07:13PM (#24630469) Homepage

    I will add lots of logging to debug a specific problem and then rip it out when the problem is fixed. Permanent logging includes run time problems like serial communication errors, file not found, etc. I like to make various logging functions switchable, so user input can be logged for example, but only when needed. Once a program is running well, it should only log data for dire exceptions, unless regular accounting logs are needed.

    • by Zadaz (950521)

      Yeah, it does really vary. I admit to doing this myself, though when I'm coding with others they despise me for crufting up the code. (Though coding with others often leads to conditions where logs are needed more often anyway...)

      During development and testing I log all non looping function calls, object creation/destroy and memory management, though I cull it as development proceeds (or just switch it all off).

      This is overkill but I got in the habit when maintaining someone else's code on a Project of Mad

  • by Animats (122034) on Saturday August 16, 2008 @07:16PM (#24630481) Homepage

    My online applications log to a database, not a text file. Multiple applications on different machines can log to the same database table. There's no need for "log rotation"; old entries can be summarized and purged by date on the live database. With appropriate indexed fields, you can find key log entries in huge log files very rapidly.

    Even program faults are logged to the database. If the program crashes, the top-level exception handler catches the event, does a traceback, opens a fresh database connection, and logs the traceback.

    • by ericfitz (59316)

      Logging to a database is generally a poor strategy. I see this over and over and cringe every time.

      Databases generally increase the overhead of logging significantly, and they don't add significant value. Sure, you can "select * from ... where ...". But do you REALLY need this? Most of the time when you need something from the log you can just grep /error/ or something comparable.

      Databases are great for reporting but are just unnecessary overhead for logging.

      • Re: (Score:3, Interesting)

        by plierhead (570797)
        IMO database logging has good points and bad points: On the good side, its easy to manipulate (query, purge, transform, summarise) the log entries. Also you can access the log entries remotely using the database tools you already know. On the bad side, its undoubtedly slower and more resource-intensive. Also, unless you have multiple DB connections (which itself raises complexity and overhead), then committing a log entry to the database will also commit your unit of work. It seems to work well for "use
      • by AaronLawrence (600990) * on Saturday August 16, 2008 @07:41PM (#24630655)

        Not to mention the added complexity and failure modes. All but the most trivial databases can go wrong in interesting ways, and when that happens where will you put your logging? It's precisely when things go wrong that you need logging the most. So you want the least possible dependencies. Right now, that's appending text to a file - file systems are simpler and tested more thoroughly than even the best databases can be.

        Like the user (or the system, or the virus...) shutdown the database server in the middle of operation. How do you prove that after the fact if the logs were going into the database?

      • by marxmarv (30295)

        What if your web server and programming language are forbidden from creating or writing files? (I am easily persuaded that this is a good idea from a system security point of view if you're a hosting provider.)

        • It's not a benefit to system security; just run the script engine (PHP, whatever) as the user who owns the account.

        • by CastrTroy (595695)
          What kind of hosting provider are you talking about? Every hosting provider from the $4/month plans all the way up let you write to files in your own personal directory. Most that I've seen give you SSH access to your directory so you can do whatever you want. Come to think of it, I've never seen a hosting environment in any situation where you could write server side code, and couldn't write to a file on that server.
  • I set a "logging level" in my configuration file so that when I need a program log to be verbose I can make it so but for normal use I can keep the logs fairly light.

    A log is only useful if it records information that is used. The problem is that sometimes everything is running fine and super verbose logging is a waste but then something in the environment changes and suddenly temporarily logging a lot of debugging information is useful. Setting a "logging level" in a configuration file lets me do light l

  • by Sir_Real (179104) on Saturday August 16, 2008 @07:18PM (#24630491)

    If you're using log4j, don't use multiple hosts to write to the same nfs filesystem file. You'll run into blocking issues and log4j doesn't handle them correctly. The nirvana of clustered app logging is an async JMS queue. You fire off the message and forget it. You don't wait for file handles.

    • Holy shiat. I just thought of a kick ass way to use the messaging queue system Amazon's cloud computing provides.

      I'm off to prototype some code. Thanks!

  • It depends on the app you're writing -- is it a web app, a database app, a userspace program written in C, a Perl/Ruby script, or.. At work we created our own logging library in C to emit log events for different levels, e.g. informational, debugging, warnings, errors, fatal messages. We then have wrappers around that library so that languages like Ruby can access that logging library. But on hindsight I think I would've just used syslog if I had to start over. :)
  • Are we talking about production environment or a development environment? In production, we log our exceptions or when major changes happen, such as something being deleted. The goal there is to help our support team and customers nail down the problems when they arise. I don't know if LoC is the right measure for these things. Our UI (it's a web app) has little to no logging since errors just appear when the UI has a problem. So we can have hundreds of lines of code with no logging there. Our backend

  • I Don't! (Score:5, Funny)

    by Vectronic (1221470) on Saturday August 16, 2008 @07:27PM (#24630563)

    I don't, save the rain forest, hug a tree, prevent deforestation, stop logging now!

  • by ILongForDarkness (1134931) on Saturday August 16, 2008 @07:29PM (#24630573)
    Segmentation fault: core dumped
  • I wrote up a two-part piece on logging best and worst practices a while back. See Part I [distancesoftware.com] and Part II [distancesoftware.com] if you are interested.

  • TOO MUCH! (Score:3, Interesting)

    by imp (7585) on Saturday August 16, 2008 @07:42PM (#24630675) Homepage

    1 line of logging per 200 lines of code is way too much. 2 in 5 lines is absolutely insane. I've seen way too many systems where the logging made it totally unusable. People just don't realize the costs of logging everything.

    There's absolutely no way to document anything this verbose. Without documentation, the logging is useless.

    • People just don't realize the costs of logging everything.

      Are you operating on embedded systems? Because anything faster than about 500MHz can easily do a few million "if (logging_level >= LOGGING_LEVEL_CONSTANT) { ... log_something ... }" type checks per second.

  • with guru meditation errors of course!
  • Filters (Score:3, Interesting)

    by zarthrag (650912) on Saturday August 16, 2008 @08:08PM (#24630863)
    My logging is setup so I can quickly filter down to the type of data I want. It's more than just "information", "warning", and "error" - but by (cpp)file, module, etc. That way, if an issue arises, I can eliminate the cruft and see just what I need. Just takes planning.
  • by Pr0Hak (2504)

    Syslog, of course.

  • by Restil (31903) on Saturday August 16, 2008 @08:24PM (#24630951) Homepage

    I tend to do my debugging by inserting a lot of printf statements to indicate where in the program I currently am and the value of any critical variables at that time. As the output information is no longer needed (i.e. I fixed the bug it was attached to), I tend to cull out whatever isn't useful anymore. However, I tend to keep starting messages in function calls related to a routine I'm working on or making more than a trivial change to... since chances are, knowing me, I'm going to end up putting them back in there anyway once I create a new bug... and lets face it, it WILL happen.

    Once I'm done, I go back, remove or comment out (usually just comment out) all the messages that have no redeeming value for a properly functioning program, and turn the rest into debug statements which print based on the debug level provided at execution time... or sometimes I use a mask to select which types of messages to display.

    -Restil

    • by Shados (741919)

      I tend to do my debugging by inserting a lot of printf statements to indicate where in the program I currently am and the value of any critical variables at that time.

      Wouldn't using a...you know... -debugger- be more efficient at doing that? Breakpoints, watching variables, etc?

      • Re: (Score:3, Insightful)

        by CastrTroy (595695)
        Disappointingly enough, this is one of the things that isn't covered very well in a lot of courses. I didn't get any exposure to debuggers in any classes I took throughout university. I learned about it myself. Same goes for a lot of other useful tools like source control systems. While I learned a lot while taking my degree, very little of what I learned dealt directly with the process of how you actually sit down and write code. Seriously, some people think that printf really is the best/only way to
      • by Zadaz (950521)

        Wouldn't using a...you know... -debugger- be more efficient at doing that? Breakpoints, watching variables, etc?

        All you get on some embedded systems is serial output. No watch variables or breakpoints, just whatever text you can manage to dump out the serial port.

        In addition logging function calls can quickly narrow down -where- to put your breakpoint and what variables to watch.

  • Log4j supports selective logging. That means you can have info/debug/trace priority messages in place, but never see them in the log unless you explicitly enable extra logging for that class or package. You can do this at runtime, e.g., via something like 'chainsaw' (which attaches to a running process) or hooks in your UI.

    Our policy is that logs are usually very quiet. Application startup/shutdown and not much more. But if there's a problem the debugging messages are already in place to let us peak int

  • Lotsa logging (Score:4, Informative)

    by SpinyNorman (33776) on Saturday August 16, 2008 @08:30PM (#24630983)

    I write code for Telecom test systems that need to run 24x7 processing highly varying requests from dozens of different client systems. Our system consists of dozens of different processes/components per host, with multiple hosts all invoking components on each other as needed (all via CORBA). There are very many paths that any request can take through our system.

    In this environment we log VERY heavily since each request is close to unique and we need to be able to determine the path it took through the system, and why it did, and what happened, in the event of any bug report. Some of the haviest used modules can produce close to 1GB of log per day per host - upto a couple hundred lines of logging information per request per process that it passes thru. We use a custom printf-like log library written in C++ (that auto rotates the log files based on various criteria), a custom tail utility for dealing with the large log files (tail a log file from a given timestamp - done instantly via binary search on the timestamps) and a daily cron job to compress the older log files and move any older than 5 days off the production servers to someplace with more storage.

  • First, a well functioning app should really not be generating any log file except what the application itself needs (e.g. usage logs for a web server).

    Second, instead of wading through reams of log output, use exceptions and assert statements so that you only generate log output when something has gone wrong.

    Finally, use your IDE. Instead of trying to infer state by combing through logs, set break points where you catch an exception. This lets you traceback to see why you ended up with the exception

  • by sprior (249994) on Saturday August 16, 2008 @08:32PM (#24630991) Homepage
    That should be about right...
  • Sometimes with printf style debugging, so much data comes out that it is not practical to use the logging in normal code. That said, I dump all data I want to stdout, and use AWK to massage the results. If you put in some kind of tags or markers, this can help simplify the log processing code significantly.

    You don't need databases or anything structured, simply one stream per process. The process will run sequentially, so a sequential storage structure (ie a file) is the best matched to the data. Also, I av

  • I work on set top boxes, and not every platform we port to has a good debugger (hell, its been years since I've seen a good debugger). Our logging system is all in house; multiple "levels" for each log statment, (noise,information,warnings,fatals,etc), with each module creating its own log id and setting its "preference level". It works well, but:

    1) Useless logs.
    Engineers not taking the time to write logs that are useful. "Got to here", "Value=1", etc. A few of us write enum-to-string functions and pass t

    • Re: (Score:2, Informative)

      by Perf (14203)

      Engineers not taking the time to write logs that are useful. "Got to here", "Value=1", etc. A few of us write enum-to-string functions and pass them to the logging system for cleaner output.

      An Engineer does something that stupid?!!!
      Who told them they were engineers? The HR Dork?

  • Having spent many years as a Systems Administrator, I would argue that the most important part of logging is to make sure that it is in a format and location that other people can use. People won't use the logs if they don't know where they are. And, if you're developing for Windows, I would go further and say that the only place you should be considering logging data to is the EventVwr application. I'm not sure how the java stuff goes, but if you're developing in .NET, ditch the log4net application, and
    • by Forbman (794277)

      Well, Log4Net can dump to the system events stuff, too...

      • Huh. Who knew?

        Okay, I'll amend my statement. If you're developing in .NET, and going to use log4net, just write to the EventVwr, where all the Windows System Admins of the world will be monitoring anyhow.
  • First - let me give you my own perspective. I recommend having each subsystem log in such a fashion that you can easily grep to include or ignore that subsystem. For example for one package the following LVLx messages were the first four characters as follows:

    LVL1 - basic startup and shutdown info ( a few lines per run)
    LVL2 - Interactions with the database
    LVL3 - Interactions with the file system
    LVL4 - Detailed database interactions including each sql statement
    LVL5 - amazingly verbose debug inform
  • I've seen at least one example where it looked like the programmer added logging when trying to debug a problem. Then, apparently on the assumption that it was "useful once, so it might be useful again", the logging was left in. There was a compile-time AND a runtime switch to toggle this logging. It tended to be ON all the time, cluttering the logs with needless information, and making the code look ugly. In theory, compile-time switches can eliminate the performance issue; but some of the logs depende

  • The CPAN module Log::Log4perl [cpan.org] is a great tool for logging -- it means you can stick in plenty of debug statements, and dial them up for debugging, then dial them down in Production.

    This module uses several message levels; in descending order of importance they are FATAL, ERROR, WARN, INFO, DEBUG and TRACE. It's possible to log messages to files, a screen, or even to an E-Mail message.

    The real strength of this method is that you don't need a 'debug' version of the code -- all configuration is done externally

  • (1) Log lots of information when you are debugging. Then, when you are done debugging that particular problem, TAKE THAT LOGGING CODE OUT.
    If you find it tedious to find your debug log lines while debugging, it is a sign that you are logging too much everyday stuff! In the meantime, surround your debug log lines with multiple lines of "===================" or the like. Then, of course, take those out too.

    (2) On an everyday basis, log only UNEXPECTED conditions. After all: if it was expected, then your c
  • operational errors (Score:3, Informative)

    by Spazmania (174582) on Saturday August 16, 2008 @10:39PM (#24631675) Homepage

    1. Distinguish between serious -operational- issues and other issues. The sysadmin doesn't need to know that you had a pointer problem; there's nothing he can do about it. He does, however, want to know that a message was received and the appropriate action taken. Or that a connection was attempted but failed.

    2. Be grep freindly. The first log entry related to a particular activity should have an ID of some sort in the log message which is then included in every additional log entry associated with that activity.

  • Good general advice in logging is to support different verbosity levels. Provide the ability to change the log verbosity as a command line switch or as a signal the process can receive.

    Then if your log is too cluttered, reduce verbosity and try again. If it isn't telling you what you need to know, increase verbosity.
  • One of the coolest things I ever saw was a tape recorder that starts recording 30 seconds before you press "record". Simple concept -- just continuously record to a 30-second buffer.

    So if it's truly disk/network that's at a premium, why not start logging just before you need it? Log at level "EVERYTHING" up into a circular buffer, and only dump to disk or network when there's an error?

  • To me, it's important to cover both the internal and external cases in any scheme.

    For the internal, it's important that your organization is capable of finding owners for failures/bugs quickly. So when a developer hits an error they don't understand or your BVTs break, an owner can be identified. Nine times out of ten, they can get you a solution in an hour or two once they've seen the error. In addition, an automation framework should work closely with your logging to bubble up the relevant errors. That ca

  • by gillbates (106458) on Saturday August 16, 2008 @11:14PM (#24631861) Homepage Journal

    It might come back to haunt you later.

    • If it could be useful for invading someone's privacy, law enforcement will expect to have access to it. Consider, for example, that the RIAA lawsuits were enabled by the fact that ISPs logged (and kept the logs) of the IP addresses and to whom they were assigned.
    • Consider how an attacker could use the logs as a means of figuring out how your software functions, and how to defeat it. "Buffer overflow on line __LINE__" - while well-intentioned - would be a bad message to log. If you must log defects, sanitize the output so that it doesn't reveal more information than is necessary for the reader to know.
    • In general, don't log normal operation beyond the startup and initialization messages. Users have this uncanny habit of looking to the logs for ways to blame the software for their own ineptitude. A verbose tirade of log messages will often provide them with ample ammunition for blame deflection and result in unnecessary service calls and potential misunderstanding. Especially if the log messages are cryptic and could be misconstrued to indicate that there is a bug with your software.
    • With respect to the above point, make sure that any errors which could be caused by misconfiguration or user error clearly indicate that the problem lies with the user or configuration.
    • When you do have to log a bug, include only the technical information necessary for the person doing the debugging. The point is to make it sufficiently technical that you have a starting point for debugging without giving the end user something with which they can hang you. Reporting a seg fault won't do you any good if you don't have the means to determine where and why it happened.

The tree of research must from time to time be refreshed with the blood of bean counters. -- Alan Kay

Working...