Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Don't Overlook Efficient C/C++ Cmd Line Processing 219

An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."
This discussion has been archived. No new comments can be posted.

Don't Overlook Efficient C/C++ Cmd Line Processing

Comments Filter:
  • by tot ( 30740 ) on Sunday July 29, 2007 @11:40AM (#20032263)
    I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.
  • Too much (Score:4, Insightful)

    by bytesex ( 112972 ) on Sunday July 29, 2007 @11:41AM (#20032269) Homepage
    I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.
  • by ScrewMaster ( 602015 ) on Sunday July 29, 2007 @11:42AM (#20032279)
    I'd say the speed of human motor activity is an even greater limiting factor.
  • by Anonymous Coward on Sunday July 29, 2007 @11:44AM (#20032291)
    It's still handy to have a fairly comfortable way of generating code that does things needed every time (or at least very, very often) in an easily applicable and very optimized way. I like it.
  • by V. Mole ( 9567 ) on Sunday July 29, 2007 @11:50AM (#20032331) Homepage
    Does the phrase "reinvent the wheel" strike a chord with anyone?
  • Re:C++ I get (Score:4, Insightful)

    by Anonymous Coward on Sunday July 29, 2007 @12:01PM (#20032393)
    I do. On MIPS, ARM, PPC, x86, and all the other embedded stuff. I don't think C will ever die - it's the universal assembler language.
  • by Anonymous Coward on Sunday July 29, 2007 @12:02PM (#20032399)
    Good grief. What a strawman of an example.
    Anyone writing or maintaining command line programs knows that they
    should be using the API getopt() or getopt_long().
    There are standards on how command line options and arguments are to be
    processed. They should be followed for portability and code maintenance.
  • by canuck57 ( 662392 ) on Sunday July 29, 2007 @12:11PM (#20032479)

    I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

    Your just experiencing this with Java, Perl or some other high overhead bloated program. People often pull out a heavy weight needing a 90MB VM or a 5-10MB basis library calling the cats breakfast of shared libraries I would agree, but lets take a look at C based awk for example, it is only a 80kb draw. Runs fast, nice and general purpose and does a good job of what it was designed to do. It can be pipelined in, out and used directly on the command line as it has proper support for stdin, sndout and stderr. On my system, only 10 disk blocks to load.

    While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core. Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....

  • Re:C++ I get (Score:3, Insightful)

    by iangoldby ( 552781 ) on Sunday July 29, 2007 @12:11PM (#20032483) Homepage
    I use C for any low-level programming project that doesn't warrent an object-oriented approach.

    The trick is to identify the best tool for the job.
  • by Anonymous Coward on Sunday July 29, 2007 @12:12PM (#20032485)
    Indeed. The applications of perfect hashing (and minimal perfect hashing) are quite limited. Basically it only makes sense if you need to quickly identify strings from a fixed, finite set of strings known at compile time. And as with all optimizations, only if that part of your program is a bottle neck or you are prepared to optimize all other aspects of your program as well.

    The traditional example application for perfect hashing was identifying keyword tokens when building a compiler, but for complex modern languages like C++ parsing source code is just a very tiny fraction of the compilation process. And even that scenario makes more sense than parsing command line options.

    I doubt there is a single application that significantly benefits from hashed lookup of command line options. Suggesting that it makes sense to spend your time increasing the complexity of your application for a practically immeasurable improvement in performance is insanity.
  • Correction... (Score:2, Insightful)

    by Pedrito ( 94783 ) on Sunday July 29, 2007 @12:19PM (#20032529)
    Just about any relatively complicated software has dozens of available command-line options.

    That should probably be rephrased to "Just about any relatively complicted software that inflicts command-lines on its users..."

    This is clearly a very unix oriented post as there are relatively few command-line windows apps and few window GUI apps that accept command-lines. But this is also a topic that's about as old as programming itself and clearly something that takes the "new" out of "news".
  • Re:Joke? (Score:5, Insightful)

    by iangoldby ( 552781 ) on Sunday July 29, 2007 @12:20PM (#20032539) Homepage

    Someone found a "new" toy?
    Well I for one won't be using this to process command-line arguments (that's what getopt() and getopt_long() are for), but it is certainly useful to know of a tool that I can use to generate a perfect hash. The next time I need some simple but efficient code to quickly discriminate between a fixed set of strings, I'll know to Google for gperf. (Before I read this article I didn't even know it existed.)
  • by Anonymous Coward on Sunday July 29, 2007 @12:26PM (#20032573)

    C/C++ will outlast us all for a language.
    There's no such language as C/C++.
  • I agree... (Score:3, Insightful)

    by SuperKendall ( 25149 ) on Sunday July 29, 2007 @12:31PM (#20032603)
    There's a time and place for gperf - command line argumnet processing is not it!

    Actually, I've never really come across a case where I knew ahead of time the whole universe of strings I would be accepting, and so never ended up using it - gperf is a great idea, but this seems to be a case of someone really looking hard to figure out where they could shoehorn gperf into just for the sake of using it.
  • by tepples ( 727027 ) <tepplesNO@SPAMgmail.com> on Sunday July 29, 2007 @12:34PM (#20032623) Homepage Journal

    HOLY SHIT! 194KB BIGGER?! HOW WILL YOU EVER FIND THE SPACE FOR SUCH A HUGE EXECUTABLE?!?!
    I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007.
  • by geophile ( 16995 ) <(jao) (at) (geophile.com)> on Sunday July 29, 2007 @01:05PM (#20032819) Homepage
    Perfect hash functions are curiosities. If you have a static set of keys, then with enough work you can generate a perfect (i.e. collision-free) hash function. This has been known for many years. The applicability is highly limited, because you don't usually have a static set of keys, and because the cost of generating the perfect hash is usually not worth it.

    Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.

    I thought maybe we were seeing a bad writeup, but no, it's the authors' themselves who talk about the need for high-performance command-line processing, and give the performance of processing N arguments as O(N)*[N*O(1)]. I cannot conceive of a situation in which command-line processing is a bottleneck. And their use of O() notation is wrong (they are claiming O(N**2) -- which they really don't want to do, not least because it's wrong). O() notation shows how performance grows with input size. Unless they are worrying about thousands or millions of command-line arguments, O() notation in this context is just ludicrous.

    I don't know why I'm going on at such length -- the extreme dumbness of this article just set me off.
  • Historically? (Score:4, Insightful)

    by ClosedSource ( 238333 ) on Sunday July 29, 2007 @01:07PM (#20032839)
    "Command-line processing is historically one of the most ignored areas in software development."

    This is like saying that walking is historically one of the most ignored areas in human transportation.
  • is this a joke? (Score:3, Insightful)

    by oohshiny ( 998054 ) on Sunday July 29, 2007 @01:14PM (#20032879)
    If it's not, the author of that article should be kept as far away from writing software as possible; he epitomizes the attitude that so frequently gets C++ programmers into trouble.
  • by Anonymous Coward on Sunday July 29, 2007 @02:13PM (#20033283)
    I've probably used more time typing this message than every program I've ever run has used parsing command line arguments.
  • It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.
    On MinGW, the port of GCC to Windows OS, my programs dynamically link with msvcrt, not glibc. Also on MinGW, libstdc++ is static, just like in the embedded toolchain. Are you implying that one of the C++ toolchains for Windows uses a dynamic libstdc++? Which toolchain for which operating system that is widely deployed on home desktop computers are you talking about?
  • by ultranova ( 717540 ) on Sunday July 29, 2007 @03:36PM (#20033941)

    While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core.

    Which is why they are so crash-prone. With C/C++, any mistake whatsoever will likely crash the program/machine, and possibly also allow crackers to make the program execute arbitrary code.

    Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....

    If you want fast, efficient and lean code, write it. Simply picking C/C++ doesn't make your code so, nor does not picking them make the code slow. What C/C++ does it make programs hard to port due to the ambigious definitions of some critical parts (such as type lengths), prone to crashing due to manual memory management, and dependant on external systems such as Gtk, Gnome or KDE for their graphical user interface due to C predating widespread adoption of computer graphics.

    As long as our computers keep on depending on C, C++ or any other language with such horrendous features, except to see a new buffer overload or other such exploit every week. I, for one, welcome our new garbage-collected bounds-checked Java overlords which don't crash randomly as C programs do.

  • by ucblockhead ( 63650 ) on Sunday July 29, 2007 @03:49PM (#20034061) Homepage Journal
    When faced with this issue, I simply wrote a Windows version of getopt. Took about a day.

    Even when reinventing the wheel, it is important to reinvent as little as possible. If you need functionality that isn't there, at least keep the same interface.
  • by Anonymous Coward on Sunday July 29, 2007 @03:55PM (#20034115)
    Type lengths in C/C++ can be specified with uint32_t, int8_t, etc. If they are not available for a certain platform, they are just a typedef away.
      'dependant on external systems such as Gtk, Gnome or KDE', you don't know crap about programming, do you? There are libraries, not 'systems', and Gnome and KDE are not libraries (although there are gnome and KDE libraries). Most 'Gnome' programs are actually gtk programs (KDE programs are usually /true/ KDE programs, not just Qt). I'm not sure what you are suggesting, but I'm sure it's stupid. Even MS Windows no longer includes all the graphical system in the kernel.
      If you think Java solves all the programming problems, you're nuts. It doesn't solve most of them, it just hides them; and it creates a whole host of new ones. And btw, Java doesn't include a 'graphical system', it just has a couple of libraries that can be used for that (and awt sucks majorly, swing is not so crappy but hardly a panacea).
  • Re:is this a joke? (Score:3, Insightful)

    by turgid ( 580780 ) on Sunday July 29, 2007 @04:32PM (#20034435) Journal

    Well, what do you expect from IBM? It's just another one of their look-Ethel-it's-open-source-and-look-at-us-helping -the-community content-free PR fluff pieces. Ignore them and they'll crawl back into their mainframe cave.

  • by timeOday ( 582209 ) on Sunday July 29, 2007 @06:50PM (#20035717)

    I'd say the speed of human motor activity is an even greater limiting factor.


    I wouldn't bet on that. The command line is not just a human/computer interface, but also a computer/computer interface. It's very common for one script to fire off many others.


    That said, I agree with the grandparent that it's hard to imagine a program where command line processing is a significant runtime expense.

  • by HeroreV ( 869368 ) on Sunday July 29, 2007 @07:10PM (#20035899) Homepage
    "Mixing tabs and spaces for indenting is bad. It causes many problems that you don't encounter when using only spaces. Therefore, tabs are bad, so only use spaces."

    That is the only significant argument against tabs I've ever read, and I've probably read it a hundred times. Only a moron wouldn't realize that it's the mixing that is bad, not the tabs or the spaces, but apparently there are a lot of morons out there.

    tabs: good
    spaces: good
    mixing tabs and spaces: bad

    I personally prefer tabs. Why?
    • Different code uses a different number of spaces for indenting, which makes copying between them more time consuming.
    • Easier to change indention width. It's a simple change of an IDE preference instead of a risky text replace.
    • Tabs are more semantic. That might sound stupid, but it gives me a warm fuzzy feeling.
  • by moosesocks ( 264553 ) on Sunday July 29, 2007 @07:14PM (#20035941) Homepage
    The weird bit is that, despite being a somewhat silly article, it launched one of the most intelligent discussions I've seen on /. in a while.
  • Re:Too much (Score:5, Insightful)

    by hackstraw ( 262471 ) on Sunday July 29, 2007 @07:26PM (#20036061)
    I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.

    The syntax for gperf is not that bad, but its simply the wrong tool for the job as far as commandline processing goes.

    gperf simply makes a "perfect" has function for searching a predetermined static lookup. It provides no mechanism for arbitrary arguments like input filenames or modifiers (like a filter for including/excluding things, or increasing/decreasing something) nor does it check for conflicting options or missing options.

    gperf would give you nothing besides a match of input to a state. gperf would provide nothing for a common commandline like: --include="*.txt" --exclude="*.backup" --with-match="some text|or this text" --limit-input=5megabytes

    getopt or just rolling your own if/else if ladder or switch statement would provide much more flexibility over gperf.

    Now, with parsing a configuration file, gperf might help, but for processing commandline arguments, gperf is simply the wrong tool for the job.

    This is like the second or third slashdot posting from IBM's developer works that is simply a well formated nonsense. Past examples are http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255 [slashdot.org] and http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255 [slashdot.org]

    This is silly on both slashdot and IBMs part.

  • by VGPowerlord ( 621254 ) on Sunday July 29, 2007 @10:17PM (#20037453)

    Not everyone uses the same tab stops.

    I see that as a good reason to use tabs. Don't like how far it's indented? Change how wide your editor displays tabs.

The use of money is all the advantage there is to having money. -- B. Franklin

Working...