Slashdot Log In
Don't Overlook Efficient C/C++ Cmd Line Processing
Posted by
CmdrTaco
on Sun Jul 29, 2007 10:33 AM
from the also-don't-eat-yellow-snow dept.
from the also-don't-eat-yellow-snow dept.
An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Speed in options parsing? (Score:5, Insightful)
Re:Speed in options parsing? (Score:4, Insightful)
Parent
Re: (Score:3, Informative)
-Peter
Re: (Score:3, Insightful)
I wouldn't bet on that. The command line is not just a human/computer interface, but also a computer/computer interface. It's very common for one script to fire off many others.
That said, I agree with the grandparent that it's hard to imagine a program where command line processing is a significant runtime expense.
Re:Speed in options parsing? (Score:5, Informative)
If you don't like the nasty nested ifs, make the keys in your dictionary the command line options and the values delegates, then just loop through your list of options passed on the command-line, invoking the delegate as appropriate. Eliminates the if, there are no switch statements either, and each of your command line arguments is now handled by a function dedicated to it, bringing all of the benefits of compartmentalizing your code rather than stringing it out in a huge processing function.
Parent
Broken handling of vtables in linkers (Score:5, Informative)
Parent
only relevent to static linking (Score:5, Informative)
Again, to be clear, dynamically linking with the c++ standard library is not going to increase your executable size. Please don't try to roll your own code that exists in the standard library. It is a real nuisance when people do that.
I should qualify that by saying that template instantiations do (of course) increase executable size, but that they do so no more than if you had rolled your own.
Parent
Which platform uses dynamic libstdc++? (Score:3, Insightful)
It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.
On MinGW, the port of GCC to Windows OS, my programs dynamically link with msvcrt, not glibc. Also on MinGW, libstdc++ is static, just like in the embedded toolchain. Are you implying that one of the C++ toolchains for Windows uses a dynamic libstdc++? Which toolchain for which operating system that is widely deployed on home desktop computers are you talking about?
All the world is not a PC (Score:5, Insightful)
Parent
devkitARM (Score:3, Informative)
Re:Byte counts when compiled with devkitARM (Score:4, Funny)
Please don't tell the poor thing it's running on MIPS, the ARMv5TE kernel might just freak out and collapse the universe.
Parent
Character encoding conversion (Score:3, Informative)
How many of these embedded tools you write actually _do_ command line processing?
None yet, but they do handle other things that involve dictionaries, such as character encoding conversion. A program designed to move items back and forth between a town in Animal Crossing (for Nintendo GameCube) and a town in Animal Crossing: Wild World (for Nintendo DS) needs to be able to understand the encodings of character names and town names that these games use, possibly by converting between their proprietary 8-bit codecs and UTF-8.
why don't you invest in more (both memory- and time-) efficient ways to do IPC than the command line?
Because the command line, pipes, and sockets are the most obvio
Re:Speed in options parsing? (Score:5, Funny)
Parent
Re:Speed in options parsing? (Score:4, Funny)
Parent
Re:Speed in options parsing? (Score:4, Insightful)
I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.
Your just experiencing this with Java, Perl or some other high overhead bloated program. People often pull out a heavy weight needing a 90MB VM or a 5-10MB basis library calling the cats breakfast of shared libraries I would agree, but lets take a look at C based awk for example, it is only a 80kb draw. Runs fast, nice and general purpose and does a good job of what it was designed to do. It can be pipelined in, out and used directly on the command line as it has proper support for stdin, sndout and stderr. On my system, only 10 disk blocks to load.
While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core. Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....
Parent
Re: (Score:3, Insightful)
Which is why they are so crash-prone. With C/C++, any mistake whatsoever will likely crash the program/machine, and possibly also allow crackers to make the program execute arbitrary code.
Re:Speed in options parsing? (Score:4, Funny)
Parent
Re: (Score:3)
Re: (Score:3, Interesting)
The problem is that people set their tab breaks at all sorts of places (eg: every 4 characters), and then use tabs to space things in the middle of lines, or they'll mix tabs and spaces at the beginnings of lines. When somebody with different settings opens the same file, the indentation looks really screwed. That happens even after you've gotten everybody to agree on a common number of columns for indentation.
I only know of two solutions:
Re: (Score:3, Insightful)
That is the only significant argument against tabs I've ever read, and I've probably read it a hundred times. Only a moron wouldn't realize that it's the mixing that is bad, not the tabs or the spaces, but apparently there are a lot of morons out there.
tabs: good
spaces: good
mixing tabs and spaces: bad
I personally prefer tabs. Why?
Re:Speed in options parsing? (Score:4, Insightful)
I see that as a good reason to use tabs. Don't like how far it's indented? Change how wide your editor displays tabs.
Parent
Re: (Score:3, Funny)
Writing code that writes code--now we're thinking!
Re:Speed in options parsing? (Score:4, Informative)
How about "macro"? [jhu.edu]
Parent
Too much (Score:4, Insightful)
Re:Too much (Score:5, Insightful)
The syntax for gperf is not that bad, but its simply the wrong tool for the job as far as commandline processing goes.
gperf simply makes a "perfect" has function for searching a predetermined static lookup. It provides no mechanism for arbitrary arguments like input filenames or modifiers (like a filter for including/excluding things, or increasing/decreasing something) nor does it check for conflicting options or missing options.
gperf would give you nothing besides a match of input to a state. gperf would provide nothing for a common commandline like: --include="*.txt" --exclude="*.backup" --with-match="some text|or this text" --limit-input=5megabytes
getopt or just rolling your own if/else if ladder or switch statement would provide much more flexibility over gperf.
Now, with parsing a configuration file, gperf might help, but for processing commandline arguments, gperf is simply the wrong tool for the job.
This is like the second or third slashdot posting from IBM's developer works that is simply a well formated nonsense. Past examples are http://developers.slashdot.org/article.pl?sid=07/
This is silly on both slashdot and IBMs part.
Parent
Yeah, because getopt(3) is a real bottleneck (Score:5, Insightful)
It is if the linker complains about not finding it (Score:5, Informative)
Parent
Re: (Score:3, Interesting)
Are you seriously trying to argue that gperf is more portable than getopt?
Re: (Score:3, Informative)
Re: (Score:3, Informative)
Again, on the off chance that this helps anyone reading this pitifully long and silly thread: it is trivial to make getopt work on Win32, just like it was trivial to make strsep work on Linux when it only had strtok. I object to the argument that "portability" has anything whatsoever to do with whether you'd use getopt to parse arguments.
Like most of the other comments on this post, I find the idea of using gperf for "high performance argument parsing" superfluous and convoluted. In fact, I find the idea o
Re:It is if the linker complains about not finding (Score:3, Insightful)
Even when reinventing the wheel, it is important to reinvent as little as possible. If you need functionality that isn't there, at least keep the same interface.
And the standard says... (Score:5, Insightful)
Anyone writing or maintaining command line programs knows that they
should be using the API getopt() or getopt_long().
There are standards on how command line options and arguments are to be
processed. They should be followed for portability and code maintenance.
I agree... (Score:3, Insightful)
Actually, I've never really come across a case where I knew ahead of time the whole universe of strings I would be accepting, and so never ended up using it - gperf is a great idea, but this seems to be a case of someone really looking hard to figure out where they could shoehorn gperf into just for the sake of using it.
Wrong in so many ways (Score:5, Insightful)
Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.
I thought maybe we were seeing a bad writeup, but no, it's the authors' themselves who talk about the need for high-performance command-line processing, and give the performance of processing N arguments as O(N)*[N*O(1)]. I cannot conceive of a situation in which command-line processing is a bottleneck. And their use of O() notation is wrong (they are claiming O(N**2) -- which they really don't want to do, not least because it's wrong). O() notation shows how performance grows with input size. Unless they are worrying about thousands or millions of command-line arguments, O() notation in this context is just ludicrous.
I don't know why I'm going on at such length -- the extreme dumbness of this article just set me off.
Re: (Score:3, Interesting)
I challenge: cite as an example any fixed set of strings (such as would be applicable for perfect hashing) for which a realistic perfect hashing scheme of any sort outperforms a statically-sized conventional chaining table using a trivial 33/37-style [google.com] string hash. I don't think you can. Gperf languishes in obscurity for a reason.
Historically? (Score:4, Insightful)
This is like saying that walking is historically one of the most ignored areas in human transportation.
is this a joke? (Score:3, Insightful)
Re: (Score:3, Insightful)
Well, what do you expect from IBM? It's just another one of their look-Ethel-it's-open-source-and-look-at-us-helping -the-community content-free PR fluff pieces. Ignore them and they'll crawl back into their mainframe cave.
Is this a fucking joke? (Score:3, Funny)
Re: (Score:3, Insightful)
Another approach - parseargs (Score:3, Interesting)
The following two directories should bring it up to the latest version I know of.
This is not efficient, mind you. Command line parsing doesn't generally need to be efficient, even by my miserly standards, honed when a PDP-11 was something you hoped to upgrade to... some day...
ftp://ftp.uu.net/usenet/comp.sources.misc/volume2
ftp://ftp.uu.net/usenet/comp.sources.misc/volume3
http://www.cmcrossroads.com/bradapp/ftp/src/libs/
http://www.cmcrossroads.com/bradapp/ftp/src/libs/
This tool is much easier (Score:3, Interesting)
http://www.ibiblio.org/pub/Linux/devel/sugerget-1
With this code, you simply specify command-line strings and variables in a printf()
style format.
E.g. supergetopt( argc, argv,
"string1", "%d %d", function1,
"string2", "%s", function2 )
will call function1( int a, int b ) when string1 is on the command line,
and will call function2( char *s ) when string2 is used on the command line.
A whole lot easier than gperf, IMHO.
Re:C++ I get (Score:4, Insightful)
Parent
Re:C++ I get (Score:5, Funny)
Parent
Re: (Score:3, Insightful)
The trick is to identify the best tool for the job.
Re:C++ I get (Score:5, Interesting)
You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).
Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.
By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)
Parent
Re:C++ I get (Score:4, Informative)
Of course C++ exceptions are what I meant. What else would I mean when using the word "exceptions" in this context?
And yes, C++ exceptions can be expressed in C. After all, C is a glorified assembler and the resulting code from C++ translation is assembler as well. It all depends in the level of abstraction at which write the C code is written and on the amount of uglyness/inefficiency you're willing to take on board (and also the trade-off between both of the latter). But that's not the point. The point of this thread is that nowadays it makes no sense to make use of this capability in a C++ compiler. Especially not when considering that a user of a C++ compiler wants more than just a compiler. He also wants a debugger that is able to meaningfully link up the binary and the original C++ source. If you're a C++ compiler vendor, using C as an IL does nothing but complicate your own life. Twice.Parent
Re:C++ I get (Score:4, Informative)
The main problem (but not the only one) is called "object destructors". You have to make sure they are called. All of them, and in the correct order, at all the nested scopes of execution you are in when the exception occurs. And you need to make sure not to call them on any object not yet constructed (always remember that constructors can throw exceptions too) and never to call a destructor twice (I've seen this kind of bug multiple times in multiple compilers). And then there is the fun of exceptions thrown by destructors, not to mention the possibility that it all happens in the middle of constructing or destructing an array of objects.
All that is why setjmp()/longjmp(), also known as C's non-local goto, don't cut it, which in turn means that you need to complicate function return mechanisms. And just when you think you got that problem sorted out, you need to be aware that C++ functions can call (library) C functions that were never compiled to even know about exceptions but that in turn can call C++ functions that may again throw an exception. The entire construction needs to be able to handle this.
As I wrote in an other post [slashdot.org] in this thread, it can be done. But it is not easy. Note that the entire object destructor issue also applies within a single scope, which is why life is not as easy as replacing every "throw" statement by "goto end;".
Parent
Re:C++ I get (Score:5, Informative)
You are wrong about 3):
Source: http://archive.gamespy.com/e32002/pc/carmack/ [gamespy.com]
And 4) as well:
Source: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/G_002b _002b-and-GCC.html [gnu.org]
Parent
Re:Joke? (Score:5, Insightful)
Parent