Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Protothreads and Other Wicked C Tricks 229

lwb writes "For those of you interested in interesting hard-core C programming tricks: Adam Dunkels' protothreads library implements an unusually lightweight type of threads. Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads. But they are implemented in 100% portable ANSI C and with an interesting but quite unintuitive use of the switch/case construct. The same trick has previously been used by Simon Tatham to implement coroutines in C. The trick was originally invented by Tom Duff and dubbed Duff's device. You either love it or you hate it!"
This discussion has been archived. No new comments can be posted.

Protothreads and Other Wicked C Tricks

Comments Filter:
  • Looks pretty cool (Score:5, Interesting)

    by bobalu ( 1921 ) on Thursday October 06, 2005 @08:29PM (#13735824)
    I used a Lifeboat lib back in the late 80's that this reminds me of. Cooperative multitasking. Eventually ported the whole thing to OS/2 and used that threading instead. All the code pretyy much worked as-is.
  • by mc6809e ( 214243 ) on Thursday October 06, 2005 @08:37PM (#13735872)

    Duff's device is a way of forcing C to do a form of loop unrolling. It has nothing to do with coroutines.

  • by Anonymous Coward on Thursday October 06, 2005 @08:57PM (#13735957)
    Protothreads are quite limited in what they can do - they are just a way of expressing a state machine using co-routines.

    Real threads are much better for things like interpreters. JIT compilation with native threads is even better.
  • Loop Abuse (Score:5, Interesting)

    by wildsurf ( 535389 ) on Thursday October 06, 2005 @09:11PM (#13736028) Homepage
    Reminded me of a function I once wrote...

    The PPC architecture has a special-purpose count register with specialized branch instructions relating to it; e.g., the assembly mnemonic 'bdnz' means "decrement the count register by one, and branch if it has not reached zero." I've used this in some pretty weird loops, including this one that broke the Codewarrior 9.3 compiler (fixed in 9.4.) This computes the location of the n'th trailing one in a 32-bit integer. Pardon my weak attempt at formatting this in HTML:

    static uint32 nth_trailing_one(register uint32 p, register uint32 n) {
    register uint32 pd;
    asm {
    mtctr n; bdz end
    top: subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdz end
    subi pd, p, 1; and p, p, pd; bdnz top
    end: }

    return __cntlzw(p ^ (p - 1));
    }

    The idea was that the instruction stream should stay as linear as possible; most of the time the branches are not taken, and execution falls through to the next line of code. Ironically (siliconically?), the entire function could probably be implemented in a single cycle in silicon; shoehorning bitwise functions like this into standard instructions tends to be extremely wasteful. Perhaps FPGA's will make an end run around this at some point. I've also tried this function with a dynamically-calculated jump at the beginning, similar to the case statement logic in the article.

    Hmm, I had a point I was trying to make with this post, but now it's escaped my mind... :-)
  • by achurch ( 201270 ) on Thursday October 06, 2005 @09:28PM (#13736099) Homepage

    I got to this little gem:

    The advantage of this approach is that blocking is explicit: the programmer knows exactly which functions that block that which functions the never blocks.

    My English parser thread shut down at that point . . .

    Seriously, this looks like a handy little thing for low-memory systems, though I'd be a bit hesitant about pushing at the C standard like that--the last thing you need is a little compiler bug eating your program because the compiler writers never thought you'd do crazy things to switch blocks like that.

  • Stackless Python (Score:2, Interesting)

    by alucinor ( 849600 ) on Thursday October 06, 2005 @09:36PM (#13736139) Journal
    Is this similar to Stackless Python [sjsoft.com] and green threads? I spend most of my time with interpreted languages, so my C is very lacking.
  • Python (Score:5, Interesting)

    by meowsqueak ( 599208 ) on Thursday October 06, 2005 @09:42PM (#13736166)
    Weightless threads in Python:

    http://www-128.ibm.com/developerworks/library/l-py thrd.html [ibm.com]

    They are cooperative but far more efficient than Python's own threading model. You can easily create hundreds of thousands of concurrent threads.
  • by abb3w ( 696381 ) on Thursday October 06, 2005 @09:57PM (#13736240) Journal
    From the summary: Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads.

    From the above Duff on Duff's Device: I have another revolting way to use switches to implement interrupt driven state machines but it's too horrid to go into.

    Perhaps this is the Duff's Device equivalent of a proof of Fermat's Last Theorem? Or is my ignorance of the history of Evil Computing showing?

  • by Dr. Manhattan ( 29720 ) <(moc.liamg) (ta) (171rorecros)> on Thursday October 06, 2005 @10:08PM (#13736295) Homepage
    Get the book Obfiscated C and Other Mysteries [amazon.com] by Don Libes. Explanations of various Obfuscated C contest entries, and alternate chapters illustrate neat corners of C, including a few things similar to this little library. Occupies a place of honor on my shelf.
  • Dijkstra says... (Score:2, Interesting)

    by Jeff85 ( 710722 ) on Thursday October 06, 2005 @10:12PM (#13736315) Homepage
    "The competent programmer is fully aware of the limited size of his own skull. He therefore approaches his task with full humility, and avoids clever tricks like the plague." -Edsgar Dijkstra
  • Re:Loop Abuse (Score:3, Interesting)

    by addaon ( 41825 ) <addaon+slashdot.gmail@com> on Thursday October 06, 2005 @10:43PM (#13736458)
    Unless you know that n is usually large, wouldn't this be more efficiently implemented with cntlzw?

    Adam
  • by ihavnoid ( 749312 ) on Thursday October 06, 2005 @11:01PM (#13736533)
    As the prothread homepage says, it's for extremely small embedded systems, where there are no operating systems, with tiny amount of memory (You can't use DRAMs on systems that cost something less than $1). Want to use threads on those kind of systems, you have no choice.

    Another advantage is its portability. Small embedded systems, whether they have operating systems or not, usually can't support some fully-blown threading standard. Those operating systems seem to implement some kind of 'specially tuned' thread APIs.

    Using these kind of threads on a full-blown PC (or servers) would have almost no benefit. However, in the embedded software engineer's perspective, it's great to see a ultra-lightweight thread library without any platform-dependent code.
  • by raytracer ( 51035 ) on Thursday October 06, 2005 @11:25PM (#13736648)
    Tom clarified this on my blog. [brainwagon.org]
    Yeah. I knew this. In my piece about The Device, I mention something about a similar trick for interrupt-driven state machines that is too horrible to go into. This is what I was referring to. I never thought it was an adequate general-purpose coroutine implementation because it's not easy to have multiple simultaneous activations of a coroutine and it's not possible using this method to have coroutines give up control anywhere but in their top-level routine. A simple assembly-language stack-switching library lets you do both of those.
    Still, a pretty cute thing.
  • Re:Loop Abuse (Score:4, Interesting)

    by wildsurf ( 535389 ) on Thursday October 06, 2005 @11:53PM (#13736809) Homepage
    Unless you know that n is usually large, wouldn't this be more efficiently implemented with cntlzw?

    It would be if I were looking for the n'th leading one, but this code is looking for the n'th trailing one. (e.g. for 0b0010011001011100, the 3rd trailing one is in the fifth-lowest bit.) The equivalent code sequence for leading ones is in fact more complicated, requiring three arithmetic instructions and a branch per iteration. (cntlzw, shift, xor, branch).

    I actually use this code as part of an algorithm where I have a very large (e.g. 65k-element) packed single-bit histogram array, and need to find the position of (say) the 1000th set bit. Vector instructions can do a coarse population-count from one end fairly efficiently, but once it's narrowed down to a 32-bit region, it comes down to slicing and dicing. My code operates by clearing the rightmost set bit in each iteration (x & (x - 1)), then at the end, isolating the needed bit (x ^ (x - 1)) and using cntlzw to find its position. To clear the leftmost set bit, you need three instructions: first get its position with cntlzw, then shift 0x80000000 right by that number of bits, and finally XOR to clear the bit. (If there's a shorter sequence, I haven't found it.)

    (oh, and for the troll responder-- you are quite spectacularly wrong. But thanks for the giggle.)
  • by TomRitchford ( 177931 ) on Thursday October 06, 2005 @11:57PM (#13736828) Homepage

    The absence of local variables is because the stack is not preserved between invocations, and multiple invocations give the appearance of scheduled threads. There is nothing here that the compiler can break with its optimizer, unless it is breaking valid C code: the control flow expressed using the macros is completely legit C control flow. It's worth running the examples through just the C pre-processor in order to understand what gets built.


    I downloaded it. But the version that is there does not, in fact, compile: there's an obvious typo at example-buffer.c, line 137:
    PT_WAIT_THREAD(pt, producer(&pt_producer) &
                      consumer(p&t_consumer));
    // should be consumer(&pt_consumer));
    That aside, I believe your claim is wrong now that I've read the code.
    int x = 23;
    // Some blocking code here.
    // A case statement leads us to here.
    if ( x != 23 ) {
    // Now x could be anything.
    }
    right? The reason is that you are simply case-statementing into the code and bypassing the whole subroutine calling mechanism entirely -- so the variable initialization just isn't called.

    Now consider the following code:
    double x = sin(y*(2*pi)/360)*cos(y*(2*pi)/360);
    // Some blocking code here.
    // A case statement gets us to here.
    double z = tan(y*(2*pi)/360;
    Suppose the compiler decided to extract out the common y*(2*pi)/360 term as an optimization. The assignment to that common term will not occur when you jump into the middle of that routine -- so your code will not work as it appears.
  • wtf? (Score:3, Interesting)

    by DeafByBeheading ( 881815 ) on Friday October 07, 2005 @12:48AM (#13737096) Journal
    Okay, I'll play the n00b. I understand most of this, but my coding background is not that great, and mostly in C++, Java, and PHP, and I'm having problems with the switch from Duff's Device...


      switch (count % 8)
      {
      case 0: do { *to = *from++;
      case 7: *to = *from++;
      case 6: *to = *from++;
      case 5: *to = *from++;
      case 4: *to = *from++;
      case 3: *to = *from++;
      case 2: *to = *from++;
      case 1: *to = *from++;
            } while (--n > 0);
      }


    What the hell is up with that do { applying only in case zero? It's in several places on the net just like that and Visual Studio compiles this just fine, so it's not an error. I checked K&R, and they don't even hint at what could be going on there... I'm lost. Help?
  • by skaller ( 265521 ) on Friday October 07, 2005 @01:32AM (#13737265)
    FYI this technique is heavily exploited in the programming language Felix:

    http://felix.sf.net/ [sf.net]

    to provide user space threading. The main difference is that all the 'C tricks' are generated automatically by the language translator. If you're using gcc then the switch is replaced by a computed jump (a gcc language extension). On my AMD64/2800 time for creating 500,000 threads and sending each a message is 2 seconds, most of the time probably being consumed by calls to malloc, so the real thread creation and context switch rate is probably greater than Meg/sec order .. just a tad faster than Linux. Both MLton and Haskell also support this style of threading with high thread counts and switch rates (although the underlying technology is different).
  • by GlassHeart ( 579618 ) on Friday October 07, 2005 @02:34AM (#13737469) Journal
    There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

    My experiences contradict your statement. State machines are both easy to implement, and easy to debug, if you do it the right way. I have seen many entirely wrong implementations, including one where you can go from any of about two dozen "states" to any other. I have seen some that just switch states when they feel like it, or switch states based on complex decisions, which makes debugging difficult. Put another way, you can make a "state machine" degenerate into something else, and nullify its benefits, if you refuse to follow the rules.

    A well-implemented state machine has an important characteristic: it is clear to see why you are where you are. This means that state transitions are checked (against unexpected events) and traced, so debugging the machine is literally a matter of reading a log that looks like this:

    in state 0, received event A so went to state 2
    in state 2, received event B so went to state 1
    in state 1, received event C so went to state 5
    in state 5, ignoring event A
    in state 5, received unexpected event C

    and so on. In this particular example, the question to answer is why we're not handling event C properly in state 5, or why we went to state 5 in the first place. Either should be pretty obvious when you consult the original design. The fix is likewise obvious. Figuring out state machines, in my experience, has always been easier than figuring out multi-threaded code.

    This isn't to say that all programs should be implemented as a state machine. Simple Unix-style pipe programs, for instance, are generally unsuitable. If you don't know how to design a state machine properly, it's also going to be unsuitable.

  • Re:wtf? (Score:2, Interesting)

    by Jeremy Singer ( 717636 ) on Friday October 07, 2005 @08:05AM (#13738326)
    I think the other replies to this are good, but beg the question of why to do this. Tricks like this might save, at most, microseconds of execution time, but they will inevitably waste hours of human time, even when they haven't broken. The right attitude of a programmer when using such a trick in a program is to consider such tricks as a kind of lethal bomb just waiting to blow up at an inconvenient time. Even if you are a brilliant programmer and can guarantee that nobody except yourself will ever look at or touch this code again, there will come a time when the intermediate value theorem of programming will get you. The theorem goes like this: No matter how brilliant you are at time n, there is always a time n + m such that you are at a different level of brilliance. Murphy's corollary says that this level of brilliance will be less than the level at point n at least half the time, and that time n will be just before its time to go on vacation, go home to your significant other, etc. I know, Dijkstra said it better.
  • by Anonymous Coward on Friday October 07, 2005 @02:51PM (#13741621)
    This is bad, lame, faux cooperative threads.

            Local variables are not preserved.

            A protothread runs within a single C function and cannot span over other functions. A protothread may call normal C functions, but cannot block inside a called function.


    Well,
    I think it it is a simple idea but very well implemented and especially useful for microcontroller programs. Mr. Dunkels other project is uIP, which I do find very useful in this area, too.
    I.e. this is neat if you don't have the resources at hand to run a even a small multitasking kernel! Maybe 2kB flash for program storage or similar.

    As one of my hobby projects, I'm implementing a software DCC encoder (a protocol used for digital model railroads) that has to talk concurrently to the PC and to another bus with various connected gadgets. It has to do buffering and status displays and whatnot. Going the traditional way of
    a chain of 'if's is hard and an explicit state-machine non-obvious. Protothreads work, are easy and small (few bytes).

    Cheers,

    Onno

I've noticed several design suggestions in your code.

Working...