Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Programming IT Technology

Protothreads and Other Wicked C Tricks 229

lwb writes "For those of you interested in interesting hard-core C programming tricks: Adam Dunkels' protothreads library implements an unusually lightweight type of threads. Protothreads are not real threads, but rather something in between an event-driven state machine and regular threads. But they are implemented in 100% portable ANSI C and with an interesting but quite unintuitive use of the switch/case construct. The same trick has previously been used by Simon Tatham to implement coroutines in C. The trick was originally invented by Tom Duff and dubbed Duff's device. You either love it or you hate it!"
This discussion has been archived. No new comments can be posted.

Protothreads and Other Wicked C Tricks

Comments Filter:
  • From the source: (Score:5, Informative)

    by MythMoth ( 73648 ) on Thursday October 06, 2005 @08:33PM (#13735847) Homepage
    Duff on Duff's Device:
    http://www.lysator.liu.se/c/duffs-device.html [lysator.liu.se]
  • by dmoen ( 88623 ) on Thursday October 06, 2005 @08:38PM (#13735876) Homepage
    This looks very similar to the implementation technique used for the Squeak programming language (not the Smalltalk Squeak). Squeak is a preprocessor for C that makes it very easy to use this technique.

    http://citeseer.ist.psu.edu/cardelli85squeak.html [psu.edu]

    Doug Moen
  • by bani ( 467531 ) on Thursday October 06, 2005 @08:42PM (#13735886)
    While it is probably the first time the copy technique had been expressed in C, it's certainly not the first time the actual technique had been expressed in code.

    I recall seeing the same trick implemented in assembler somewhat earlier, I think the technique was called towers?
  • by LLuthor ( 909583 ) <lexington.luthor@gmail.com> on Thursday October 06, 2005 @08:48PM (#13735911)
    Duff's device was the first convoluted form of a switch() statement which became well known.
    All these C "tricks" employ the same technique (though more elegantly) for different goals. Nonetheless, Duff's device can be said to have inspired such code.
  • Not new (Score:4, Informative)

    by Anonymous Coward on Thursday October 06, 2005 @08:51PM (#13735928)
    SGI had state threads library since long http://oss.sgi.com/state-threads [sgi.com]
  • by skids ( 119237 ) on Thursday October 06, 2005 @09:08PM (#13736017) Homepage
    ...not bound to any particular OS.

    If that's what folks are looking for, another option is the tasks added to LibGG a while back. Tradeoffs either way -- LibGG's requires at least C signals (but will use pthreads or windows threads if detected during compile time), whereas this can be used in OS-less firmware. But on the positive side you can use switch() in LibGG tasks -- what
    you can't use are a lot of non-MT-safe system calls. It's an OK abstraction but of course there are so very many ways to accidentally ruin portability that it is far from foolproof.

    http://www.ggi-project.org/documentation/libgg/1.0 .x/ggAddTask.3.html [ggi-project.org]
  • by Anonymous Coward on Thursday October 06, 2005 @09:34PM (#13736123)
    He couldn't have invented it in 1985 since Duff sent his original mail in 1983:

    From research!ucbvax!dagobah!td Sun Nov 13 07:35:46 1983
    Received: by ucbvax.ARPA (4.16/4.13) id AA18997; Sun, 13 Nov 83 07:35:46 pst
    Received: by dagobah.LFL (4.6/4.6b) id AA01034; Thu, 10 Nov 83 17:57:56 PST
    Date: Thu, 10 Nov 83 17:57:56 PST
    From: ucbvax!dagobah!td (Tom Duff)
    Message-Id:
    To: ucbvax!decvax!hcr!rrg, ucbvax!ihnp4!hcr!rrg, ucbvax!research!dmr, ucbvax!research!rob
  • by nothings ( 597917 ) on Thursday October 06, 2005 @09:43PM (#13736169) Homepage
    Please note that this isn't interesting unless you work in, as, the FA says, a severely memory constrained system. No normal embedded system needs to do this, much less the systems most programmers on Slashdot probably work with.

    This is bad, lame, faux cooperative threads.

    Local variables are not preserved.

    A protothread runs within a single C function and cannot span over other functions. A protothread may call normal C functions, but cannot block inside a called function.

    It's also not even particlarly new [mine-control.com] [1998].

    Unless memory is at an absolute premium, just use cooperative threading instead. If you try to use prototheads, you'll quickly discover how unlike "real" programming it is. Even just a 4K stack in your cooperative threads will get you way more than protothreads does.

  • by xant ( 99438 ) on Thursday October 06, 2005 @10:06PM (#13736287) Homepage
    Actually, I don't know if this is exactly the same feature, but it sounds like it can be used that way. Check out the What's new in Python [python.org] entry. This is currently implemented in Python CVS, to be available in Python 2.5.

    The actual Python Enhancement Proposal [python.org] gives more detail and several badass use-cases.
  • by shalunov ( 149369 ) on Thursday October 06, 2005 @10:17PM (#13736336) Homepage
    It most certainly is the Duff's device, or at least is very close to it. Duff's device is, indeed, a way to unroll loops; specifically, a way to unroll loops that uses a peculiarity in switch statement syntax that allows case to point inside a loop body. Now, take a look at lc-switch.h in the Protothreads tarball. It contains macros that use the same peculiarity to jump inside functions instead of loops.
  • by plalonde2 ( 527372 ) on Thursday October 06, 2005 @10:29PM (#13736397)
    The challenge is making the design maintainable. There isn't a program that can't be written as a state machine; but most programs expressed this way are difficult to understand and maintain.

    The argument that Rob Pike makes in A Concurrent Window System [swtch.com] and with Luca Cardelli in Squeak: a Language for Communicating with Mice [microsoft.com] is that many of the event systems and associated state machines that we write can be much simplified by treating input multiplexing, and thus coroutine-like structures, as language primitives.

    This work follows directly from Hoare's Communicating Sequential Processes - a good summary can be found here [swtch.com]. Working with CSP only a little has convinced me of how much easier so many systems tasks are in this framework than in the world of the massive state-system/event loop world.

  • by plalonde2 ( 527372 ) on Thursday October 06, 2005 @10:39PM (#13736442)
    The absence of local variables is because the stack is not preserved between invocations, and multiple invocations give the appearance of scheduled threads. There is nothing here that the compiler can break with its optimizer, unless it is breaking valid C code: the control flow expressed using the macros is completely legit C control flow. It's worth running the examples through just the C pre-processor in order to understand what gets built.

    It's ugly as sin, but your compiler had better get it right, or else it's not a C compiler.

  • by betterthancats ( 660632 ) on Thursday October 06, 2005 @11:12PM (#13736582) Homepage
    Well, there is always erlang [erlang.org].

    I'm kind of surprised it hasn't been mentioned yet.
  • by dmadole ( 528015 ) on Thursday October 06, 2005 @11:40PM (#13736741)

    It's too clever to be really useful unfortunately. The big issue is of course the no "local variables". Trouble is, if you are writing in C, the compiler may well be creating local variables for you behind your back. In C++ for example there are many cases where this will certainly happen, like

    void DoSomething(const string&);
    DoSomething("hollow, whirled");

    where a local variable of type string will be temporarily created to pass to routine DoSomething.

    You need to read the article.

    It only says you can't use local variables across functions that block. Actually, it doesn't even say that you can't use them, it only says don't expect their value to be preserved.

    In your example, even if the compiler does create a local variable to call DoSomething, and even if DoSomething does block, who cares if the value of that local variable is preserved, since it's impossible to reference it again after that statement?

    But that was an awfully long time ago. Now it's hard to find memory chips below 1Mbit.

    I can help you with this problem! Is 16 bytes small enough [microchip.com]?

    And since you can't use local variables, you can't use things like the C libraries or pretty well any library ever written, which is teh sux0r.

    But you can use the C libraries. Just don't use local variables across functions that block. Only a very few C library functions block.

  • by S. Traaken ( 28509 ) on Friday October 07, 2005 @12:19AM (#13736962)
    Suppose the compiler decided to extract out the common y*(2*pi)/360 term as an optimization.

    But the compiler won't decide to do this because it won't be able to establish that y (or pi) can not be changed between instances of this code.
  • by plalonde2 ( 527372 ) on Friday October 07, 2005 @12:24AM (#13736982)
    But that's precisely the reason you musn't use locals: they are not preserved. The switch in the "thread" setup sets up a new scope in which you might declare locals, but that scope need not be entered cleanly, leading to (in the case of your example) an uninitialized local variable.

    In your second example, the compiler *cannot* remove the sub-expression because the case statement that gets you there crosses a basic block boundary; the return statement from the blocking code, and the jump in through the switch are caught as valid C constructs that prevent the common sub-expression optimization.

    The macros as given give you a semblance of variable continuity and scoping, but the compiler, after the macro substitutions, just sees a return on the end of the blocking section, and a switch into the code following, something like:

    char YourFunc(char foo) {
    switch (foo) {
    case 'A': /*your blocking code*/
    return 'B';
    case 'B': /* the stuff after your blocking code */
    break;
    }
    }
    The ugliness in the macros is about letting other control constructs live around your switch in, but still generates legit code, but, for example, you can't use a local varible as an index to your for loop. Yuck.
  • Re:wtf? (Score:5, Informative)

    by ggvaidya ( 747058 ) on Friday October 07, 2005 @02:09AM (#13737402) Homepage Journal
    Okay, I'll try and see if I can figure this thing out (you have to admit, it screws with your mind just looking at it ...):

    You can implement a simple memcpy function like this:
    void copy(char *from, char *to, int count) {
      do {
          *from++ = *to++;
          count--;
      } while(count > 0);
    }
    So far, so good. Now Duff's problem was that this was too slow for his needs. He wanted to do loop unrolling [wikipedia.org], where each iteration in the loop does more operations, so that the entire loop has to iterate less. This means the 'is count > 0? if so, go back, otherwise go on' part of the loop has to execute fewer times.

    Now, the obvious problem with this is that you don't know how much you can unwind this particular loop. If it has 2 elements, you can't unwind it to three elements, for instance.

    This is where Duff's Device turns up:
    int n = (count + 7) / 8; /* count > 0 assumed */
     
      switch (count % 8)
      {
      case 0: do { *to = *from++;
      case 7: *to++ = *from++;
      case 6: *to++ = *from++;
      case 5: *to++ = *from++;
      case 4: *to++ = *from++;
      case 3: *to++ = *from++;
      case 2: *to++ = *from++;
      case 1: *to++ = *from++;
            } while (--n > 0);
      }
    First, we check to see how much we can unroll the loop - for instance, if count is perfectly divisible by 5, but not 6, 7, or 8, in which case we can safely have 5 copies inside our loop without worry that the copy is going to move past the end of the array. Then - and here's the magic trick - we use switch to jump into a do loop. It's a perfectly ordinary do loop; the trick is entirely in the fact that if count==6, for instance, then C considers the do-loop to begin at 'case 6:', causing 6 copies of '*to++ = *from++' to be executed before the 'while' returns the loop position to the 'case 6:' point which is where, as far as C is concerned, the do-loop began.

    Thus, the loop is unwound to a level that it can handle.

    I think.

    Feel free to correct/amplify/mock. :)

    cheers,
    Gaurav
  • by Punto ( 100573 ) <puntob.gmail@com> on Friday October 07, 2005 @03:31AM (#13737633) Homepage
    The idea of the Duff's device is to use switch to jump to a label depending on the value of a variable (in a portable way). You could use 'goto' with a bunch of conditions, but using switch will make cleaner code (as far as jumping around to arbitrary places in your code goes), and the compiler will probably make a jump table in most cases, which is faster.

    That's what Duff 'discovered', and it's the trick they're using here.

  • by thogard ( 43403 ) on Friday October 07, 2005 @04:02AM (#13737715) Homepage
    Ada didn't exist before 1977 and C was taking shape in 1972 and in use in 73. Ada's specs were 1st published in June of 79 and the C Programming Language book was published in 78.
  • Re:wtf? (Score:3, Informative)

    by Brane ( 210649 ) on Friday October 07, 2005 @04:35AM (#13737791)
    [...]the 'while' returns the loop position to the 'case 6:' point which is where, as far as C is concerned, the do-loop began.

    No, it returns to the 'case 0:' point where the 'do {' is. (Otherwise the loop wouldn't be executed count times, and somehow I think this Duff guy would have thought of that...)
  • Re:wtf? (Score:5, Informative)

    by ChadN ( 21033 ) on Friday October 07, 2005 @04:53AM (#13737831)
    I disagree with your assessment, although you were on the right track; The loop doesn't return back to the case label where the loop was entered, it always jumps back to the 'do' statement (synonomous with the case 0:).

    The way you describe it is that the loop is unrolled to a size that is safely divisible into the 'count' value, which is an interesting idea, but would not be as efficient (large prime number counts would not get unrolled, for example, and a more complex computed got would be required at the loop end).

    My take is this: with loop unrolling, one always has to take care of the 'remainder'. In the above example, the loop is unrolled to be a fixed size (8 repeated copy instructions, instead of one), and any count not divisible by 8 has to handle the remainder of the count after dividing by 8. Conceptually, you could imagine handling this remainder with a separate case section after the unrolled loop. In Duff's device, the remainder is actually dealt with first, by intially jumping into the loop somewhere other than the beginning, then letting the fully unrolled loop finish up.

    In answer to the previous poster's question, the 'do' could (probably) be put on it's own line, before case 0:, but that wouldn't look nearly as bizarre. :)

    Of course, maybe I'm wrong too. I hope not.
  • by Cwaig ( 152883 ) on Friday October 07, 2005 @05:45AM (#13737960)
    I suspect Adam knows all about FSM's - he was also the original author of the LIWP TCP/IP stack.

    Your point about only working on a particular kind of OS isn't a valid one. Why would it need to be the highest priority native thread?

    I've actually used the Protothread library in implementing the playback code of a PVR - and what it actually provides is explicit scheduling between a set of tasks. For example - playing back an MPEG2 Transport stream requires you to do perform several distinct tasks:
    1) Demultiplex the Transport stream
    2) Feed the MPEG video decoder hardware
    3) Feed the MPEG audio decoder
    ie. 1 producer, 2 consumers.

    You can implement this using normal threads. Or you can cut down on overheads and use protothreads, given that you only have a single instance of the MPEG hardware blocks, and can only play a single TS anyway.

    The system level thread for playback can be thought of as a container for the conceptual Protothreads that schedule cooperatively within the system thread in a producer/consumer type relationship. Kind of like a process/thread separation on a larger OS (the code was running on Nucleus).

    Using protothreads provides a deterministic task swap behaviour that removes the need for any locking primitives on the shared data structures between the producer (in this case the Demux thread) and the consumers (hardware feed threads). You can have a task swap occur based on your own complex conditions (for instance, threshold levels in stream buffers vs time until next frame decode is required), rather than the much more simplistic time slice scheduling or message blocking you'd see in a typical "real" threaded system.

    The priority give to the thread which contains the Protothread scheduled tasks doesn't have to be the highest priority on the system at all. All that priority signifies is how important the actual process of playing the MPEG stream is relative to the other functions going on in the system in parallel - eg. it'd be lower priority than a flash update that was going on in parallel, or any interrupt service threads, or threads that respond to user input. But it'd be more important than the thread that's just doing the nightly scan for new DTT channels in the background.

    I know - I do go on a bit.......
  • by TwistedSquare ( 650445 ) on Friday October 07, 2005 @06:22AM (#13738030) Homepage
    It's worth pointing out that CSP can be implemented by using co-operative multi-tasking that the Protothreads idea is similar to. However at first glance it seems Protothreads are too light-weight to be useful so you must use things like GNU pth, or Microsoft's Fibers. All this can be found in C++CSP [cppcsp.net], as well as other CSP implementations.
  • Re:wtf? (Score:4, Informative)

    by Tarwn ( 458323 ) on Friday October 07, 2005 @06:35AM (#13738072) Homepage
    So in a shorter description, what happens is that:
    1) you determine how many groups of 8 you will need, rounding up to count the remainder block as well (if there is one)
    2) code enters switch statement based on the remainder value, hits the correct case and falls through (note that if there was no remainder we start at the top of the cases and fall through, consuming an entire 8 block)
    3) code hits the while, decrements the number of 8 blocks (as we just finished off the partial "remainder block")
    4) return to do, fall through to finish this 8 group
    5) loop back to 3

    Took me a few minutes of staring at it (and I admit, some tme looking at above descriptions) to get over 4 years of no C in my diet, but now I have to admit that is beautiful.

Waste not, get your budget cut next year.

Working...