Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Graphics Software Data Storage Printer

Reverse Engineering of a Graphics Format? 62

Jimbo God of Unix asks: "I recently purchased a color laser (Samsung CLP-500) because it claimed to have Linux compatibility. It does, mostly. However, I was irritated to find that the drivers are proprietary (splc, Samsung Printer Language - Color) and somewhat cranky to get working. I was hoping to find some good resources on reverse engineering the graphics format used to drive the printer. I've managed to mostly dissect the file format, so I think I can get the graphics data out, but I don't really know how to proceed to the next step. Are there any good resources for figuring out how to reverse engineer the graphics format? Are there any tools out there that will help me analyze the format (other than hexdump) or tell me if it's close to something else so I don't have to do as much work?"
"I have something of an advantage since I can compare the output from the Windows driver to the Linux driver, and I was able to dissect the Windows output file from the info gleaned by dissecting the Linux output file. But I'm kind of stuck at the moment and there don't seem to be too many documents or tools out there for dissecting graphics data.

I thought this might be useful for reverse engineering some of the proprietary image compression formats for web cams as well, but that's a project for another day."
This discussion has been archived. No new comments can be posted.

Reverse Engineering of a Graphics Format?

Comments Filter:
  • A few general hints? (Score:5, Informative)

    by Myself ( 57572 ) on Saturday February 12, 2005 @11:06AM (#11651856) Journal
    If there's a chance it's a plain bitmap, simple visualization can reveal lots of patterns not evident in a hex dump.

    I believe the old wardialing tool Toneloc had a mode, or a companion program, to display logs this way. It was easy to see things like "numbers ending in -0100 never get answered" as a vertical red line, for instance.

    The important things would be an adjustable margin, to "wrap" the pixels at varying widths, and adjustable bit depth, so you can discover odd packings that might not otherwise be apparent.

    If the data might be compressed, have a look at the article Hacking Data Compression [fadden.com] for a great, if slightly dated, conceptual overview.

    ERANAI (I am not a reverse engineer), but I hope this helps. Let us know if you have any luck!

  • and reside in the USA?

    My next step would be to get a good lawyer to find out if what you are doing will open yourself up to potential legal action.
  • by Scarblac ( 122480 ) <slashdot@gerlich.nl> on Saturday February 12, 2005 @11:13AM (#11651913) Homepage

    I can't help you with your question, I have no experience with reverse engineering.

    But for others who don't want to have the same problem: you should have checked www.linuxprinting.org [linuxprinting.org], which says of the Samsung CLP-500:

    Samsung supports this printer with proprietary drivers which come with the printer on its driver CD or can be downloaded on the web sites of Samsung. Unfortunately, these drivers do not work necessarily with all Linux distributions and there are no free drivers available. As it is also not sure whether Samsung will update their drivers for future Linux versions, this printer cannot be recommended.

    I would try to get the proprietary driver to work, basically by getting the distro it was made for, or at least finding out why it works there but not on your distro - probably it needs some specific kernel image that it was compiled with, which would suck...

    • Yes, it is annoying when drivers or dev kits are built for very specific distros. For this I have found a semi-decent solution. Setup a computer with usermode linux or vmware, etc. For each thing you need a specific distro for, install that distro on this box. So for your case, you would make a standard print server with the distro the driver wants. Then all your other computers can use it as a normal perfectly working print server. Hooray.

      This is a problem if you only have one computer since you pro
  • Lesson Learned (Score:4, Interesting)

    by Solder Fumes ( 797270 ) on Saturday February 12, 2005 @11:13AM (#11651914)
    Never buy anything that claims to work with Linux. Buy things that Linux supports.

    Unless you're just adventurous that way, and want to write drivers.
  • by xenephon ( 572595 ) on Saturday February 12, 2005 @11:19AM (#11651957)
    You should have bought the CLP-550. I avoided the CLP-500 for just this reason (and the fact that I heard bad things about its OS X support, as well). The CLP-550 supports Postscript, and works fine from my Mac, my Linux box, and my Windows box. Another advantage to the CLP-550 over the 500 is that the 550 comes with full toner cartridges; the cartridges which ship with the 500 are only half full.

    I don't understand how companies can sell printers that don't support Postscript. On the other hand, this seems to be a case where a company heard complaints from its customers, and corrected thier bad practices (the toner issue, and Postscript support).

    • I just ordered the CLP-550N for exactly the same reason you mentioned. This was after reading the test report of C`t magazine, where they gave exactly the same reason for getting the 550 over the 500. The series also comes with built in duplex unit which is another good reason for getting the printer.
    • I don't understand how companies can sell printers that don't support Postscript

      Actually, Brother doesn't support Postscript per-se, but does support Brotherscript. (which seems to work fine for me)

      I have a Brother 5170DN, which is a wonderful network printer. Just plug it into your network and it works as an independent network client.
      • Brother printers typically work ok with Linux. I just finally retired my HL-630 and replaced it with a HP LaserJet 4M+. The Brother printer served me faithfully for about 5 years. Not bad considering I originally bought it for $15 at a Salvation Army Thrift Shop. I replaced the drum and it worked perfectly. Just this winter the drum failed again and rather than replacing it I upgraded to PostScript with the industrial quality HP printer.
    • I don't understand how companies can sell printers that don't support Postscript

      For the exact reason why compaines can sell software modems (Winmodems) rather than the real thing. It is just something to watch out for as a *nix user.

    • by Anonymous Coward
      > I don't understand how companies can sell printers that don't support Postscript

      Because Adobe charges rip-off rates for the right to call it PostScript. We stopped making printers when over half of the cost of our printer was the stupid fees to Adobe.
    • by SSpade ( 549608 ) on Sunday February 13, 2005 @09:08PM (#11664085) Homepage

      I don't understand how companies can sell printers that don't support Postscript.

      Easy. It costs money to develop, license and ship a postscript based printer, and your typical home user doing nothing fancy and running windows doesn't need it - but they're very sensitive to up-front price.

      The real question is why do people buy non-postscript printers when they know that their operating system will work trivially with a postscript printer, but will require a lot of effort to work (often badly) with a non-postscript printer?

      The same line of reasoning explains the "demo" toner cartridges shipped with low-end printers. Your typical home user is very sensitive to up-front price (and probably never looks at the per-page cost). If the population of people buying printers wanted manufacturers to behave this way they just need to, en masse, be less stupid.


    • I don't understand how companies can sell printers that don't support Postscript. On the other hand, this seems to be a case where a company heard complaints from its customers, and corrected thier bad practices (the toner issue, and Postscript support).

      Consumer-grade lasers are something of a new market, one that manufacturers are being very careful about wading into. The more expensive business-class laser printers are big money and manufacturers don't want to see their business-class customers downgra
      • I've got a 2300DL, FWIW. The "weird language" is ZjScript.

        BTW, the 2430DL, which claims Linux support, is actually a ZjScript printer. It's just Minolta wrote a binary-only, RedShat 8+, SuSE 8+ compatible driver for it. My guess is that it would work on a 2300DL, unless it's got code to check for it being used with a 2300DL, too.
    • It doesn't take much to write the software that accepts a bitmap stream. Use ghostscript on the computer and you save the expense of Postscript hardware and licensing on the printer. Use the money saved to lower price and sell more printers. The customer can use the money saved for a dinner at a nice restaurant. Everyone wins.
  • by bluGill ( 862 ) on Saturday February 12, 2005 @11:20AM (#11651959)

    If you cannot get specs you really only have one choice: trial, error, and compare. Print a blank page, then print a page with on pixel. Then print with two pixels. Start simple and make things more complex.

    It helps greatly if you buy (or build if you can) some sort of hardware trace tool. I've used this for SCSI devices before, good ones will give you all the data that is transferred to/from the device in question.

    If this was simple everyone would do it. However it is complex, and generally boring. A half functioning drive is worthless.

    P.S. a better idea would be to return this printer now while you still can. Buy a printer that supports postscript. That hits the bottom line of companies who pull these tricks and in the end is worth more to the linux comunity.


    • P.S. a better idea would be to return this printer now while you still can. Buy a printer that supports postscript. That hits the bottom line of companies who pull these tricks and in the end is worth more to the linux comunity.

      Except, of course, that it doesn't hit their bottom line because most printers with postscript support are priced higher than the new consumer-level lasers that are starting to come out.
      • So it hits the bottom line harder. They still have to pay money to develop that cheap printer, but they don't get many sales because people buy postscript anyway, which means the postscript printers have a better return on investment.

  • PGM (Score:3, Informative)

    by Knights who say 'INT ( 708612 ) on Saturday February 12, 2005 @11:21AM (#11651960) Journal
    PGM is the easiest format to reverse engineer out there; it's an ASCII file with RGB values and some headers.

    Useful for those wanting to muck with images directly from code. I learned about that last week, and I'm having fun with neural nets :-D
  • by torpor ( 458 ) <ibisumNO@SPAMgmail.com> on Saturday February 12, 2005 @11:37AM (#11652049) Homepage Journal
    .. such as a page full of checkerboard or something, and work from there.

    if you can see the 'obvious' change in pattern in the file, you've got a lead. but the important thing is to start from the very beginning with something you know .. and look for that pattern at each stage through the pipeline ...
  • The truth is, you're probably never going to reverse engineer a decent driver.

    If the linux driver is flakey, it's probably because the printer's firmware is itself flakey, and the Windows driver just contains innumerable hacks to get around the problems that keep cropping up.

    Take the thing back, complain that you can't get it working under Linux, and buy a different one.
  • Be Methodical (Score:5, Informative)

    by maeglin ( 23145 ) on Saturday February 12, 2005 @12:40PM (#11652497)
    If it's a head control language or something you might in trouble, but if it's simply an image being sent you should be able to figure it out eventually.

    The best way to reverse engineer a graphics format is to use a collection of sample images to get a high level idea of what is going on. Choose the images in a way that will give you the most information.

    Make sure the printouts always the same size, layout, color depth, margins, etc. It does no good to compare an A4 grayscale image to a color letter sized one.

    If you're operating under the assumption that it's a simple bitmap, the following may work.

    1. Is it compressed?

    Print out a page with some dots on a colored background.

    Print out a page with more dots on it.

    Are they the same size?

    If so, most it's most likely a bitmap.

    If not, it's probably compressed.

    What type of compression is it?

    Print out a page which is half white, half another color.

    Print out another page which is checkered (with *very* small squares) half white, half the other color.

    Is one smaller than the other, if so it may be compressed. If it is, it *could* be Jim-Bobs compression algorithm, but programmers are lazy so it's most likely something off-the-shelf.

    If it's the half-and-half print that is smaller, it's either RLE or something like JPG (most likely RLE as JPG is lossy -- compare a gradient print to find out if it's RLE or not).

    If it's the checkered print then it's probably LZW.

    If neither is smaller, re-evaluate your compression assessment.

    2. Create a decompresser to test your decompression theory.

    Print a colored page.

    Print a second colored page a couple of changes.

    If you can't create two data dumps of (relatively) equal size from the input data, you're probably wrong about the compression algorithm.

    If they are the same size you may be going in the right direction. (If they're exactly the same size be very happy).

    3. Guestimate packing.

    Print a cyan* page, a yellow page, a magenta page and a white page. Take a look at the first four bytes or 16 bit words. If you've got clearly observable patterns (ff 0 0 0; 0 ff 0 0; 0 0 ff 0; etc.) you're in luck. If not try to work out the packing order. Just keep in mind, if it's a bitmap, and you've got the decompression down, and the page is one color *eventually* you will find a repeating pattern that represents that color.

    4. Visualize the decompressed data.

    The best way from this point is to find a way to visualize what you've got. In the past I had stock BMP code that I would use to generate a new displayable image, but I've also created custom apps to display it.

    If the resulting image looks right but is a funky color, it's packing.

    If the resulting image looks like it *could* be close but has a lot of shear, play with your assumed width and height.

    If it looks like static and you've previously determined that you're dealing with 16 bit values, try changing the byte order and try again.

    5. Lather, rinse and repeat.

    Despite what the nay-sayers want you to do. Don't give up. Figuring out someone's attempt to hide data from you is a reward you give yourself. Even if it takes days or weeks, when the light goes on and you think, "Ah ha! I've got you now you bastard!", it makes the time worth it -- at least for me it always did.

    Besides, if you do get it working, you can release it and make Open Source better by your efforts.

    * Remember, it's cmyk, not rgb.
    • Great advice!

      If it's a head control language or something you might in trouble, but if it's simply an image being sent you should be able to figure it out eventually.

      I decoded a head control language once. Eventually I had great black and white, but color was really tough, since you had to interleave the data based on when each head was going to pass over an area. I gave up when I bought another really nice printer for ~$200. Still, for years later, I got e-mail thanking me for my black-and-white GP

  • by ComputerSlicer23 ( 516509 ) on Saturday February 12, 2005 @12:53PM (#11652594)
    If you really want this to work under Linux, give the information you have to the Ghostscript guys (that used to be GNU Ghostscript, but I believe they've broken away, Aladdin didn't like it for some reason).

    They know printers. They know lots about printers, and printer languages. My guess is that they'll be thrilled to get an opportunity to hack another printer working. I know when I bought a printer that has "PS Support", it had a postscript driver in software that talked a propriatary protocol to the printer. They would have gladly written the output driver for it, but they didn't know how it worked.

    Maybe if you know how it works, you'll be able to get them to do something with it.

    Kirby

  • by martyb ( 196687 ) on Saturday February 12, 2005 @01:02PM (#11652667)

    There's a bunch of info on the CLP-500 here that might help. There are lots and lots of comments from users with both good and bad results and the distros they used.

    Check this out:

    Good Luck!

  • return it... (Score:2, Insightful)

    by zonker ( 1158 )
    and buy a printer that has proper postscript 3 support. your life will be much easier as will your prints. it looks like the 550 has postscript 3 for instance (yes it's more expensive but the lack of headaches will be worth the cost)...

    really folks, when you buy a printer don't just look at features and speed. look at the printer languages it features. if it only features a proprietary language (like yours does), be prepared for what you are getting into. pcl5 is okay, but postscript 3 is where its at
  • According to linuxprinting.org [linuxprinting.org]:

    Color laser printer, max. 1200x1200 dpi, this is a Paperweight

    Doh!

  • an fft is usually quite useful in trying to deconstruct binary formats, all of the fixed-length parts of the encoding show up as frequency spikes. as someone else mentioned, if it employs compression at any level then you're pretty much sol.

    seriously though, its not worth it to write a driver for a single instance of a device. and if you dont have adequate documentation, the bar has to be even higher to make it worthwhile. if its really that trivial for you to do, you should get a real job doing it, throw
  • Your Samsung Printer (Score:4, Informative)

    by ratboy666 ( 104074 ) <fred_weigel@[ ]mail.com ['hot' in gap]> on Saturday February 12, 2005 @08:15PM (#11655645) Journal
    Probably BIG bi-level compression (load jbigkit from the web -- it will give you a starting point). Probably separate maps for each of the colours. Embedded command language for the rest -- look on the cable for details.

    'K? [I think that covers most of the current crop of printers]. Next time, buy a PostScript device.

    Ratboy
  • by crazney ( 194622 ) on Saturday February 12, 2005 @11:42PM (#11656866) Homepage Journal
    I've RE'd a bunch of stuff, from DRM protection (http://crazney.net/programs/itunes/), Audio Codecs, network protocols and file formats. I use all sorts of nifty tools, most of which I wrote myself.

    For a graphics format, however, I'd be inclined to go for disassembly of the proprietary driver. Perhaps you could try various test cases (scan a white sheet of paper, what's the data look like? Try a black, red, green, blue.. etc). But if it's compressed with some unknown algorithm (like the Audio codec that I've reversed) I don't like your chances of getting it that way.

    There are a bunch of disassemblers around, I have written my own (which isn't available publically cause it's still too shit) but I would highly recommend Datarescue's IDA. Old versions work fine in wine.

    However, something to be mindful of: Just rewriting their binary driver in C is copyright violation, make sure you properly document the spec and then do a cleanroom implementation.

    David.
    • Just to pick a few nits..

      However, something to be mindful of: Just rewriting their binary driver in C is copyright violation,

      Well, that may depend on the EULA. But assuming either that the EULA doesn't forbid reverse engineering, or that you're willing to bet that the purpose your work will qualify as 'intercompatibility' in court. (Which it should, but not everyone wants to take that risk.)

      Anyway, if you rewrite it without duplicating their code, you're not infringing their copyright.

      make sure you p
      • Well, that may depend on the EULA. But assuming either that the EULA doesn't forbid reverse engineering, or that you're willing to bet that the purpose your work will qualify as 'intercompatibility' in court. (Which it should, but not everyone wants to take that risk.)

        Not entirely correct, it is going to depend on what country you are in, and whether click-wrap EULAs actually have any legal credence. I'm not in the US of A.

        That is not clean-room, since the developer is already 'tainted' by having seen t
  • This might be irrelevant to graphics, but I think french cafe [samba.org] analogy written by Andrew Tridgell who developed Samba is a good reference on how to do reverse engineering (or in his term: network analysis or protocol analysis [groklaw.net]) in general.
  • If you need a high-quality wysiwyg printing, and are not afraid of a long wait, there is a "big crowbar" approach. Render your pages to a raster image via a decent rendering engine. Ghostscript does well on some fonts, not on others. Batik does XSL:FO well, though slowly. Render to a hi-res bitmap at the given resolution, and send that to the printer. Hopefully it can be sent as a stream, and not require buffering. It is slow and inefficient, but it can quite often work around difficult situations.
  • Wine [winehq.com] is apparently able to use Windows printer drivers. I've never used this feature myself and there doesn't seem to be that much info about it but this may be worth examining.

If all the world's economists were laid end to end, we wouldn't reach a conclusion. -- William Baumol

Working...