Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Intel Microsoft AI Security IT Technology

Microsoft and Intel Project Converts Malware Into Images Before Analyzing It (zdnet.com) 45

Microsoft and Intel have collaborated on a new research project that explores a new approach to detecting and classifying malware. From a report: Called STAMINA (STAtic Malware-as-Image Network Analysis), the project relies on a new technique that converts malware samples into grayscale images and then scans the image for textural and structural patterns specific to malware samples. The Intel-Microsoft research team said the entire process followed a few simple steps. The first consisted of taking an input file and converting its binary form into a stream of raw pixel data. Researchers then took this one-dimensional (1D) pixel stream and converted it into a 2D photo so that normal image analysis algorithms can analyze it.
This discussion has been archived. No new comments can be posted.

Microsoft and Intel Project Converts Malware Into Images Before Analyzing It

Comments Filter:
  • by jellomizer ( 103300 ) on Monday May 11, 2020 @09:14AM (#60047660)

    We will often take data and put it in Graphs to see trends that may be difficult to calculate. We use heat maps to figure out density of how things are distributed.

    Humans are still much better than computers at catching trends and patterns especially visually. I have actually taken data and converted it to an image, then I did an algorithm on it and I was able to see a graphical representation of the image. Also I can spots patters much quicker. So a Malware embedded in an other program will look about the same way, even if it is spread across some extra static.

    • by gtall ( 79522 )

      Really? How many images can you scan in an hour?

    • by PPH ( 736903 )

      Yeah. But the technique that they appear to be using here just slices the 1D byte array of the code into equal length lines and then stacks them as the second dimension in a picture. But there is no guarantee that bytes in adjacent lines have any correlation. There are methods for representing program and data flow as multidimensional maps, which have graphical representations. But that processing is non trivial and I don't see it mentioned in TFA.

  • Ummm... (Score:4, Insightful)

    by reanjr ( 588767 ) on Monday May 11, 2020 @09:18AM (#60047670) Homepage

    What's the difference between a grayscale image and an array of bytes? This summary sounds like the worst of science reporting, where the reporter doesn't have the first clue what they're talking about...

    • Magic.

      Computers, you see, are magic.

      Once we convert from executable Evil Bit Format to Microsoft(r)/Intel(r) Safe Image AI Render(tm) format then we can apply our special machine learning AI algorithms to it.

      The standard process I'm sure you are familiar with:

      1: claim to use AI
      2: ...
      3: profit!
      • You just load the malware image into the computer and repeatedly say "Enhance! Enhance! Enhance!"
    • by guruevi ( 827432 )

      A grayscale image is an array of bytes. Basically this is a bit-matching analysis to see whether the code contains certain fragments that are known to be bad or closely related at least.

      • And that is different from the usual pattern-matching algorithm used so far to fingerprint sections of malware in what way exactly?

        Oh. Right. We have converted it to an abstract greyscale picture first.

        • The advantage of treating it as an image is that they are able to use pre-existing training rather than training from scratch. Existing image processing models can already detect edges, basic shapes, etc.

          For example, if a given malware family looks like an oval, the new model only has to determine that oval=Cryptolocker. It doesn't have to first learn how to identify ovals.

          • That's awfully convoluted, you do know that the image processing is mostly dependent on teaching a computer how to interpret the binary representation of shapes in pictures, right? And that computers are still not really good at it (that's why CAPTCHAs are still a thing).

            Why not cut out the middle man and just teach it to interpret the binary representation of an executable binary? I can see why you want to use what picture interpretation software did as a starting point for your development, but turning a

            • > That's awfully convoluted, you do know that the image processing is mostly dependent on teaching a computer how to interpret the binary representation of shapes in pictures, right?

              And that part is already done. You don't need a samples of the new malware to train the ML on because they are using ML that has already been trained on millions of "shapes in pictures".

              Kinda like you learning a new word vs starting with a baby who doesn't know the letters. Starting with a pre-existing knowledge of letters

              • I missed a word in what I wrote. It should be:
                --
                You don't need a *million* samples of the new malware to train the ML on because they are using ML that has already been trained on millions of "shapes in pictures".
                --

                Sufficient samples for training is an issue when trying to use ML and AI to recognize new malware. They are polymorphic, meaning they aren't the same each time. You're looking for subtle patterns. It's awfully hard to reliably find the subtle patterns when you have a total of five examples, if y

            • > sounds a bit like pushing a signal through a DAC only to feed that into an ADC.

              Yes. Where Shazam exists. Consider you have a large collection of mp3 files, eacha live recording of a song. Your task is to identify the songs. You also have the Shazam app, which recognizes songs exactly as you want to do. It takes analog input thriufh the microphone.

              A reasonable plan is to pay the mp3s and let Shazam recognize them. It's inefficient to build Shazam for this purpose of you're starting at Genesis 1:1, bu

            • There are two reasons. The first is that while the underlying binary data between two similar malware files may change considerably, the image-equivalent may end up being extremely similar. This happens in actual images: you can take the same image and produce two files with completely different binary values (a jpeg vs a png, for instance) . Your binary-matching tool won't notice they're the same image, while an image-scanning tool will.

              I can see why you want to use what picture interpretation software did as a starting point for your development, but turning a binary into an image to then use an algorithm to interpret it sounds a bit like pushing a signal through a DAC only to feed that into an ADC.

              Actually, yes, that is in fact a very good way to perform song identi

    • Re:Ummm... (Score:5, Interesting)

      by aardvarkjoe ( 156801 ) on Monday May 11, 2020 @09:29AM (#60047726)

      The interesting thing would be that existing image analysis algorithms would be relevant to identifying malware vs. non-malware, which isn't obvious -- although probably not totally unexpected, because presumably there are patterns in malware code related to their similar behavior that are not present in non-malware. The mechanics of converting it into image data suitable for the existing algorithms isn't very interesting, as it sounds like they pretty much just use the bytes as pixel intensity values.

      The thing which seems weird to me is that from a quick glance at the (very superficial) article, it sounds like they just naively chop up the stream of bytes into equal-sized lengths to create a 2D array ("picture"). In a real picture, there will generally be a high amount of correlation between pixels directly above or below one another, and that seems like it wouldn't apply in this case.

      • Re:Ummm... (Score:5, Insightful)

        by DavenH ( 1065780 ) on Monday May 11, 2020 @09:39AM (#60047760)
        It is surprising enough that I feel like there must be some BS involved. The convolutional kernel maps that have been fitted to natural images are going to swing and completely miss when convolved against high-entropy chaos of compiled binary code. Perhaps they could hit on the low-entropy stuff, like data regions and string constants, but the fact that there's so little 2D local connectivity in a binary (it's sequential) I don't see how this is anywhere close to being the right model for the job.
        • by tlhIngan ( 30335 )

          when convolved against high-entropy chaos of compiled binary code

          It's binary code. It's actually highly structured and not as random as you think. It looks random to you and me, but that's because we're not trained to spot the patterns (plus, hexadecimal isn't always the best way to view opcodes especially if it doesn't line up with the fields).

          In fact, those who are experienced disassemblers can easily go and recognize patterns in the code - it comes with experience and you learn to recognize what compile

          • by DavenH ( 1065780 )
            Interesting points. Isn't it pertinent that you and I can't spot the patterns? If they were of similar distributions to nature, then we'd be able to without disassembly experience.

            My skepticism is mostly aimed at the validity of using a convolutional net trained on natural imagery distributions. I suspect that a well-designed neural net model could be trained to find similar such patterns on run-of-the-mill binaries, but I'd expect it to use attention rather than 2D connectivity, and I'd expect it to need

      • by Junta ( 36770 )

        Except that generally speaking the machine vision treatment of pictures would tend to process the image in very low precision. Machine vision would have a lot of problems if it would fixate on high variation among very nearby pixels, which in the real world would be noise to discard.

        As far as I've been told, the AI field have had a lot of interest in image and voice samples, because they are complex and unstructured data sources that are hard to traditionally process, but they can and do apply their method

      • > The interesting thing would be that existing image analysis algorithms would be relevant to identifying malware vs. non-malware, which isn't obvious

        The technique doesn't really distinguish between malware and non-malware per se. The system can be trained to detect the Clop ransomware. It can be trained to detect the Maze ransomware, etc. It detects each specific malware family that it was trained on.

    • by XXongo ( 3986865 )

      What's the difference between a grayscale image and an array of bytes?

      A grayscale image is an array of numbers, of course.

      The point is that there exists a large body of software to do image analysis, and the proposal is that some of this image analysis tech might be usable to analyze that array.

      I'm not sure if I believe it, but it's an interesting approach.

    • by ceoyoyo ( 59147 )

      An image encodes associations between values in two dimensions instead of just one. I don't see any reason why those associations would exist in a binary though. Which might mean that some intern at MS or Intel wanted to apply a pretrained 2D convnet used for image analysis instead of just training a 1D one.

      Or perhaps they don't realize 1D convolution exists. I know that sounds strange, but I've met "data scientists" who think convolution is specifically an image analysis technique.

      • The problem is that the second dimension is completely arbitrary. Unless they found something rather spectacular that allows you to find malware by "diagonal reading".

        • Re:Ummm... (Score:5, Insightful)

          by ceoyoyo ( 59147 ) on Monday May 11, 2020 @10:10AM (#60047850)

          Exactly. I've seen people try this before. The idea is that we've got all these powerful tools for image analysis, and encoding data as a visualization is really useful for humans, so it should be a good idea for computers too, right? People seem to like that style of superficial reasoning.

          If the description of what they've done is accurate, I'd think you'd be able to fool it just by sticking a few random bytes in your binary to throw off the way the lines are stacked. Like loading an image with the wrong size fastest varying dimension.

    • Reading the article - They're turning binary files into blurry images and building the hash from the blurry image.

    • i was thinking the same thing. all the image processing code i have ever written takes the pixel data and converts it to numbers, then maths the numbers to either form a new image, or find something in the image.

      if i do make some pixel data at some step of the process, it's so i can display it on screen for dramatic effect. i might add some high tech looking green targeting reticles to make it really look like it's doing something for the user. Then i'll intentionally slow my processing down because if t
    • You can quickly look at two images and compare them. This is an old trick but not often used. You can convert any data into an image to get a basic sense of its overall shape. You can also cat any file you like into a frame buffer or audio buffer.
    • by ljw1004 ( 764174 )

      What's the difference between a grayscale image and an array of bytes? This summary sounds like the worst of science reporting, where the reporter doesn't have the first clue what they're talking about...

      In the first, you look at correlations between arr[X] and arr[X-Width] and arr[X+Width] and arr[X-1] and arr[X+1]. In the second, you only look at correlations with arr[X-1] and arr[X+1].

  • Microsoft has never known how to deal with viruses in any of its software products. I'm not surprised they try to turn them into animated GIFs this time around...

  • I'm not sure I get the point of this beyond proving it can be done?

    Isn't it much more computationally intensive to analyze 2D images (that you have to render the raw data into, in the first place) than to just directly look for matching pieces of code?

  • by enriquevagu ( 1026480 ) on Monday May 11, 2020 @10:46AM (#60048012)

    Their proposal is analogous to the early attempts at using GPUs for general-purpose programming. Before you had CUDA and OpenCL, some people noticed that the shaders and texture pipelines in programmable GPUs (once they received FP support) were, essentially, high-performance matrix operators. When the GPU frame buffer could be extracted back to the CPU, they adapted their algebra code such that one matrix would be considered as a texture, and the pipeline applied the required operation by considering data matrices as image components. This was circa early 2000's.

    Today, they are doing the exact same conversion to exploit their already-existent HW-supported image analysis AI pipelines to identify malware samples.

  • Using images to store byte arrays and then referencing them with linear algebra and other GPU-optimizable math is quite clever and a wonderful optimization. Support Intelligence [support-intelligence.com] has done this for several years. Using a CNN built for photos (they're using a pre-trained ImageNet model) is new though, and I can't say it makes any sense to me; why would the features be at all related? (SI uses SVMs.)

    Take a look at Malware Detection by Eating a Whole EXE [arxiv.org] by researchers at UMD, nVidia, UMBC, et al. In section 7

  • Right, so you take data that is fundamentally 1D and you cut into slices of an arbitrary length because you need to fit it into a rectangle, presumably you also pad it with some extra 0s or something.
    So the relation between adjacent pixels on the Y axis represents very little and is based on your arbitrary choices.
    Then to scale the image to a more manageable size you average these unrelated values.
    And then you run it through a neural net that has been trained on data that has completely different characteri

    • Oh wait, I get it, it's computational homeopathy, you keep diluting the data until it's indistinguishable from random noise.

    • by ceoyoyo ( 59147 )

      I read the article, such as it is. They wanted to use a pretrained network. Most of the pretrained ones are for 2D images, so well, let's format our binary as a 2D image!

      I imagine it works just as well as looking at a bitmap you've rendered with an arbitrary width.

      Neural networks are extremely flexible though, so I'm sure they get something just by doing a bit of "fine-tuning" training. I expect they'd get better results training from scratch using a sensible model though.

  • This is the dumbest of dumb ideas. Really dumb. Incredibly dumb. Opcode patterns are relevant. 2D representations of opcodes are not. Anyone who had the faintest idea of what they're doing would know this simply would not work.

  • Because this is how we get Snow Crash!!!

"The whole problem with the world is that fools and fanatics are always so certain of themselves, but wiser people so full of doubts." -- Bertrand Russell

Working...