Forgot your password?
typodupeerror
The Internet Technology

51% of Internet Traffic Is "Non-Human" 125

Posted by samzenpus
from the it's-all-bots dept.
hypnosec writes "Cloud-based service Incapsula has revealed research indicating 51 per cent of website traffic is through automated software programs, with many programmed for malicious activity. The breakdown of an average site's traffic is as follows: 5% is due to hacking tools looking for an unpatched or new vulnerability within a site, 5% is scrapers, 2% is from automated comment spammers, 19% is the result of 'spies' collating competitive intelligence, 20% is derived from search engines (non-human traffic but benign), and only 49% is from people browsing the Internet."
This discussion has been archived. No new comments can be posted.

51% of Internet Traffic Is "Non-Human"

Comments Filter:
    • Re:Obligatory (Score:5, Insightful)

      by cosm (1072588) <(moc.liamg) (ta) (3msoceht)> on Wednesday March 14, 2012 @10:43PM (#39360435)
      When the singularity arrives, it wont be the T-900 to fear, but instead incessant little gnat-bots that swarm anything with a wallet. It seems the internet is denigrating to just another platform of the one true age-old human behavior--scheming and conning to get the most precious thing you have. Your personal information and your money. And this does not surprise me--for a technological system/network created by humans will be just as full of our flaws and intrinsic 'mental' malfunctions as any non-silicon process our species oversees. Evolutionary my dear Watson.
      • Re:Obligatory (Score:4, Interesting)

        by Anonymous Coward on Wednesday March 14, 2012 @10:49PM (#39360477)
        That sounds a lot like the singularity in Accelerando [wikipedia.org]. Basically the computer programs end up being so much richer than the humans that the story follows some humans that run away from the solar system because otherwise they can't afford to live.
        • by sackbut (1922510)
          They are uploaded humans though that 'run away' in a small (100 gram) 'spaceship'. One of the AIs that wants it's 'rights' is an amalgam of a Russian KGB/botnet and a network modeled on lobster neural function.
      • Perhaps. I look forward to the creation of an AI, as it will, no doubt, provide some insight that the human race has been lacking. However, that is merely a possibility -> who knows what the AI will actually be? Perhaps it will want to attend Art School...;-)

        • by Anonymous Coward

          Or even worse... become a financial programmer.

          • by Anonymous Coward

            It can't be worse than a lawyer? Can it?

            • The question is, in a contest to see who could be more evil, who would win, the AI or the humans?

              • Re:Obligatory (Score:4, Insightful)

                by DarkOx (621550) on Thursday March 15, 2012 @05:50AM (#39362109) Journal

                My guess is humans will be more evil, we are innovative in a way its hard to imagine an AI will be. It won't matter though. The AI will adapt, adopt, and iteratively improve on our ideas; using them against us so much effectively than we could ever hope to do.

                • Be careful. You do NOT want an Anti-Asimovian robot/AI being evil. Because we can sometimes be evil, the bot will ALWAYS be efficiently evil.

                • by Anonymous Coward

                  Being properly evil would be an NP-Complete problem, and we humans would kick the AI's butts!

                  Also, we could always just send wave after wave of our own men to try to overload the killbot limit.

                • Re: (Score:2, Interesting)

                  by Anonymous Coward

                  Just have to remember that Corporations are psychopathic artificial beings that are created by us, so that's how evil our creations can be.
                  If you create an AI to trade, merge and "generate wealth", then it would end up being just like a cold blooded Corporation.

      • by flyneye (84093)

        Let's not forget how keen the U.N. is on being internet cops. Maybe they'll be internet traffic cops and start issuing citations.

    • by Anonymous Coward

      the original [wikimedia.org]

    • by Polo (30659) *

      I was thinking it was a good thing [wikipedia.org]

  • Hmmm (Score:4, Funny)

    by koan (80826) on Wednesday March 14, 2012 @10:36PM (#39360375)

    "only 49% from people browsing the Internet." I wonder how much of that 49% is porn.

    • by Mashiki (184564)

      48.93% the other 0.06% is facebook, with 0.01% making up twitter spam.

    • Re:Hmmm (Score:5, Informative)

      by Anonymous Coward on Wednesday March 14, 2012 @11:17PM (#39360609)

      Hate to bring sources into a slashdot conversation, but Sandvine's 2011 report has 53.6% as "real-time entertainment". 29 percentage points are Netflix, 10 are YouTube.
      So if those numbers are correct, roughly 15% of the Internet is porn.

      • by GNious (953874)

        Hate to bring sources into a slashdot conversation, but Sandvine's 2011 report has 53.6% as "real-time entertainment"
        [...]
        So if those numbers are correct, roughly 15% of the Internet is porn.

        how much of the 53.6% is real-time porn?

      • by pjt33 (739471)

        That sounds like US Internet traffic rather than Internet traffic as a whole. Netflix don't operate in most of the world, whereas YouTube does.

      • by Anonymous Coward

        You are conflating two different statistics. TFA is about website traffic. Since I just repeated what the article said, I will explain. This statistic is only hits to websites and where they come from. You are talking about traffic on the internet. This is a totally different number.

        Duh.

    • 69% of that.
  • I knew it!
  • Any webmaster should already know this, probably way more than 51% for websites in existence for several years.
    • by mooingyak (720677)

      Any webmaster should already know this, probably way more than 51% for websites in existence for several years.

      Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.

      • Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.

        The interesting thing is that 51% is identifiable as bots. What about bots that are designed to emulate real users?

        I mention that because I have written some bots that are designed to emulate users as closely as possible, so as to not be noticed by paranoid webmasters. Mine follow valid workflow scenarios, and even pause appropriate amounts of time between post backs, so I am fairly certain that they have gone unnoticed.

        I don't think that I am more clever than the average hacker, so I am sure that oth

        • Don't forget to randomize intervals on a bell curve depending on content size and the bot's likes (tag mesh you designate and is compared to each page's dictionary). With some fairly basic data mining you can find bots if their jumps are regular and, if you really want to (or if you have been employed to find out) which of the human clients is an executable, you can always build your own uberbots and train ML algos to match them (and in turn the malicious visitors). Tried it once, it can get complicated, es

        • by mooingyak (720677)

          Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.

          The interesting thing is that 51% is identifiable as bots. What about bots that are designed to emulate real users?

          I mention that because I have written some bots that are designed to emulate users as closely as possible, so as to not be noticed by paranoid webmasters. Mine follow valid workflow scenarios, and even pause appropriate amounts of time between post backs, so I am fairly certain that they have gone unnoticed.

          I don't think that I am more clever than the average hacker, so I am sure that others are doing this sort of thing, too.

          It depends on a few things.

          First, how much do I actually care? If your bot is pulling down less than a thousand pages in a day, it's not going to be noticed by me, and it would have to get higher still for me to make an effort to filter it out.
          Second, did you fool my advertisers? I keep track of page views mostly to keep them honest. If there's a significant discrepancy between their numbers and mine, I'm going to find out why.
          I only block access in fairly extreme cases. I mostly just don't count bot re

  • by stoborrobots (577882) on Wednesday March 14, 2012 @10:41PM (#39360417)

    Which of those categories do data analysis and aggregation tools fall into?

    I'm thinking of user-focused tools like RSS Readers, Stock Quote graphers, etc... They're automated non-human tools which access websites, but it's not clear how they are being categorised...

  • web !=internet (Score:5, Insightful)

    by Anonymous Coward on Wednesday March 14, 2012 @10:41PM (#39360419)

    the article seems to be about websites, not the intetnet

  • by Anonymous Coward

    Is such a huge load over advertising.

    It doesn't link to any research, its simply and in house "research" after which they also suggest you that you really should use their service. So its to be taken with a grain of salt.
    There is no research method described, or anything else.

    • Is such a huge load over advertising.

      It doesn't link to any research, its simply and in house "research" after which they also suggest you that you really should use their service. So its to be taken with a grain of salt. There is no research method described, or anything else.

      Yeah there is nothing in that article that tells who the fuck Incapsula is. Do they have people with doctorates and PHD's doing their "research." Or are they just pulling numbers out of their butt. Smells like a fly by night scam.

      • Do they have people with doctorates and PHD's doing their "research." Or are they just pulling numbers out of their butt.

        Could be both. This allegedly happened in some areas where researchers felt that a "controlled publication" of scientific evidence could bring more exposure to what they considered important issues... (climategate, peppered moths, Libby half-life, etc.)

  • by Rie Beam (632299) on Wednesday March 14, 2012 @10:44PM (#39360447) Journal

    Hey, now, I know the United States isn't exactly the only game in town anymore, but you guys could be a little more sensitive.

    • Re: (Score:3, Funny)

      Don't worry, you're probably part of the 49%. The 51% is primarily comprised of furries, klingons, cat videos, our robot overlords, our reptilian overlords, our reptilian robot overlords, and the welsh.

      • by mjwx (966435)

        Don't worry, you're probably part of the 49%. The 51% is primarily comprised of furries, klingons, cat videos, our robot overlords, our reptilian overlords, our reptilian robot overlords, and the welsh.

        Our insect overlords wish to know why you've excluded them.

  • how much of that 49% is Reddit and 4chan?

    • by Anonymous Coward on Wednesday March 14, 2012 @10:56PM (#39360517)

      4chan -- where the men are men, the women are men, and the children are FBI agents.

      • by Mashiki (184564)

        And the FBI agents are trrrroooollllssss.

      • by xenobyte (446878)

        Awesome! - Maybe slightly expanded:

        4chan - where the men are men, the women are men, the trolls are infantile, the children are FBI agents and the pedophiles are rampant - and soon on their way to jail.

        Anonymous has left the building.

      • by Iskender (1040286)

        I'm more familiar with this being said about IRC.

        Of course, it was probably said about Usenet before my time...

    • add to that 10% of the 49% are nerds and those barely count as human
    • More than you might think. Not that those two are huge, but the same people are also everywhere else, including slashdot. Probably not so much on Facebook. Some people spend an inordinate amount of time online and have multiple personas. They spend so much more time online than most people they skew the statistics. You can't really say X% are doing A and Y% are doing B because mostly the same people are doing both.

      There are fewer people online that is apparent.

  • by Anonymous Coward

    Title says 51% of Internet Traffic.
    Summary says 51% of Website Traffic.

    Internet != Website.

    • by Dahamma (304068) on Wednesday March 14, 2012 @10:54PM (#39360505)

      The summary is ok, but the title is completely wrong. It could well be 51% of HTTP requests, but far as 'Internet traffic", it's probably a tiny fraction of a percent.

      In fact, why is it even surprising or newsworthy that 50% of HTTP requests are malicious? Anyone who runs a public web server will be able to see that pretty quickly (though as long as it's configured correctly the actual traffic will be tiny (consisting of a whole bunch of 404's).

      • by mooingyak (720677)

        They're not saying 50% are malicious, just non-human. I get a fairly large chunk of traffic from google's bots, which I don't consider malicious.

      • by ceoyoyo (59147)

        You'd think a Slashdot poster would know the difference between the web and the Internet. Sigh.

      • by wvmarle (1070040)

        If it's indeed http requests then the numbers start to make a little more sense. Especially the 20% from search engine crawlers is a very high number I'd say - considering that there are just a few serious crawlers around, and they won't visit a site every 10 minutes.

        • by Dahamma (304068)

          Actually if you think about it 20% from search engine crawlers would mean either the crawlers are ridiculously overcrawling, or there are just too many damn crawlers. How the hell can 1 out of 5 accesses to web sites be involved in trying to help people find web sites!? That's insane. So the real answer is not found in analyzing the data, it's analyzing the source.

          Basically, some random bullshit hosting company saw a trend with its low-traffic customer websites and is now extrapolating that to the Intern

  • by powerspike (729889) on Wednesday March 14, 2012 @10:50PM (#39360489)
    If you run wordpress for your site... It's more like 50% Bots (search engines), 40% Comment Spam, and 8% Content Scanners and 2% Visitors....
    • by Anonymous Coward
      You think that's bad, try running a small wiki.
    • by fenix849 (1009013)

      Wordpress gets a bad rap, because bad sysadmins/developers don't keep it up to date, or enable comments but don't enable akismet.

      But yeah Visitors will often be a fraction of overall web traffic to a given blog, regardless of the platfrom that runs it.

      • While i Agree with you, However regardless if you use akismet, or keep it up to date, if your site has any decent SE rankings, you are going to get hit big time by comment spam, and search engines. The more content you have the more you are going to get. From 7 Wordpress sites, in the last 24 hours i have received over 400 blog comments (including ones automaticlly marked as spam). It's pretty bad, and getting worse.
  • Web != Internet (Score:5, Insightful)

    by wiredlogic (135348) on Wednesday March 14, 2012 @11:18PM (#39360617)

    Seriously. Do they have liberal arts majors writing the headlines at /. now?

  • Did not realize there were that many furries out there. Though, it makes sense, we make the internets go [moonbuggy.org].
  • Someone should invent porn that appeals to screen scrapers, then we'd REALLY see web traffic go wild!

    • by Kozz (7764)

      Someone should invent porn that appeals to screen scrapers, then we'd REALLY see web traffic go wild!

      Scrapers Gone Wild!

      On second thought... maybe not.

  • by SuperCharlie (1068072) on Wednesday March 14, 2012 @11:43PM (#39360741)
    Try using a calendar which has next month and year links (along with every day therein) and doesnt know googlebot is coming.....gigs. seriously.
    • You should restrict it in robots.txt. I had to do it on my site because Google kept appending a character over and over to a variable in the URL. It would just add another character on and request again. It was a pretty weird bug and was generating gigs of traffic as well. You can also restrict Google's crawl rate in Webmaster Tools.
      • Thats what I did once I figured out what was going on. I had a php event calendar and Im pretty sure the bot took it from unix day zero through whatever the upper time limit is. It ended up being about 2 gigs per visit.
  • by Anonymous Coward

    The internet is dangerous, buy our security product.

  • by FrootLoops (1817694) on Thursday March 15, 2012 @12:04AM (#39360831)

    Here's the original ZDNet blog post [zdnet.com]. It's a longer article with more detail; it's also linked at the bottom of TFA, which seems to have plagiarized it. Compare the first paragraphs:

    [TFA] Cloud-based service, Incapsula, has revealed research indicating that 51 per cent of website traffic is through automated software programs; with many programmed for the intent of malicious activity.

    [ZDNet] Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.

    The sentence structure and order of ideas is identical, and many phrases are the same or nearly the same. A high schooler should do better. Minor rephrasing is not sufficient.

    That said, both articles are pretty much advertisements. The study doesn't appear to have attempted to actually be comprehensive (so it only used data from this one company). The point was apparently to give this cloud service provider some selling points for businesses to use their service to "secure" their sites. This story is yet another that shouldn't even have appeared on /.; shame on the editors who let it through.

    • 50% human traffic is too much even for HTTP protocol.

      It means semantic web concept if far from penetrating enough web. Human ability to perceive information is constant, does not change with time. Some might have illusion that we do improve this, but we are not.

      All those automated HTTP protocol robots are doing their service to us to reduce this overload and facilitate, even the evil ones.

      "5% is due to hacking tools looking for an unpatched or new vulnerability within a site, " those are wolves weeding out

    • The sentence structure and order of ideas is identical, and many phrases are the same or nearly the same. A high schooler should do better. Minor rephrasing is not sufficient.

      It could be that ZDNet was copied by TFA, but...

      That said, both articles are pretty much advertisements.

      ...suggests that it may just be that both articles are slightly rewritten versions of an Incapsula press release posted as "news", with one of them using more of the release.

      • Good point. Maybe so.

        I was curious and found this blog post [incapsula.com] from Incapsula which contains the statistics both articles used. The details are different enough that I wouldn't call either article "plagiarized" from that post, though the articles could have provided more accurate citations. The ZDNet post has some details like

        I spoke with Marc Gaffan, co-founder of Incapsula. “Few people realize how much of their traffic is non-human, and that much of it is potentially harmful.”

        which make me think it's probably an original work, despite being advertisement-heavy.

  • by Gravis Zero (934156) on Thursday March 15, 2012 @12:21AM (#39360913)

    Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.

    and it just so happens that Incapsula has the perfect solution to save you from all this... for a price. [wikipedia.org]

  • I worked for an anti-spam provider > 90% of emails were spam some customers > 99%. That said though spam emails tend to be short, almost to the point of ridiculous. I don't remember the exact numbers but say the average email that is legitmate is about 50k (because of attachments skewing it, but still even legitimate email tends to be 5+ sentences). Along comes duffious spammer. Not only are they shooting off 10k emails per bot per hour, but they are all one sentence emails with a tinyurl link in them

    • Bots send short emails usually for throughput reasons. Why waste bandwidth when you are both trying to use little enough so you don't get caught and your peak email send rate is inversely proportional to content size.

      Another tidbit that I'm sure a bunch of people know but is worth throwing out there: spam with images, there is a reason for that. The images round trip to the spamers servers. Usually they set it up so that your email account is tagged somehow in the url that your viewer sends to their server.

  • Guilty as charged. I admit, I've been known to check out the competition from other sites to ensure I'm not falling behind the curve. My guess is that they perform a reverse DNS lookup of their IP logs and determine that the company's network I'm behind belongs in the same industry as theirs.

    • This caught my eye, too. My question is, why are they counting the spies in with the "non-human" traffic??
  • However, based solely on the title, my reaction was "No shit, Sherlock". Or, to introduce the younger crowd to an "old saw"... See http://en.wikipedia.org/wiki/Occam's_razor [wikipedia.org], and ponder it well.

    Then, GET OFF MY LAWN!
  • by Osgeld (1900440) on Thursday March 15, 2012 @01:09AM (#39361153)

    Last time I checked whenever I sent any data across the net, it was not human, but rather data.

  • IPv6 to the rescue (Score:5, Interesting)

    by WaffleMonster (969671) on Thursday March 15, 2012 @01:13AM (#39361171)

    With IPv6 no more wholesale scanning of the entire global address space in minutes time looking for expliotable hosts. No more 5 minutes to ownage of unpatched PCs and the associated waste of bandwidth.

    No more self propogating worms using simple algorithms to divide and conquer the global network.

    In the grand scheme of things it won't help much but better than nothing.

  • last time I had a personal website up, 60% of it was buffer overflow bots, 20% were old IIS exploit bots and 10% were slashdot scans whenever I made a post.

    Really though, firewalls in the US should come with the entire Chinese, Russian, and Indian IP range blocked for incoming connections by default.

  • When it gets down to the the mythical 10% that human's supposedly use of their own information processing machine (their brains), will the net mind achieve sentience?

  • Consider, for instance, lol cats and pedo bears. Two distinct mammals that have perplexed the likes of sir Attenborough for many office hours.

    Testing the waters, we also have dramatic animals (it all began with a hamster), and the turtle kid. The latter a new breed of furry, that may prove more nuisance than entertainment.

  • Skynet has become self-aware.

  • HAT, ALUMINUM HAT NOW PLEASE. Who are these non humans filling up the pipes that lead into my house? ripping wire out in 3.2.1...
  • Not surprising.

    I bet Jane [wikipedia.org] makes up a large percentage of internet traffic.

The first Rotarian was the first man to call John the Baptist "Jack." -- H.L. Mencken

Working...