51% of Internet Traffic Is "Non-Human" 125
hypnosec writes "Cloud-based service Incapsula has revealed research indicating 51 per cent of website traffic is through automated software programs, with many programmed for malicious activity. The breakdown of an average site's traffic is as follows: 5% is due to hacking tools looking for an unpatched or new vulnerability within a site, 5% is scrapers, 2% is from automated comment spammers, 19% is the result of 'spies' collating competitive intelligence, 20% is derived from search engines (non-human traffic but benign), and only 49% is from people browsing the Internet."
Obligatory (Score:5, Funny)
Re:Obligatory (Score:5, Insightful)
Re:Obligatory (Score:4, Interesting)
Re: (Score:1)
Re: (Score:2)
Perhaps. I look forward to the creation of an AI, as it will, no doubt, provide some insight that the human race has been lacking. However, that is merely a possibility -> who knows what the AI will actually be? Perhaps it will want to attend Art School...;-)
Re: (Score:1)
Or even worse... become a financial programmer.
Re: (Score:1)
It can't be worse than a lawyer? Can it?
Re: (Score:3)
The question is, in a contest to see who could be more evil, who would win, the AI or the humans?
Re:Obligatory (Score:4, Insightful)
My guess is humans will be more evil, we are innovative in a way its hard to imagine an AI will be. It won't matter though. The AI will adapt, adopt, and iteratively improve on our ideas; using them against us so much effectively than we could ever hope to do.
Re:Evil (Score:2)
Be careful. You do NOT want an Anti-Asimovian robot/AI being evil. Because we can sometimes be evil, the bot will ALWAYS be efficiently evil.
Re: (Score:1)
Being properly evil would be an NP-Complete problem, and we humans would kick the AI's butts!
Also, we could always just send wave after wave of our own men to try to overload the killbot limit.
Re: (Score:2, Interesting)
Just have to remember that Corporations are psychopathic artificial beings that are created by us, so that's how evil our creations can be.
If you create an AI to trade, merge and "generate wealth", then it would end up being just like a cold blooded Corporation.
Re: (Score:2)
Let's not forget how keen the U.N. is on being internet cops. Maybe they'll be internet traffic cops and start issuing citations.
Re: (Score:1)
the original [wikimedia.org]
Re: (Score:2)
I was thinking it was a good thing [wikipedia.org]
Re: (Score:3)
The remainder is kids in Cambodia and Mexico seeking out places to spam and sending messages manually for $0.01 per 100 spams.
Which is not automated traffic.
Hmmm (Score:4, Funny)
"only 49% from people browsing the Internet." I wonder how much of that 49% is porn.
Re: (Score:3)
48.93% the other 0.06% is facebook, with 0.01% making up twitter spam.
Re: (Score:3)
I am the 0%.
Re: (Score:1)
I am the 0%.
With that post, not anymore :-)
Re:Hmmm (Score:5, Informative)
Hate to bring sources into a slashdot conversation, but Sandvine's 2011 report has 53.6% as "real-time entertainment". 29 percentage points are Netflix, 10 are YouTube.
So if those numbers are correct, roughly 15% of the Internet is porn.
Re: (Score:1)
Hate to bring sources into a slashdot conversation, but Sandvine's 2011 report has 53.6% as "real-time entertainment"
[...]
So if those numbers are correct, roughly 15% of the Internet is porn.
how much of the 53.6% is real-time porn?
Re: (Score:3)
That sounds like US Internet traffic rather than Internet traffic as a whole. Netflix don't operate in most of the world, whereas YouTube does.
Re: (Score:1)
You are conflating two different statistics. TFA is about website traffic. Since I just repeated what the article said, I will explain. This statistic is only hits to websites and where they come from. You are talking about traffic on the internet. This is a totally different number.
Duh.
Re: (Score:1)
But what about... (Score:1)
Re: (Score:2)
bots love porn.
Re: (Score:2)
Re:But what about... (Score:5, Funny)
... PORN?!?
It says right there in the summary: "only 49% from people browsing the Internet." Although you could argue that it's higher than that since spiders must crawl through porn too. Adding the 20% for the search spiders, we have that 69% of web traffic is porn related. A fitting number, I dare say.
Aliens! (Score:2)
Weak figure. (Score:1)
Re: (Score:3)
Any webmaster should already know this, probably way more than 51% for websites in existence for several years.
Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.
Its worse than that, hes a bot, Jim... (Score:3)
Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.
The interesting thing is that 51% is identifiable as bots. What about bots that are designed to emulate real users?
I mention that because I have written some bots that are designed to emulate users as closely as possible, so as to not be noticed by paranoid webmasters. Mine follow valid workflow scenarios, and even pause appropriate amounts of time between post backs, so I am fairly certain that they have gone unnoticed.
I don't think that I am more clever than the average hacker, so I am sure that oth
Re: (Score:2)
Don't forget to randomize intervals on a bell curve depending on content size and the bot's likes (tag mesh you designate and is compared to each page's dictionary). With some fairly basic data mining you can find bots if their jumps are regular and, if you really want to (or if you have been employed to find out) which of the human clients is an executable, you can always build your own uberbots and train ML algos to match them (and in turn the malicious visitors). Tried it once, it can get complicated, es
Re: (Score:3)
Agreed. I was thinking only 51%? I currently toss roughly 65% of my logs out when I'm calculating how much human traffic we've received.
The interesting thing is that 51% is identifiable as bots. What about bots that are designed to emulate real users?
I mention that because I have written some bots that are designed to emulate users as closely as possible, so as to not be noticed by paranoid webmasters. Mine follow valid workflow scenarios, and even pause appropriate amounts of time between post backs, so I am fairly certain that they have gone unnoticed.
I don't think that I am more clever than the average hacker, so I am sure that others are doing this sort of thing, too.
It depends on a few things.
First, how much do I actually care? If your bot is pulling down less than a thousand pages in a day, it's not going to be noticed by me, and it would have to get higher still for me to make an effort to filter it out.
Second, did you fool my advertisers? I keep track of page views mostly to keep them honest. If there's a significant discrepancy between their numbers and mine, I'm going to find out why.
I only block access in fairly extreme cases. I mostly just don't count bot re
Reading tools? (Score:3)
Which of those categories do data analysis and aggregation tools fall into?
I'm thinking of user-focused tools like RSS Readers, Stock Quote graphers, etc... They're automated non-human tools which access websites, but it's not clear how they are being categorised...
web !=internet (Score:5, Insightful)
the article seems to be about websites, not the intetnet
Re:web !=internet (Score:4, Informative)
Yeap, and that's the title of the ZDNet article, which was then copied by ITProPortal, which not only didn't add anything worthwhile, but also managed to fuck up the title.
That article. (Score:1)
Is such a huge load over advertising.
It doesn't link to any research, its simply and in house "research" after which they also suggest you that you really should use their service. So its to be taken with a grain of salt.
There is no research method described, or anything else.
Re: (Score:2)
Is such a huge load over advertising.
It doesn't link to any research, its simply and in house "research" after which they also suggest you that you really should use their service. So its to be taken with a grain of salt. There is no research method described, or anything else.
Yeah there is nothing in that article that tells who the fuck Incapsula is. Do they have people with doctorates and PHD's doing their "research." Or are they just pulling numbers out of their butt. Smells like a fly by night scam.
PHDs are not under oath (Score:2)
Do they have people with doctorates and PHD's doing their "research." Or are they just pulling numbers out of their butt.
Could be both. This allegedly happened in some areas where researchers felt that a "controlled publication" of scientific evidence could bring more exposure to what they considered important issues... (climategate, peppered moths, Libby half-life, etc.)
Arrogance (Score:3)
Hey, now, I know the United States isn't exactly the only game in town anymore, but you guys could be a little more sensitive.
Re: (Score:3, Funny)
Don't worry, you're probably part of the 49%. The 51% is primarily comprised of furries, klingons, cat videos, our robot overlords, our reptilian overlords, our reptilian robot overlords, and the welsh.
Re: (Score:2)
Don't worry, you're probably part of the 49%. The 51% is primarily comprised of furries, klingons, cat videos, our robot overlords, our reptilian overlords, our reptilian robot overlords, and the welsh.
Our insect overlords wish to know why you've excluded them.
95% of the 49% are missing a chromosome or two (Score:2)
how much of that 49% is Reddit and 4chan?
Re:95% of the 49% are missing a chromosome or two (Score:5, Funny)
4chan -- where the men are men, the women are men, and the children are FBI agents.
Re: (Score:2)
And the FBI agents are trrrroooollllssss.
Re: (Score:2)
Awesome! - Maybe slightly expanded:
4chan - where the men are men, the women are men, the trolls are infantile, the children are FBI agents and the pedophiles are rampant - and soon on their way to jail.
Anonymous has left the building.
Re: (Score:2)
I'm more familiar with this being said about IRC.
Of course, it was probably said about Usenet before my time...
Re: (Score:1)
More than you might think (Score:1)
More than you might think. Not that those two are huge, but the same people are also everywhere else, including slashdot. Probably not so much on Facebook. Some people spend an inordinate amount of time online and have multiple personas. They spend so much more time online than most people they skew the statistics. You can't really say X% are doing A and Y% are doing B because mostly the same people are doing both.
There are fewer people online that is apparent.
Re: (Score:1)
*than is apparent
Re: (Score:2)
So do you take a breath after "ever" or something when you say that?
Bad Title / Summary (Score:1)
Title says 51% of Internet Traffic.
Summary says 51% of Website Traffic.
Internet != Website.
Re:Bad Title / Summary (Score:5, Insightful)
The summary is ok, but the title is completely wrong. It could well be 51% of HTTP requests, but far as 'Internet traffic", it's probably a tiny fraction of a percent.
In fact, why is it even surprising or newsworthy that 50% of HTTP requests are malicious? Anyone who runs a public web server will be able to see that pretty quickly (though as long as it's configured correctly the actual traffic will be tiny (consisting of a whole bunch of 404's).
Re: (Score:3)
They're not saying 50% are malicious, just non-human. I get a fairly large chunk of traffic from google's bots, which I don't consider malicious.
Re: (Score:2)
Re:Bad Title / Summary (Score:4, Insightful)
If you get a fairly large chunk of traffic from Google's bots, then you must have almost no *actual* daily traffic :)
Re: (Score:2)
Last time I checked (which wasn't recent) googlebot only accounted for about 5-6 million a day of my total traffic.
Re: (Score:2)
5-6 million of what? Unit missing.
Re: (Score:1)
square inches..?
Re: (Score:2)
I don't claim to know why google does what they do. Maybe it's a matter of unique pages hosted -- we have a very large number. Maybe it's something else. But I just ran a spot check and I'm north of 3 mil in yesterday's logs.
FWIW, we're in the same ballpark in terms of total traffic.
Re: (Score:3)
You'd think a Slashdot poster would know the difference between the web and the Internet. Sigh.
Re: (Score:2)
If it's indeed http requests then the numbers start to make a little more sense. Especially the 20% from search engine crawlers is a very high number I'd say - considering that there are just a few serious crawlers around, and they won't visit a site every 10 minutes.
Re: (Score:2)
Actually if you think about it 20% from search engine crawlers would mean either the crawlers are ridiculously overcrawling, or there are just too many damn crawlers. How the hell can 1 out of 5 accesses to web sites be involved in trying to help people find web sites!? That's insane. So the real answer is not found in analyzing the data, it's analyzing the source.
Basically, some random bullshit hosting company saw a trend with its low-traffic customer websites and is now extrapolating that to the Intern
if you run wordpress... (Score:5, Insightful)
Re: (Score:1)
Re: (Score:2)
Wordpress gets a bad rap, because bad sysadmins/developers don't keep it up to date, or enable comments but don't enable akismet.
But yeah Visitors will often be a fraction of overall web traffic to a given blog, regardless of the platfrom that runs it.
Re: (Score:3)
Web != Internet (Score:5, Insightful)
Seriously. Do they have liberal arts majors writing the headlines at /. now?
Things I learn... (Score:1)
Scraper Porn (Score:2)
Someone should invent porn that appeals to screen scrapers, then we'd REALLY see web traffic go wild!
Re: (Score:2)
Someone should invent porn that appeals to screen scrapers, then we'd REALLY see web traffic go wild!
Scrapers Gone Wild!
On second thought... maybe not.
Want some bot traffic? (Score:3)
Re: (Score:2)
Re: (Score:2)
Executive summary (Score:2)
The internet is dangerous, buy our security product.
Better link, crappy story (Score:5, Informative)
Here's the original ZDNet blog post [zdnet.com]. It's a longer article with more detail; it's also linked at the bottom of TFA, which seems to have plagiarized it. Compare the first paragraphs:
[TFA] Cloud-based service, Incapsula, has revealed research indicating that 51 per cent of website traffic is through automated software programs; with many programmed for the intent of malicious activity.
[ZDNet] Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.
The sentence structure and order of ideas is identical, and many phrases are the same or nearly the same. A high schooler should do better. Minor rephrasing is not sufficient.
That said, both articles are pretty much advertisements. The study doesn't appear to have attempted to actually be comprehensive (so it only used data from this one company). The point was apparently to give this cloud service provider some selling points for businesses to use their service to "secure" their sites. This story is yet another that shouldn't even have appeared on /.; shame on the editors who let it through.
50% human traffic is too much (Score:2)
50% human traffic is too much even for HTTP protocol.
It means semantic web concept if far from penetrating enough web. Human ability to perceive information is constant, does not change with time. Some might have illusion that we do improve this, but we are not.
All those automated HTTP protocol robots are doing their service to us to reduce this overload and facilitate, even the evil ones.
"5% is due to hacking tools looking for an unpatched or new vulnerability within a site, " those are wolves weeding out
Advertisement-as-news (Score:2)
It could be that ZDNet was copied by TFA, but...
Re: (Score:2)
Good point. Maybe so.
I was curious and found this blog post [incapsula.com] from Incapsula which contains the statistics both articles used. The details are different enough that I wouldn't call either article "plagiarized" from that post, though the articles could have provided more accurate citations. The ZDNet post has some details like
I spoke with Marc Gaffan, co-founder of Incapsula. “Few people realize how much of their traffic is non-human, and that much of it is potentially harmful.”
which make me think it's probably an original work, despite being advertisement-heavy.
Consider the source (Score:5, Insightful)
Incapsula, a provider of cloud-based security for web sites, released a study today showing that 51% of web site traffic is automated software programs, and the majority is potentially damaging, — automated exploits from hackers, spies, scrapers, and spammers.
and it just so happens that Incapsula has the perfect solution to save you from all this... for a price. [wikipedia.org]
not really surprising I guess (Score:2)
I worked for an anti-spam provider > 90% of emails were spam some customers > 99%. That said though spam emails tend to be short, almost to the point of ridiculous. I don't remember the exact numbers but say the average email that is legitmate is about 50k (because of attachments skewing it, but still even legitimate email tends to be 5+ sentences). Along comes duffious spammer. Not only are they shooting off 10k emails per bot per hour, but they are all one sentence emails with a tinyurl link in them
oh and in case you are wondering (Score:3)
Bots send short emails usually for throughput reasons. Why waste bandwidth when you are both trying to use little enough so you don't get caught and your peak email send rate is inversely proportional to content size.
Another tidbit that I'm sure a bunch of people know but is worth throwing out there: spam with images, there is a reason for that. The images round trip to the spamers servers. Usually they set it up so that your email account is tagged somehow in the url that your viewer sends to their server.
Re: (Score:2)
Re: (Score:2)
I did not read the artice (Score:2)
Then, GET OFF MY LAWN!
Isnt Human Trafficing Wrong? (Score:3)
Last time I checked whenever I sent any data across the net, it was not human, but rather data.
IPv6 to the rescue (Score:5, Interesting)
With IPv6 no more wholesale scanning of the entire global address space in minutes time looking for expliotable hosts. No more 5 minutes to ownage of unpatched PCs and the associated waste of bandwidth.
No more self propogating worms using simple algorithms to divide and conquer the global network.
In the grand scheme of things it won't help much but better than nothing.
my breakdown (Score:2)
last time I had a personal website up, 60% of it was buffer overflow bots, 20% were old IIS exploit bots and 10% were slashdot scans whenever I made a post.
Really though, firewalls in the US should come with the entire Chinese, Russian, and Indian IP range blocked for incoming connections by default.
Sentience Imminent? (Score:2)
When it gets down to the the mythical 10% that human's supposedly use of their own information processing machine (their brains), will the net mind achieve sentience?
You'd think non-human share was higher (Score:1)
Consider, for instance, lol cats and pedo bears. Two distinct mammals that have perplexed the likes of sir Attenborough for many office hours.
Testing the waters, we also have dramatic animals (it all began with a hamster), and the turtle kid. The latter a new breed of furry, that may prove more nuisance than entertainment.
I have a theory. (Score:1)
Skynet has become self-aware.
Re: (Score:1)
HAT NOW. (Score:2)
Jane (Score:2)
Not surprising.
I bet Jane [wikipedia.org] makes up a large percentage of internet traffic.
Re: (Score:2)
Perhaps you're just considering a specific country. According to Wikipedia, the overall world sex ratio is 101 males to 100 females [wikipedia.org]. At birth, the ratio is more like 106 males to 100 females, though males die earlier than females, especially in later years. (An aunt who used to be a delivery room nurse told me that female babies are generally stronger than males, so eg. a premature female has a higher chance of surviving.) Some cultures don't like girl babies, leading to infanticide or abortions, so the rat
Re: (Score:2)
Perhaps you're just considering a specific country. According to Wikipedia, the overall world sex ratio is 101 males to 100 females [wikipedia.org]. At birth, the ratio is more like 106 males to 100 females, though males die earlier than females, especially in their last years. (An aunt who used to be a delivery room nurse told me that female babies are generally stronger than males, so eg. a premature female has a higher chance of surviving.) Some cultures don't like girl babies, leading to infanticide or abortions, so the ratio can get artificially skewed; it also just seems to naturally vary a bit.
Re:49% of population is male... (Score:4, Funny)
Thanks, but I prefer "This." ;)
Re: (Score:2)
Perhaps you're just considering a specific country. According to Wikipedia, the overall world sex ratio is 101 males to 100 females [wikipedia.org]. At birth, the ratio is more like 106 males to 100 females, though males die earlier than females, especially in their last years. (An aunt who used to be a delivery room nurse told me that female babies are generally stronger than males, so eg. a premature female has a higher chance of surviving.) Some cultures don't like girl babies, leading to infanticide or abortions, so the ratio can get artificially skewed; it also just seems to naturally vary a bit.
Almost everyone dies in their last year....