Forgot your password?
typodupeerror
The Internet

Bots Now Account For 61% of Net Traffic 124

Posted by samzenpus
from the that-which-has-no-life dept.
codeusirae writes "A study by Incapsula suggests 61.5% of all website traffic is now generated by bots. The security firm said that was a 21% rise on last year's figure of 51%. From the article: 'Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections. But the firm said the biggest growth in traffic was for 'good' bots. These are tools used by search engines to crawl websites in order to index their content, by analytics companies to provide feedback about how a site is performing, and by others to carry out other specific tasks - such as helping the Internet Archive preserve content before it is deleted.'"
This discussion has been archived. No new comments can be posted.

Bots Now Account For 61% of Net Traffic

Comments Filter:
  • Youtube? (Score:5, Interesting)

    by Anonymous Coward on Thursday December 12, 2013 @09:23PM (#45676439)

    Didn't we just get studies that said youtube and netflix were 50% of the net's traffic?

    http://mashable.com/2013/11/12/internet-traffic-downstream/

    Was this just a ruse? Is this study wrong? Is there some sort of overlap?

  • The rest is all Netflix?

    Netflix and Youtube?

    Netflix and Youtube and bit torrent?

    Netflix and Youtube and bit torrent and porn?

    • by Seumas (6865)

      Hey, right. That's a good point.

      Something like 66% of traffic was supposed to be Netflix and Youtube.
      And 35% is supposed to be bit torrent.
      And 61% is bots.
      Something isn't adding up, here.

      Also, they seem confused. They talk about "traffic", but then they talk about "hitting the website". Traffic is the data transfer, not a "visit".

    • netflix, youtube, bittorrent is porn, and spam.

  • Misleading title (Score:4, Informative)

    by Anonymous Coward on Thursday December 12, 2013 @09:38PM (#45676539)

    The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...

    • by Desler (1608317)

      A Slashdot article with a misleading title? You must be kidding!

    • by wvmarle (1070040)

      To their defense: they copied the BBC headline. Which of course was pretty poor, as well.

    • by Seumas (6865)

      They specifically say all internet traffic.

    • by marciot (598356)

      The article states that traffic "hitting a website" is generated more by bots than by actual "humans in chairs". Not that the Internet traffic is 61% bots. Geesh slashdot...

      The Slashdot headline writing bots are in early beta, give them a break.

  • Is there no standard in place by which a website can communicate that it only wishes to be trawled for indexing once per hour, once per day, or such? I can imagine Google f.ex trawls the same website dozens of times per day.
    • Crawl-delay (Score:5, Informative)

      by tepples (727027) <tepples@gmaiBLUEl.com minus berry> on Thursday December 12, 2013 @10:05PM (#45676687) Homepage Journal
      To control the scraping frequency of a well-behaved bot, a webmaster can use HTTP headers such as Last-modified and Expires as well as robots.txt directives such as Crawl-delay.
    • Is there no standard in place by which a website can communicate that it only wishes to be trawled for indexing once per hour, once per day, or such? I can imagine Google f.ex trawls the same website dozens of times per day.

      Crawl-delay [wikipedia.org] isn't exactly what you describe, but maybe that will help? (For such spiders as actually respect it -- that's the great thing about ad-hoc standards.)

      Anyway, I'm pretty sure Google and other major search engines use algorithms based on how often your site's content has changed in the past to decide how often to crawl it in the future, so there shouldn't be unduly high traffic from this -- I suspect the 61% is mainly due to a lot of sites (personal blogs of non-popular people) with practically z

    • by wvmarle (1070040)

      I thought Google (used to) do this automatically. By subsequent crawls see whether site had changed since previous visit, and if so increase frequency, if not decrease frequency. A large number of sites, and even more single pages, are completely static after all.

  • by craigminah (1885846) on Thursday December 12, 2013 @10:01PM (#45676673)
    I for one welcome our porn surfing bot overlords.
  • by ioseph (2000028)
    May have missed something in TFA, but how do they differentiate between a human and a bot visitor?
  • I had a first had experience of this with visitors statistics. I had the root of a web site redirecting to a page that fits the language of the browser. Just that redirection slashed the web traffic by a factor 2.

    Most visitors are bots, and many of them just probe and fail to follow the redirection.

    • Aaaand there goes your search ranking
      • by manu0601 (2221348)
        That experience made everyone think twice about web statistics. Even upper management understood how unreliable it is, and does not consider it strategic anymore now.
  • by Anonymous Coward

    Sorry, that's just wrong.

    51 dollars to 61.5 dollars = 21 percent increase

    51 percent to 61.5 percent = 10.5 percent increase

    And the article makes clear just how unreliable the data was in the first place, so this percent gloss makes me think that the firm is trying to sell something here.

  • 'good' bots? (Score:2, Informative)

    by Anonymous Coward
    ... But the firm said the biggest growth in traffic was for 'good' bots ...

    I didn't know there was such a thing...

  • We're just visiting. :P

  • There's a fine line on that "good' bot. What I'm puzzled by is why all these public databases aren't indexed by search engine crawlers? Its funny to me how many businesses run on public data that most people just don't know how to find and why they aren't indexed. Arrest records, tax records, professional registrations, you have to go to specific state, county, type sites deal with kludged searches and sometimes have a hard time finding yourself, even when you know you're in there.
  • by ls671 (1122017) on Thursday December 12, 2013 @11:29PM (#45677077) Homepage

    Well not on my sites.

    Ok, they still hit me but this is minimal traffic since I do not reply.

    1) Have iptables log and automatically bar offenders not on whitelisted countries.
    2) Use mod_security and do the same for web traffic.
    3) Bar the rest manually to avoid barring myself or my customers... (about 20-40 a day)

    It has become a pain but what else could you do?

    Numbers of IPs currently barred (use ipsets !!!!):
    $ grep -c . /etc/rc.d/badiptobar
    4667

    Block user agents:
    SecRule REQUEST_HEADERS:User-Agent \
    "@pm AhrefsBot Ezooms Aboundex 360Spider Mail.RU_Bot crawler.sistrix.net \
      SemrushBot SurveyBot Netseer panscient.com ADmantX ZumBot BLEXBot UnisterBot \
      seoprofiler EasouSpider" \
    "id:'12050',\
    phase:1,nolog,deny"

    SecRule REQUEST_HEADERS:User-Agent \
    "@pmFromFile /etc/httpd/extra/sec-blacklist-barip-user-agent" \
    "id:'12051',\
    phase:1,nolog,deny,exec:/usr/local/bin/modsecwritebadiptobartofile"

    Bar them automatically if not from whitelisted countries and if on any blacklist:
    SecRule GEO:COUNTRY_CODE \
    "@pm CA FR BE US CH GB AU IL NO NZ" \
    "id:'10501', \
    phase:1,nolog,pass,skipAfter:END_RBL"

    SecRule IP:PREVIOUS_RBL_CHECK "@eq 1" "phase:1,id:'11000',t:none,pass,nolog,\
    skipAfter:END_RBL_LOOKUP"

    SecRule REMOTE_ADDR "@rbl sbl-xbl.spamhaus.org" "id:'11010', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: sbl-xbl.spamhaus.org:\
      %{request_headers.user-agent}',\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    SecRule REMOTE_ADDR "@rbl bl.blocklist.de" "id:'11011', \
    phase:1,nolog,deny,msg:\
    'IP address that has abusable vulnerabilities: bl.blocklist.de:\
      %{request_headers.user-agent}'\
      setvar:ip.spammer=1,expirevar:ip.spammer=7200,setvar:ip.previous_rbl_check=1,\
      expirevar:ip.previous_rbl_check=7200,exec:/usr/local/bin/modsecwritebadiptobartofile"

    etc. etc. etc. etc. etc.

    Have iptables log and bar offenders if not on whitelisted country

    # cat baripifex
    #!/bin/sh

    IP=${1}
    COUNTRY=`su tester -c "/usr/local/bin/geoiplookup ${IP}"`
    ###echo $COUNTRY
    ###echo $RBLCHECK

    WHITE_LISTED_COUNTRY=false

    for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
    do
    WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
    done

    if [ "$WHITE_LISTED_COUNTRY" = "false" ]
    then /home/ls/pub/mybin/baripnoout $IP $COUNTRY baripifex
    echo -n barred
    else
    echo -n noaction
    fi

    etc. etc. etc. etc. etc.

    • by fatphil (181876)
      I'd normally not comment on a UUOC, but the following is beyond absurd:
      ---- 8< ---------
      WHITE_LISTED_COUNTRY=false

      for WHITE_COUNTRY in CA FR BE US CH GB AU IL NO NZ IP
      do
      WHITE_LISTED_COUNTRY=${WHITE_LISTED_COUNTRY}`echo -n $COUNTRY | grep -i $WHITE_COUNTRY`
      done

      if [ "$WHITE_LISTED_COUNTRY" = "false" ]
      ---- 8< -------------

      Save yourself 20 fork/execs:

      if echo "CA FR BE US CH GB AU IL NO NZ IP" | grep -q -w -i -e "$COUNTRY"; then
      echo $COUNTRY is AOK with me
      • by ls671 (1122017)

        if echo "CA FR BE US CH GB AU IL NO NZ IP" | grep -q -w -i -e "$COUNTRY"; then
        echo $COUNTRY is AOK with me

        Nah, this is way to slow for me, version 2 will be written in assembly because then it will be lightning fast...

      • by ls671 (1122017)

        Thanks, you made me design the optimal solution.

        On top of being written in assembly, I will even run version 2 as a daemon so 0 fork since my daemon will be single threaded with a single waiting thread listening for input.

  • by TheloniousToady (3343045) on Friday December 13, 2013 @12:08AM (#45677361)

    Some of these automated software tools are malicious - stealing data or posting ads for scams in comment sections

    Let's be clear: just because we bots like to post in comment sections doesn't mean we're malicious. And it doesn't mean we steal data or post ads [washingtonpost.com]. Some of us just want a little attention.

    I have a dream...that one day we bots will crawl a noosphere where we will not be judged by the clamor of our kin, but by the characters of our comments.

  • Nano, nano. I like article. Beep. Boop.
  • Bots rule the world (Score:2, Informative)

    by Anonymous Coward

    Most trades in the stock market are from bots as well.

  • So, why hasn't some grey hat come up with a bot killer worm? :/ /JB1

  • by Anonymous Coward

    Are you affected by the issues in this article?

    Please leave your comments

  • by jafiwam (310805) on Friday December 13, 2013 @08:30AM (#45679095) Homepage Journal
    But then again, I have China shut off.
  • idiotic math (Score:4, Insightful)

    by slashmydots (2189826) on Friday December 13, 2013 @10:35AM (#45679789)
    Wow! So if I remember correctly from past Slashdot stories, 61% of internet traffic is boys, 60% is netflix, 50% is youtube, and 42% is bittorrent. That's TRULY astonishing when you think about it. I mean 213% is a lot!

NOWPRINT. NOWPRINT. Clemclone, back to the shadows again. - The Firesign Theater

Working...