Thursday 18 August 2011

Social Robot Invasion


Robot Wars

Am I at war with the robots or should I embrace them?

I recently released a set of 42 Free Twitter Templates and set about promoting it through Twitter and Facebook.

In the knowledge it would bring me more visitors to see my other web design work I thought it well worth investing the three days to create and implement the download system into my website.

To monitor the results I installed a new visitor counter the week before release so that I could instantly see what was going on and the results from the promotion were quite astounding from a site that was attracting only around a dozen visitors per day.  With the clock set at the 1956 previous visitors the installation took place and I suddenly found around 100 per day before its promotion launch late one afternoon I sent out the first Tweet and the results came in pretty quickly by midnight the clock had reached 361 visits in one day!  The following day saw the site achieve 1117 visitors and this surpassed all my expectations so as you would expect I should be a happy man!

However one thing I noticed was that the bots on site were out numbering the visitors and although they are not counted as a visitor it does make me wonder just who is behind all these bots and what is there purpose?

I've used this promotion combination on my other websites before but never with such an instant success.  From a human factor I realise that offering Free Twitter Template to Twitter users was bang on the money as far as target marketing is concerned but what was the key that drew so many bots?  Is it the word "FREE" if so why are people setting up these bots to trawl for specific keywords?  I can only assume they are but I'd be interested to know a little bit more if anyone can tell me just how they are targeted.



From what I've found out already, the good guys who we want to attract to our web pages, are the search engines, for they are the ones that bring us visitor traffic. However there are bad ones too.

The following is taken from Wikipedia
A Web crawler (BOT) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or Web scutters.

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches. Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for sending spam).

A Web crawler is one type of bot, or software agent. In general, it starts with a list of URLs to visit, called the seeds. As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of URLs to visit, called the crawl frontier. URLs from the frontier are recursively visited according to a set of policies.

The large volume implies that the crawler can only download a fraction of the Web pages within a given time, so it needs to prioritize its downloads. The high rate of change implies that the pages might have already been updated or even deleted.

The number of possible crawlable URLs being generated by server-side software has also made it difficult for web crawlers to avoid retrieving duplicate content. Endless combinations of HTTP GET (URL-based) parameters exist, of which only a small selection will actually return unique content. For example, a simple online photo gallery may offer three options to users, as specified through HTTP GET parameters in the URL. If there exist four ways to sort images, three choices of thumbnail size, two file formats, and an option to disable user-provided content, then the same set of content can be accessed with 48 different URLs, all of which may be linked on the site. This mathematical combination creates a problem for crawlers, as they must sort through endless combinations of relatively minor scripted changes in order to retrieve unique content.

Malicious use of bots is the coordination and operation of an automated attack on networked computers, such as a denial-of-service attack by a botnet. Internet bots can also be used to commit click fraud and more recently have seen usage around MMORPG games as computer game bots. A spambot is an internet bot that attempts to spam large amounts of content on the Internet, usually adding advertising links.There are malicious bots (and botnets) of the following types:
  • Spambots that harvest email addresses from internet forums, contact forms or guestbook pages
  • Downloader programs that suck bandwidth by downloading entire web sites
  • Web site scrapers that grab the content of web sites and re-use it without permission on automatically generated doorway pages
  • Viruses and worms
  • DDoS attacks
  • Botnets / zombie computers; etc.
  • File-name modifiers on peer-to-peer file-sharing networks. These change the names of files (often containing malware) to match user search queries.
  • Automating the entry of internet sweepstakes or instant win games to get an advantage
  • Automating tasks on promotional web sites to win prizes
  • Votebots which automatically cast votes for or againsts certain forms of user-contributed content such as videos on Youtube or reader comments on blog pages.
  • Bots are also used to buy up good seats for concerts, particularly by ticket brokers who resell the tickets. Bots are employed against entertainment event-ticketing sites, like TicketMaster.com. The bots are used by ticket brokers to unfairly obtain the best seats for themselves while depriving the general public from also having a chance to obtain the good seats. The bot runs through the purchase process and obtains better seats by pulling as many seats back as it can.
  • Bots are often used in massively multiplayer online role-playing games (MMORPG) to farm for resources that would otherwise take significant time or effort to obtain; this is a concern for most online in-game economies. As such, players are often banned from their respective MMORPG for going outside the programming and "cheating" as bots are not typically allowed because they give an unfair advantage.
The most widely used anti-bot technique is the use of CAPTCHA, which is a type of Turing test used to distinguish between a human user and a less-sophisticated AI-powered bot, by the use of graphically encoded human-readable text.

Over to you...

------------------------------------------------------------------------

1 comment:

  1. Bots have become far more advanced from the old web-crawling scripts of yesteryear. I remember back in 2001ish, I thought these scripts running on UNIX and Linux boxes were awesome. Most did simple things like search the web for specific content and create autoblogs and galleries.

    Nowadays, there are people using these bots to collect data.. Call it market research or spying, it's not anything new but it's become much more pervasive now.

    Can you imagine if Google or Facebook ever went evil? With all the data they have access too, they can essentially create AI or even more pushy interruptive marketing that can guess what you want at any time, guilting or tricking you into buying.

    It's a bit scary, yes.. But the writing has been on the wall for quite some time!

    ReplyDelete