One of my web sites has been spidered by the e-SocietyRobot spider. It’s web site is at http://www.yama.info.waseda.ac.jp/~yamana/es/, slightly more legible using Babelfish. e-SocietyRobot is not a search engine. e-SocietyRobot hit 4,549 pages, no MP3 files luckily, but still used 51MB of traffic. But it is some unknown research project attempting to spider the web. They have no plans on making their indexed pages available. So should I try to block off that robot? I of course want the search engines to spider my sites. But I don’t want to help some anonymous “research” project. Maybe they are spammers. Maybe they are going to use my site to feed into some splogs.
On a related issue, I wish that spiders would give an accurate referrer. Even if the referrer was another page in my own site, it would be useful to know where they are coming from. Does anyone know why they don’t?