# Surprisingly, this robots.txt only confuses a few of the bots that come by. # I guess that's a good thing that comes from Nutch, larbin, et al. becoming # popular: everyone and their cats are running spiders, but at least they're # intelligent spiders with good robots.txt parsing. # msnbot-products comes in because msnbot is allowed, but I don't want it to. User-agent: msnbot-products Disallow: / User-agent: Exabot #User-agent: Gigabot User-agent: Googlebot User-agent: Googlebot-Image User-agent: Googlebot-Mobile User-agent: ia_archiver User-agent: Krugle #User-agent: MJ12bot User-agent: MojeekBot User-agent: MSIECrawler User-agent: msnbot User-agent: msnbot-media User-agent: msnbot-NewsBlogs # Yeah, right, news or blogging here. User-agent: Slurp User-agent: SurveyBot # http://www.domaintools.com/'s bot. User-agent: Teoma User-agent: Yahoo-MMCrawler Disallow: /%23 # I hope this works to disallow /#. Disallow: /cgi-bin/cwcount Disallow: /disptext/ Disallow: /junk/99bottlesofbeer.html #Disallow: /junk/omgwtf2007/? Disallow: /junk/omgwtf2007/omgwtf. #Disallow: /mozilla/searchplugins/? #Disallow: /mozilla/xpi/? Disallow: /parts/h/banninate Disallow: /parts/h/banninator.fcgi Disallow: /parts/h/tg/ Disallow: /parts/h/tg2/ # I should probably just use "/parts/h/tg", huh? Disallow: /parts/h/tg3/ Disallow: /parts/h/tg5/ Disallow: /parts/h/tgt/ #Disallow: /parts/words/? Disallow: /xxx_robots.txt # Do data from ia_archiver still get sold? I don't like that... I think Exalead # might sell their data, too. Hmm. # I commented out Gigabot due to IncrediBILL not liking some of their clients: # http://incredibill.blogspot.com/2007/02/gigablast-to-google-content-connection.html User-agent: * Allow: /fistdecade.php Disallow: / # Hopefully nobody's going to barf on this, especially since there's no # User-Agent. Says someone using Allow and comments. Sitemap: http://www.mattnordhoff.com/sitemap.xml