How to Block Bad Bots from Accessing a Site

Generally speaking, not all Bot or Internet Bots are bad. However there are overwhelming number of Crawling bot, Scrapping bots and so many other kinds of bots are out there that its almost impossible to stop all of them from accessing your site. In most cases these Bots won’t add any value to your website, contrary to that they create more problem than helping you in any way or form.

Their blatant disregard for robots.txt is fairly well known among the web developers community and individuals who pays serious attention to these issue. Now, there are ways to stop these unexpected visitors. Luckily if you are running Apache Web Server, you may seek help from .htaccess. Today in this post I will share a simple snippet that would allow you to put end to these bots. You can get creative with snippet afterwards. So, let’s get started.

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR] 
RewriteCond %{HTTP_USER_AGENT} ^Widow 
RewriteRule ^.* - [F,L]

Now, copy this snippet and paste it on your site's .htaccess file. Once done, simply upload it on your server. What you need to remember though, I am blocking few user-agents here to demonstrate the snippet. This list can get really really big. Below you will find few sites and their links that maintains database for number of Bots. Many of them are outdated, so you need to use this snippet wisely. Rather than blocking thousands of Bots, simply open up your Server Log / Visitors log from your server and find them out. This is time consuming but you will definitely be benefited out of all these chaotic things.

Note: There are thousands of bots out there and number of websites maintains a healthy database entirely dedicated to list newly found Bots. Sites like BotReports, RobotsTXT, User-Agents, UserAgentString, Udger etc has listed number of User Agents. However, I would suggest you to check your Server/Visitors log from your server and add them on your .htaccess file. You get the idea.

Resources: mod_rewrite, Rewrite rules

Today In History



Leave a Reply

Note: Convet HTML, PHP, JavaScripts from HTMLify, before posting from comment section.
License: By submitting a comment here you grant this site a perpetual license to reproduce your words and name/Web site in attribution. Please use your real name or a pseudonym (i.e., pen name, alias, nom de plume) when commenting. If you add your site name, company name, or something completely random, I'll likely change it to whatever I want.