The AI/LLM bot blocker web server, firewall, and robots.txt config generator used in production by the Ichido Search Engine. These configs block known large AI and LLM bots from accessing your site content, while still allowing classical search engines and legitimate users to access content. Supports the following web servers, firewalls, and standards:
Server/Firewall | Blocked |
---|---|
Iptables | IP Addresses |
Apache | User-Agent |
Nginx | User-Agent |
Lighttpd | User-Agent |
Caddy | User-Agent |
IIS | User-Agent |
Robots.txt | User-Agent |
In total there are 6 variants of config files, of which you'll only need 2 with the minimal config (1 web server config and 1 robots.txt), or 3 with the full config (1 web server config, 1 robots.txt, and 1 firewall config). The minimal config will block most AI bots with a low false positive rate, and the full config that aggressively blocks AI bots and site scrapers, but will likely have many more false positives (due to blocking of all IP addresses from large cloud vendors that most LLMs operate on). It is recommended for most use cases to use the minimal config.
The config files can be built manually from source, or prebuilt files can be downloaded from Ichido's file server. Minimal config prebuilt files are prefixed with minimal-
and full files with full-
. Below are instructions for applying the configurations.
- Download the robots.txt file and add it to the root of your web content (should be reachable at
https://\<your\_site\>/robots.txt
).
wget https://files.ichi.do/minimal-robots-block-ai-bots.conf /var/www/html/<web_root>/robots.txt
For shared hosting, use the .htaccess
file instructions below.
- Enable the rewrite module.
sudo a2enmod rewrite
- Download the config file into apache's
conf-available
directory:
sudo wget https://files.ichi.do/minimal-apache-block-ai-bots.conf -O /etc/apache2/conf-available/block-ai-bots.conf
- Create a symbolic link to the config in
/etc/apache2/conf-enabled/
ln -s /etc/apache2/conf-available/block-ai-bots.conf /etc/apache2/conf-enabled/
- Restart apache.
sudo service apache2 restart
- Download the config file.
sudo wget https://files.ichi.do/minimal-htaccess-block-ai-bots.conf
- Merge the config with your existing
.htaccess
file, either manually using your hosting provider tools or with this command.
cat .htaccess minimal-htaccess-block-ai-bots.conf > temp.conf
mv temp.conf .htaccess
- Download the config file into nginx's
modules-available
directory:
sudo wget https://files.ichi.do/minimal-nginx-block-ai-bots.conf -O /etc/nginx/modules-available/11-block-ai-bots.conf
- Include the config in your
server
blocks after thelisten
directives:
# Ichido AI Bot Blocker.
include /etc/nginx/modules-available/11-block-ai-bots.conf;
- Restart nginx.
sudo service nginx restart
- Download the config file into lighttpd's
conf-available
directory:
sudo wget https://files.ichi.do/minimal-lighttpd-block-ai-bots.conf -O /etc/lighttpd/conf-available/11-block-ai-bots.conf
- Create a symbolic link to the config in
/etc/lighttpd/conf-enabled/
sudo ln -s /etc/lighttpd/conf-available/11-block-ai-bots.conf /etc/lighttpd/conf-enabled/
- Restart lighttpd.
sudo service lighttpd restart
- Make directories to store caddy config files.
sudo mkdir -p /etc/caddy/conf-available/
sudo mkdir -p /etc/caddy/conf-enabled/
- Download the config file into
/etc/caddy/conf-available/
:
sudo wget https://files.ichi.do/minimal-caddy-block-ai-bots.conf -O /etc/caddy/conf-available/11-block-ai-bots.conf
- Create a symbolic link to the config in
/etc/caddy/conf-enabled/
sudo ln -s /etc/caddy/conf-available/11-block-ai-bots.conf /etc/caddy/conf-enabled/
- Import the config file in your site blocks. For example:
# Ichido AI Bot Blocker.
:80 {
import /etc/caddy/conf-enabled/11-block-ai-bots.conf
}
- Restart caddy.
sudo service caddy restart
TODO
- Install
iptables-persistent
.
sudo apt-get install -y iptables-persistent
- Download the config file into
/etc/iptables/rules.v4
:
sudo wget https://files.ichi.do/full-iptables-block-ai-bots.conf -O /etc/iptables/rules.v4
- Restart iptables.
sudo service iptables restart
For ease of contribution, this repo is hosted on Github and mirrored on Ichido's Software Forge. If you have a Github account, you can contribute using Github's standard workflow, but if you do not have a Github account you can still contribute via email patches using the workflow below:
- Clone this repo:
git clone https://git.ichi.do/anthony/ai-bot-blocker
cd ai-bot-blocker
- Add your name and an email address to the locally cloned repo:
git config user.name "<name>"
git config user.email "<email>"
- Make changes to the source code.
- Add those changes and commit:
git add .
git commit -m "ADD: new commit."
- Create a patch file from the new commit:
# Use HEAD~1 for 1 commit, HEAD~2 for 2 commits, etc.
git diff HEAD~1 > diff.patch
- Send the patch file to <anthony.m.mancini@protonmail.com> through email.
(C) Anthony Mancini 2024. Licensed under the AGPL-3.0 (see LICENSE.txt).
- Anthony Mancini <anthony.m.mancini@protonmail.com>