Skip to content

anthmn/ai-bot-blocker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ichido AI And LLM Bot Blocker

The AI/LLM bot blocker web server, firewall, and robots.txt config generator used in production by the Ichido Search Engine. These configs block known large AI and LLM bots from accessing your site content, while still allowing classical search engines and legitimate users to access content. Supports the following web servers, firewalls, and standards:

Server/Firewall Blocked
Iptables IP Addresses
Apache User-Agent
Nginx User-Agent
Lighttpd User-Agent
Caddy User-Agent
IIS User-Agent
Robots.txt User-Agent

In total there are 6 variants of config files, of which you'll only need 2 with the minimal config (1 web server config and 1 robots.txt), or 3 with the full config (1 web server config, 1 robots.txt, and 1 firewall config). The minimal config will block most AI bots with a low false positive rate, and the full config that aggressively blocks AI bots and site scrapers, but will likely have many more false positives (due to blocking of all IP addresses from large cloud vendors that most LLMs operate on). It is recommended for most use cases to use the minimal config.

The config files can be built manually from source, or prebuilt files can be downloaded from Ichido's file server. Minimal config prebuilt files are prefixed with minimal- and full files with full-. Below are instructions for applying the configurations.

Usage

Step 1. Download An AI Bot Blocker Robots.txt.

  1. Download the robots.txt file and add it to the root of your web content (should be reachable at https://\<your\_site\>/robots.txt).
wget https://files.ichi.do/minimal-robots-block-ai-bots.conf /var/www/html/<web_root>/robots.txt

Step 2. Download An AI Bot Blocker Web Server Config.

Apache

For shared hosting, use the .htaccess file instructions below.

  1. Enable the rewrite module.
sudo a2enmod rewrite
  1. Download the config file into apache's conf-available directory:
sudo wget https://files.ichi.do/minimal-apache-block-ai-bots.conf -O /etc/apache2/conf-available/block-ai-bots.conf
  1. Create a symbolic link to the config in /etc/apache2/conf-enabled/
ln -s /etc/apache2/conf-available/block-ai-bots.conf /etc/apache2/conf-enabled/
  1. Restart apache.
sudo service apache2 restart

.htaccess

  1. Download the config file.
sudo wget https://files.ichi.do/minimal-htaccess-block-ai-bots.conf
  1. Merge the config with your existing .htaccess file, either manually using your hosting provider tools or with this command.
cat .htaccess minimal-htaccess-block-ai-bots.conf > temp.conf
mv temp.conf .htaccess

Nginx

  1. Download the config file into nginx's modules-available directory:
sudo wget https://files.ichi.do/minimal-nginx-block-ai-bots.conf -O /etc/nginx/modules-available/11-block-ai-bots.conf
  1. Include the config in your server blocks after the listen directives:
# Ichido AI Bot Blocker.
include /etc/nginx/modules-available/11-block-ai-bots.conf;
  1. Restart nginx.
sudo service nginx restart

Lighttpd

  1. Download the config file into lighttpd's conf-available directory:
sudo wget https://files.ichi.do/minimal-lighttpd-block-ai-bots.conf -O /etc/lighttpd/conf-available/11-block-ai-bots.conf
  1. Create a symbolic link to the config in /etc/lighttpd/conf-enabled/
sudo ln -s /etc/lighttpd/conf-available/11-block-ai-bots.conf /etc/lighttpd/conf-enabled/
  1. Restart lighttpd.
sudo service lighttpd restart

Caddy

  1. Make directories to store caddy config files.
sudo mkdir -p /etc/caddy/conf-available/
sudo mkdir -p /etc/caddy/conf-enabled/
  1. Download the config file into /etc/caddy/conf-available/:
sudo wget https://files.ichi.do/minimal-caddy-block-ai-bots.conf -O /etc/caddy/conf-available/11-block-ai-bots.conf
  1. Create a symbolic link to the config in /etc/caddy/conf-enabled/
sudo ln -s /etc/caddy/conf-available/11-block-ai-bots.conf /etc/caddy/conf-enabled/
  1. Import the config file in your site blocks. For example:
# Ichido AI Bot Blocker.
:80 {
    import /etc/caddy/conf-enabled/11-block-ai-bots.conf
}
  1. Restart caddy.
sudo service caddy restart

IIS

TODO

(Optional) Step 3. Download An AI Bot Blocker Firewall Config.

Iptables

  1. Install iptables-persistent.
sudo apt-get install -y iptables-persistent
  1. Download the config file into /etc/iptables/rules.v4:
sudo wget https://files.ichi.do/full-iptables-block-ai-bots.conf -O /etc/iptables/rules.v4
  1. Restart iptables.
sudo service iptables restart

Contributing

For ease of contribution, this repo is hosted on Github and mirrored on Ichido's Software Forge. If you have a Github account, you can contribute using Github's standard workflow, but if you do not have a Github account you can still contribute via email patches using the workflow below:

  1. Clone this repo:
git clone https://git.ichi.do/anthony/ai-bot-blocker
cd ai-bot-blocker
  1. Add your name and an email address to the locally cloned repo:
git config user.name "<name>"
git config user.email "<email>"
  1. Make changes to the source code.
  2. Add those changes and commit:
git add .
git commit -m "ADD: new commit."
  1. Create a patch file from the new commit:
# Use HEAD~1 for 1 commit, HEAD~2 for 2 commits, etc.
git diff HEAD~1 > diff.patch
  1. Send the patch file to <anthony.m.mancini@protonmail.com> through email.

License

(C) Anthony Mancini 2024. Licensed under the AGPL-3.0 (see LICENSE.txt).

Contact

About

Ichido AI and LLM Bot Blocker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published