Support

Admin Tools

#42074 How to block unwanted bots

Posted in ‘Admin Tools for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
5.3.1
PHP version
8.2
Admin Tools version
7.8.0

Latest post by nicholas on Wednesday, 02 July 2025 07:11 CDT

esperando

Hello, I recently noticed an unjustifiably high consumption of the available bandwidth provided by my hosting server. Yesterday, on July 1st, 186,874 MB were consumed (186 thousands and 874 MB).

I don’t think this traffic could be due to human visits. From the statistics i see that most of them come from United States and Belgium. I am from Greece. My guess is that either someone is scraping my site or it’s caused by AI bots training or something similar. I’m wondering if you’ve experienced this as well and if you have any solutions to suggest.

One possible solution might be to use the Cloudflare free plan, but I’m also looking for other alternatives. Perhaps some rules in the robots.txt file like these? https://github.com/ai-robots-txt

Thank you

esperando

Update.

This is from the webserver statistics from yesterday.

The first ip is from AWS-ANTHROPIC, the second from Microsoft and the rest from RIPE Network Coordination Centre.

#       Hits                Visitors Tx.      Amount                  Country                 Data

1 121.382 (29.41%)  1 (0.02%)    5.45 GiB (5.31%)  US United States   216.73.216.248
2 60.793 (14.73%)    1 (0.02%)    5.4 GiB (5.26%)    US United States   20.171.207.46
3 4.298 (1.04%)        1 (0.02%)    3.73 GiB (3.63%)  BE Belgium           57.141.0.2
4 4.097 (0.99%)        1 (0.02%)    3.59 GiB (3.50%)  BE Belgium           57.141.0.18
5 4.032 (0.98%)        1 (0.02%)    3.52 GiB (3.43%)  BE Belgium           57.141.0.4
6 3.899 (0.94%)        1 (0.02%)    3.41 GiB (3.32%)  BE Belgium           57.141.0.13
7 3.750 (0.91%)        1 (0.02%)    3.29 GiB (3.20%)  BE Belgium           57.141.0.25

nicholas
Akeeba Staff
Manager

I think you've pretty much answered your question. These are all search engine bots.

Technically, you CAN add the user agents to the .htaccess Maker's list of user agents to block (enabling the feature to block specific user agents first, of course).

I do not recommend it, though. This will make your site disappear. If the AI services cannot access your site, any queries made to them will not include results from your site – which is a big problem, as people have started moving away from traditional search and into AI-assisted searching. If you block the indexing bots of legitimate search engines it will be even worse, as you will essentially be telling search engines to not look into your site, therefore be unable to return any search results which include your site. Essentially, your site will be invisible to the Internet unless someone explicitly types a URL to your site into their browser's address bar, or follow a link on another page. Since you have an e-shop, that would be tantamount to business suicide.

Yes, the bandwidth all these services consume is annoyingly high. The best solution I could recommend is putting a CDN like BunnyCDN (EU-based) or CloudFlare (US-based) in front of your site AND enabling Joomla's System - Page Cache plugin AND Joomla's cache in the Global Configuration. This will make the public pages of your site behave like a more or less static site which is only updated every so often, as defined in the Joomla cache settings. Set the cache time to 1440 minutes to cache your site's public pages for an entire day. The upside is that most of the traffic coming from these services will be served from the CDN's cache, without hitting your server. The downside is that every time you change something on your site you'll have to go to your CDN to invalidate the cache if you want the changes to be shown to real users immediately.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

esperando

I changed my robots.txt like this, to allow google and bing search bots and disallow the AI' s. So i will not appear in chatgpt' s results, but i will apeear in google and bing search. Am i right?

 

# Block Anthropic's Claude bot
User-agent: anthropic-ai
Disallow: /

# Block Common Crawl (used by many LLMs)
User-agent: CCBot
Disallow: /

# Block Amazonbot (often associated with AI data collection)
User-agent: Amazonbot
Disallow: /

# Block GPTBot (OpenAI's web crawler)
User-agent: GPTBot
Disallow: /

# Block Facebook AI Research
User-agent: facebookexternalhit
Disallow: /

# Block Bytespider (used by ByteDance/TikTok AI)
User-agent: Bytespider
Disallow: /

 

# Allow Google search

User-agent: Googlebot
Allow: /*.js*
Allow: /*.css*
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.gif*

Disallow: /administrator/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/

Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/

Disallow: /tmp/

 

# Allow Bing search

User-agent: Bingbot
Allow: /*.js*
Allow: /*.css*
Allow: /*.png*
Allow: /*.jpg*
Allow: /*.gif*

Disallow: /administrator/
Disallow: /api/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/

Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/

Disallow: /tmp/

nicholas
Akeeba Staff
Manager

I believe that is correct.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!