Support

Admin Tools

#10169 DFIShield blocking Googlebot!

Posted in ‘Admin Tools for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Admin Tools version
n/a

Latest post by nicholas on Friday, 23 December 2011 03:24 CST

TurnTex
Nicholas,

Upon reviewing my WAF Security Exceptions log, I noticed that the Googlebot was blocked by DFIShield. Should this be happening? If not, what do I need to do to prevent this? I sure don't want Google getting pissed off at me!

Here is a screen shot of the exception log.

nicholas
Akeeba Staff
Manager
Hi Curtis,

First, add Googlebot's IP address to the "Never block these IPs" field in Admin Tools' Configure WAF page. This will prevent Googlebot from being accidentally blocked. Do NOT add this IP to any other whitelist.

Next, please show me the URLs! One uncommon and very tricky hacking method is the attacker filling a page of hack URLs on a page and have Google index the page. The hack URLs are URLs pointing to your site with a known exploit, e.g. SQLi, DFI and so on. Google will index that page and try to follow the links. The idea is that most sites give Googlebot a "free pass" without any security check. So, if Google tries to access the hack URL, the site could be caught off guard and hacked. The only way to know if this is the case here is showing me the URLs in question.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

TurnTex
Nicholas,

Can you tell me what URL's you are talking about and how to find them? if you are talking about the url that googlebot was accessing, it is shown on the screen shot I attached. If something else, please let me know.

Also, does the googlebot always use the same ip address? Seems like I have seen it come in on different ip addresses in JoomlaWatch. If so, is there a range I should be whitelisting?

nicholas
Akeeba Staff
Manager
Yes, that's the URL I was talking about. I thought it was truncated. OK, if this is the URL, it's definitely a hacking attempt. The flypage parameter is supposed to point to a PHP file holding the product presentation page and it's set to .. (list the directory above). So, just do what I said above and let Google see a few 403 errors for this URL so that it stops trying to index it.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

nicholas
Akeeba Staff
Manager
Ah, regarding your question about GoogleBot's IPs, look here: http://chceme.info/ips/

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

TurnTex
Thank you, sir! Just for information, can you explain more about how this hacking attempt works? Is the hacker making googlebot carry something to my site somehow? How do they benefit by having googlebot follow a link to my site?

Also, if I put googlebot's ip address in the never block field, won't that prevent them from getting the 403 error? I guess I am just a little confused but ABSOLUTELY trust your experience and will do what you say!

nicholas
Akeeba Staff
Manager
The attack is simple. I create a page which looks like legitimate content and link to your site. Only the link includes some nefarious query string, designed to (hopefully) hack your site. Then I let Google index my page, e.g. submitting a sitemap. Google will scan my page and try to follow all links, including the ones with the nefarious parts. It's actually very simple. The benefit of this method is twofold:
- You don't now who tried or succeeded to hack you, so you can't block them or easily trace it back to the attacker. Tougher forensics mean less risk of being caught.
- Many sites give Googlebot a "free pass", not enforcing any security checks to traffic generated by GoogleBot. This means that the possibility of the attack being blocked is significantly lower.

Regarding the solution I proposed. Here's what's going on now. If GoogleBot repeatedly triggers exceptions on your site, its IP will be automatically blocked be Admin Tools. You don't want that. You want Googlebot to continue seeing the 403s for the nefarious URLs (so that it will stop trying to index them after a while), but still be able to "see" the rest of your site. That's exactly what my proposed solution does.

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

TurnTex
It is probably that same sonofabitch that has been trying to hack me with the kiddie scripts for the last few weeks. I finally got sick of seeing his attempts and blacklisted his ip address ranges. He was using a couple of different ranges but they all went back to the same city and state here in the US. Right after I blocked his ip addresses, I started getting these googlebot blocks. I wonder why they want to hack my little old site so bad?!

Anyway, thanks again for your most excellent support and work!!

Oh yeah...on the googlebot ip list you linked above...how would I enter all of those ranges so I make sure to not block google? Can I use a dash between ranges such as 64.233.160.0-64.233.191.255 then a comma for the next range?

nicholas
Akeeba Staff
Manager
Hm, the "Never block these IPs" doesn't support ranges yet. OK, I will be fixing that for the next release. In the meantime, you can't do much. Just wait for these URLs to not be indexed any more :(

Nicholas K. Dionysopoulos

Lead Developer and Director

πŸ‡¬πŸ‡·Greek: native πŸ‡¬πŸ‡§English: excellent πŸ‡«πŸ‡·French: basic β€’ πŸ• My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!