Support

Akeeba Backup for Joomla!

#37219 Failed backups causing PHP Max_Children issue

Posted in ‘Akeeba Backup for Joomla!’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
3.10.8
PHP version
7.4.29
Akeeba Backup version
8.1.3

sdlsites

We have been seeing an occasional issue where MySites.Guru will notify us of a failed backup. When the backup fails, the site goes offline and upon investigation, a PHP Max_Children error is logged on the server. 

This is happens across multiple websites. Is there a recommended setting for Max_Children to avoid such an issue or is there something that we should be doing differently? It seems like this issue tends to happen after the backup is completed and the files are transferred to Google Drive. 

nicholas
Akeeba Staff
Manager

The PHP FastCGI Process Manager (PHP-FPM) works by maintaining a pool of PHP execution threads which are normally sitting idle. When a request comes in PHP-FPM assigns one of those threads to handle it and when it's done communicates the results back to the web server. Normally, one a few threads (between 3 and 10 on most servers) are kept in memory.

If requests keep coming in and there are no available threads in the PHP-FPM pool, it will create new threads (children). If it reaches the max_children limit it will refuse to handle the request and log the error about max_children.

While the backup requests are technically "slow" requests (they take several seconds each) they DO NOT occur concurrently — at least per site. They come one after another. This means that we are only occupying one (1) PHP-FPM child thread during the backup.

This means that you have one of two problems:

1) You are taking backups of multiple sites at the exact same time and they are all handled by the same PHP-FPM pool. If the number of concurrent backups and web visitors (including each engines and other automations!) exceeds the max_children setting of the PHP-FPM pool some requests are dropped.

The solution to that is to space out your backups or use the native CLI backup script (cli/akeeba-backup.php) with a regular CRON job to take a backup. The first solution means that backups are not concurrent, therefore you don't have persistent occupation of several PHP-FPM child threads but it also means that they may stretch to such a long time that it might become impractical for you. The second solution just doesn't use any PHP-FPM threads at all. The CLI scripts runs under the PHP CLI binary which is completely unrelated to PHP-FPM. It is also much faster as you don't waste time going through a web server and everything that entails. The downside is that it's slightly harder to set up — but that's true only for the first backup you set up, then you just copy over the CRON command line and adjust the path to the CLI script (helpfully printed in Akeeba Backup's Schedule Automatic Backups page!) and the backup profile, if you're not using the default.

2) PHP-FPM does not release the thread into the pool once it finishes executing. Frankly, I had only seen that happening well over 10 years ago, when PHP-FPM was still in a teething stage. If this is the case you need to talk to your host to see what's going on with the server.

If I were to make an educated guess, I'd say that the former problem I described is far more likely than the latter.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

sdlsites

Nicholas,

Thank you for the extensive reply. We actually had another site kicked off for a backup that began running from 23:30 and was running for around 30 minutes when the problems began. At the time, it was the only backup that was running on the server. 

I passed along your second possibility to the hosting provider and they are exploring it, but they came back with the following report while it is still being investigated. 

Please let me know if anything stands out that may be causing an issue. 

 

 

-------

 

Step 1.)  PHP-FPM Max Children limit of 10 reached 1646 times for lambertvillenj.org on June 1st, 2022

Log path:   /opt/cpanel/ea-php74/root/usr/var/log/php-fpm/error.log


* Example error:
 
[root@cloudvpsserver ~]# grep -i max_children /opt/cpanel/ea-php74/root/usr/var/log/php-fpm/error.log|grep lambertvillenj|grep 01-Jun-2022|tail -1
[01-Jun-2022 23:30:02] WARNING: [pool lambertvillenj_org] server reached max_children setting (10), consider raising it


* Number of times that errors  was logged on June 1st 2022:
 
[root@cloudvpsserver ~]# grep -i max_children /opt/cpanel/ea-php74/root/usr/var/log/php-fpm/error.log|grep lambertvillenj|grep 01-Jun-2022|wc -l
1646


TO RESOLVE THIS ISSUE:

adjusted the PHP-FPM settings for lambertvillenj.org

* WHM => MultiPHP Manager => User domain Settings => lambertvillenj.org => PHP-FPM Settings
 
Max Requests:  20  =>  200
Max Children:  10  =>  20
Process Idle Timeout:  10  =>  5

1A.)  I raised the "Max Requests" to increase how many requests can be served by an existing "Max Child" before it is recycled. 

1B.)  I raised the "Max Children" to allow for more concurrent connections / requests 

1C.)  I lowered the "Process Idle Timeout" to make idle processes cycle faster, rather than sitting unused.

==========

Step 2. Checked the PHP-FPM php error log, but no recent errors were recorded:

PHP-FPM PHP error log:   /home/lambertvillenj/logs/lambertvillenj_org.php.error.log 
 

[root@cloudvpsserver ~]# tail -1 /home/lambertvillenj/logs/lambertvillenj_org.php.error.log 
[21-May-2022 09:34:43 UTC] PHP Warning:  session_start(): Failed to read session data: user (path: /var/cpanel/php/sessions/ea-php73) in /home/lambertvillenj/public_html/libraries/joomla/session/handler/native.php on line 260
[root@cloudvpsserver ~]# 


NO ISSUES TO ADDRESS
==========


Step 3.)  Several Apache Timeout errors for lambertvillenj.org

Apache error log:  /var/log/apache2/error_log 
 

[root@cloudvpsserver ~]# grep lambertvillenj /var/log/apache2/error_log 
...etc...

[Thu Jun 02 00:01:21.672176 2022] [proxy_fcgi:error] [pid 14824:tid 46962644735744] (70007)The timeout specified has expired: [client 74.105.2.160:36570] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/
[Thu Jun 02 00:01:41.360365 2022] [proxy_fcgi:error] [pid 1271:tid 46962604812032] (70007)The timeout specified has expired: [client 74.105.2.160:36624] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/
[Thu Jun 02 00:02:07.468748 2022] [proxy_fcgi:error] [pid 1381:tid 46962611115776] (70007)The timeout specified has expired: [client 52.91.90.57:17008] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/resident/calendar/2570
[Thu Jun 02 00:02:49.621652 2022] [proxy_fcgi:error] [pid 1380:tid 46962604812032] (70007)The timeout specified has expired: [client 74.105.2.160:37638] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/
[Thu Jun 02 00:02:52.198788 2022] [proxy_fcgi:error] [pid 1381:tid 46962596407040] (70007)The timeout specified has expired: [client 74.105.2.160:37646] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/
[Thu Jun 02 00:03:30.000546 2022] [proxy_fcgi:error] [pid 14824:tid 46962598508288] (70007)The timeout specified has expired: [client 74.105.2.160:36932] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/
[Thu Jun 02 00:04:13.246408 2022] [proxy_fcgi:error] [pid 14824:tid 46962499417856] (70007)The timeout specified has expired: [client 74.105.2.160:37034] AH01075: Error dispatching request to : (polling), referer: https://lambertvillenj.org/


TO RESOLVE THIS ISSUE:

I checked the current Apache configuration, and found no Timeout value set.  When a value is not defined, the "default value" is used, in this case the "Default Timeout" is "60"
 

* WHM => Apache Configuration => Include Editor => PreVirtual Host Include => All Versions:

Original settings:  
KeepAlive On
KeepAliveTimeout 2
MaxKeepAliveRequests 1500

<IfModule mpm_event_module>
    StartServers             6
    MinSpareThreads        150
    MaxSpareThreads        250
    ServerLimit            32
    ThreadsPerChild         50
    MaxRequestWorkers      1600
    MaxConnectionsPerChild   10000
</IfModule>


Adjusted Settings:  
KeepAlive On
KeepAliveTimeout 2
MaxKeepAliveRequests 1500
Timeout 300

<IfModule mpm_event_module>
    StartServers             6
    MinSpareThreads        150
    MaxSpareThreads        250
    ServerLimit            32
    ThreadsPerChild         50
    MaxRequestWorkers      1600
    MaxConnectionsPerChild   10000
</IfModule>

I increased the Apache Timeout from 60 to 300 (5 minutes) which is fairly standard, especially when you see a lot of those "(70007)The timeout specified has expired" errors.

 

 

nicholas
Akeeba Staff
Manager

Based on the report you shared with me, it looks like your server was already overloaded and hitting the max_children limit nearly 1700 times in a day — that's around once every 52 seconds on average! No wonder your backup didn't run. If your server was so overloaded it was a literal roll of the dice whether the backup step request coming from mySites would be handled by your server or result in an error.

I suggest waiting for another backup AFTER these changes have been made. If the backup fails again, please try to figure out why it failed. Do not assume it's the same problem as this seems to be addressed now.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

sdlsites

Nicholas,

He didn't properly word that in his report. It was a lifetime hit of 1646 that was reached yesterday. All 1646 didn't occur yesterday. Still not great. We will monitor.

We have seen a failed backup every few months, but we had a few this week across different websites so it was a reason to reach out. I will double-check the timing of backups to make sure that they aren't running concurrently. This may have been the issue for other problems that we have experienced. 

 

Support Information

Working hours: Typically we work Monday to Friday, 9am to 7pm Cyprus timezone (EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets, but we cannot respond to them, outside of our working hours.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!

Summer vacations: Our support will be closed for replies and new tickets from August 6th to August 21st, 2022 due to summer vacations.