Support

Akeeba Backup for Joomla!

#33397 Akeeba not uploading to Google Drive when using Watchfulli

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
n/a
PHP version
n/a
Akeeba Backup version
n/a

Latest post by on Wednesday, 19 August 2020 17:17 CDT

maestroc

Not sure if this is an Akeeba issue or Watchfulli but please see the video below to see the problem.  Briefly, when using Google Drive for remote storage akeeba is not uploading the file to Google when I run a remote backup via Watchfulli even though I know the Google Drive keys and my downloadID in watchfulli are correct.

https://drive.google.com/file/d/1Lx4qGA8T0AW9DmRpzbK_MHYIUvGBYs6h/view

 

Note the error message seen at 1:15 in the video when I try to manually force akeeba to upload to Google.  Wondering if something similar is happening between akeeba and watchfulli?  Like the master downloadID is not being accepted when passed from Watchfulli?

 

nicholas
Akeeba Staff
Manager

The video is not useful for determining the root cause. I need the log file of the failed backup attempt. Please ZIP and attach it to your next post, as requested when filing the ticket. Thank you!

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

maestroc

My apologies.  I had downloaded the log but forgot to include it.  Here it is.

nicholas
Akeeba Staff
Manager

The log file reads:

You must enter your Download ID in the application configuration before using the “Upload to Google Drive” feature.

Have you entered a valid Download ID in Akeeba Backup on that site?

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

maestroc

Let me try to clarify.  I normally don't create a child download ID for each site I run and instead had been putting my master download ID in Watchful.li.  It has always worked fine until very recently. 

When I run a backup on a site it has dutifully uploaded the archive to S3 or Google Drive even if the local download ID was not entered in the site itself.

If I need to start using a child downloadID on each site then I certainly can do that, however the fact that Akeeba is marking them as completed when the transfer did not complete is a bigger concern for me at the moment.

S3 uploads seem to work correctly but they don't always transfer all the parts despite being marked in the backup manager as completed.  Parts of the archive are left on the local server even though Akeeba reports that they have all been transferred. On sites that use Google Drive the uploads also do not transfer despite them being marked as complete.

Please see this quick video where I try to show you these confusing issues on a different site (this one using S3).  I will also attach the relevant log file.

https://drive.google.com/file/d/1VAtYK07wJwMUJsWYqKdc62hOAKa0z3Ns/view

 

maestroc

I also went ahead and made an addon download id and put it in the site and tried it again.  It did the same thing.  Marked it as completed but only uploaded a few parts of the archive to S3.  Log file for this latest attempt is attached as well.

maestroc

I lastly tonight tried running a backup from the back end of the site instead of remotely.  That one failed to complete as well but did give me an error message on the screen when it failed.  However it also marked that attempt it in the manager as completed.  

 

nicholas
Akeeba Staff
Manager

Starting with Akeeba Backup 7 some things changes for every storage engine which uses OAuth2: Box, DropBox, Google Drive, Google Storage, OneDrive (legacy and the unified OneDrive and OneDrive for Business implementation) and pCloud.

You cannot connect to these services without an access token. Their APIs do not allow you to create that token directly. They require a public, known in advance authentication endpoint which will exchange a secret key known to the client and the initial response of the storage provider's authentication server with an access token. Furthermore, each and every time you take a backup you need to use the secret key to refresh the access token (exchange the expired one with a new one). The two problems with OAuth2 and mass distributed software is that a. the endpoint needs to be known in advance and b. the secret cannot be disseminated with the software because it'd jeopardise everyone's security.

There are two ways to go about a solution.

One way is to ask you, our clients, to host that endpoint on your site. You'd need to sign up for a developer account on the storage engine provider you want to use. Some of them require payment to do so. You'd need to host a special file that runs outside Joomla on your site and make it web accessible. You'd need to navigate the very unfriendly and ever changing interface of the storage provider you'd like to use, create an OAuth2 login screen and configure the endpoints correctly. You'd need to write and publish Privacy Policy and Terms of Service pages for the OAuth2 integration on your site, even if you're the only one using it. You'd need to go through a verification process. Then and only then would you be able to start transferring backups to the remote storage. This is impractical, to say the least. 

The other way to go about it is us going through all that process and host the authentication mediation endpoint on our own server. The downside is that it costs us real money to develop and maintain this endpoint. Do remember that each and every of the backups you take use it which costs us money.

Between Akeeba Backup 3 and 6 inclusive we made these endpoints available free of charge. This led to a significant number of clients who let their subscription lapse but were still using our services to facilitate backups on their sites. We reached a point where this was no longer sustainable so we enforced a Download ID check starting with Akeeba Backup 7 -- and put the code to convey the Download ID in the last Akeeba Backup 6 release as well. That's why you need an active Download ID to upload a backup to these storage providers, as documented.

So, that explains the Google Drive problem.

Amazon S3 does not need a Download ID because it does not go through our server. Same thing applies for every other storage provider that does not need to go through our server. If you don't need to consume our resources to take a backup we don't need to ask for rent for our resources, to put it very bluntly.

Looking at your logs, your problem with Amazon S3 is different. It says that the archive part file disappeared while trying to upload it. This means that something outside the backup process removed it. If you are starting another backup or visiting Akeeba Backup's control panel page while the upload to any remote storage is in progress you will cause the archive part files not already uploaded to disappear. Same goes if an external service tries to access Akeeba Backup's API at that point. Or maybe you have a CRON job, yours or your host's, which deletes these files automatically. Or a plugin which does the same. The problem is not coming from the backup process, it's external to it.

I hope this helps.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

maestroc

That helps a lot and solves the Google Drive issue.  Thank you!

If I could continue on the last paragraph you wrote though, I have a feeling that while that may be the cause on those specific backups that I sent you the log files to I think I have a server issue related to this post that may also be affecting my uploads to S3 and Backblaze as when running a backup from the back end of the site I get the same errors that that other customer mentions:

Backblaze B2 API Error bad_request: Missing header: Content-Length
Post-processing interrupted -- no more files will be transferred

In that post you mention that it is something for the host to fix.  In my case, I am the host running several sites on a VPS with cpanel.  I have no proxy like Cloudflare setup but I do have the CSF firewall running for security  so not sure what to do or tweak to troubleshoot this issue.  Do you have any suggestions of what I should be looking for on the server that might cause that interference with the uploads to S3, Backblaze, etc?

nicholas
Akeeba Staff
Manager

I was not talking about a web proxy or CDN in the issue you've linked to. I am talking about the server proxying its outbound requests through something. Typically this is Squid or Varnish. This is typically done in a belief that by proxying requests FROM the site TO the outside world (the exact opposite of what CloudFlare does) any requests to third party services would be cached and your site would be faster. However, this falls flat when we are trying to send data to the remote service or read up-to-date content to make decisions, i.e. what Akeeba Backup does.

Also note that this might be outside your VPS. A VPS nowadays is just a virtual machine. Proxying would very likely take place at the VM host level which is outside your control. Therefore you need to talk to your web host about it.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

maestroc

Excellent.  Thank you!  I will talk to my VPS provider about that.

One last thing (sorry) but getting back to the original problem now- that the backups were being marked as complete when in truth they did not actually transfer all of the files.  Is there anything that can be done about that?  In my mind it shouldn't say Manage Remote Files and look as though it is a completed backup when in truth not all of the files were actually transferred.  Perhaps some kind of final test should be done to make sure all the files were truly uploaded and inform the user if they were not?

nicholas
Akeeba Staff
Manager

One last thing (sorry) but getting back to the original problem now- that the backups were being marked as complete when in truth they did not actually transfer all of the files.  Is there anything that can be done about that? 

Not in any practical way.

When we send data to the remote server we expect to receive a reply back telling us if the upload succeeded or failed. When we receive a successful reply we have to assume that it's legitimate. The problem here is that the server undermines the communication, spitting out lies back to us. This situation is essentially a man-in-the-middle attack.

In theory we could re-download all the files we uploaded and check them for integrity. If you only check them for existence there are many cases where you cannot differentiate between a partially uploaded and a fully uploaded backup archive file.

However, this comes with two problems. First and foremost, the files are not guaranteed to show up immediately upon uploading. For example, if you uploaded a 5GB backup archive in 5MB chunks to S3 it might take several seconds for the archive file to appear as available: S3 needs to first stitch together all the parts (which are possibly on different physical machines). The way to implement that is with an exponential back-off strategy which is unworkable in the context of a mass-distributed application running on diverse server environments many of which have very tight time limits.

Even if that wasn't an issue, we are still downloading the entire backup back to the server. This is a slow process which might trigger a backup script failure because of CPU time limits in CRON jobs. It will also eat up your server's bandwidth because each backup is transferred twice. Furthermore, not all remote storage engines support chunked downloads which would most definitely cause a timeout during the transfer back to the server (that's why there's no Fetch Back to Server button for all remote storage engines). Last, but not least, we'd have to assume that the local server has enough disk space to transfer back the files which is not a given – that was the reason behind the immediate archive part transfer option implemented in all remote storage engines.

So while in theory this is possible, in practice it would only work if you are using specific storage engines from the CLI backup without any CPU usage limits on your account on a server with practically unlimited bandwidth and disk space. That's not a realistic set of requirements in the context of who is using on our software, on which servers and what for.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

System Task
system
This ticket has been automatically closed. All tickets which have been inactive for a long time are automatically closed. If you believe that this ticket was closed in error, please contact us.

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!