Support

Akeeba Backup for Joomla!

#40203 Periodically refreshed incremental backup

Posted in ‘Akeeba Backup for Joomla! 4 & 5’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

Joomla! version
4 & 5
PHP version
8.1
Akeeba Backup version
9.8.4

Latest post by nicholas on Tuesday, 30 January 2024 06:22 CST

rkonik

I want to perform incremental backups of files. I've conducted a few tests and it works well. It's not perfect, but it's sufficient. However, I've noticed a inconvenience. When I create a profile for this backup, Akeeba performs a full backup of the site's files (because I only want files). Each subsequent backup is only an incremental part.

Up to this point, it's great. However, a problem has arisen because I've set it to store only 50 copies on the remote server. Here's the issue: when I performed another backup, the part of the backup that was initially created was simply deleted. My question to you: Is there any way for Akeeba to perform a full backup again after a certain number of copies?

nicholas
Akeeba Staff
Manager

You need two backups profiles. One that takes a full backup, and one that takes incremental file backups.

For example, you could have the full backup profile run every week, and the increment one run every day. When you want to restore your site you'd restore the latest full backup, then each and every of the incremental file backups.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

rkonik

I expected something more advanced. Still, this solution is better than no solution at all.
Thank you very much for the tip.

nicholas
Akeeba Staff
Manager

I think you conflate “intuitive” with “advanced”. They are almost always polar opposites.

If you were thinking “intuitive”, sure, it's very easy to do. In fact, that's the direction I was originally going with that. The original idea was to give you two options: full backup every at least X amount of hours, and full backup every X backup runs. Sounds great, right? You only need to set up one CRON job.

Well, that idea works great if your use case is something like “I want to take a full backup every Sunday and an incremental backup Monday to Saturday” Yes, this is a valid use case… but it's really the only one supported.

A very simple use case where this fails is a simple one: I want to take a full backup once a month, and an incremental backup every other day. This now becomes more convoluted because you need another set of options which are mutually exclusive with the previous options. That was the point where I realised that “intuitive” was going to be problematic for the users. I can add options all day long and test them fairly easily with Unit Tests… but people would a. not understand why some options are mutually exclusive (because nobody tries to think what happens if they combine two options whose intersection is the null set), and b. there are always use cases which cannot be covered by existing options, no matter how advanced they are.

Here is another valid, real world use case a business client shared with me: I want to take a full backup on the first Thursday of a month and a partial backup every other Thursday. You cannot express that as every X time, or every X runs, or using month and day of the month constraints. Each month has a different number of days and weeks than each other, and starts at a different day of the week. This has a real world business policy reasoning behind it. Site backups were stored to on-site storage. The on-site storage was backed up every Friday at midnight to tape. Tapes were stored in an on-site safe, and moved to an off-site location the first Friday of every month. I don't know of any kind of general purpose options which could express that. I would have to write a special rule for this and any other oddball use case. This neither scales, nor makes for an intuitive user interface after all.

    Nicholas K. Dionysopoulos

    Lead Developer and Director

    🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
    Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

    rkonik

    You're probably right. However, from the user's perspective, which is mine, it's not intuitive. In my specific case, over 90% of the Archive is kept on external servers, provided by a different provider than the website for security reasons. That's why I choose the allocation amount each time. It's often referred to as the number of backup copies.

    This solution has worked well for me so far. However, recently, there have been pages that take up much more than just a few gigabytes. The issue is that creating such a backup takes time and space, and additionally, it needs to be transferred to an external server, which also requires time. Finally, there's the cost of the storage.

    Creating a full backup each time is counterproductive because the files don't change that quickly, unlike database entries. So, in search of a solution, I opted for incremental backups. Here's how I envisioned it:

    By selecting this backup form, in the profile configuration, I set a value for how many backups should be performed before a full backup is executed. This is a more advanced feature for me.

    In the solution version you provided, as a user, I would have to remember that both profiles are connected (implicitly), which causes other typically human problems like accidentally deleting backups.

    In the scenario I imagined, a full backup every X backups seemed reasonable to me. This solution partly eliminates the issue of the number of backups on remote servers because if I set the number of backups to 50 and every 20 backups a full backup is made, it simplifies matters.

    During tests of how incremental backup works on your end, I set a limit of 5 copies on the remote server and tested it. Copy after copy, I waited for the result. Initially, the system executed a full backup with the first copy. Every subsequent one was incremental. These were small files because essentially nothing changed. The problem arose when I made backup number 6, 7, etc. The system continued to make copies of only what changed, and the full backup from the remote server was deleted. If I didn't have the information I gained from you, I would have to create a new profile or figure out another solution.

    However, if you could set an option that when there is a limit on the number of backup copies on the remote server (let's say 50 for easier conversation), the system itself should always automatically perform a full backup when it reaches the maximum limit of copies on the remote server. This solution would remove the need for me to think about it additionally because I would always have a full copy after completing a certain number or reaching the limit of copies on the remote server. Consider whether this can be implemented.

     

    nicholas
    Akeeba Staff
    Manager

    Yes, I get it, but I told you why it had to be set up this way. What you envision is what I originally had in mind. However, it would only cover the one use case you and me can see in our heads. Unfortunately, this is not how 90% of the users want to use the software.

    Trying to reconcile all those different use cases ends up requiring a re-implementation of the CRON system. I know because I put a lot of work into dissecting use cases, figuring out how to implement this, until I realised that I am basically re-inventing CRON.

    Yes, you definitely need a bit of documentation for the site. But… you should have that already. Do you really remember how everything is set up on every site? Like, do you really have a mindmap of all the category and user group choices you made on each site, just two things which require far more structure and thought than your backup plan? You can't. Anything beyond an absolutely trivial site with 2–4 categories and just the basic Joomla! user groups requires documentation. Most trivial sites don't need incremental backups. So, what exactly would be the problem I am solving by introducing a restrictive interface?

    Besides, your backup strategy is rather simplistic. I am pretty sure that your site does not grow massively in multiple folders. It only does so within a few specific folders, right? Let's say that it only grows by adding files into the images folder. It also grows by adding database data to certain tables, let's say #__content (which also means #__assets, as these two go hand-in-hand) Here's a good backup strategy:

    • Profile 1: Full site backup.
    • Profile 2: Full site backup with incremental files. Exclude all top-level files and folders except images. Exclude all tables except #__content and #__assets.

    Now your incremental backup is even more precise, and smaller.

    As for storage limits, each backup profile has Quota settings. Enable remote quotas and set up your limits. For example, you could have the full site backup have a maximum of 2 backups kept, and the incremental backup have a maximum of 12 backups kept. Assuming a once-a-week full backup and a daily incremental backup (except for the full backup days) that would give you two weeks of rollback.

    The problem arose when I made backup number 6, 7, etc. The system continued to make copies of only what changed, and the full backup from the remote server was deleted. If I didn't have the information I gained from you, I would have to create a new profile or figure out another solution.

    I am sorry to tell you, but a. you failed to use it correctly and b. this is another reason why your idea would fail in the real world.

    You should have marked the full site backup as a Frozen backup. Frozen records are never deleted automatically. Since you didn't freeze it, it got reaped by the quota settings you set up. You did not think it through.

    And this is why your idea would fail in the real world. Let's say a user wants to take a full backup every Sunday and incremental backups every day. Let's say we magically let them do that through the interface. They would also be able to set up quotas. What if they set a count quota of 3? What if they set a size quota which is met on a Thursday? In both cases they delete their full backup and now they don't actually have a backup of their site. Are they going to blame themselves for their own choices? No. As you demonstrated, they will blame the software for doing exactly what the user told it to do.

    FWIW, a few years ago we thought about creating a "wizard" for incremental backups. The wizard would create the two backup profiles and tell you how to set up their CRON jobs (or even create Joomla! Scheduled Tasks for you). However, we decided against it because it ultimately requires a modicum of responsibility from the user which appears to be incompatible with the target audience of such a feature.

    The best we can do is document how incremental backups are supposed to work. This is already on my to-do list.

    Nicholas K. Dionysopoulos

    Lead Developer and Director

    🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
    Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

    rkonik

    Documentation explaining would be very useful. The fact you mentioned about freezing slipped my mind. I know this mechanism and use it, but I didn't think of it here.

    I understand that when creating software, you have to reach the largest number of users. I also understand that every user error is blamed on you. I don't blame you, I just want to understand the mechanism.

    At the same time, I don't think many people use incremental backup, but I could be wrong. I don't have data on this matter.

    As you mentioned, documentation describing the approach would be helpful. Theory is one thing, but life shows something else.

    Thank you very much for the exchange of thoughts as it taught me a lot, and I'm looking forward to documentation from you.

    nicholas
    Akeeba Staff
    Manager

    I know that you didn't blame me, sorry if I gave you the wrong impression. What I was trying to say is that even you, who are one of the most focused and understanding users, could not figure out that your configuration was wrong. This tells me that the less focused and understanding people would also miss that and then come back to blame the software. I hope it's a bit more clear now.

    You are right in that not a lot of users use incremental backups. However, those who do tend to have very different use cases. Only half of the people who have reached out to me about incremental backups have a straightforward use case. The other half have use cases which are quite convoluted. I often have to sit down with pen and paper (okay, iPad and Apple Pencil) to come up with a plan which will work for them. The use case I told you yesterday where the first Thursday of a month is a full backup is convoluted. The incremental backups require two CRON jobs, one running Friday to Wednesday, and another one running on the second, third, and fourth Thursday of the month because of a limitation on what and how can be expressed with CRON expressions.

    After replying to you this morning I did write the first draft of the documentation. I will go through it again before publishing it in a couple of weeks :)

    Have a wonderful day!

    Nicholas K. Dionysopoulos

    Lead Developer and Director

    🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
    Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

    Support Information

    Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

    Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!