Support

Akeeba Solo

#38322 Database Engine question

Posted in ‘Akeeba Solo (standalone)’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

PHP version
n/a
Akeeba Solo version
n/a

Latest post by nicholas on Thursday, 12 January 2023 07:12 CST

RakataTech

Good afternoon,

 

I am currently working with a fairly large database, and the question is  not specific to this site but just wanted to get a better understanding on the backup engine.

 

If let's say the database only backup start at 1105 and finish at 1120, in the mean time new records have been added to the database, how does akeeba deal with this ? Is it writing the new records to a temporary file and Insert them at the end of the backup process ?

 

Kind regards,

Max

nicholas
Akeeba Staff
Manager

No, it doesn't work like that as we cannot change the database implementation on the server (what you describe is more or less the way binary logging works in a MySQL master-slave cluster when you stop replication to a slave and you're using RBL — but I think I am getting a bit too technical for the subject at hand).

We get a batch of new records (by default: 1000 at a time; it can be configured in the Configuration page).

If the table has a single field, numeric primary key we do that by selecting from the table sorted by its primary key, ascending and every batch (except the first) starts by referencing the last primary key read and having a limit clause. So, if the last primary key we read was 1234 the next batch we ask for has where clause in its query that looks like primary_key_field_name > 1234. The table's backup stops when MySQL stops returning rows for it. This means that any rows added before we reach that point will be backed up. Any rows added after that point are not included in the backup.

If the table has no primary key we just use a limit query, increasing the start by the batch size until we get no more rows. This is a bit problematic because a table with no primary key or a composite primary key does not have a canonical way of listing its records. This means that if a record is inserted while we're taking a backup, MySQL may return a duplicate record in a future select query or include the new record and skip another one which should've been backed up.

There's really no solution to that. The way mysqldump addresses it is by either doing exactly what we're doing OR (if you tell it to do so) by locking either all tables or just the table being dumped. However, if you lock the tables you are essentially preventing your site from working. This is why virtually nobody uses tables locking for backing up their site.

The strategy we are following provides very good data consistency in the vast majority of cases. If you have a really peculiar site which requires absolute database consistency there are two things. One is taking the site offline to take a backup. Another thing, far more complicated, is to have a master-slave database setup and an rsync to a second server. When you need to take a backup sever the connection to the master server and take the backup on the static replica. In the past 17 years I have seen exactly TWO (2) sites which each necessitated using either of these drastic approaches. In most practical use cases moving the backup so that the database backup takes place during the valley of activity, typically in the middle of the night, works great.

FWIW having an outnumber column even on tables which don't need one for backup consistency, or using a temporarily read-only slave database, on systems which cannot be turned off is not an unheard-of idea. I know for a fact that this was very typical 20-odd years ago in ERPs used in factories; you can't stop the factory floor to take backups, nor can you take inconsistent backups.

Since you know your use case and resources best you can decide which approach is best for you.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!