Akeeba Solo

No, it doesn't work like that as we cannot change the database implementation on the server (what you describe is more or less the way binary logging works in a MySQL master-slave cluster when you stop replication to a slave and you're using RBL — but I think I am getting a bit too technical for the subject at hand).

We get a batch of new records (by default: 1000 at a time; it can be configured in the Configuration page).

If the table has a single field, numeric primary key we do that by selecting from the table sorted by its primary key, ascending and every batch (except the first) starts by referencing the last primary key read and having a limit clause. So, if the last primary key we read was 1234 the next batch we ask for has where clause in its query that looks like primary_key_field_name > 1234. The table's backup stops when MySQL stops returning rows for it. This means that any rows added before we reach that point will be backed up. Any rows added after that point are not included in the backup.

If the table has no primary key we just use a limit query, increasing the start by the batch size until we get no more rows. This is a bit problematic because a table with no primary key or a composite primary key does not have a canonical way of listing its records. This means that if a record is inserted while we're taking a backup, MySQL may return a duplicate record in a future select query or include the new record and skip another one which should've been backed up.

There's really no solution to that. The way mysqldump addresses it is by either doing exactly what we're doing OR (if you tell it to do so) by locking either all tables or just the table being dumped. However, if you lock the tables you are essentially preventing your site from working. This is why virtually nobody uses tables locking for backing up their site.

The strategy we are following provides very good data consistency in the vast majority of cases. If you have a really peculiar site which requires absolute database consistency there are two things. One is taking the site offline to take a backup. Another thing, far more complicated, is to have a master-slave database setup and an rsync to a second server. When you need to take a backup sever the connection to the master server and take the backup on the static replica. In the past 17 years I have seen exactly TWO (2) sites which each necessitated using either of these drastic approaches. In most practical use cases moving the backup so that the database backup takes place during the valley of activity, typically in the middle of the night, works great.

FWIW having an outnumber column even on tables which don't need one for backup consistency, or using a temporarily read-only slave database, on systems which cannot be turned off is not an unheard-of idea. I know for a fact that this was very typical 20-odd years ago in ERPs used in factories; you can't stop the factory floor to take backups, nor can you take inconsistent backups.

Since you know your use case and resources best you can decide which approach is best for you.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support

#38322 Database Engine question

This is a public ticket

Environment Information

Thursday, 12 January 2023 06:39 CST

Thursday, 12 January 2023 07:12 CST

Please wait

Support Information