Akeeba Backup for Joomla

Appendix B. The JPS archive format, v.2.0
Prev	Part III. Appendices	Next

Appendix B. The JPS archive format, v.2.0

Design goals

The JPS format strives to be a compressed archive format designed specifically for efficiency of creation by a PHP script, while providing secure AES-128 encryption of the file descriptor and file contents. It is similar in design to the JPA, with a few notable differences:

Both the file descriptor and the file data are split to 64Kb blocks encrypted using Rijndael-128 in CBC mode (that's the same as AES-128)
All files are compressed using Deflate (ZLib)

Even though JPS is designed for use by PHP scripts, creating a command-line utility, a programming library or even a GUI program in any other language is still possible. JPS is supposed to have low to medium compression rations, and be secure. However it is not as error-tolerant as other archive formats.

This is an open format. You may use it in any commercial or non-commercial application royalty-free. Even though the PHP implementation is GPL-licensed, we can provide it under commercial-friendly licenses, e.g. LGPL v3. Please ask us if you want to use it on your own software.

Important

When the password is blank, no encryption takes place. Archivers should take this into account when creating files. Unarchivers should also take this into account when the user passes an empty string as their password.

When a non-blank password is used, all files are encrypted using the same password. More specifically, all data blocks are encrypted using the same password.

Security

The security of the format largely hinges on the assumption that Rijndael-128 in CBC mode with randomized IVs in each encrypted stream is not susceptible to KPA (known plain-text attacks). Should a KPA be found against the encryption algorithm the obvious crib would be the encrypted file header of the first file in an Akeeba Backup archive which is very predictable. Even if the order of files were randomized, there are well-known files (part of the installer) with known contents, making them relatively easy to identify by their relative size in the archive. However, as we said above, the encryption algorithm is not known to be susceptible to KPA, nullifying this threat.

Another defense you can use when creating the archive is the use of a non-static salt for PBKDF2 key expansion. This means that the cryptographic key which could theoretically be brute forced by means of a KPA would only apply to a specific encrypted block. It would then take another, more computationally expensive, brute force attack against the password to decrypt the entire archive. The downside is that this is a much slower encryption method since a key needs be derived for every encrypted block of data. Counter-intuitively this could lead to worse security since the practical considerations of the implementation lead to using a much smaller number of iterations with a weaker hashing algorithm which may end up being easier to brute-force, especially for the shorter passwords.

Our recommendation for v2.0 archives is using key expansion with a static salt, a high number of iterations (e.g. 64000) and a strong hashing algorithm (e.g. SHA512).

Key Expansion

JPS v.2.0 (PBKDF2)

JPS v.2.0 is using a different, more secure, key expansion scheme that JPS v.1.x. PBKDF2 is used on the user-supplied password to generate the encryption key. PBKDF2 was selected over memory-hard algorithms (like bcrypt, scrypt, Argon2 etc) for performance reasons, considering that encryption has to also take place on shared hosts with limited resources and old versions of PHP which don't even support these newer hashing algorithms. As processors get faster and old PHP versions become increasingly obsolete we might revise the key expansion algorithm in the future.

The supported PBKDF2 algorithms at this time are SHA-1 (used by default), SHA-256 and SHA-512. The algorithm used throughout the archive is specified in the archive header. Even though SHA-1 is not collision-resistant, the high number of iterations mitigates that risk.

The number of iterations used throughout the archive is also specified in the archive header. By default it's 100,000. This is a moderately high number of iterations while still being practical on resource-limited shared hosting.

There are two possibilities for the salt used for PBKDF2. One possibility is using a static salt, found in the archive's header. In this case you only perform key expansion once and use the expanded key for all encrypted blocks in the archive. The other possibility is having a different salt per encrypted block. In this case a key expansion is executed per encrypted block, therefore using a different encryption key for each block.

	Important
	We STRONGLY recommend using long (64 or more characters), completely random passwords which make use of lowercase and uppercase Latin letters, numbers and special characters (top row on US-format keyboards). Use a password manager to generate and store these passwords. Taking these precautions make password brute forcing with conventional technology highly impractical in the foreseeable future.

JPS v.1.x (Rijndael-128 CTR)

All JPS v.1.x format use a very naive key expansion, based on Rijndael-128 running in CTR (counter) mode. The implementation details can be found in the Encrypt class' expandKey method. The obvious downside is that only up to 16 bytes of the password (which may be as little as 5.3 characters in UTF-8 encoding) are taken into account. The other obvious downside is that the key is simply the password being encrypted with a version of itself in CTR mode which is not very cryptographically safe. The shortcomings of this approach were exacerbated in the first public version of the JPS format (1.9) which used the key as an IV for all encrypted blocks, weakening the security of the format.

This key expansion is not supported since JPS v.2.0.

Structure of an archive

An archive consists of exactly one Standard Header and one or more Entity Blocks . Each Entity Block consists of exactly one Entity Description Block and at most one File Data Block. Each FIle Data Block consist of one or several Data Chunk Blocks. All values are stored in little-endian byte order, unless otherwise specified.

All textual data, e.g. file names and symlink targets, must be written as little-endian UTF-8, non null terminated strings, for the widest compatibility possible.

Standard Header

The function of the Standard Header is to allow identification of the archive format and supply the client with general information regarding the archive at hand. It is a binary block appearing at the beginning of the archive file and there alone. It consists of the following data (in order of appearance):

Signature, 3 bytes: The bytes 0x4A 0x50 0x54 (uppercase ASCII string “JPS”) used for identification purposes.
Major version, 1 byte: Unsigned integer represented as single byte, holding the archive format major version, e.g. 0x02 for version 2.0.
Minor version, 1 byte: Unsigned integer represented as single byte, holding the archive format minor version, e.g. 0x00 for version 2.0.
Spanned archive, 1 byte: When set to 1, the archive spans multiple files
Extra header length, 2 bytes: The total length of extra headers. In version 2.0 of the format it is always 76.

The total size of this header is 8 bytes, plus the size of the extra headers (if any).

Key Expansion Extra Header

The function of the Key Expansion Extra Header is to let you know of the PBKDF2 key expansion algorithm's configuration parameters used throughout this backup archive. It consists of the following data:

Identification Header, 4 bytes

The bytes 0x4A 0x48 0x00 0x01 used for identification purposes

Extra Header Size, 2 bytes

Unsigned short integer, little endian, holding the total size of this extra header (including the 4 bytes of the identification header), i.e 76 for a version 2.0 header

Algorithm, 1 byte

Unsigned byte holding the ID of the hash algorithm used for PBKDF2. The valid algorithms are:

0 = SHA-1
1 = SHA-256
2 = SHA-512

Values up to and including 127 are reserved for future use.

Iterations, 4 bytes

Unsigned long integer, little endian, with the number of iterations to use in PBKDF2

Use Static Salt, 1 byte

Unsigned byte. When it is 1 use the Static Salt below with PBKDF2 unless otherwise specified in the encryption block. This allows you to cache the expanded key for encryption / decryption purposes. This is only recommended if you are using SHA-256 or SHA-512 with a high number of iterations.

If this is 0 we recommend setting the Static Salt to all null bytes.

Static Salt, 64 bytes

The rest of the extra header (64 bytes in v.2.0) is the Static Salt mentioned above.

Entity Block

An Entity Block is merely the aggregation of exactly one Entity Description Block, followed by the encrypted contents of exactly one Entity Description Block Data and zero or one instances of a File Data Block. An Entity can be at present a File, Symbolic Link or Directory. If the entity is a File of zero length or if it is a Directory the File Data Block is omitted. In any other case, the File Data Block must exist.

Entity Description Block Header

The function of the Entity Description Block Header is to allow a client to read the encrypted Entity Description Block Data. It is a binary block consisting of the following data (in order of appearance):

Signature, 3 bytes: The bytes 0x4A, 0x50, 0x46 (uppercase ASCII string “JPF”) used for identification purposes.
Encrypted size, 2 bytes: The encrypted size of the following Entity Description Block Data
Decrypted size, 2 bytes: The decrypted size of the following Entity Description Block Data

Entity Description Block Data

it purpose is to provide the client information about an Entity included in the archive. The client can then use this information in order to reconstruct a copy of the Entity on the client's file system. The data is written to the archive encrypted with Rijndael-128 in CBC mode. The Entity Description Block Data consists of the following information before it is encrypted:

Length of entity path, 2 bytes.

Unsigned short integer, represented as 2 bytes, holding the size of the entity path data below.

Entity path data, variable length.

Holds the complete (relative) path of the Entity as a UTF16 encoded string, without trailing null. The path separator must be a forward slash (“/”), even on systems which use a different path separator, e.g. Windows.

Entity type, 1 byte.

0x00 for directories (instructs the client to recursively create the directory specified in Entity path data). When the entity type is 0x00 the Compression Type MUST be 0x00 as well.
0x01 for files (instructs the client to reconstruct the file specified in Entity path data)
0x02 for symbolic links (instructs the client to create a symbolic link whose target is stored, uncompressed, as the entity's File Data Block). When the type is 0x00 the Compression Type MUST be 0x00 as well.

Compression type, 1 byte.

0x00 for no compression; the data contained in File Data Block should be written as-is to the file. Also used for directories, symbolic links and zero-sized files.
0x01 for deflate (Gzip) compression; the data contained in File Data Block must be deflated using Gzip before written to the file.
0x02 for Bzip2 compression; the data contained in File Data Block must be uncompressed using BZip2 before written to the file. This is generally discouraged, as both the archiving and unarchiving scripts must be ran in a PHP environment which supports the bzip2 library.

Uncompressed size, 4 bytes

An unsigned long integer representing the size of the resulting file in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).

Entity permissions, 4 bytes

UNIX-style permissions of the stored entity.

File Modification Time, 4 bytes

The UNIX timestamp of the file's last modification time. For directories and symlinks it must be ignored and set to 0x00000000.

File Data Block

The File Data Block is only present if the Entity is a file with a non-zero file size. It consists of one or more Data Chunk Blocks. Do note that the File Data Block has no header. The collection of one or several Data Chunk Blocks is called the "File Data Block".

Data Chunk Block

Each Data Chunk Block consists of the following information:

Encrypted size, 4 bytes

Unsigned long containing the size, in bytes, of the encrypted data.

Decrypted size, 4 bytes

Unsigned long containing the size, in bytes, of the decrypted data. If the decryption yields more bytes, the extraneous bytes must be trimmed off.

Encrypted data, variable length

The decrypted data is compressed, depending on the Compression Type, and then encrypted using AES-128 in CBC mode. The compression format used may be:

Binary dump of file contents or textual representation of the symlink's target, for CT=0x00
Gzip compression output, without a trailing Adler32 checksum, for CT=0x01
Bzip2 compression output, for CT=0x02

In split archives, the first 8 bytes must appear within the same part. They may or may not be in the same part as the Entity Description Block Data. The Encrypted Data can span multiple parts. Since the minimum part size is 64Kb and the maximum Decrypted Size can't be over 64Kb, the Encrypted Data will either be in the same part in its entirety, or span exactly two parts.

Encrypted data block format

The encrypted blocks have one of the following possible formats. You can detect the data format in two ways.

First, the legacy format is only used with JPS version 1.9 and below. If the file header claims that the archive is JPS 1.10 then the current format MUST be used.

If you do not or cannot trust the file header you can do a simple heuristics. Read the last 24 bytes of the encrypted block. If the first four bytes are JPIV you definitely have a current format block. Otherwise you most likely have a legacy format block (there's 1 in 4,228,250,625 chance of false detection).

JPS 2.0 (Current)

In this format the IV for each encryption block is always different, produced by a crypto safe PRNG (either OpenSSL or mcrypt). Therefore the encryption is safe against cryptanalysis (as far as an attack against Rijndael-128 itself is not discovered). Moreover, it allows for the inclusion of a per-block salt for PBKDF2 key expansion.

Encrypted data, variable length

This data is encrypted with Rijndael-128 using the IV described below.

Per-Block Salt, 68 bytes (OPTIONAL)

The literal string JPST followed by the 64 bytes of the per-block salt. Discard the JPST marker and use the rest as the salt for the PBKDF2 algorithm.

This section MUST be present when the Use Static Salt flag in the archive header is 0.

This section MAY be present when the Use Static Salt flag in the archive header is 1. This means that you shouldn't simply skip checking the existence of this section just because Use Static Salt is 1. If it's present, use it and derive a new, per-block encryption key.

Initialization Vector (IV) data block, 20 bytes

The literal string JPIV followed by the 16 bytes (128-bit) Initialization Vector data. Discard the JPIV marker and use the rest of the block as the IV for your Rijndael-128 decryption engine.

Decrypted data length, 4 bytes

The size of the decrypted data in bytes. Since Rijndael-128 in CBC mode encrypts data in 16-byte (128-bit) blocks it needs to pad data with length not exactly divisible by 16 with zero bytes. These zero bytes are not part of the input data and need to be discarded. For example, if your decrypted data length is 24 and the Rijndael-128 decryption result is 32 bytes you need to throw away the last 8 bytes which are just the zero (null) padding bytes.

JPS 1.10 (Previous)

	Note
	Only compatible with JPS 1.10 archive files. Not compatible with JPS 1.9 archive files. Obsolete since JPS 2.0.

Encrypted data, variable length: This data is encrypted with Rijndael-128 using the IV described below.
Initialization Vector (IV) data block, 20 bytes: The literal string JPIV followed by the 16 bytes (128-bit) Initialization Vector data. Discard the JPIV marker and use the rest of the block as the IV for your Rijndael-128 decryption engine.
Decrypted data length, 4 bytes: The size of the decrypted data in bytes. Since Rijndael-128 in CBC mode encrypts data in 16-byte (128-bit) blocks it needs to pad data with length not exactly divisible by 16 with zero bytes. These zero bytes are not part of the input data and need to be discarded. For example, if your decrypted data length is 24 and the Rijndael-128 decryption result is 32 bytes you need to throw away the last 8 bytes which are just the zero (null) padding bytes.

JPS 1.9 and below (Legacy)

	Note
	Only compatible with JPS 1.9 and 1.10 archive files. Obsolete since JPS 2.0.

In this format the IV is always the same and derived from the encryption key. For this reason the encryption is NOT safe against some methods of cryptanalysis which could compromise the encryption key.

Encrypted data, variable length: This data is encrypted with Rijndael-128.
Decrypted data length, 4 bytes: The size of the decrypted data in bytes. Since Rijndael-128 in CBC mode encrypts data in 16-byte (128-bit) blocks it needs to pad data with length not exactly divisible by 16 with zero bytes. These zero bytes are not part of the input data and need to be discarded. For example, if your decrypted data length is 24 and the Rijndael-128 decryption result is 32 bytes you need to throw away the last 8 bytes which are just the zero (null) padding bytes.

End-of-archive header

This header is written after the end of the archive data, at the end of the last part of the archive.

When creating spanned archives, the first file (part) of the archive set has an extension of .j01, the next part has an extension of .j02 and so on. The last file of the archive set has the extension .jps. You must also ensure that the Entity Description Block is within the limits of a single part, i.e. the contents of the Entity Description Block must not cross part boundaries. The File Data Block data can cross one or multiple part blocks, but the header of each Data Chunk Block must both be inside the same part.

This header is written after the end of the archive data, at the end of the last part of the archive. Its structure is:

Signature, 3 bytes: The bytes 0x4A, 0x50, 0x45 ("JPE")
Number of parts, 2 bytes: The total number of parts this archive consists of. Non-spanned archives should set this to 1.
File count, 4 bytes: Unsigned long integer represented as four bytes, holding the number of files present in the archive.
Uncompressed size, 4 bytes: Unsigned long integer represented as four bytes, holding the total size of the archive's files when uncompressed.
Compressed size, 4 bytes: Unsigned long integer represented as four bytes, holding the total size of the archive's files in their stored (compressed) form

The size of the EOA header is 17 bytes for version 1.9 of the format.

Change Log

Revision History
	July 2010	NKD,
Described version 1.9

Revision History
	January 2017	NKD,
Described version 2.0

Prev	Up	Next
Appendix A. The JPA archive format, v.1.3	Home	Appendix C. GNU Free Documentation License