Skip to content
/ server Public

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented Oct 28, 2025

  • The Jira issue number for this PR is: MDEV-37949

Description

TODO: fill description here

Release Notes

TODO: What should the release notes say about this change?

How can this PR be tested?

mysql-test/mtr --parallel=auto --force --big-test --mysqld=--loose-innodb-log-archive --skip-test=mariabackup

The mariabackup suite must be skipped when setting innodb_log_archive=ON, because the mariadb-backup tool will only support the old innodb_log_archive=OFF format (copying from ib_logfile0).

Unfortunately, all --suite=encryption tests that use innodb_encrypt_log must be skipped when using innodb_log_archive. This is because the server would have to be reinitialized; we do not allow changing the format of an archived log on startup (such as adding or removing encryption). This combination is covered by the test innodb.log_file_size_online,encrypted.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

This is a new feature, but for now based on the 11.4 branch so that any unrelated errors that may be found during testing can be fixed rather quickly. Merges to the main branch may be blocked for weeks at a time.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this Oct 28, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

The InnoDB write-ahead log file (ib_logfile0) is pre-allocated to
innodb_log_file_size and written as a ring buffer. This is good for
write performance and space management, but unsuitable for arbitrary
point-in-time recovery or for facilitating incremental backup.

TODO: Fix multi-file recovery in the non-mmap code path, as well as
fix some crashes in
--mysqld=--loose-innodb-log-{archive,recovery-start=12288}

TODO: Test and fix the crash-safety and recovery of SET GLOBAL
innodb_log_archive. Implement proper recovery when the file header of
ib_%016x.log is in the innodb_log_archive=OFF format. Ensure that
the start LSN of the ib_logfile0 will be written in a way that
is compatible with the already written log_sys.get_sequence_bit().

innodb_log_archive=ON: A new format where InnoDB will create and
preallocate files ib_%016x.log, instead of writing a circular file
ib_logfile0. Each file will be pre-allocated to
innodb_log_file_size<=4G. Once a log fills up, we will create and
pre-allocate another log file, to which log records will be
written. Upon the completion of the first log checkpoint in a recently
created log file, the old log file will be marked read-only, signaling
that there will be no further writes to that file, and that the file
may safely be moved to long-term storage.

The file name includes the log sequence number (LSN) at file offset
12288 (log_t::START_OFFSET). Limiting the file size to 4 GiB allows us
to identify each checkpoint by storing a 32-bit big-endian offset into
the optional FILE_MODIFY and the mandatory FILE_CHECKPOINT records,
between 12288 and the end of the file.

The innodb_encrypt_log format is identified by storing the encryption
information at the start of the log file. The first 32-bit value will
be 1, which is an invalid checkpoint offset. Each
innodb_log_archive=ON log must use the same encryption parameters.
Changing innodb_encrypt_log or related parameters is only possible by
setting innodb_log_archive=OFF and restarting the server, which will
permanently lose the history of the archived log.

The maximum number of log checkpoints that the innodb_log_archive=ON
file header can represent is limited to 12288/4=3072 when using
innodb_encrypt_log=OFF. If we run out of slots in a log file, each
subsequently completed checkpoint in that log file will overwrite the
last slot in the checkpoint header, until we switch to the next log.

innodb_log_recovery_start: The checkpoint LSN to start recovery from.
This will be useful when recovering from an archived log. This is useful
for restoring an incremental backup (applying InnoDB log files that were
copied since the previous restore).

innodb_log_recovery_target: The requested LSN to end recovery at.
When this is set, all persistent InnoDB tables will be read-only, and
no writes to the log are allowed. The intended purpose of this setting
is to prepare an incremental backup, as well as to allow data
retrieval as of a particular logical point of time.

Setting innodb_log_recovery_target>0 is much like setting
innodb_read_only=ON, with the exception that the data files may be
written to by crash recovery, and locking reads will conflict with any
incomplete transactions as necessary, and all transaction isolation
levels will work normally (not hard-wired to READ UNCOMMITTED).

srv_read_only_mode: When this is set (innodb_read_only=ON), also
recv_sys.rpo (innodb_log_recovery_target) will be set to the current LSN.
This ensures that it will suffice to check only one of these variables
when blocking writes to persistent tables.

The status variable innodb_lsn_archived will reflect the LSN
since when a complete InnoDB log archive is available. Its initial
value will be that of the new parameter innodb_log_archive_start.
If that variable is 0 (the default), the innodb_lsn_archived will
be recovered from the available log files. If innodb_log_archive=OFF,
innodb_lsn_archived will be adjusted to the latest checkpoint every
time a log checkpoint is executed. If innodb_log_archive=ON, the value
should not change.

The new setting SET GLOBAL innodb_log_archive=ON will enable log
archiving as soon as the current ib_logfile0 is about to wrap around.

SET GLOBAL innodb_log_archive=OFF will immediately rewrite the
checkpoint header in the latest ib_%016x.log and rename the file
to ib_logfile0.

When innodb_log_archive=ON, the setting SET GLOBAL innodb_log_file_size
will affect subsequently created log files when the file that is being
currently written is running out.

no_checkpoint_prepare.inc: A new file, to prepare for subsequent
inclusion of no_checkpoint_end.inc. We will invoke the server to
parse the log and to determine the latest checkpoint.

All --suite=encryption tests that use innodb_encrypt_log
will be skipped for innodb_log_encrypt=ON, because enabling
or disabling encryption on the log is not possible without
temporarily setting innodb_log_archive=OFF and restarting
the server. The idea is to add the following arguments to an
invocation of mysql-test/mtr:

--mysqld=--loose-innodb-log-archive --skip-test=mariabackup

The mariabackup test suite must be skipped when using the
innodb_log_archive=ON format, because mariadb-backup will only
support the old ib_logfile0 format (innodb_log_archive=OFF).

log_sys.first_lsn: The start of the current log file, to be consulted
in log_t::write_checkpoint() when renaming files.

log_sys.archived_lsn: New field: The value of innodb_lsn_archived.

log_sys.end_lsn: New field: The log_sys.get_lsn() when the latest
checkpoint was initiated. That is, the start LSN of a possibly empty
sequence of FILE_MODIFY records followed by FILE_CHECKPOINT.

log_sys.resize_target: The value of innodb_log_file_size that will be
used for creating the next archive log file once the current file (of
log_sys.file_size) fills up.

log_sys.archive: New field: The value of innodb_log_archive.

log_sys.next_checkpoint_no: Widen to uint16_t. There may be up to
12288/4=3072 checkpoints in the header.

log_sys.log: If innodb_log_archive=ON, this file handle will be kept
open also in the PMEM code path.

log_sys.resize_log: If innodb_log_archive=ON, we may have two log
files open both during normal operation and when parsing the log. This
will store the other handle (old or new file).

log_sys.resize_buf: In the memory-mapped code path, this will point
to the file resize_log when innodb_log_archive=ON.

recv_sys.log_archive: All innodb_log_archive=ON files that will be
considered in recovery.

recv_sys_t::find_checkpoint(): Find and remember the checkpoint position
in the last file when innodb_log_recovery_start points to an older file.

log_t::archive_new_write(): Create and allocate a new log file, and
write the outstanding data to both the current and the new file.

log_t::archived_mmap_switch_prepare(): Create and memory-map a new log
file, and update file_size to resize_target. Remember the file handle
of the current log in resize_log, so that write_checkpoint() will be
able to make it read-only.

log_t::archived_mmap_switch_complete(): Switch to the buffer that was
created in archived_mmap_switch_prepare().

log_t::write_checkpoint(): Allow an old checkpoint to be completed in
the old log file even after a new one has been created. If we are
writing the first checkpoint in a new log file, we will mark the old
log file read-only. We will also update log_sys.first_lsn unless it
was already updated in ARCHIVED_MMAP code path. In that code path,
there is the special case where log_sys.resize_buf == nullptr and
log_sys.checkpoint_buf points to log_sys.resize_log (the old log file
that is about to be made read-only). In this case, log_sys.first_lsn
will already point to the start of the current log_sys.log, even
though the switch has not been fully completed yet.

log_t::header_rewrite(my_bool): Rewrite the log file header before or
after renaming the log file. The recovery of the last ib_%016%.log
file must tolerate also the ib_logfile0 format.

log_t::set_archive(my_bool): Implement SET GLOBAL innodb_log_archive.
An error will be returned if non-archived SET GLOBAL innodb_log_file_size
(log file resizing) is in progress. The current log file will be renamed
to either ib_logfile0 or ib_%016x.log, as appropriate.

log_t::archive_set_size(): A new function, to ensure that
log_sys.resize_target is set on startup.

log_checkpoint_low(): Do not prevent a checkpoint at the start of a file.
We want the first innodb_log_archive=ON file to start with a checkpoint.

log_t::create(lsn_t): Initialize last_checkpoint_lsn. Initialize the
log header as specified by log_sys.archive (innodb_log_archive).

log_write_buf(): Add the parameter max_length, the file wrap limit.

mtr_t::finish_writer(): Specialize for innodb_log_archive=ON

log_t::append_prepare<log_t::ARCHIVED_MMAP>(): Special case.

log_t::get_path(): Get the name of the current log file.

log_t::get_circular_path(size_t): Get the path name of a circular file.
Replaces get_log_file_path().

log_t::get_archive_path(lsn_t): Return a name of an archived log file.

log_t::get_next_archive_path(): Return the name of the next archived log.

log_t::append_archive_name(): Append the archive log file name
to a path string.

mtr_t::finish_writer(): Invoke log_close() only if innodb_log_archive=OFF.
In the innodb_log_archive=ON, we only force log checkpoints after creating
a new archive file, to ensure that the first checkpoint will be written
as soon as possible.

log_t::checkpoint_margin(): Replaces log_checkpoint_margin().
If a new archived log file has been created, wait for the
first checkpoint in that file.

srv_log_rebuild_if_needed(): Never rebuild if innodb_log_archive=ON.
The setting innodb_log_file_size will affect the creation of
subsequent log files. The parameter innodb_encrypt_log cannot be
changed while the log is in the innodb_log_archive=ON format.

log_t::attach(), log_mmap(): Add the parameter bool read_only.

recv_sys_t::find_checkpoint(): If the circular ib_logfile0 is missing,
determine the oldest archived log file with contiguous LSN.
If innodb_log_archive=ON, refuse to start if ib_logfile0 exists.
Open non-last archived log files in read-only mode.

recv_sys_t::find_checkpoint_archived(): Validate each checkpoint in
the current file header, and by default aim to recover from the last
valid one. Terminate the search if the last validated checkpoint
spanned two files. If innodb_log_recovery_start has been specified,
attempt to validate it even if there is no such information stored
in the checkpoint header.

log_parse_file(): Do not invoke fil_name_process() during
recv_sys_t::find_checkpoint_archived(), when we tolerate FILE_MODIFY
records while looking for a FILE_CHECKPOINT record.

recv_scan_log(): Invoke log_t::archived_switch_recovery() upon
reaching the end of the current archived log file.

log_t::archived_switch_recovery_prepare(): Make use of
recv_sys.log_archive and open all but the last file read-only.

log_t::archived_switch_recovery(): Switch files in the pread() code path.

log_t::archived_mmap_switch_recovery_complete(): Switch files in the
memory-mapped code path.

recv_warp: A pointer wrapper for memory-mapped parsing that spans two
archive log files.

recv_sys_t::parse_mmap(): Use recv_warp for innodb_log_archive=ON.

recv_sys_t::parse(): Tweak some logic for innodb_log_archive=ON.

log_t::set_recovered_checkpoint(): Set the checkpoint on recovery.
Updates also the end_lsn.

log_t::clear_mmap(): Clean up the logic.

log_t::persist(): Even if the flushed_to_disk_lsn does not change,
we may want to reset the write_lsn_offset.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants