I wish it were not so, but the GNU tar manual is not always easy to parse, nor is it necessarily complete on all topics.
A common impediment to effectively using GNU
tar wth modern tape drives is understanding how to use the blocking factor and checkpoint features.
First, let's define some terms as they are used with GNU
tar in increasing order of abstraction:
taris a unit of data that is 512 bytes long, regardless of what "block size" might mean for any underlying storage devices.
tarto be stored in an archive it is converted to a sequence of one or more records. The number of records needed depends on how large the file is, but there will always be at least one whole record for every file.
The current default blocking factor of GNU
20, though you can verify this with your own installation:
$ tar --show-defaults --format=gnu -f- -b20 --quoting-style=escape ...
By multiplying the default blocking factor of
20 by the size of one block in bytes, we can tell that
tar will write to our tape 10240 bytes at a time, or 10 KiB at a time.
This may have made sense in the distant past and with tape drives that expected to be sent data in specific record sizes, however modern LTO-7 tape drives should be written at around 300 MiB per second for maximum efficiency. At the default blocking factor this means tar would have to finish a record and start a new one every 0.03 milliseconds or so.
Modern drives also accept a wide range of "block sizes" for writes, presumably these no longer strongly correspond to the physical layout of the tape media:
# dmesg -t | grep "Block limits" st 6:0:1:0: [st0] Block limits 1 - 8388608 bytes.
Increasing the blocking factor (option
2048 when creating tape archives with GNU
tar and modern tape drives results in a dramatic improvement in throughput and makes it much easier to hit the desired 300 MiB per second data rate. This corresponds to a record size of 512 KiB or 1 MiB respectively.
As with any storage throughput optimisation it is important to benchmark all changes to ensure they result in a real-world improvement, though
1024 is probably a good place to start with tests.
tar operations may be long running, it can be useful to have feedback about progress during the operation.
tar has a few methods to provide this feedback however for integrating
tar in to large operations (e.g. a backup management system) the
exec method provides a lot of flexibility.
This method will cause
tar to execute an external program at periodic intervals while operating, and set certain environment variables with information about the state of progress.
TAR_CHECKPOINT environment variable is just documented as being the "number of the checkpoint", the exact meaning of which is amibiguous.
As far as I can tell from reading the source code (invocations in
checkpoint_run() which increments the
TAR_CHECKPOINT is incremented for every
record written, so to convert from a
checkpoint number to bytes written or read:
With the default blocking factor of
20, and checkpoints configured for every
100records, the 1st checkpoint will be
checkpoint 100, at which time 1013760 bytes (99 records of 20 blocks each) has been written. The second checkpoint will be
200, at which time 2037760 bytes will have been written, and so on.
Or, more succinctly:
bytes_written_or_read = (checkpoint_number - 1) * blocking_factor * 512
Back to Projects