'How to append to a zlib stream?

I have been trying to append raw bytes to an existing zlib stream using the deflatePrime function provided by the zlib library. However, i cannot understand what the second and third parameter of deflatePrime(z_streamp strm,int bits,int value); is other then a small note in the docs suggesting that it's supposed to be less then or equal to 16 and some hints in the gzlog.c example. In the file, its called like this:

deflatePrime(&strm, (8 - log->back) & 7, *buf);

and the definition of log->back is:

 int back;       /* location of first block id in bits back from first */

which is initialized like this:

log->back = 3 + (buf[HEAD + 34] & 7);
  1. What is the block id?
  2. What is the buf[HEAD + 34] supposed to point to as that's out of the HEAD bounds?
  3. Why is an arbitrary 3 added to log->back?


Solution 1:[1]

A deflate block is a sequence of bits, not bytes, and can start and end at any bit position in the stream. So a block can start somewhere in the middle of a byte, and can end somewhere in the middle of another byte.

You've started a new raw deflate stream with deflateInit(). Once you start feeding bytes to deflate(), it will, in time, emit a stream of bits as a sequence of bytes. However, you have a few bits leftover after the last full byte from a previous deflate block. You would like to insert those before the bits that are about to be emitted. That's what deflatePrime(strm, bits, value) lets you do. It inserts bits bits from the least significant bits of value right at the start of the soon-to-be-emitted stream of bits. You need to call it before the first deflate() call.

In gzlog, this is used to append new deflate blocks after previous deflate blocks. You first chop off the final block, leaving the last deflate block with data in it. That last deflate block likely (with probability 0.875) ended somewhere in the middle of a byte. In that case, you save those last few bits of that previous block to be inserted before the next block. You will do that insertion with deflatePrime().

The block id is the first three bits of every deflate block. So the location of the block id is the location of the start of the block. log->back is the number of bits that the block start is back from the first length in a stored block, which itself is always on a byte boundary. A stored block is the three header bits starting wherever in a byte, then just enough padding zero bits to get to a byte boundary, followed by the stored block lengths and data, all on byte boundaries. The number of bits back is at least three, since the header is three bits. It could have been as far back as ten bits, which is the three bits plus seven pad bits. It can't be any further, since one bit further back and it ends on a byte boundary, so no pad bits. log->back is therefore in 3..10.

Then the number of bits remaining in the previous block after the last full byte is eight minus log->back, modulo eight. So if log->back is six, then there are two bits before that from the previous deflate block. If log->back is ten, then the first two bits of the header are in the second byte back, with the last bit of the header in the last byte, followed by seven pad bits. Then the number of bits from the previous deflate block is six.

To get the bits back, three is added to what was saved in the extra block, because three was subtracted when it was saved: ext[34] = log->back - 3 + (op << 3); (the operation code is also saved in other bits in that byte). So why is three subtracted and then added? So that the value of 3..10 can be stored in three bits in the extra field as 0..7. Instead of explaining what it is again, I'll copy from the source code comments:

   - First stored block start as the number of bits back from the final stored
     block first length byte.  This value is in the range of 3..10, and is
     stored as the low three bits of the final byte of the extra field after
     subtracting three (0..7).  This allows the last-block bit of the stored
     block header to be updated when a new stored block is added, for the case
     when the first stored block and the last stored block are the same.  (When
     they are different, the numbers of bits back is known to be eight.)  This
     also allows for new compressed data to be appended to the old compressed
     data in the compress operation, overwriting the previous first stored
     block, or for the compressed data to be terminated and a valid gzip file
     reconstructed on the off chance that a compression operation was
     interrupted and the data to compress in the foo.add file was deleted.

I don't know what you mean by "out of the HEAD bounds". That is the offset of a byte in the extra block.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1