This repository has been archived on 2022-08-10. You can view files and clone it, but cannot push or open issues or pull requests.
chez-openbsd/ta6ob/lz4/examples/blockStreaming_lineByLine.md
2022-08-09 23:28:25 +02:00

3.6 KiB

LZ4 Streaming API Example : Line by Line Text Compression

by Takayuki Matsuoka

blockStreaming_lineByLine.c is LZ4 Straming API example which implements line by line incremental (de)compression.

Please note the following restrictions :

  • Firstly, read "LZ4 Streaming API Basics".
  • This is relatively advanced application example.
  • Output file is not compatible with lz4frame and platform dependent.

What's the point of this example ?

  • Line by line incremental (de)compression.
  • Handle huge file in small amount of memory
  • Generally better compression ratio than Block API
  • Non-uniform block size

How the compression works

First of all, allocate "Ring Buffer" for input and LZ4 compressed data buffer for output.

(1)
    Ring Buffer

    +--------+
    | Line#1 |
    +---+----+
        |
        v
     {Out#1}


(2)
    Prefix Mode Dependency
          +----+
          |    |
          v    |
    +--------+-+------+
    | Line#1 | Line#2 |
    +--------+---+----+
                 |
                 v
              {Out#2}


(3)
          Prefix   Prefix
          +----+   +----+
          |    |   |    |
          v    |   v    |
    +--------+-+------+-+------+
    | Line#1 | Line#2 | Line#3 |
    +--------+--------+---+----+
                          |
                          v
                       {Out#3}


(4)
                        External Dictionary Mode
                +----+   +----+
                |    |   |    |
                v    |   v    |
    ------+--------+-+------+-+--------+
          |  ....  | Line#X | Line#X+1 |
    ------+--------+--------+-----+----+
                            ^     |
                            |     v
                            |  {Out#X+1}
                            |
                          Reset


(5)
                                    Prefix
                                    +-----+
                                    |     |
                                    v     |
    ------+--------+--------+----------+--+-------+
          |  ....  | Line#X | Line#X+1 | Line#X+2 |
    ------+--------+--------+----------+-----+----+
                            ^                |
                            |                v
                            |            {Out#X+2}
                            |
                          Reset

Next (see (1)), read first line to ringbuffer and compress it by LZ4_compress_continue(). For the first time, LZ4 doesn't know any previous dependencies, so it just compress the line without dependencies and generates compressed line {Out#1} to LZ4 compressed data buffer. After that, write {Out#1} to the file and forward ringbuffer offset.

Do the same things to second line (see (2)). But in this time, LZ4 can use dependency to Line#1 to improve compression ratio. This dependency is called "Prefix mode".

Eventually, we'll reach end of ringbuffer at Line#X (see (4)). This time, we should reset ringbuffer offset. After resetting, at Line#X+1 pointer is not adjacent, but LZ4 still maintain its memory. This is called "External Dictionary Mode".

In Line#X+2 (see (5)), finally LZ4 forget almost all memories but still remains Line#X+1. This is the same situation as Line#2.

Continue these procedure to the end of text file.

How the decompression works

Decompression will do reverse order.

  • Read compressed line from the file to buffer.
  • Decompress it to the ringbuffer.
  • Output decompressed plain text line to the file.
  • Forward ringbuffer offset. If offset exceedes end of the ringbuffer, reset it.

Continue these procedure to the end of the compressed file.