Redundancy of Lossless Data Compression for Known Sources by Analytic Methods

Author(s): Michael Drmota;Wojciech Szpankowski

Source:
    Journal:Foundations and Trends® in Communications and Information Theory
    ISSN Print:1567-2190,  ISSN Online:1567-2328
    Publisher:Now Publishers
    Volume 13 Number 4,
Pages: 144(277-417)
DOI: 10.1561/0100000090

Abstract:

Lossless data compression is a facet of source coding and a well studied problem of information theory. Its goal is to find a shortest possible code that can be unambiguously recovered. Here, we focus on rigorous analysis of code redundancy for known sources. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variableto- variable (VV) length codes. In particular, we investigate the average redundancy of Shannon, Huffman, Tunstall, Khodak and Boncelet codes. These codes have succinct representations as trees, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf). Such trees are precisely analyzed by analytic methods, known also as analytic combinatorics, in which complex analysis plays decisive role. These tools include generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, Tauberian theorems, and singularity analysis. The term analytic information the- ory has been coined to describe problems of information theory studied by analytic tools. This approach lies on the crossroad of information theory, analysis of algorithms, and combinatorics.

Full Text