------------------

Proc. 18th Symposium on Combinatorial Pattern Matching (CPM '07)

Move-to-Front, Distance Coding, and Inversion Frequencies Revisited.

Travis Gagie (Università del Piemonte Orientale)
Giovanni Manzini (Università del Piemonte Orientale)

Abstract. Move-to-Front, Distance Coding and Inversion Frequencies are three somewhat related techniques used to process the output of the Burrows-Wheeler Transform. In this paper we analyze these techniques from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible. This is a non-trivial task since many compressors have non-constant overheads that become non-negligible when the input string is highly compressible.

Because of the properties of the Burrows-Wheeler transform, being locally optimal ensures an algorithm compresses low-entropy strings effectively. Informally, local optimality implies that an algorithm is able to effectively compress an arbitrary partition of the input string. We show that in their original formulation neither Move-to-Front, nor Distance Coding, nor Inversion Frequencies is locally optimal. Then, we describe simple variants of the above algorithms which are locally optimal. To achieve local optimality with Move-to-Front it suffices to combine it with Run Length Encoding. To achieve local optimality with Distance Coding and Inversion Frequencies we use a novel escape and re-enter strategy.

NOTE: a couple of subtle mistakes are present in the version published in the CPM 07 proceedings. In this extended version we fix these mistakes, improve some of the bounds, and give the complete proof of all theorems and lemmas.

Retrieve the extended version in PDF

------------------

Viewable With Any Browser Valid HTML 3.2!

------------------