Difference between revisions of "LZMA2 compression"

From openZIM
Jump to navigation Jump to search
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
'''LZMA2 (de)compression''' is the standard and only one compression algorithm supported in ZIM. In the zimlib, this done with the xz-utils library.
'''LZMA2 (de)compression''' is the standard and only one compression algorithm supported in ZIM. In the standart implementation this done with the xz-utils library.


A problem with LZMA2 is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma parameters so that the compression ratio of lzma level 3 is almost identical with bzip2 (deprecated). The big advantage is, that decompression of lzma is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma is much slower than bzip2 and the support for xz-utils is not yet that widespread.
A problem with LZMA2 is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma2 parameters so that the compression ratio of lzma2 level 3 is almost identical with bzip2 (deprecated). The big advantage is, that decompression of lzma2 is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma2 is much slower than bzip2 and the support for xz-utils is not yet that widespread.


Here are some test results:
Here are some test results:
Line 7: Line 7:
Creating a file with 55498 index entries with 28936 articles.
Creating a file with 55498 index entries with 28936 articles.


bzip2 (deprecated):
;bzip2 (deprecated):
     size: 90207329
     size: 90207329
     creating: 0:02:18
     creating: 0:02:18
Line 15: Line 15:
     reading random access on Nanonote: 0.7 #/s
     reading random access on Nanonote: 0.7 #/s


lzma:
;lzma2:
     size: 90286916
     size: 90286916
     creating: 0:12:01
     creating: 0:12:01

Latest revision as of 16:08, 17 October 2010

LZMA2 (de)compression is the standard and only one compression algorithm supported in ZIM. In the standart implementation this done with the xz-utils library.

A problem with LZMA2 is, that with higher compression rates the memory needed for decompression increases. Using the highest rate 9, 65 MB RAM is needed. On the Nanonote, which we want to support we have only 32 MB installed. Tests showed, that level 4 is too much, but 3 is ok. xz-utils has a additional extreme-flag, which justifies the lzma2 parameters so that the compression ratio of lzma2 level 3 is almost identical with bzip2 (deprecated). The big advantage is, that decompression of lzma2 is much faster (factor 3-4) than bzip2. The downside is, that creating zim files with lzma2 is much slower than bzip2 and the support for xz-utils is not yet that widespread.

Here are some test results:

Creating a file with 55498 index entries with 28936 articles.

bzip2 (deprecated)
   size: 90207329
   creating: 0:02:18
   reading random access: 29 #/s
   creating full text index: 00:03:01
   size of full text index: 92184996
   reading random access on Nanonote: 0.7 #/s
lzma2
   size: 90286916
   creating: 0:12:01
   reading random access: 120 #/s
   creating full text index: 00:03:03
   size of full text index: 87282408
   reading random access on Nanonote: 2.3 #/s

The tests (except the benchmark on the Nanonote) are done on our test machine - a dual core AMD 2,6GHz.