JB2Image.h

Coding bilevel images with JB2.

o JB2Blit
Blit data structure.
o JB2Shape
Shape data structure.
o JB2Image
Main JB2 data structure.
Files "JB2Image.h" and "JB2Image.cpp" address the compression of bilevel images using the JB2 soft pattern matching scheme. These files provide the complete decoder and the decoder back-end. The JB2 scheme is optimized for images containing a large number of self-similar small components such as characters. Typical text images can be compressed into files 3 to 5 times smaller than with G4/MMR and 2 to 4 times smaller than with JBIG1.

JB2 and JBIG2 --- JB2 has strong similarities with the forthcoming JBIG2 standard developped by the "ISO/IEC JTC1 SC29 Working Group 1" which is responsible for both the JPEG and JBIG standards. This is hardly surprising since JB2 was our own proposal for the JBIG2 standard and remained the only proposal for years. The full JBIG2 standard however is significantly more complex and slighlty less efficient than JB2 because it addresses a broader range of applications. Full JBIG2 compliance may be implemented in the future.

JB2 Images --- Class JB2Image is the central data structure implemented here. A JB2Image is composed of an array of shapes and an array of blits. Each shape contains a small bitmap representing an elementary blob of ink, such as a character or a segment of line art. Each blit instructs the decoder to render a particular shape at a specified position in the image. Some compression is already achieved because several blits can refer to the same shape. A shape can also contain a pointer to a parent shape. Additional compression is achieved when both shapes are similar because each shape is encoded using the parent shape as a model. A "O" shape for instance could be a parent for both a "C" shape and a "Q" shape.

Decoding JB2 data --- The first step for decoding JB2 data consists of creating an empty JB2Image object. Function decode then reads the data and populates the JB2Image with the shapes and the blits. Function get_bitmap finally produces an anti-aliased image.

Encoding JB2 data --- The first step for decoding JB2 data also consists of creating an empty JB2Image object. You must then use functions add_shape and add_blit to populate the JB2Image object. Function encode finally produces the JB2 data. Function and the necessary shapes. The compression ratio depends on several factors:

All this is quite easy to achieve in the case of an electronically produced document such as a DVI file or a PS file: we know what the characters are and where they are located. If you only have a scanned image however you must first locate the characters (connected component analysis) and cut the remaining pieces of ink into smaller blobs. Ordering the blits and matching the shapes is then an essentially heuristic process. Although the quality of the heuristics substantially effects the file size, misordering blits or mismatching shapes never effects the quality of the image. The last refinement consists in smoothing the shapes in order to reduce the noise and maximize the similarities between shapes.

ToDo --- Some improvements have been planned for a long time: (a) Shapes eventually will contain information about the baseline: this could improve the handling of the character descenders and also will provide a more understandable way to superpose matching shapes. (b) JB2 files eventually will be able to reference external shape dictionaries: common characters will be shared between document pages. (c) There will be a way to specify a color for each shape: this is good for encoding electronically produced documents.

References

Author:
Paul Howard <pgh@research.att.com> -- JB2 design
Léon Bottou <leonb@research.att.com> -- this implementation
Version:
$Id: JB2Image.h.html,v 1.2 2000/08/26 00:09:29 bcr Exp $

Alphabetic index Hierarchy of classes