DjVu Technology --- Over 90 percent of the information in the world is still on paper. Many of those paper documents include color graphics and/or photographs that represent significant invested value. And almost none of that rich content is on the Internet. That's because scanning such documents and getting them onto a Web site has been problematic at best. At the high resolution necessary to preserve the quality of images and to keep type readable, file sizes become far too bulky for acceptable download speed. Reducing resolution to achieve satisfactory download speed means forfeiting quality and legibility. Conventional JPEG and GIF compression techniques only begin to solve the problem. As a result, Web site content developers have been unable to leverage existing printed materials.The DjVu image compression technology http://www.djvu.att.com/ addresses this problem. Content developers can scan high-resolution color pages of books, magazines, catalogs, manuals, historical or ancient documents. The DjVu document format provides the means to compress these bulky images to a size comparable to that of an HTML page. Content providers and consumers around the world can leverage the incredible wealth of information and images that were previously trapped in hard copy form. DjVu is the enabling technology that will deliver on the promise of the Internet as the world's universal library.
There are in fact three kinds of DjVu image files:
In addition we often use IW44 Image files. This is the native format for the IW44 wavelet representation. These files have the same uses then Photo DjVu Images but use a simpler file format. There are two variants of IW44 files for gray level images and color images.
- Photo DjVu Image.
- Photo DjVu Image files are best used for encoding photographic images in colors or in shades of gray. The images are coded using the IW44 wavelet representation optimized for fast progressive rendering.
- Bilevel DjVu Image.
- Bilevel DjVu Image files are best used to compress black and white images representing text and simple drawings. The JB2 data compression model uses a the "soft pattern matching" technique which essentially consists of encoding each character by describing how it differs from a well chosen already encoded character.
- Compound DjVu Image.
- Compound DjVu Image files are an extremely efficient way to compress high resolution color document images containing both pictures and text, such as a page of a magazine. Compound DjVu Images represent the document images using two layers. The background layer is used for encoding the pictures and the paper texture. The foreground layer is used for encoding the text and the drawings.
DjVu Reference Library --- The DjVu Reference Library implements the components of the DjVu technology that are essential for the definition of the file formats. AT&T Labs http://www.att.com/attlabs releases the full source code of the DjVu Reference Library under the terms of the AT&T Source Code License.
To understand the meaning of this release, it is important to realize that the separation of AT&T, NCR and Lucent Technologies has significantly changed the business equation. AT&T is not in the business of selling software. The DjVu technology is interesting for AT&T because it gives the means to propose enhanced services to its customers. This can only happen if we ensure that the DjVu technology is widely disseminated.
This source code release serves three purposes:
- It demonstrates our intention to create a lasting technology. Regardless of AT&T involvement, developers around the world now have a permanent access to critical parts of DjVu technology.
- It allows developers around the world to work with the DjVu technology, to create new advances using our basic building blocks, to create bridges between DjVu and other document representation formats, to create viewers for various platforms, etc.
- It provides an authoritative implementation of the DjVu format for standardization purposes.
The DjVu Reference Library is the first component of DjVu released in source code form. The DjVu Reference Library completely defines the DjVu and IW44 image formats. It does not implement however the sophisticated encoding strategies which allow for the highest compression ratios. These strategies however are very application dependent. We have developed encoders for scanned documents (see http://www.djvu.att.com/djvu. We also know that completely different (and probably much easier) encoding strategies are needed for electronically created documents.
We intend to keep releasing new versions of this library and additional software such as viewers and encoders. Using the currently released code, you can easily do the following:
The following tasks can be achieved with an increasing effort level. The documentation somehow explains the basic ideas, but this is not currently implemented:
- Decoding DjVu and IW44 images.
- Rendering DjVu or IW44 image fragments at any resolution.
- Encode IW44 wavelet images.
- Creating DjVu Photo Images which are basically embedded IW44 images according to the instructions in djvumake.
- Creating Lossless Bilevel DjVu Images using lossless encoding. Instructions are provided in JB2Image.h and djvumake. Lossless encoding should be very efficient with electronically produced documents (since the character shapes are perfectly defined). A DVI to DJVU program comes to mind ...
- Creating Compound DjVu Images for Electronically Produced Images. Just combine the above task with the masking technique described in djvumake. A GIMP to DJVU filter comes to mind ...
- Creating DjVu Images from Scans. The most difficult part involves the separation of the foreground and the background. A simple thresholding works on half the documents and fails miserably on the second half. Creating a robust solution is not easy at all.
We do not plan to publish soon the source code pertaining to the DjVu compression of scanned images. We may provide the source code of older versions. We also provide executables for personal uses. We license this technology to business partners who commit to support the DjVu format. Releasing this code today would negatively affect the business of our partners and eventually reduce the dissemination of the DjVu technology.
We think that this is preferable to the Java alternative where the license mandates that you cannot use the software unless your work fully complies with the Java specifications. We think that we should not prevent people from using our code for completely different purposes. In other words, we prefer to release less software but make it really free (in the sense of free speech not free lunch as you probably know.)
Alphabetic index Hierarchy of classes