The Free Lossless Image Format (FLIF) is a lossless image compression format. It supports grayscale, RGB and RGBA, with a color depth from 1 to 16 bits per channel. Both still images and animations are supported. FLIF can handle sparsecolor images (e.g. 256 color palettes) effectively. It has an interlaced and a noninterlaced mode; unlike PNG, interlacing usually does not come with worse compression. In terms of compression, FLIF usually outperforms other lossless compression formats like PNG, lossless WebP, BPG, JPEG 2000, JPEG XR, JPEGLS, APNG, MNG, or GIF.
This document specifies the FLIF16 bitstream, as implemented by version 0.2 of the reference encoder (https://github.com/FLIFhub/FLIF), which was released in September 2016.
Overview of the format
A FLIF file consists of 4 parts, with increasingly complicated encoding methods:

The main header, containing basic image metadata: width, height, color depth, number of frames. This information is encoded in a straightforward way, to make it easy to quickly identify a file.

Optional metadata chunks. These chunks contain nonpixel data: Exif/XMP metadata, an ICC color profile, … This information is encoded using DEFLATE compression.

Second header, containing further information about how the actual pixel data is encoded. This header also describes a chain of transformations that were applied before encoding, which have to be reversed after decoding (e.g. the YCoCg color transform). This information is encoded using CABAC entropy coding.

The actual pixel data. This information is encoded using MANIAC entropy coding.
Notational conventions
We use '/' to denote integer division (rounding towards zero) and '>>' to denote arithmetic right shift. This corresponds to the usual operator semantics in e.g. the C programming language.
>>1 denotes arithmetic right shift by one, which corresponds to dividing by two rounding down. For positive integers this is the same as /2, but for negative integers it is not, e.g. 1 / 2 = 0 while 1 >> 1 = 1. 
Part 1: main header
Type  Description  Value 

4 bytes 
Magic 

4 bits 
Interlacing, animation 
3 = ni still; 4 = i still; 5 = ni anim; 6 = i anim 
4 bits 
Number of channels (nb_channels) 
1 = Grayscale; 3 = RGB; 4 = RGBA 
1 byte 
Bytes per channel (Bpc) 

varint 
Width 
width1 
varint 
Height 
height1 
varint 
Number of frames (nb_frames) 
nb_frames2 (only if animation) 
The fifth byte is an ASCII character that can be interpreted as follows:
for noninterlaced still images: '1'
=Grayscale, '3'
=RGB, '4'
=RGBA,
for interlaced images add +0x10 (so 'A'
=Grayscale, 'C'
=RGB, 'D'
=RGBA),
for animations add +0x20 (so 'Q'
, 'S'
, 'T'
for noninterlaced, 'a'
,'c'
,'d'
for interlaced).
Variablesize integer encoding (varint):

An unsigned integer (BigEndian, MSB first) stored on a variable number of bytes.

All the bytes except the last one have a '1' as their first bit.

The unsigned integer is represented as the concatenation of the remaining 7 bit codewords.
nb_frames is 1 for a still image, > 1 for an animation.
Part 2: metadata
Chunks are defined similar to PNG chunks, except the chunk size is encoded with a variable number of bytes, and there is no chunk CRC. Also, the first chunk (which is always "FLIF") has no explicit chunk size and is not DEFLATEcompressed.
Chunk names are either 4 letters (4 bytes), indicating an optional chunk, or 1 byte with a value below 32, indicating a nonoptional chunk.
The convention of using upper and lower case letters is kept, but the meaning of the bits is slightly different.

First letter: uppercase=critical, lowercase=noncritical → noncritical chunks can be stripped while keeping a valid file (it might be rendered differently though)

Second letter: uppercase=public, lowercase=private

Third letter: uppercase=needed to correctly display the image (e.g. a color profile), lowercase=can be stripped safely without changing anything visually

Fourth letter: uppercase=safe to copy blindly (when editing the actual image data), lowercase=may depend on image data (e.g. contain a thumbnail)
This optional part is a concatenation of chunks that each look like this:
Type  Description 

4 bytes 
Chunk name 
varint 
Chunk length (size) 
size bytes 
DEFLATEcompressed chunk content 
The optional chunks are followed by one nonoptional chunk with a singlebyte chunk name. Nonoptional chunks do not have an explicit chunk size and their content is not DEFLATEcompressed. Currently the only nonoptional chunk has name 0x00 (a NUL byte), which indicates a FLIF16 bitstream. Future (nonbackwards compatible) revisions of the FLIF format will use a different name for this nonoptional chunk.
If a decoder encounters an unknown optional chunk, it should proceed decoding (optionally it can produce a warning), as long as the chunk is noncritical. If a decoder encounters an unknown nonoptional chunk, it should fail and produce an error message.
The following optional chunks are defined:
Chunk name  Description  Content (after DEFLATEdecompression) 

iCCP 
ICC color profile 
raw ICC color profile data 
eXif 
Exif metadata 

eXmp 
XMP metadata 
XMP contained within a readonly xpacket with no padding 
Symbol encoding
From here on, all values are encoded using arithmetic coding.
uni_int(min,max) denotes an integer in the interval min..max (inclusive), encoded using Uniform symbol coding (see below) and a 24bit RAC. This encoding is not contextadaptive, and every number in the interval gets a fixed chance (close to a uniform distribution, hence the name). uni_int(b) denotes an integer of b bits, i.e. it’s a synonym for uni_int(0,2^b1).
nz_int_CONTEXT(min,max) denotes an integer in the interval min..max (inclusive), encoded using Nearzero symbol coding (see below) and a 24bit RAC, where the chance table is given by CONTEXT. This encoding is contextadaptive, i.e. the chance table gets updated when encoding or decoding a number. Numbers are encoded using a zerosignexponentmantissa representation, which favors distributions in which nearzero values have a higher chance (hence the name).
gnz_int_CONTEXT(min,max) denotes a simple generalization of Nearzero symbol coding that can be used in case the interval does not necessarily include zero. It is defined as follows:
gnz_int_CONTEXT(min,max) =
nz_int_CONTEXT(0, max  min) + min 
if min > 0 
nz_int_CONTEXT(min  max, 0) + max 
if max < 0 
nz_int_CONTEXT(min,max) 
otherwise 
Range encoder
TODO: describe the 24bit RAC and the 12bit chances.
Chance table update
TODO: describe how cutoff and alpha influence this.
Uniform Symbol Coding
An integer encoded with Uniform symbol coding uses a series of 50% chance bits (chance is 2048/4096, nonadaptive) to recursively narrow down the input interval until it is a singleton. The following code describes how to decode such an integer:
// assumption: min <= max
int read_uniform_int(int min, int max) {
if (min == max) return min;
// split in [min..mid] [mid+1..max]
int med = min + (maxmin)/2;
if (rac.read_bit()) {
return read_uniform_int(mid+1, max);
} else {
return read_uniform_int(min, mid);
}
}
There is no compression advantage of using a RAC to encode these 50% chance bits (just regular bits would be just as efficient), but it is convenient to use the RAC anyway to avoid having to mix the RAC bitstream with regular bits.
For intervals of the form 0..2^k this representation boils down to writing down the number as k binary bits; for other intervals it can be slightly more concise than binary notation.
This representation is used mostly for simple signaling, where context adaptation would be of little use.
Nearzero symbol coding
This is the representation used for the bulk of the data, including the pixel values. It is very efficient for values that are close to zero (e.g. wellpredicted pixels).
An integer encoded with Nearzero symbol coding uses a zerosignexponentmantissa representation, where the exponent is represented in unary notation. Every bit position of this representation has its own RAC context (so their chances adapt separately); the exponent bits use different chances not just depending on their position but also depending on the sign. Chances are represented as 12bit numbers (1..4095), where 1 means that the bit is very likely to be zero (it has 1/4096 chance of being one), while 4095 means that the bit is very likely to be one (the chance is 4095/4096).
The following code describes how to decode an integer in this representation:
// assumption: min <= 0 <= max
int read_nearzero_int(int min, int max) {
if (min == max) return min;
if (rac.read(ZERO)) return 0;
bool sign; // true = positive, false = negative
if (min < 0) {
if (max > 0) {
// only actually read the sign bit if both signs are possible
sign = rac.read(SIGN);
} else {
// max <= 0 and the number is not zero, so it is negative
sign = false;
}
} else {
// min >= 0 and the number is not zero, so it is positive
sign = true;
}
// highest possible absolute value
const int amax = (sign? max : min);
// highest possible exponent
const int emax = ilog2(amax);
// exponent is represented in unary: a series of 1s followed by 0
// (the end 0 is dropped if e=emax)
int e;
for (e = 0 ; e < emax ; e++) {
if (rac.read(EXP(e,sign)) break;
}
// the first mantissa bit is always 1
int have = (1 << e);
// if all other mantissa bits are 1, then the total is have+left
int left = have1;
// read mantissa bits from mostsignificant to leastsignificant
for (int pos = e; pos>0;) {
left >>= 1; pos;
// if the bit is 1, then the value will be at least minabs1
int minabs1 = have  (1<<pos);
// if the bit is 0, then the value will be at most maxabs0
int maxabs0 = have  left;
if (minabs1 > amax) {
// 1bit is impossible (would bump value above maximum),
// so assume the bit is 0 without reading it
} else if (maxabs0 >= 1) {
// 0bit and 1bit are both possible,
// so we read the bit and adjust what we have if it is a 1
if (rac.read(MANT(pos))) have = minabs1;
} else {
// 0bit is impossible (would make the value zero),
// so assume the bit is 1 without reading it
have = minabs1;
}
}
return (sign ? have : have);
}
The chances are initialized as follows:
bit  initial chance 

ZERO 
1000 
SIGN 
2048 
EXP(0,sign) 
1000 
EXP(1,sign) 
1200 
EXP(2,sign) 
1500 
EXP(3,sign) 
1750 
EXP(4,sign) 
2000 
EXP(5,sign) 
2300 
EXP(6,sign) 
2800 
EXP(7,sign) 
2400 
EXP(8,sign) 
2300 
EXP(9,sign) 
2048 
EXP(k,sign), k > 9 
2048 
MANT(0) 
1900 
MANT(1) 
1850 
MANT(2) 
1800 
MANT(3) 
1750 
MANT(4) 
1650 
MANT(5) 
1600 
MANT(6) 
1600 
MANT(7) 
2048 
MANT(k), k > 7 
2048 
Part 3: second header
Type  Description  Condition  Default value 

1 byte 
NUL byte (0x00), chunk name of a FLIF16 bitstream 

uni_int(1,16) 
Bits per pixel of the channels 
Bpc == '0': repeat(nb_channels) 
8 if Bpc == '1', 16 if Bpc == '2' 
uni_int(0,1) 
Flag: alpha_zero 
nb_channels > 3 
0 
uni_int(0,100) 
Number of loops 
nb_frames > 1 

uni_int(0,60_000) 
Frame delay in ms 
nb_frames > 1: repeat(nb_frames) 

uni_int(0,1) 
Flag: has_custom_cutoff_and_alpha 

uni_int(1,128) 
cutoff 
has_custom_cutoff_and_alpha 
2 
uni_int(2,128) 
alpha divisor 
has_custom_cutoff_and_alpha 
19 
uni_int(0,1) 
Flag: has_custom_bitchance 
has_custom_cutoff_and_alpha 
0 
? 
Bitchance 
has_custom_bitchance 

variable 
Transformations (see below) 

uni_int(1) = 0 
Indicator bit: done with transformations 

uni_int(0,2) 
Invisible pixel predictor 
alpha_zero && interlaced && alpha range includes zero 
Channels are ordered as follows:
Channel number  Description 

0 
Red or Gray 
1 
Green 
2 
Blue 
3 
Alpha 
Transformations
For each transformation:
Type  Description 

uni_int(1) = 1 
Indicator bit: not done yet 
uni_int(0,13) 
Transformation identifier 
variable 
Transformation data (depends on transformation) 
Transformations have to be encoded in ascending order of transformation identifier. All transformations are optional.
Transformations serve two main purposes:

to modify the pixel data (in a reversible way) to make it compress better, and

to keep track of the range of actually occuring pixel values, in order to narrow it down.
Initially, pixel values are assumed to be in the range 0..2^(bit_depth); this range can be modified by transformations. We’ll use range(channel).min and range(channel).max to denote the global minimum and maximum value of a particular channel.
We also use a potentially more accurate (narrow) conditional range crange(channel,values) to denote the range of a pixel value in channel channel, given that the pixel values in previously encoded channels are values. Initially, the conditional ranges are simply equal to the global range, but transformations might change that.
Finally, we define a function snap(channel,values,x) which given a pixel value x for channel channel and pixel values values in previously encoded channels, returns a 'valid' value as close as possible to x. Usually, snap(channel,values,x) simply clamps x to the conditional range crange(channel,values), but the ColorBuckets transformation changes that behavior.
In the following descriptions of transformations, we use orig_range, orig_crange, orig_snap to denote the original ranges and snap function (the initial ones, or the ones resulting from the previous transformation in the chain). We use new_range, new_crange, new_snap to denote the updated ranges and snap function.
In part 4 we will use range, crange and snap to denote the final ranges and snap functions, i.e. after applying all transformations.
We will now describe the transformations and their encoding in more detail.
Transformation 0: ChannelCompact
The ChannelCompact transformation looks at each channel independently, and reduces its range by eliminating values that do not actually occur in the image.
To be able to reconstruct the original values, the mapping from the reduced range to the original range is encoded. Nearzero symbol coding is used, with a single context which we’ll call A.
The information is encoded as follows:

For each channel c :

new_range(c).max = nz_int_A(0,orig_range(c).maxorig_range(c).max)

min = orig_range(c).min

For i = 0..new_range(c).max1 :

decompacted(i) = min + nz_int_A(0, orig_range(c).maxmin+new_range(c).maxi)

min = decompacted(i)+1


The effect of this transformation is as follows:

new_range(c).min = new_crange(c,…).min = 0

new_range(c).max = new_crange(c,…).max is explicitly encoded

new_snap is the default snap function (simple clamping)
To reverse the transformation (after decoding the pixel values) :

For each channel c :

For every pixel value v :

Set v = decompacted(v)


Transformation 1: YCoCg
The YCoCg transformation converts the colorspace from RGB to YCoCg. No information has to be encoded for this (besides the identifier of the transformation).
The transformation only affects the first three channels (0,1,2). It is not allowed to be used if nb_channels = 1.
Pixel transformation
Channel  Original meaning  New meaning  Forward transform  Reverse transform 

0 
Red (R) 
Luma (Y) 
Y = (((R+B)>>1) + G)>>1 
R = Co + Y + ((1Cg)>>1)  (Co>>1) 
1 
Green (G) 
Chroma orange (Co) 
Co = R  B 
G = Y  ((Cg)>>1) 
2 
Blue (B) 
Chroma green (Cg) 
Cg = G  ((R+B)>>1) 
B = Y + ((1Cg)>>1)  (Co>>1) 
Luma (Y) corresponds to roughly 50% green, 25% red, 25% blue. Chroma orange (Co) is positive for colors near orange (red, orange, yellow), and negative for colors near blue. Chroma green (Cg) is positive for colors near green and negative for colors near purple.
The YCoCg transformation tends to decorrelate the channels, which helps to improve compression. For example, for grayscale images, i.e. images where R=G=B, the result of the YCoCg transform is Y=R (=G=B), Co=Cg=0. Human perception is more sensitive to luma than it is to chroma. For this reason, in interlaced mode, the default zoomlevel/channel ordering (see below) gives priority to the luma channel; this results in partial decodes (progressive previews) which are effectively chroma subsampled. 
New ranges
Define origmax4 to be equal to max(orig_range(0).max,orig_range(1).max,orig_range(2).max)/4+1 and newmax to be equal to 4 * (origmax4)  1. In the most common case where the three channels have the range 0..255, this evaluates to origmax4 = 64 and newmax = 255.
Channel number c  Original meaning  New meaning  new_range(c) 

0 
Red (R) 
Luma (Y) 
0..newmax 
1 
Green (G) 
Chroma orange (Co) 
newmax..newmax 
2 
Blue (B) 
Chroma green (Cg) 
newmax..newmax 
Unlike the RGB color space, not every coordinate in the YCoCg color space corresponds to an actual color. In particular, the range for Co and Cg is much smaller for nearblack and nearwhite colors than for intermediate luma values:
(Download the above animation losslessly: WebP (847K), APNG (938K), FLIF (90K); or lossy: GIF (7024K), MP4 (235K))
The conditional range function crange is updated to reflect this. It is updated as follows:

new_crange(0) = new_range(0)

new_crange(1,yval).min =
3 + 4 * yval if yval < origmax4  1
4 * (yvalnewmax) if yval > 3 * origmax4  1
newmax otherwise

new_crange(1,yval).max =
3 + 4 * yval if yval < origmax4  1
4 * (newmaxyval) if yval > 3 * origmax4  1
newmax otherwise

new_crange(2,yval,coval).min =
2  2 * yval if yval < origmax4  1
2 * (newmaxyval) + 2 * ((abs(coval)+1)/2) if yval > 3 * origmax4  1
min(2 * yval + 1, 2 * newmax  2 * yval  2 * abs(coval)+1)/2 otherwise

new_crange(2,yval,coval).max =
1 + 2 * yval  2 * (abs(coval)/2) if yval < origmax4  1
2 * (newmaxyval) if yval > 3 * origmax4  1
min(2 * (yval newmax),  2 * yval  1 + 2* (abs(coval)/2)) otherwise
Transformation 2: reserved (unused)
Transformation identifier 2 is not used. It is reserved for future extensions that support transformations to other color spaces like YCbCr.
Transformation 3: PermutePlanes
The PermutePlanes transformation reorders (permutes) the channels; optionally it also subtracts the values of the new channel 0 from the values of channels 1 and 2. This transformation is useful if for some reason the YCoCg transformation is not used: it can e.g. be used to transform RGB to G (RG) (BG).
This transformation is not allowed to be used in conjunction with the YCoCg transformation; it is also not allowed to be used if nb_channels = 1. Also, if alpha_zero is true, then channel 3 (Alpha) is not allowed to be permuted to a different channel number.
There are two main reasons to do a channel reordering: better compression (the order matters for compression since the values of previously encoded channels are used in the MANIAC properties, see below), and better progressive previews (e.g. Green is perceptually more important than Red and Blue, so it makes sense to encode it first). Additionally, subtracting channel 0 from the other channels is a simple form of channel decorrelation; usually not as good as the YCoCg transformation though.
We denote the permutation used by PermutePlanes with p, where p(nc)=oc means that the new channel number nc corresponds to the old channel number oc.
Without subtraction, the forward transformation looks as follows:
Channel number c  Original pixel value  New pixel value (no Subtract)  new_range(c) (no Subtract) 

0 
v_{0} 
v_{p(0)} 
orig_range(p(0)) 
1 
v_{1} 
v_{p(1)} 
orig_range(p(1)) 
2 
v_{2} 
v_{p(2)} 
orig_range(p(2)) 
3 
v_{3} 
v_{p(3)} 
orig_range(p(3)) 
With subtraction, the forward transformation looks as follows:
Channel number c  Original pixel value  New pixel value (with Subtract)  new_range(c) (with Subtract) 

0 
v_{0} 
v_{p(0)} 
orig_range(p(0)) 
1 
v_{1} 
v_{p(1)}v_{p(0)} 
orig_range(p(1)).minorig_range(p(0)).max to orig_range(p(1)).maxorig_range(p(0)).min 
2 
v_{2} 
v_{p(2)}v_{p(0)} 
orig_range(p(2)).minorig_range(p(0)).max to orig_range(p(2)).maxorig_range(p(0)).min 
3 
v_{3} 
v_{p(3)} 
orig_range(perm(3)) 
The reverse transformation can easily be derived from this: given input values (in_{0},in_{1},in_{2},in_{3}), the output values are given by out_{p(c)} = in_{c} if there is no Subtract or c is 0 or 3, and by out_{p(c)} = in_{c} + in_{0} if there is Subtract and c is 1 or 2.
To encode the parameters of this transformation, nearzero symbol coding is used, with a single context which we will call A.
Type  Description  Condition 

nz_int_A(0,1) 
Boolean: Subtract 

nz_int_A(0,nb_channels1) 
p(c) 
repeat: c from 0 to nb_channels1 
The decoder has to check that p actually describes a permutation, i.e. it is a bijection (no two input channels map to the same output channel).
Transformation 4: Bounds
Transformation 5: PaletteAlpha
Transformation 6: Palette
Transformation 7: ColorBuckets
The ColorBuckets transformation is an alternative to the Palette transformations; it is useful for sparsecolor images, especially if the number of colors is relatively small but still too large for effective palette encoding.
Unlike the Palette transformations, ColorBuckets does not modify the actual pixel values. As a result, the reverse transformation is trivial: nothing has to be done. However, the transformation does change the crange and the snap functions. By reducing the range of valid pixel values (sometimes drastically), compression improves.
A 'Color Bucket' is a (possibly empty) set of pixel values. For channel 0, there is a single Color Bucket b_{0}. For channel 1, there is one Color Bucket for each pixel value in orig_range(0); we’ll denote these Color Buckets with b_{1}(v_{0}). For channel 2, there is one Color Bucket for each combination of values (v_{0},Q(v_{1})) where v_{0} is in orig_range(0), v_{1} is in orig_range(1), and the quantization function Q maps x to (x  orig_range(1).min) / 4. Finally, for channel 3, there is a single Color Bucket b_{3}.
The new ranges are identical to the original ranges: new_range(c) = orig_range(c).
The new conditional ranges are given by the minimum and maximum of the corresponding Color Buckets:

new_crange(0) = min(b_{0}) .. max(b_{0})

new_crange(1,v_{0}) = min(b_{1}(v_{0})) .. max(b_{1}(v_{0}))

new_crange(2,v_{0},v_{1}) = min(b_{2}(v_{0},Q(v_{1})) .. max(b_{2}(v_{0},Q(v_{1}))

new_crange(3) = min(b_{3}) .. max(b_{3})
The new snap function returns the value in the corresponding Color Bucket that is closest to the input value; if there are two such values, it returns the lowest one. For example, if Color Bucket b_{1}(20) is the set {1,3,4,6,8}, then new_snap(1,20,x) returns 1 for x=100, 4 for x=5, 6 for x=6, and 8 for x=100.
The ColorBuckets transformation is not allowed in the following circumstances:

Palette or PaletteAlpha is used, or in general, both channel 0 and 2 contain only zeroes: orig_range(0) = orig_range(2) = 0 .. 0

The image is a grayscale image

Channel 1 is trivial: orig_range(1) is a singleton

Channel 0, 1 or 2 requires more than 10 bits: orig_range(c).max  orig_range(c).min > 1023
To encode the Color Buckets, (generalized) nearzero symbol coding is used with 6 different contexts, which we will call A,B,C,D,E, and F. The encoding is quite complicated.
To decode, all Color Buckets are initialized to empty sets.
First b_{0} is encoded:
Type  Description  Condition  Effect 

nz_int_A(0,1) = 1 
Boolean: nonempty 

gnz_int_B(orig_range(0).min, orig_range(0).max) 
min 

gnz_int_C(min, orig_range(0).max) 
max 
if max  min < 2, then b_{0} = { min, max } 

nz_int_D(0,1) 
discrete 
max  min > 1 
if discrete = 0, then b_{0} = min .. max 
gnz_int_E(2, min(255, max  min)) 
n = size of b_{0} 
discrete 
b_{0}[0] = min, b_{0}[n1] = max 
gnz_int_F(b_{0}[i1]+1, max + 1 + i  n) 
b_{0}[i] 
discrete, repeat: i from 1 to n2 
Next, for all values v_{0} in b_{0}, Color Bucket b_{1}(v_{0}) is encoded:
Type  Description  Condition  Effect 

nz_int_A(0,1) 
Boolean: nonempty 
if false, b_{1}(v_{0}) is the empty set 

gnz_int_B(orig_crange(1,v_{0}).min, orig_crange(1,v_{0}).max) 
min 
nonempty 

gnz_int_C(min, orig_range(1,v_{0}).max) 
max 
nonempty 
if max  min < 2, then b_{1}(v_{0}) = { min, max } 
nz_int_D(0,1) 
discrete 
max  min > 1 
if discrete = 0, then b_{1}(v_{0}) = min .. max 
gnz_int_E(2, min(510, max  min)) 
n = size of b_{1}(v_{0}) 
discrete 
b_{1}(v_{0})[0] = min, b_{1}(v_{0})[n1] = max 
gnz_int_F(b_{1}(v_{0})[i1]+1, max + 1 + i  n) 
b_{1}(v_{0})[i] 
discrete, repeat: i from 1 to n2 
Next, for all values v_{0} in b_{0}, for all values qv_{1} from 0 to (orig_range(1).max  orig_range(1).min) / 4, Color Bucket b_{2}(v_{0},qv_{1}) is encoded if for some k in 0..3, it is the case that v_{1} = qv_{1} * 4 + orig_range(1).min + k is in the set b_{1}(v_{0}):
Type  Description  Condition  Effect 

nz_int_A(0,1) 
Boolean: nonempty 
if false, remove v_{1} from b_{1}(v_{0}) 

gnz_int_B(min_{k=0..3}(orig_crange(2,v_{0},v_{1} + k).min), max_{k=0..3}(orig_crange(2,v_{0},v_{1} + k).max)) 
min 
nonempty 

gnz_int_C(min, max_{k=0..3}(orig_crange(2,v_{0},v_{1} + k).max)) 
max 
nonempty 
if max  min < 2, then b_{2}(v_{0},qv_{1}) = { min, max } 
nz_int_D(0,1) 
discrete 
max  min > 1 
if discrete = 0, then b_{2}(v_{0},qv_{1}) = min .. max 
gnz_int_E(2, min(5, max  min)) 
n = size of b_{2}(v_{0},qv_{1}) 
discrete 
b_{2}(v_{0},qv_{1})[0] = min, b_{2}(v_{0},qv_{1})[n1] = max 
gnz_int_F(b_{2}(v_{0},qv_{1})[i1]+1, max + 1 + i  n) 
b_{2}(v_{0},qv_{1})[i] 
discrete, repeat: i from 1 to n2 
Finally, if there is an Alpha channel (i.e. nb_channels > 3), then b_{3} is encoded in exactly the same way as b_{0}.
Transformation 8: reserved (unused)
Transformation 9: reserved (unused)
Transformation 10: DuplicateFrame
Transformation 11: FrameShape
Transformation 12: FrameLookback
Transformation 13: reserved (unused)
MANIAC entropy coding
MetaAdaptive Nearzero Integer Arithmetic Coding (MANIAC) is a variant of Contextadaptive binary arithmetic coding (CABAC) in which not just the probability model is adaptive (based on local context), but also the context model itself is adaptive. In this sense, it is metaadaptive.
The symbol encoding itself corresponds to the Nearzero symbol coding method described above. The chance table (the chances for ZERO, SIGN, EXP(p,s) and MANT(p)) depends on the local context; these chances adapt each time a symbol is encoded or decoded.
The local context is described as a vector of integer numbers, which are called properties. To determine which chance table to use, a decision tree is used. This tree is called a MANIAC tree. The full structure of the tree is known in advance by the decoder (it is encoded in part 4 of the bitstream, see below). However, at decode time, the tree is initially only partially used (only the root node at the beginning), and it 'grows' to eventually become the full tree.
The inner nodes of a MANIAC tree contain a test of the form property[k] > value
. If this test evaluates to true
, then the left branch is taken, otherwise the right branch is taken. Eventually a leaf node is reached.
The leaf nodes of a MANIAC tree contain a counter and a chance table. This chance table is used in the Nearzero symbol coding. The counter gets decremented each time the leaf node is reached. When it reaches zero, the tree 'grows' and the leaf node becomes an inner node (a decision node). Its chance table gets duplicated and becomes the chance table of its two branch nodes.
Part 4: pixel data
In this final part of the FLIF bitstream, the actual pixel values are encoded. There are two methods (or modes) of pixel encoding: noninterlaced and interlaced. The main header indicates which method is used (see Part 1 above).
NonInterlaced method
If this encode method is used, then we start immediately with the encoding of the MANIAC trees (see below), followed by the encoding of the pixels. The order in which the pixels are encoded is described by the following nested loops:

For all channels c : (in the order 4,3,0,1,2, skipping those that don’t exist or have a singleton range)

For all rows y, i.e. for y from 0 to height1 :

For all frames f , i.e. for f from 0 to nb_frames1 :

For all columns x, i.e. for x from 0 to width1 :

Encode the pixel at location (x,y) in channel c of frame f




How the pixel is actually encoded is described below.
Interlaced method
For interlacing, we define the notion of zoomlevels. Zoomlevel 0 is the full image. Zoomlevel 1 are all the evennumbered rows of the image (counting from 0). Zoomlevel 2 are all the evennumbered columns of zoomlevel 1. In general: zoomlevel 2k+1 are all the evennumbered rows of zoomlevel 2k, and zoomlevel 2k+2 are all the evennumbered columns of zoomlevel 2k+1.
In other words, every evennumbered zoomlevel 2k is a downsampled version of the image, at scale 1:2^k.
We define the 'maximum zoomlevel' max_zl of an image as the zoomlevel with the lowest number that consists of a single pixel. This is always the pixel in the topleft corner of the image (row 0, column 0). This pixel is always encoded first, as a special case.
The zoomlevels are encoded from highest (most zoomed out) to lowest; in each zoomlevel, obviously only those pixels are encoded that haven’t been encoded previously. So in an evennumbered zoomlevel, the oddnumbered rows are encoded, while in an oddnumbered zoomlevel, the oddnumbered columns are encoded.
If the interlaced encode method is used, we do not encode the MANIAC trees right away. Instead, we initialize the trees to a single root node per channel, and start encoding a 'rough preview' of the image (a few of the highest zoomlevels). This allows a rough thumbnail extraction without needing to decode the MANIAC tree. Then the MANIAC tree is encoded, and then the rest of the zoomlevels are encoded.
Type  Description  Condition 

uni_int(0..max_zl) 
Number of the first MANIACencoded zoomlevel: first_zl 

uni_int(range(c).min,range(c).max) 
pixel value of the topleft (0,0) pixel of channel c 
repeat(c in 0..nb_channels1) 
Encode_zoomlevels(max_zl1,first_zl+1) 
encoding of zoomlevels max_zl1 until first_zl+1 

encoding of MANIAC trees 
see further below 

Encode_zoomlevels(first_zl,0) 
encoding of zoomlevels first_zl until 0 
The encoding of a series of zoomlevels happens by interleaving the channels in some way. This interleaving is either in the 'default order', or in a custom order. In any case, the following invariants must hold:

Zoomlevel k of a channel can only be encoded after zoomlevel k+1 of that channel has already been encoded;

If channel 3 exists and alpha_zero is true, then zoomlevel k of channel 0 (usually Luma) can only be encoded after zoomlevel k of channel 3 (Alpha) has already been encoded;

Zoomlevel k of channel 1 (usually Co) can only be encoded after zoomlevel k of channel 0 (usually Luma) has been encoded;

Zoomlevel k of channel 2 (usually Cg) can only be encoded after zoomlevel k of channels 0 and 1 (Luma and Co) have been encoded;

If channel 4 (FrameLookback) exists: zoomlevel k of any other channel (0,1,2, or 3) can only be encoded after zoomlevel k of channel 4 has already been encoded.
TODO: describe default order
Encode_zoomlevels(h,l)
There are three different pixel predictors in interlaced mode (numbered 0, 1, 2). For each channel, either the same predictor is used in all zoomlevels h to l (in that case the predictor number is encoded only once), or a different predictor can be picked for each zoomlevel (in that case the number 1 is encoded, and the actual predictor number is encoded at the beginning of each zoomlevel).
Type  Description  Condition 

uni_int(0,1) 
Boolean: default_order 

uni_int(1,2) 
Pixel predictors pred[channel] 
repeat(nb_channels) 
Repeat nb_channels * (hl+1) times: (once for every channel/zoomlevel in the range from h to l)
Type  Description  Condition  Default value 

uni_int(0,nb_channels1) 
Channel c to be encoded 
not default_order 
given by default order 
Zoomlevel z is implicit 

uni_int(0,2) 
Pixel predictor p to use 
pred[c] == 1 
pred[c] 
Encode_zl(c,z,p) 
Encoding of the next zoomlevel of channel c 
range(c).min < range(c).max 
The encoding of a single zoomlevel/channel combination is defined as follows:
Encode_zl(c,z,p) :

If z is even (a 'horizontal' zoomlevel)

For all odd rows y of zoomlevel z, i.e. for y from 1 to (height1)/(2^(z/2)) in steps of 2 :

For all frames f, i.e. for f from 0 to nb_frames1 :

For all columns x of zoomlevel z, i.e. for x from 0 to (width1)/(2^(z/2)) :

Encode the pixel at location (x,y) in zoomlevel z of channel c of frame f, using predictor p





Else (if z is odd, that is, a 'vertical' zoomlevel)

For all rows y of zoomlevel z, i.e. for y from 0 to (height1)/(2^(z+1/2)) :

For all frames f, i.e. for f from 0 to nb_frames1 :

For all odd columns x of zoomlevel z, i.e. for x from 1 to (width1)/(2^(z/2)) with steps of 2 :

Encode the pixel at location (x,y) in zoomlevel z of channel c of frame f, using predictor p




Pixel encoding
The overall encoding mechanism of a pixel value v is as follows:

The value is predicted, resulting in a predicted value guess

The range of possible values is computed as accurately as possible using crange(c,…), resulting in min and max

guess is adjusted to be a valid value in this range using snap(c,…)


The local context of the pixel is computed as a MANIAC property vector pvec

The number that actually gets encoded is the difference between the actual and predicted value v  guess

If the prediction is any good, these numbers tend to be close to zero

To decode the value: nz_int_MANIAC_{c,pvec}(minguess, maxguess) + guess

The above mechanism is the same in interlaced and noninterlaced mode and for all channels; however the predictor (guess) and the layout of property vector (pvec) depends on the mode and the channel.
Skipped pixels
Some pixels are not encoded at all. There are three such cases:

If the alpha_zero flag is true, then 'invisible pixels' (pixels with Alpha value zero) semantically have undefined RGB values. So if the pixel value in channel 3 (Alpha) is equal to 0, then the pixel values in channels 0, 1, 2 (RGB, or e.g. YCoCg after transformations) are not encoded. However, the decoder does need to set the values of these pixels to something, since those 'invisible' values might very well be used in predictors or properties of neighboring visible pixels. It sets these values simply to the predicted value. In interlaced mode, the predictor to use for that is called the 'invisible pixel predictor' (see part 3 above).

In animations which use the FrameShape transformation, from the second frame onwards, all rows have a 'begin' and 'end' column. Pixels before the begin and after the end are not encoded; their value is equal to the corresponding pixel value from the previous frame.

In animations which use the FrameLookback transformation, channel 4 (Lookback) refers to past frames. If the value in channel 4 is not zero but some number k > 0, then all other channels are not encoded for that pixel. The pixel value for channels 0,1,2,3 is equal to the corresponding pixel value from k frames ago.
As an optimization, it is safe to simply skip pixels from a constant channel, that is, a channel which has a singleton range. Technically, one could argue that they are actually encoded, but for each pixel value v, we have v = min = max = guess, which means the symbol encoding doesn’t use any bits. 
Pixel predictors
The different pixel predictors are described below. In each case, the actual predicted value is the value returned by the snap function. Usually the snap function will simply return its input value, but not always. Noteworthy exceptions are Predictor 1 of Interlaced mode, which as defined below could even be outside the channel range range, and any predictor in combination with the ColorBuckets transformation, which can have a nontrivial snap function.
Noninterlaced mode
Assume we have to predict pixel X in channel c given its previously decoded neighbors:
TL 
T 
L 
X 
The predicted value for X is the median of the following three values:

L + T  TL

L

T
Border conditions:

If X is at column 0 (so it has no left neighbor L), then we set L = range(c).min, except when alpha_zero is true, c < 3 and the alpha value at the position of pixel X is zero; in that case we set L = (range(c).min+range(c).max)/2.

If X is at row 0 (so it has no top neighbor T), then we set T = TL = L.

If X is at row > 0 and column 0, then TL = T.
Interlaced mode
In interlaced mode, there are three different predictors. Moreover, the predictors are slightly different in evennumbered zoomlevels compared to oddnumbered zoomlevels, since slightly different neighboring pixels are available (already decoded) : in horizontal zoomlevels the pixel below the current pixel is already known (since it is part of the previously decoded zoomlevel), while in vertical zoomlevels the pixel to the right of the current pixel is already known.
Horizontal zoomlevel (evennumbered)
Assume we have to predict pixel X in channel c given its previously decoded neighbors.
These neighbors are defined with respect to the current zoomlevel. In the full image, these 'neighbors' are not directly adjacent (except in zoomlevel 0). 
TT 

TL 
T 
TR 

LL 
L 
X 

BL 
B 
BR 
The following predictors are defined:
 Predictor 0

(T + B)>>1
 Predictor 1

The median of the following three values:

(T + B)>>1

L + T  TL

L + B  BL

 Predictor 2

The median of T, B and L
Border conditions:

If X is at column 0 (so it has no left neighbor L), then we set L = TL = BL = T

If X is at the rightmost column, then we set TR = T.

If X is at the last row, then BL = B = L.

If X is at the rightmost column or last row, then BR = B.
Vertical zoomlevel (oddnumbered)
Assume we have to predict pixel X in channel c given its previously decoded neighbors.
TT 

TL 
T 
TR 

LL 
L 
X 
R 
BL 
BR 
The following predictors are defined:
 Predictor 0 (P_{0})

(L + R)>>1
 Predictor 1 (P_{1})

The median of the following three values:

(L + R)>>1

L + T  TL

R + T  TR

 Predictor 2 (P_{2})

The median of T, L and R
Border conditions:

If X is at row 0 (so it has no top neighbor T), then we set T = TL = TR = L

If X is at the rightmost column, then we set TR = R = T.

If X is at the last row, then BL = L.

If X is at the rightmost column or last row, then BR = R.
MANIAC properties
The MANIAC property vector pvec which is used to determine the context (i.e. the MANIAC tree leaf node) is constructed in the following way. Assume the pixel currently being decoded is pixel X in channel number c and the pixel values of previously decoded neighboring pixels (of the same channel) are named as in the diagram below — it depends on the interlacing mode and zoomlevel which of these are known at decode time.
TT 

TL 
T 
TR 

LL 
L 
X 
R 
BL 
B 
Moreover, we use X_{pc} to denote the pixel value at the position of X in a previously decoded channel pc, and P to denote the predicted value for X (as described above).
Finally, we define the range maxdiff(c) as the range of a difference between two pixel values in channel c, that is: maxdiff(c).min = range(c).min  range(c).max and maxdiff(c).max = range(0).max  range(c).min.
Noninterlaced mode
In noninterlaced mode, the following MANIAC properties are defined for a pixel in channel c:
Property  Description  Condition  Property range (prange) 

X_{0} 
Luma / channel 0 value 
0 < c < 3 
range(0) 
X_{1} 
Co / channel 1 value 
1 < c < 3 
range(1) 
X_{3} 
Alpha value 
c < 3 < nb_channels 
range(3) 
P 
Predicted value snap(c,…,median(L+TTL,L,T)) 
range(c) 

M_{ni} 
Medianindex (see below) 
0..2 

L  TL 
Left  Topleft 
maxdiff(c) 

TL  T 
Topleft  Top 
maxdiff(c) 

T  TR 
Top  Topright 
maxdiff(c) 

TT  T 
Toptop  Top 
maxdiff(c) 

LL  L 
Leftleft  Left 
maxdiff(c) 
For images without an alpha channel, channel 0 has 7 properties (numbered 0 to 6), channel 1 has 8 properties, and channel 2 has 9 properties. For images with an alpha channel, channel 0 has 8 properties, channel 1 has 9 properties, channel 2 has 10 properties, and channel 3 has 7 properties. In animations with FrameLookback, channel 4 has 7 properties. 
Border conditions: if in the last five properties, which are differences between two neighboring pixels, one of those pixels does not exist, the property is defined to be zero.
The medianindex M_{ni} indicates which of the three input values for the median predictor became the predictor; more precisely, it is defined as follows:
M_{ni} =
0 
if P = L+TTL 
1 
if P = L (and the above condition does not hold) 
2 
if P = T (and neither of the above conditions hold) 
0 
otherwise 
Interlaced mode
In interlaced mode, the following MANIAC properties are defined for a pixel in zoomlevel z of channel c using predictor p:
Property  Description  Condition  Property range (prange) 

X_{0} 
Luma / channel 0 value 
0 < c < 3 
range(0) 
X_{1} 
Co / channel 1 value 
1 < c < 3 
range(1) 
X_{3} 
Alpha value 
c < 3 < nb_channels 
range(3) 
M_{i} 
Medianindex (see below) 
0..2 

X_{0}  (T_{0}+B_{0})>>1 
Luma 'prediction miss' 
0 < c < 3 and z is even 
maxdiff(0) 
X_{0}  (L_{0}+R_{0})>>1 
Luma 'prediction miss' 
0 < c < 3 and z is odd 
maxdiff(0) 
T  B 
Top  Bottom 
z is even 
maxdiff(c) 
T  (TL + TR)>>1 
Top 'prediction miss' 
z is even 
maxdiff(c) 
L  (BL + TL)>>1 
Left 'prediction miss' 
z is even 
maxdiff(c) 
B  (BL + BR)>>1 
Bottom 'prediction miss' 
z is even 
maxdiff(c) 
L  R 
Left  Right 
z is odd 
maxdiff(c) 
L  (BL + TL)>>1 
Left 'prediction miss' 
z is odd 
maxdiff(c) 
T  (TL + TR)>>1 
Top 'prediction miss' 
z is odd 
maxdiff(c) 
R  (BR + TR)>>1 
Right 'prediction miss' 
z is odd 
maxdiff(c) 
P_{p} 
Predicted value (see above) 
range(c) 

TT  T 
Toptop  Top 
c is not 2 
maxdiff(c) 
LL  L 
Leftleft  Left 
c is not 2 
maxdiff(c) 
For images without an alpha channel, channel 0 has 8 properties (numbered 0 to 7), channel 1 has 10 properties, and channel 2 has 9 properties. For images with an alpha channel, channel 0 has 9 properties, channel 1 has 11 properties, channel 2 has 10 properties, and channel 3 has 8 properties. In animations with FrameLookback, channel 4 has 8 properties. 
Border conditions: If TL, T, TR, L, BL, B, BR or R are missing, they are defined as described in the interlaced pixel predictor definition above. If TT or LL are missing, the property they occur in gets value zero. If B_{0} is missing, B_{0} = T_{0}; if R_{0} is missing, R_{0} = L_{0}.
The medianindex M_{i} indicates which of the three input values for the median in Predictor 1 is the median; more precisely, it is defined as follows:
M_{i} =
0 
if z is even and P_{1} = (T + B)>>1 
0 
if z is odd and P_{1} = (L + R)>>1 
1 
if P_{1} = L + T  TR (and the above conditions do not hold) 
2 
otherwise 
MANIAC tree encoding
There is one tree per nontrivial channel (a channel is trivial if its range is a singleton or if it doesn’t exist). The trees are encoded one after another and in a recursive (depthfirst) way, as follows:
nb_properties depends on the channel, the number of channels, and the encoding method (interlaced or noninterlaced), as specified above. The range of each property is maintained during tree traversal. The initial property ranges prange[property] are defined above; these are narrowed down when going to a deeper node in the tree.
For each channel c, three different contexts are used: we’ll just call them A_{c}, B_{c} and C_{c}.
Type  Description  Condition 

nz_int_A_{c}(0,nb_properties) 
0=leaf node, > 0: property+1 

nz_int_B_{c}(1,512) 
node counter 
not a leaf node 
nz_int_C_{c}(prange[property].min,prange[property].max1) 
test_value 
not a leaf node 
recursive encoding of left branch 
where prange[property].min = test_value+1 
not a leaf node 
recursive encoding of right branch 
where prange[property].max = test_value 
not a leaf node 
From this description, the MANIAC trees can be reconstructed. Leaf nodes have a counter value that is effectively infinity (they can never be turned into a decision node).
Checksum
At the very end of the bitstream, there is an optional 32bit checksum to verify the integrity of the file.
Type  Description  Condition 

uni_int(1) 
Boolean: have_checksum 

uni_int(16) 
Most significant 16 bits of checksum 
have_checksum 
uni_int(16) 
Least significant 16 bits of checksum 
have_checksum 
This checksum is a standard CRC32, computed from the original (or decoded) pixel data in the following way:
The CRC is initialized to the value (width << 16) + height, interpreted as an unsigned 32bit number. Then the pixel values of all channels are processed, in scanline order (top to bottom, left to right), where each value is represented as a littleendian integer in 1, 2 or 4 bytes, depending on the bit depth of the image. The following sizes are used:
Channel  All channels use at most 8 bits per pixel  At least one channel uses 9 or more bits per pixel 

0 
1 byte 
2 bytes 
1 
2 bytes 
4 bytes 
2 
2 bytes 
4 bytes 
3 
1 byte 
2 bytes 