MSC1 is a compressor specifically designed to compress 8-bit computer text/colour screens. It has the characteristic of being able to efficiently compress small memory areas (up to 1024 bytes). This specification is intended for developers willing to produce MSC1-compatible compressed data blocks using any programming language.
Note that this document describes only the block format, not how the compressor nor decompressor actually work. The correctness of the decompressor should not depend on implementation details of the compressor, and vice versa.
An MSC1 compressed data stream is composed of “blocks”. A block can alternatively be:
The last block must be the “end of stream” block.
Each block starts with a control token (CTR
). The token is a one byte value, separated into bits field. The first bit define if the block is “literal” (0
) or “duplication” (1
). The rest of the bit layout change its meaning based on this first bit. The end of stream block is special since it has the entire control token to zero (00000000
). Decoding any block reaches the end of current block. Next byte will be the start of another block.
If the CTR define the block as “literal”, the token contains also the count of literals as the lower 7 bits. So, following token are the literals themselves, from 1 to 127. They are exactly as numerous as the count decoded. It's *not* possible to have zero literal since it is equal to the “END OF STREAM” block.
As an example, the following bytes define a literal of 5 letters (“HELLO”):
0x05, H, E, L, L, O
If the CTR define the block as “dupes”, the token contains:
count
field, as the 6-2 bit field (5 bits);offset
field (high bits), as the low part of the CTR (2 bits).
The following byte contains the low bits of the offset
field (8 bits).
When a DUPES BLOCK is encountered, the decoder must copy 4 bytes from the actual address of source memory minus offset
for count
times. So, it starts by the offset
: this is a 10 bit value (position -0…1024). The offset
represents the offset of the match to be copied from. “0” means “start position”. The maximum offset value is 1024.
Example: following bytes define a the repetition of string “ELLO” for 4 times:
5, H, E, L, L, O, 0x90, 0x06, ... |<------ 0x06 bytes-----|
If the CTR define an “end of stream” (a byte of value 0x00
), no more data follows.
There are specific rules to generate blocks of compressed stream.
These rules are in place to ensure that a conformant decoder can be designed for speed, issuing speculatively instructions, while never reading nor writing beyond provided I/O buffers.
The format makes no assumption nor limits to the way the compressor searches and selects matches within the source data block. Multiple techniques can be considered, featuring distinct time/performance trade offs. As long as the format is respected, the result will be compatible and decodable by any compliant decoder.
Please see: