User Tools

Site Tools


msc1
Translations of this page:


MSC1 STREAM DESCRIPTION

MSC1 is a compressor specifically designed to compress 8-bit computer text/colour screens. It has the characteristic of being able to efficiently compress small memory areas (up to 1024 bytes). This specification is intended for developers willing to produce MSC1-compatible compressed data blocks using any programming language.

Note that this document describes only the block format, not how the compressor nor decompressor actually work. The correctness of the decompressor should not depend on implementation details of the compressor, and vice versa.

COMPRESSED DATA STREAM

An MSC1 compressed data stream is composed of “blocks”. A block can alternatively be:

  • a block of “literal” data;
  • a duplication (“dupes”) block.

The last block must be the “end of stream” block.

Each block starts with a control token (CTR). The token is a one byte value, separated into bits field. The first bit define if the block is “literal” (0) or “duplication” (1). The rest of the bit layout change its meaning based on this first bit. The end of stream block is special since it has the entire control token to zero (00000000). Decoding any block reaches the end of current block. Next byte will be the start of another block.

"LITERAL" BLOCK

If the CTR define the block as “literal”, the token contains also the count of literals as the lower 7 bits. So, following token are the literals themselves, from 1 to 127. They are exactly as numerous as the count decoded. It's *not* possible to have zero literal since it is equal to the “END OF STREAM” block.

As an example, the following bytes define a literal of 5 letters (“HELLO”):

   0x05, H, E, L, L, O

"DUPES" BLOCK

If the CTR define the block as “dupes”, the token contains:

  • a count field, as the 6-2 bit field (5 bits);
  • an offset field (high bits), as the low part of the CTR (2 bits).

The following byte contains the low bits of the offset field (8 bits).

When a DUPES BLOCK is encountered, the decoder must copy 4 bytes from the actual address of source memory minus offset for count times. So, it starts by the offset: this is a 10 bit value (position -0…1024). The offset represents the offset of the match to be copied from. “0” means “start position”. The maximum offset value is 1024.

Example: following bytes define a the repetition of string “ELLO” for 4 times:

   5, H, E, L, L, O, 0x90, 0x06, ...
         |<------ 0x06 bytes-----|

END OF STREAM

If the CTR define an “end of stream” (a byte of value 0x00), no more data follows.

SPECIFIC RULES

There are specific rules to generate blocks of compressed stream.

  • The last block MUST be a END OF STREAM. The data stream ends right after it.
  • If input is smaller than 5 bytes there is only one block, and it contains the whole input as literals.
  • Empty input can be represented with a single END OF STREAM, interpreted as a final block without additional literal and without additional dupes.
  • If a DUPES BLOCK has count of zero (0) it is equals to a count of 32.
  • Streams less than 8 bytes cannot be compressed.

These rules are in place to ensure that a conformant decoder can be designed for speed, issuing speculatively instructions, while never reading nor writing beyond provided I/O buffers.

ADDITIONAL NOTES

The format makes no assumption nor limits to the way the compressor searches and selects matches within the source data block. Multiple techniques can be considered, featuring distinct time/performance trade offs. As long as the format is respected, the result will be compatible and decodable by any compliant decoder.

IMPLEMENTATION EXAMPLES

C language

Please see:

POWERED BY