MIO0 Compressed Data Format
Users browsing this thread: 1 Guest(s)

Attached are two documents that summarize my understanding of the MIO0 compressed data format that is used in several N64 games including Super Mario 64.  For convenience, I am also including a formatted description of MIO0 in this post, but the documents also contain examples and specifics about the data in the SM64 ROM.

​MIO0 compressed block structure
               
OffsetLenDescription
0x00
0x00
0x04
0x08
0x0C
0x10
0x04
0x04
0x04
0x04
Header:
* Signature: "MIO0"
* Decompressed length
* Compressed offset (CO)
* Uncompressed offset (UO)
0x10Layout bits:
* 0 = compressed
* 1 = uncompressed data
Padding to align compressed
(CO)Compressed Data:
* 6-bit length/offset
(UO)Uncompressed Data:
* individual data bytes

Header
In all games, the header is 16-bytes and aligned to 4-byte boundaries.
In SM64, it is aligned to a 16-byte boundary (last 4 address bits are 0).
The header contains the "MIO0" signature followed by three 32-bit values:
decompressed length, compressed offset, and uncompressed offset.

Layout Bits
The layout bits section begins immediately after the header, at offset 0x10.
These bits identify if the next group of output data are described by a
compressed group (0) or if the next byte is to be pulled from the
uncompressed data (1).  The bits are packed starting at most significant first
moving down to least significant.  Additional padding 0x00 values may be after
the layout bits to align the compressed data section to a 4-byte boundary.

Compressed Data
The compressed data are located at "Compressed offset" described in the header
and is always aligned to a 4-byte boundary (last 2 address bits are 0).
They are composed of 16-bit values that describe where to copy the next
sequence of bytes from.  Each 16-bit value consists of a 4-bit length, and a
12-bit look-back offset:
 length = upper 4 bits of first byte + 3 [range: 3-18]
 offset = lower 4 bits of first and all second byte + 1 [range: 1-4096]
​Note: The length can be greater than the offset.  This just means that it will
copy from other data that was already copied during this compressed block.
This often occurs when there are large sections of the same data byte.

Uncompressed Data
The uncompressed data immediately follows the compressed data and thus is
aligned to a 2-byte boundary since the compressed data is aligned to a 4-byte
and the compressed data is 16-bit chunks.  The uncompressed data are individual
bytes of data, one for each '1' in the layout bits.

​MIO0 Decompression Procedure
  1. ​read header to determine decompressed length and offsets
  2. ​read next bit from layout bits
    1. ​if bit is 1, output next byte from uncompressed data
    2. ​if bit is 0, read next 2-bytes from compressed data
      • length = upper 4 bits + 3
      • offset = next 12 bits + 1
      • ​copy length bytes from current output buffer index - offset
  3. ​if (bytes output < decompressed length), go back to step 2.

​Software Libraries
Thanks to BGNG who did the initial leg work for understanding MIO0 and creating M0CK. I am sure there were others that helped along the way, but many of the old links are dead. If anyone has any more info on previous work, let me know and I'll update the post here.
(This post was last modified: 11-04-2015, 07:07 AM by queueRAM. Edit Reason: thanks )


Attached Files
Size: 5.98 KB / Downloads: 249 .txt   mio0_format.txt

Size: 5.27 KB / Downloads: 198 .txt   sm64_mio0_stats.txt


Well written explanation. Now I have a good understanding of MIO0 data compression format. It's the first time that I'm approaching to understand data compression, but it seems more that MIO0 is not only similar to LZ77(wikipedia), but an actual implementation of the algorithm.
I've found the attached example very useful: I've followed it passage by passage, with the help of pencil and paper. LZ77 is a good algorithm to compress tongue twisters Tongue (There's only some errors at point 8: you have forgotten to add +1 to the offset values)

I know that MIO0 files are used in SM64 to compress textures (haha, no PNG at the time), but is it supposed to also contain other data?

Also, I understand the +1 bias on the offset to optimize the sliding window's target (an offset of 0 would be out of the buffer, so useless), but I don't get why Nintendo decided to add +3 to the length. Personally I would have used +2 to even get 2 byte repetitions. Maybe it is so to catch bigger granularity up to 18 bytes instead of 17...
(This post was last modified: 19-10-2015, 05:00 PM by Fendoroid. Edit Reason: Clarified text )

(19-10-2015, 12:32 PM)Fendoroid Wrote: I've found the attached example very useful: I've followed it passage by passage, with the help of pencil and paper. LZ77 is a good algorithm to compress tongue twisters Tongue (There's only some errors at point 8: you have forgotten to add +1 to the offset values)


I'm glad to hear that you found it useful.  Thank you for pointing out my mistake in step 8 of the example.  You are correct, I need to add +1 there.  I will correct this in the next version

(19-10-2015, 12:32 PM)Fendoroid Wrote:
I know that MIO0 files are used in SM64 to compress textures (haha, no PNG at the time), but is it supposed to also contain other data?


While PNG wasn't around, other image compression was available, but those are complex and slow to decompress.  I think this MIO0 format was chosen because it was reasonably high compression ratio and it was very simple and fast to decompress.  MIO0 can certainly be used to compress any type of data, but it works best with data containing many repetitions.  I think the Mario Kart 64 ROM compresses some vertex data with it.  See shygoo's MK64 notes.

(19-10-2015, 12:32 PM)Fendoroid Wrote:
Also, I understand the +1 bias on the offset to optimize the sliding window's target (an offset of 0 would be out of the buffer, so useless), but I don't get why Nintendo decided to add +3 to the length. Personally I would have used +2 to even get 2 byte repetitions. Maybe it is so to catch bigger granularity up to 18 bytes instead of 17...


A 2-byte repetition sounds useful at first, but it costs 2 bytes (for length and offset) to store the compressed data anyway, so you would only be saving 1-bit of space (1-bit layout + 2-bytes compressed vs. 2-bit layout + 2-bytes uncompressed).  In the long run, I think you'd be better off catching the one extra byte in longer repetitions.

MIO0 Compressed Data Format
Users browsing this thread: 1 Guest(s)