I'll put a small summary here but I'll try to keep it short, since it's somewhat complicated and I'm down to one last thing before I have a functional understanding of the whole format. I'll write up a full report later on how it all works together, probably on the wiki. Thankfully, as assembly tends to make things, it's a lot less impossible than it first looks. And no worries about derailing development, I do romhacking because it's fun, and figuring out this compression format has been an absolute blast. It's my form of escapism.
There are three sections to the data. I've labelled them temporarily as Header, Master Commands, and Channel Commands. Header has already been fully explained, and the Master/Channel Commands are similar with a minor difference in functionality. The Master Command section starts directly after the header, at 0x2C. The Channel Commands are the 8 different pointers in the header, that point to different groups of data.
The command sections are all variable-length groups of bits. Often times single bits will be pulled from them, though it can change to a larger number of bits when necessary. The Master Commands handle things like deciding if a previous or already calculated rgba value will be used, and whether to copy the current rgba value down a row, among other tasks. The Channel Command sections are a bit more complicated. The first one holds all the data that is used to created the Huffman tree that holds 5-bit values that represent one color from the RGB spectrum. The rest all hold data that describes what RGB values to grab from the Huffman table. There's also some code for the Channel Commands that can define repeated colors, which is probably where the compression comes in at.
So this brings us to the A1 buffer. For clarity, the A1 buffer is NOT an output, it's a processing aide to determine pixels that are in areas with lots of changing colors. Every time that a pixel's color changes from a previous pixel, in the A1 buffer it adds 1 to the pixel to the right, the pixel that's 2 pixels right, the pixel down, the pixel down & left, the pixel down & right, and the pixel down twice. 6 in total. Effectively this map shows you areas that are heavy in changing colors, and places that are effectively the same color.
The trick to this map is that the values will vary between 0 (no nearby color changes) and 6 (tons of color changes around it). These values then have 1 added to them and are used to determine which channel (1-7) is used when referencing the next color information. This means that each Channel Command grouping is based on how much color changing is going on around the specified pixel.
This has a strong advantage in that the early channels (1, 2, possibly 3) have a HIGH chance of repeating colors (and repeating colors often means that you can save somewhere around 5 bits per pixel). It's a pretty killer idea.
I've replaced 4 out of 5 of the proc_XXXX functions with my own handwritten ones, and they still function, so I'm close to getting the format down. The last thing I have is that when calculating a new pixel color, it grabs a 5-bit value from the Huffman tree (which I understand), but then it takes the average of the pixel up one and the pixel left one, and then does some special comparison between that and the new pixel in proc_80040C94. It seems sometimes it combines the two, and others it just uses the new one. I need to do some use-cases to understand the reasoning behind it, but it's the last big thing.
That being said, making a compressor sounds pretty complex. Deciding how to make the Hoffman tree from the start sounds like an interesting challenge, and it'd probably be hard to make a compressor that'd out-perform the one made for this. I'm having fun though.
EDIT: Okay, I figured out how the new pixel color is calculated from the predicted values & the output from the Channel Commands. I think I have it all figured out. I'll write up my own non-assembly-style decompressor, comment the heck out of it & probably upload it here, and then maybe take a hack at a compressor. But not today. I need a break, I lost my whole weekend to TKMK.
EDIT EDIT: Working C# code for a decoder:
http://pastebin.com/Zyx0xCdz