Reading Code That Lies

This is Part 2 of a four-part series. Part 1 covered the hardware context, WOZ format, and the dual-format track discovery.

The P6 Boot ROM loads the 6-and-2 boot sector to $0800 and jumps to $0801. What happens next is where the interesting engineering lives. The boot code is self-modifying, the GCR decode table gets deliberately corrupted before use, the post-decode algorithm has been swapped for a custom variant, and the per-track address markers are patched at runtime in a way that makes the disk completely unreadable until tracks 1-5 have already been loaded. Each technique is precise, layered on the others, and designed to defeat a specific class of copy tool.

Disassembling this code means reading instructions that don’t reflect runtime behavior. The bytes on disk are not the bytes that execute. You have to trace both what’s stored and what gets changed, and you have to do it across multiple separate memory regions that interact with each other.

The Boot Sector

The boot sector is 256 bytes of tight RWTS code. Disassembling disasm_rwts.py against the WOZ file reveals what it does:

$0801: LDX #$00
$0803: LDA $0800,X   ; read from current location
$0806: STA $0200,X   ; write to new location
       ...
$080C: JMP $020F     ; enter relocated code

Step one is relocation. The code copies all 256 bytes of itself from $0800 to $0200, then jumps to $020F within the copy. The reason is straightforward: the next thing the code needs to do is build a 256-byte GCR decode table, and the most convenient place for that table is $0800 — which is currently occupied by the boot sector itself. Move the code first, then use the space.

From $020F, the relocated code builds the 5-and-3 GCR decode table. The algorithm is elegant: it scans nibble values from $AB through $FF, testing each one for the 5-and-3 validity rule (no two consecutive zero bits), excluding $D5 (the reserved prolog marker), and assigning sequential 5-bit values $00-$1F to the 32 valid nibbles. Building the table programmatically rather than storing it as static data saves 32 bytes of boot sector space — bytes that were worth saving in 1981.

After building the table, the RWTS reads the first 5-and-3 sector from track 0 — sector 0, which contains the stage 2 loader code. It decodes the sector using the standard $02D1 post-decode routine, loading 256 bytes to $0300-$03FF.

Then, before jumping into that freshly-loaded code, it does something that will become the theme of this entire analysis:

$0237: LDA #$A9       ; opcode for LDA immediate
$0239: STA $031F      ; write to stage 2 code
$023C: LDA #$02
$023E: STA $0320      ; write to stage 2 code
$0241: JMP $0301      ; enter stage 2

It patches the stage 2 code before running it.

Self-Modifying Code

On disk, the bytes at $031F/$0320 in sector 0 contain $09 $C0 — the encoding for ORA #$C0. At runtime, the boot code writes $A9 $02 over those two bytes — LDA #$02 — before jumping to $0301.

In context, this matters because $031A-$031E computes a Disk II I/O page address from the current slot number: TXA; LSR; LSR; LSR; LSR extracts the slot from X (which holds slot × 16). The unpatched ORA #$C0 would combine this with $C0 to form the address of the Disk II controller page for the active slot — the address needed for the first disk read. After the patch, LDA #$02 discards the whole computation and hardcodes $02, pointing to the fixed stage 2 entry point.

Why does this matter? During the initial boot read, the RWTS needs to know the slot-based address to access the disk hardware. After stage 2 takes over and establishes its own I/O handling, the RWTS uses a fixed vector. The patch switches behavior between these two modes at exactly the right moment.

From a static analysis standpoint, this is adversarial. If you disassemble the raw sector data — the bytes as they exist on disk — you see ORA #$C0 at $031F. That instruction is never executed. The actual runtime instruction is LDA #$02. You would have to trace execution across the boundary between the boot sector (at $0200 after relocation) and the stage 2 code (at $0300) to understand what actually runs. A tool that disassembles the stage 2 sector in isolation shows you a lie.

This is the fundamental challenge of copy-protected boot code: the disk is not the computer. What’s stored on disk is not the same as what gets executed in RAM.

The GCR Table Corruption

Stage 2 begins at $0301. The first thing it does is modify the GCR decode table that was just built:

$0301: LDA $0800,Y    ; load GCR table entry (Y enters at $99)
$0304: ASL            ; shift left 1 bit
$0305: ASL            ; shift left 1 bit
$0306: ASL            ; shift left 1 bit
$0307: STA $0800,Y    ; store back
$030A: INY
$030B: BNE $0301      ; loop: Y = $99 through $FF, wrapping to $00

The Y register enters this loop at $99 — carried over from the preceding boot code — and runs until it wraps from $FF back to $00. That means the loop applies ASL three times to every byte in the GCR table from offset $0899 through $08FF: the upper 103 entries.

Three left shifts multiply a value by 8 and zero out the low 3 bits. The upper portion of the table, which should contain 5-bit values $00 through $1F, now contains $00, $08, $10, $18, and so on — multiples of 8 only.

At first glance this seems to break everything. The GCR decode table is how the RWTS converts raw disk nibbles back to 5-bit data values. Corrupt the table and the RWTS can’t decode anything correctly. How does the disk read at all after this point?

The answer is a mathematical property of XOR:

(A << 3) XOR (B << 3) = (A XOR B) << 3

XOR distributes over bit shifts. The 5-and-3 data checksum is computed by XOR-accumulating all decoded values together — if the XOR of all values is zero, the sector data is valid. If every value in the affected table range is multiplied by 8, their XOR is also multiplied by 8 — and a value of zero multiplied by anything is still zero. The checksums still pass.

So the sectors on the disk were written with this corrupted table as the assumed decode table. The data on disk is encoded to produce correct results only when decoded with the corrupted table. A nibble copier that reads the raw disk nibbles perfectly — capturing every bit correctly — but decodes them with a freshly-built, uncorrupted GCR table produces garbage for every sector past the first one. The checksum passes (because the mathematical property holds), but the decoded bytes are wrong in a subtle, systematic way that won’t obviously announce itself.

This is the most elegant technique on the disk. It doesn’t break the copy in an obvious way. It doesn’t cause an error. It produces plausible-looking data that’s just wrong enough to crash when executed.

The Custom Post-Decode Permutation

Standard 5-and-3 encoding stores 256 data bytes as 411 GCR nibbles: 154 “secondary” nibbles carrying the low 3 bits, and 256 “primary” nibbles carrying the high 5 bits. The standard post-decode routine at $02D1 — documented by Apple and used by DOS 3.2 — reassembles these into 256 output bytes in a specific order.

Apple Panic uses $02D1 exactly once: for that very first sector read, which loads stage 2 code to $0300. After that, the self-modification at $0237 redirects all subsequent sector reads to a custom post-decode routine at $0346. This routine processes the 154 secondary nibbles in reverse order (X counting down from $32) and interleaves them across five groups per iteration — producing 256 correct byte values, but in a completely different output order than $02D1 would.

Working out the permutation mapping took significant analysis and the compare_decoders.py verification script. The relationship is:

$0346 output[5k + n] = $02D1 output[offset - k]
  where offsets = [50, 101, 152, 203, 254] for n = [0, 1, 2, 3, 4]

Byte 255 is a special case: $0346 reconstructs all 8 bits from secondary[153] and primary[255], while $02D1 loses the high bits. In practice, I implemented the standard $02D1 decode first (well-documented and easier to verify) and then applied the permutation mapping to reorder bytes into the $0346 output order. compare_decoders.py confirmed byte-for-byte equivalence.

The protection value here is similar to the GCR corruption: a copier that reads the raw nibbles correctly and applies the standard $02D1 post-decode produces 256 incorrect bytes in the wrong order, with a valid checksum. The output doesn’t look obviously wrong. The bytes are all plausible 6502 code values. They just aren’t the right bytes.

Loading $B600-$BFFF

With the corrupted GCR table in place and the $0346 post-decode active, stage 2 runs a sector loading loop: sectors 0 through 9 from track 0 are loaded into $B600-$BFFF.

There’s a small puzzle here. The loop re-reads sector 0 — the same sector that was already decoded to $0300 — this time using the corrupted GCR table and the $0346 post-decode. Since sector 0 was written with the standard table (it’s the one sector that must be decodable by the freshly-built, uncorrupted table, since the stage 2 code hasn’t run the corruption loop yet when that first read happens), decoding it with the corrupted table produces garbled data. Of the 256 bytes, 227 will be wrong.

The result at $B600 is never used. The game loader enters at $B700 (sector 1), and no subsequent code references the $B600 page. The write routines in the RWTS at $B800-$BFFF do reference $B600 as a GCR encode table, but the game never writes to disk after booting — it’s purely RAM-resident from JMP $4000 onwards. The loop starts at sector 0 because that’s the simplest counter; skipping it would have required extra code for no practical benefit.

After the loading loop completes, stage 2 jumps to $B700 — the game loader.

The Game Loader and Per-Track Markers

The game loader at $B700 is where the remaining protection layers become visible.

The loader reads tracks 1-5 into $0800-$48FF. These 65 sectors contain the title screen code, sprite data, HGR rendering routines, sound effects, and a collection of lookup tables. They also contain a small block of code at $1000-$1027 that runs once after tracks 1-5 are loaded, before the loader proceeds to tracks 6-13:

$1000: LDA $C057       ; HGR mode on
$1003: LDA $C054       ; page 1
$1006: LDA $C052       ; full screen
$1009: LDA $C050       ; graphics mode
       ...
$1015: LDA #$DE
$1017: STA $B8F6,Y    ; patches $B976: CMP #$D5 → CMP #$DE
$101A: STA $BE75,Y    ; patches $BEF5: LDA #$D5 → LDA #$DE
       ...
$1027: JMP $1200       ; title screen display

This code patches the RWTS’s address field search routine. Before the patch, the RWTS searches for $D5 as the first byte of an address prolog — the standard marker byte. After the patch, it searches for $DE. Two separate patch sites, two bytes changed.

The timing is critical: this patch runs after tracks 1-5 are loaded, but before tracks 6-13 are read. Tracks 1-5 use $D5 as their first prolog byte. Tracks 6-13 use $DE. The RWTS cannot read tracks 6-13 until the patch has been applied — but the code that applies the patch lives on tracks 1-5, which can only be read with the unpatched RWTS. Each stage depends on the previous one.

The Per-Track Second Byte

That’s the $DE layer. There’s another layer beneath it.

Across all 14 tracks, the second byte of the address prolog also varies. Track 0 uses the standard D5 AA B5 prolog. Tracks 1-5 use D5, but with a different second byte per track. Tracks 6-13 use DE with yet another per-track second byte. The complete picture:

Track:   0   1   2   3   4   5   6   7   8   9  10  11  12  13
First:  D5  D5  D5  D5  D5  D5  DE  DE  DE  DE  DE  DE  DE  DE
Second: AA  BE  BE  AB  BF  EB  FB  AA  FA  AA  AB  EA  EF  BB
Third:  B5  B5  B5  B5  B5  B5  B5  B5  B5  B5  B5  B5  B5  B5

The second-byte values come from a lookup table originally stored at $B7E2 and copied to $0400 at boot. Before reading each track, the game loader calls a routine at $B7CB that patches the RWTS’s second-byte comparison operand at $B980 using the table entry for the target track.

The protection value of this combined scheme is layered. A copier scanning for D5 AA B5 address fields finds nothing on tracks 1-5 (wrong second byte) and nothing on tracks 6-13 (wrong first byte). A copier that knows to look for $DE still can’t read tracks 6-13 without the second-byte table — every track has a unique second byte, so even brute-forcing the second byte requires 256 attempts per track. And the per-track table only exists in RAM during execution — it was copied from $B7E2 in the game loader code, which itself can only be read if you’ve already worked through the earlier protection layers.

Defeating this layer requires reading the per-track table out of the running loader, which means you need a working emulator that can boot the disk.

The Nine Layers Together

By this point, we have the full taxonomy of protection:

Layer	Mechanism	What it defeats
1	Dual-format track 0 (6+2 and 5+3 on same track)	Standard DOS 3.3 copy utilities
2	Invalid address field checksums on all 13 sectors	Nibble copiers that validate address headers
3	GCR table corruption (ASL ×3 on upper entries)	Raw nibble copies decoded with fresh tables
4	Intentionally bad data checksum on sector 11	Copiers that validate all sector data
5	Custom post-decode permutation (`$0346` vs `$02D1`)	Manual analysis using standard decode assumptions
6	Self-modifying code	Static disassembly
7	Per-track second-byte variations in address prologs	Copiers expecting standard `D5 AA B5` markers
8	Non-standard `$DE` first byte on tracks 6-13	Copiers scanning for `$D5` as prolog start
9	Non-standard sector/track numbers in address fields	Copiers expecting sector numbers 0-12

Layer 9 deserves a note. On most tracks, the address fields contain deliberately non-standard values — sector numbers like 215, 253, and 255, and track numbers that don’t correspond to the physical track position. This is a consequence of the GCR corruption: the address field data was written using the corrupted table, so the encoded sector and track numbers, when decoded with that table, produce the expected values in the game loader’s RWTS. A standard copier expecting sector numbers 0-12 would reject all of these sectors as invalid even if it handled every other protection layer.

Emulation as the Only Way Through

At this point in the analysis, static disassembly had reached its limits. The combination of self-modifying code, runtime patches, and the corrupted GCR table made hand-tracing too error-prone. To verify the analysis and work through the remaining unknowns — particularly the bit-stream issue with sector 1, which we’ll cover in Part 3 — I needed to build an emulator.

emu6502.py is a full NMOS 6502 emulator in Python: all 256 opcodes including the 29 undocumented ones, cycle-accurate timing, and a simulated Disk II controller that feeds nibbles from the WOZ bit stream to the emulated soft switches. It turned out that Apple Panic’s boot code uses no undocumented opcodes — but I couldn’t have known that without implementing all of them first. Running arbitrary boot code from a disk of unknown behavior requires correctness as the baseline assumption.

The emulator is what revealed the details I’ve described here: the GCR corruption loop’s exact starting Y value ($99, carried over from the table builder), the sector 0 re-read producing garbled data at $B600, the exact runtime values of the per-track table at $0400. Static analysis gives you the structure; emulation gives you the execution.

But the first thing the emulator revealed was a problem I hadn’t anticipated: sector 1 wouldn’t decode correctly no matter what I tried. That turns out to be the most interesting bug in the entire project — and it has nothing to do with copy protection.

Next: The Bug Was in Our Own Code — the sector 1 crisis, three plausible but wrong theories, the discovery that the hardest problem in the whole analysis was in our own bit-stream reader, and what it took to boot the disk all the way to JMP $4000.