The Bug Was in Our Own Code

This is Part 3 of a four-part series. Part 2 covered the boot code analysis: self-modifying instructions, GCR table corruption, the custom post-decode permutation, and the per-track marker scheme.

After implementing the corrupted GCR table and the custom $0346 post-decode permutation, I could decode 12 of the 13 five-and-three sectors on track 0. Twelve sectors of working, valid 6502 machine code. Checksums passing. Bytes confirmed against the running emulator.

Sector 1 failed every time.

This was the lowest point of the project. Sector 1 is the critical one — it loads to $B700, the game loader entry point. Without it, the entire boot sequence stops. And despite all the work on the decode pipeline, sector 1 produced a bad checksum on every attempt.

What followed was hours of debugging with three completely plausible but entirely wrong theories, before the real cause turned out to be something I should have considered much earlier.

The Symptom

The 5-and-3 data checksum is simple: XOR all 256 decoded bytes together. If the result is zero, the sector is valid. For sectors 0 and 2-12, this worked perfectly after applying the GCR corruption and $0346 permutation. Sector 11 was expected to fail — it’s the deliberate decoy with a bad checksum. Every other sector passed.

Sector 1 produced a non-zero checksum. The decoded bytes looked plausible — they had the right shape for 6502 code, with the right distribution of high/low bytes — but something was subtly wrong. The checksum wouldn’t budge regardless of what I tried.

Theory 1: Wrong GCR Table

The first theory was the most obvious one: maybe the GCR table corruption was implemented incorrectly.

The corruption loop starts at Y = $99 and runs through Y = $FF wrapping to $00. That affects 103 entries at positions $0899-$08FF in the table. Maybe the starting offset was wrong, or the loop didn’t terminate where I thought, or the Y register value entering the loop wasn’t actually $99.

I spent a significant amount of time on this. I reviewed the assembly for the corruption loop. I double-checked the Y register value by tracing the code from the GCR table builder (which sets up Y) through the first sector read (which modifies Y during read) to the corruption loop entry. I tried different loop ranges. I tried different starting Y values. I tried applying the corruption to different subsets of the table.

None of it helped. All other sectors continued to pass. Sector 1 continued to fail.

Theory 2: Deliberate Bad Checksum

The second theory: maybe sector 1 intentionally has a bad checksum, like sector 11.

I knew the stage 2 loader doesn’t verify data checksums — it reads and post-decodes sector data without checking whether the result is valid. So a bad checksum on a real sector wouldn’t prevent booting. The hardware just doesn’t care if the data is wrong, as long as the RWTS can find and read the sector.

But this felt deeply wrong. Sector 1 contains the game loader entry point at $B700. It has to be executable correct machine code. A bad-checksum sector with valid-looking but incorrect bytes would crash immediately on execution.

I looked at the decoded bytes and they appeared to be real code — subroutine preambles, zero page accesses, loop structures. Deliberately corrupted code doesn’t look like that. I abandoned this theory but kept it in mind as a fallback.

Theory 3: Undiscovered GCR Variant

The third theory was the most elaborate: maybe there was an additional GCR table modification I hadn’t found.

The disk already had one deliberate table modification — the ASL ×3 corruption on the upper 103 entries. What if there was another? What if certain sectors had their own additional table modification, or a key applied per-sector?

I wrote woz_53brute.py to test this. It tries thousands of GCR table variants against sector 1: different subsets of entries, different shift amounts, XOR keys applied before or after the standard corruption, combinations of the above. The search space was large but tractable since the correct variant would have to produce a valid checksum.

Nothing worked. The brute force found variants that improved the checksum, but none that passed it. After thousands of iterations, it was clear that no simple GCR table modification would fix sector 1’s checksum.

At this point I had exhausted every theory that involved the copy protection being the source of the problem. I had to consider the possibility that the issue was in my own code.

The Real Problem: Bit-Stream Wrap

A WOZ file stores each track as a linear array of bits with a defined bit count — typically around 51,000 bits for a 5.25-inch disk at standard rotation speed. The problem is that a physical floppy disk track is a circle, not a line. There is no inherent start or end point. The Applesauce hardware picks an arbitrary position — based on the index pulse — as “bit 0” and captures one revolution of data. The resulting bit stream is a linearization of something that is physically circular.

When converting the bit stream to nibbles, you scan through the bits looking for the valid disk byte patterns: sequences where the high bit is set and no two consecutive bits are both zero. When you reach the end of the linear bit stream, you’ve consumed one revolution of data. Some implementations stop there, start a new revolution from the beginning, and scan for sector boundaries in the joined stream.

Here’s the subtle failure mode: the last nibble in the stream might not end at bit position N.

Valid disk bytes are sequences of specific lengths, not all the same. A nibble ends when you’ve accumulated enough bits. After processing the last complete nibble in the stream, there will typically be some leftover bits that didn’t form a complete nibble. A naive implementation discards these leftovers when it transitions from one revolution to the next and starts scanning fresh.

But those leftover bits are part of a nibble that spans the circular boundary. In the physical disk, the revolution doesn’t end — the head keeps reading and those bits are followed by the bits from “the next revolution” (which is actually the same track, just looping around). The nibble that started near the end of the linear stream ends near the beginning of the next pass.

Sector 1’s 411 data nibbles happen to start near the end of the track’s bit stream and wrap past the index pulse position. With the naive approach, there’s 1 leftover bit at the wrap point. Discarding that bit shifts every subsequent nibble by one bit position. All 411 data nibbles for sector 1 are decoded with incorrect bit alignment, producing completely wrong values with a bad checksum.

The fix is straightforward once you see it: bit-doubling. Before converting the bit stream to nibbles, concatenate the bit stream with itself:

# WRONG: convert one revolution, then try to handle the wrap at nibble level
nibbles = bits_to_nibbles(track_bits)
wrapped = nibbles + nibbles[:extra]   # nibble boundaries are already wrong

# RIGHT: double the bits before converting
doubled = track_bits + track_bits
nibbles = bits_to_nibbles(doubled)    # correct nibble boundaries at all positions

With the doubled bit stream, the nibble converter sees a continuous stream with no artificial boundary. Nibbles that span the original wrap point are decoded correctly because their constituent bits are now contiguous in the input.

The check_wrap.py diagnostic script proved the fix definitively: 189 out of 256 bytes in sector 1’s decoded output differ between the naive and bit-doubled approaches. With bit-doubling, the checksum passes and the decoded data is valid 6502 machine code — real instructions at real addresses, consistent with the disassembly of the other sectors.

The bit-stream wrap issue is not copy protection. The position of sector 1 near the track boundary may not even be deliberate — it depends on which physical position happened to be the Applesauce’s “bit 0” during the capture. But any disk reader that doesn’t implement bit-doubling will fail to decode any sector whose data nibbles span the linearization boundary. The protection happened to make this failure mode catastrophically hard to debug, because when all your other sectors decode correctly and one doesn’t, you assume the problem is in the protection handling, not in the geometry of how you’re reading the disk.

Building the Emulator

With bit-doubling in place, sector 1 decoded correctly and the full track 0 image was clean. But I was already building the emulator before this breakthrough — it was the emulator traces that ultimately confirmed the correct decodes by showing me what the stage 2 loader was actually doing at runtime.

emu6502.py implements the full NMOS 6502 instruction set: all 151 documented opcodes and all 105 undocumented ones. The undocumented opcodes aren’t optional — any opcode that appears in real software must be handled, because skipping it would cause the emulated CPU to diverge from the real hardware’s behavior with no error message. I implemented them all up front and then confirmed they weren’t needed for this particular disk.

The more interesting engineering challenge was the disk interface. The Disk II controller communicates with the CPU through a set of memory-mapped “soft switches” in the $C0Ex address range. Reading $C08C (offset by the slot number) reads the current nibble from the data latch. The emulated CPU will execute hundreds of thousands of read instructions against that address while the RWTS waits for nibbles — it polls in a tight loop, checking the high bit of each byte and looping if it’s zero.

The emulated disk controller translates WOZ bit streams into nibbles on demand: it maintains a bit position in the track, scans forward until it finds a valid nibble (bit 7 set, valid 5-and-3 or 6-and-2 pattern), and returns that nibble when the emulated CPU reads the soft switch address. The bit-doubling fix applies here too — the controller needs the doubled bit stream to correctly produce nibbles that span the physical track boundary.

The Full Boot

With the emulator working and sector 1 clean, boot_emulate_full.py could trace the entire boot sequence from power-on to JMP $4000. That takes approximately 69.8 million emulated 6502 instructions.

The instruction count isn’t surprising when you think about it. The RWTS is polling a memory-mapped register in a tight loop, waiting for each nibble to arrive as the disk spins under the read head. The Apple II ran at about 1 MHz. At one revolution per 200 milliseconds, a 13-sector disk produces around 13 sectors × 411 nibbles × many polling cycles per nibble, across 14 tracks. The emulator faithfully reproduces all of that spinning.

Stage 1 — Track 0 to $B700:

The P6 Boot ROM loads the 6-and-2 sector to $0800. The boot code relocates to $0200, builds the GCR table, reads sector 0 to $0300 using the standard $02D1 post-decode, self-modifies the stage 2 code, and jumps to $0301. Stage 2 corrupts the GCR table, reads sectors 0-9 to $B600-$BFFF using the corrupted table and $0346 post-decode, and jumps to $B700.

Memory after stage 1:

$0200-$03FF  Boot RWTS (relocated boot sector + stage 2 loader)
$0800-$08FF  5-and-3 GCR decode table (corrupted — ASL ×3 on upper entries)
$B600-$B6FF  Sector 0 garbled re-read (unused)
$B700-$BFFF  Game loader + RWTS (sectors 1-9)

Stage 2 — Tracks 1-5 to $0800-$48FF:

The game loader reads 65 sectors from tracks 1-5 — 13 sectors per track, 5 tracks — into the $0800-$48FF region. This loads the complete title screen system: the HGR sprite blitter, sound routines, the animation sequence player, pre-shifted shape data, the title screen bitmap, and the $1000 RWTS patcher.

After tracks 1-5 are loaded, the game loader calls $1000. This routine sets the Apple II soft switches for hi-res graphics mode, patches the RWTS’s address prolog search from $D5 to $DE, and jumps to $1200 — the title screen animation engine. The title screen displays.

Stage 3 — Tracks 6-13 to $4000-$A7FF:

After the player presses a key, the game loader reads tracks 6-13 into $4000-$A7FF — 104 sectors, approximately 27KB. During this read, the loader calls the title screen animation routine three times between each track’s sector reads. The title screen stays animated while the disk loads.

Then: JMP $4000. The game begins.

What the Emulator Confirmed

Running the emulator with verbose tracing produced several confirmations and one additional detail I hadn’t anticipated.

The confirmations were straightforward: the GCR corruption loop starting at Y = $99 was correct, the sector load destinations matched the analysis, the $DE patch was applied exactly when expected.

The surprise was the stage 3 animation interleaving. Between reading each of tracks 6-13, the loader calls JSR $B7DA (the animation dispatcher) three times. The title screen apple is actually animating while the game loads — the loader is carefully interleaving disk I/O with the display code to keep the screen alive. This is a polish detail that only becomes visible when you trace the exact execution order of the loading loop.

The full memory image at JMP $4000 confirmed the layout: the game code is at $4000-$A7FF, sprite and tile data has been loaded to $0800-$48FF, and the RWTS remains live at $B700-$BFFF. The game’s own relocation routine at $4000 will copy sprite data from $4800-$5FFF down to $0800-$1FFF before starting, reorganizing memory into its final runtime layout.

What “Hardest Bug” Means

In retrospect, the sector 1 crisis is a good case study in where bugs actually live. The symptom — bad checksum — was real. The three theories that followed from it were each plausible in isolation. The GCR table could have had additional modifications I’d missed. The sector could have been an intentional decoy. There could have been a per-sector key. Any of these explanations was consistent with the observed facts.

The real cause was in a completely different part of the system: not in the copy protection at all, but in the geometry of how WOZ files linearize circular disk tracks. I spent hours looking at copy protection mechanisms when the actual problem was in my own bit-stream reader.

This is a general principle: when debugging, the assumption that the bug is in the thing you’re analyzing — the external system, the copy protection, the unknown disk format — is often wrong. Sometimes the bug is in the thing you wrote yourself and assumed was correct. check_wrap.py was written specifically to test this assumption, and it took three failed theories before I was willing to test it.

Next: Forty-Five Kilobytes — extracting and disassembling the game binary, what a recursive-descent disassembler finds in 27KB of 1981 6502 code, the HGR sprite engine, the game loop, and what it means that all of this fit on 14 tracks of an obsolete disk format.