Forty-Five Kilobytes

This is Part 4 of a four-part series. Part 3 covered the sector 1 crisis, the bit-doubling breakthrough, building the 6502 emulator, and the full boot trace to JMP $4000.

After 69.8 million emulated instructions, the CPU arrives at $4000. The disk has done its work. Everything is in RAM — boot RWTS at $0200, game loader and disk RWTS at $B700, title screen code at $1000, and the game itself at $4000-$A7FF. Now comes the second phase of the analysis: understanding what the code actually does.

The copy protection story ends at JMP $4000. What follows is game archaeology.

Building the Runtime Image

The memory state at JMP $4000 isn’t the final runtime layout. The first thing the code at $4000 does is relocate sprite and tile data from $4800-$5FFF down to $0800-$1FFF, overwriting the title screen code loaded from tracks 1-5. After this copy, the title screen routines are gone — replaced by sprite frames, tile bitmaps, and lookup tables the game engine needs at runtime.

build_runtime.py simulates these relocation copy loops and produces the canonical post-relocation memory image: 43,008 bytes covering $0000-$A7FF, with everything in its final runtime position.

The relocation reveals one small artifact: embedded in the source data at $4980 is the string EDASM.OBJ. The Merlin assembler for the Apple II produces object files with that extension, and the linker left a fragment of the filename in the binary. It sits in a region that gets overwritten during relocation and never executes. A small, harmless window into Ben Serki’s 1981 toolchain.

The Recursive Descent Disassembler

Linear disassembly — decoding bytes sequentially from start to end — is unreliable for real programs. Data regions decode as code and produce nonsense; function preambles appear to contain branches into the middle of prior instructions. A sequential disassembler gets confused within the first few hundred bytes of any real executable.

Recursive descent solves this by tracing execution paths. Start from a known entry point, symbolically execute each instruction, classify those bytes as code, and follow every branch and jump. Conditional branches put both targets on a work queue. JSR instructions follow the subroutine and add the return address (JSR+3) to the queue. RTS terminates the current trace path and the algorithm backtracks to the next queued target.

The initial seeds for disassemble.py are $4000, $7000, and every JSR target discovered during tracing. The main complication is indirect jumps: JMP ($xxxx) instructions where the target address is in a pointer table. The disassembler can’t follow these statically — it would need to know the runtime pointer value.

The solution is multi-pass gap-filling. After the initial trace, any byte not classified as code or data is a candidate. The gap-filler scans unclassified regions for sequences that look like valid subroutine entries: an instruction plausible as a function start, followed by a chain of valid instructions ending at RTS. These isolated code islands are the subroutines reached only via indirect jumps — found by structural pattern-matching rather than execution tracing.

The result: 8,800 lines of Merlin32-compatible assembly, 104 identified subroutines, consistent branch labels throughout, hardware register comments on every soft switch access. Every byte in $4000-$A7FF is accounted for.

The Memory Map

The final runtime layout tells the story of what the game needs and where it lives.

Boot infrastructure — still in RAM, never called again:

$0200-$02FF: The complete boot RWTS
$0300-$03FF: Stage 2 loader — dead code after boot
$B700-$BFFF: Game loader + full RWTS with all per-track marker tables

Relocated sprite and tile data (copied from $4800-$5FFF by the relocation routine at $4000):

$0800-$097F: Platform tile bitmaps — 8 pre-shifted horizontal variants
$0A00-$0BFF: Player sprite source frames — 7 animation frames, 64 bytes each
$0C00-$0D7C: Background theme tile patterns — 4 themes with 4 duplicates
$1000-$15FF: Enemy sprite data — Apple, Butterfly, Mask × 8 shift variants × 2 animation frames
$1600-$16FF: Sprite transparency masks
$1700-$17FF: Solid fill pattern
$1800-$18FF: Identity table (table[n] = n for all 256 values)
$1900-$1FFF: Game state variables

Game code (from tracks 6-13):

$4000-$4024: Relocation routine — 37 bytes, runs once, becomes dead code
$402B-$43FF: Core game subroutines
$6000-$6EFF: Pre-shifted player sprites — 5 animations × 7 shift positions × ~110 bytes
$6F00-$6FFF: Platform tile patterns
$7000-$A7FF: Main game code — initialization, loop, graphics engine, AI, collision, scoring, sound

One overlap is intentional. HGR page 2 ($4000-$5FFF) is the background snapshot buffer for sprite restoration and physically occupies the same addresses as the relocation source region. The relocation routine evacuates all sprite data from that region before the game begins drawing — same physical 8KB, two purposes at two different points in program lifetime.

HGR Graphics: Seven Bits, Two Colors, One Peculiarity

The Apple II hi-res screen is a genuinely unusual piece of hardware. Each display byte encodes 7 pixels — bit 7 is not a pixel but a palette selector, shifting which of two color pairs is used for the 7 data bits. The screen is 280 columns × 192 rows, but the memory addressing is non-linear: rows are interleaved across three banks in a pattern that saved cost in the display circuitry at the expense of any row-address calculation in software.

Apple Panic handles HGR row addressing through SUB_706B, a dedicated computation routine that converts a row number to HGR screen address pairs for both page 1 and page 2, storing results in zero-page temporaries at $02-$05. Every draw call goes through this routine. The non-linear row address formula is computed fresh each time rather than pre-tabulated — a choice that trades a few extra cycles per draw call for 192 bytes of table storage that went elsewhere. (The $1800 region holds something different: an identity table, table[n] = n for all 256 values, used in memory operations that need a no-op mapping.)

The sprite engine uses XOR blitting. XOR a sprite’s pixel data onto the screen: the sprite appears. XOR it again at the same position: the original content is restored, because A XOR B XOR B = A. The engine needs no per-sprite saved-background buffer — erasing a sprite is always just drawing it again.

The practice on Apple II hardware requires care because of the palette bit. XORing arbitrary sprite data against arbitrary screen content can flip bit 7 in ways that corrupt colors in pixels adjacent to the sprite — not just within the bounding box but anywhere the sprite data touches a byte. Apple Panic handles this with pre-computed transparency masks at $1600-$16FF. Before XORing sprite pixels, the code ANDs the relevant screen bytes against the mask to clear the bits the sprite will write, preventing unintended palette corruption.

Pre-shifted sprites are the other key optimization. A sprite that starts at horizontal pixel position 3 within a byte needs its data shifted 3 bits left before XORing. Computing that shift at draw time costs a byte-loop plus shift instructions — expensive at 60 frames per second on a 1 MHz CPU. Instead, the sprites are stored pre-shifted at all 7 possible offsets within a byte column. render_sprites.py renders all 7 shift variants of all frames as PNGs; the shift-0 variant shows the canonical shape, shifts 1-6 show progressively offset versions with carry bits occupying adjacent bytes.

Five animation frames × 7 shift variants × ~110 bytes per frame accounts for the 3,840 bytes of player sprite data at $6000-$6EFF. Enemy sprites — Apple, Butterfly, Mask × 2 frames × 8 shifts × pixel and mask bytes — fill most of $1000-$15FF.

The Sound Engine

The Apple II base configuration has no dedicated sound hardware. There is one bit of I/O: reading or writing the soft switch at $C030 toggles a speaker diaphragm. One toggle, one click. Rapid repeated toggles produce a tone whose frequency depends on the gap between them — a tight inner loop produces a high pitch, a slow loop produces a low one.

Apple Panic implements four distinct sounds, all through $C030 toggle-and-wait loops.

The two-phase sweep (SUB_71E8) plays a descending pitch immediately followed by an ascending one — a falling-then-rising tone used for scoring and for losing a life. Two sequential delay-change loops in one routine produce the paired arc.

The long buzz (SUB_720F) is a sustained tone from nested countdown loops — plays when an extra life is earned, and runs notably longer than the other effects.

The dig/stomp click (SUB_848E) is five iterations of a maximum-delay toggle — very slow toggling, producing a short percussive thud for dig and stomp impacts.

The walking sound is a side effect of the movement delay routine SUB_7FD3. The delay loop toggles $C030 on alternating frames (a phase toggle at $E1), so the player produces rhythmic clicking while moving — no separate sound routine needed, just the delay loop’s incidental toggling.

All four sounds work by cycle counting. At 1 MHz with a predictable loop body, toggle frequency is determined precisely by loop structure. No timer hardware, no interrupts.

Collision Detection

Collision detection has two conceptual layers in a sprite-based game: logical and visual.

The logical layer is what Apple Panic uses. Enemy and player positions are tracked as (row, column) integer pairs in tile-grid coordinates. Each frame, the main loop compares each of the 10 enemy slots’ coordinates against the player’s coordinates. If any enemy is within one tile unit — horizontally or vertically — the hit logic triggers. Ten slot comparisons is negligible at the logic’s ~20 cycles per comparison.

The visual layer is a side effect of XOR blitting. When a player sprite is XOR’d onto screen bytes that already contain an enemy sprite, the overlapping pixels produce incorrect colors — the XOR of two bitmaps. The game doesn’t use this visual corruption for collision detection, but it’s visible when sprites physically overlap during a frame: a brief flash of “rainbow” pixels before the death sequence clears both entities.

Hole mechanics are handled differently from entities. Holes are permanent modifications to HGR page 2 — the background snapshot — not sprites drawn over it. When the player digs, the code writes the hole bitmap directly to page 2 at the tile position. All subsequent frames XOR sprites against this modified background, so the hole appears beneath all moving objects without any special treatment in the sprite engine. Filling a hole restores the original platform tile bitmap at that page 2 position.

Enemy fall-into-hole is a multi-step state transition: WALKING → FALLING → TRAPPED → DYING → RESPAWNING. In FALLING, the enemy’s Y position increments each frame. In TRAPPED, Y is fixed and a timer decrements. When the timer reaches zero, the enemy either transitions to DYING (if the player struck them while trapped) or back to WALKING from the hole’s position. The timer length comes from the difficulty table.

Enemy AI: Three Behavioral Trees

The three enemy types share a state machine structure but have meaningfully different behavioral parameters within the WALKING state.

Apple enemies move strictly horizontally. On reaching a platform edge, they check for a platform below and either fall to it or reverse direction. They cannot use ladders. They’re the simplest enemy: predictable, fast on later levels, and the type the difficulty table introduces first.

Butterfly enemies move both horizontally and vertically. Their movement decisions go through SUB_890B, a master AI direction chooser that evaluates platform edges, nearby ladders, and the player’s current position — but also draws a pseudo-random value from ROM at $F800. The combination produces behavior that tracks the player directionally while remaining somewhat unpredictable: butterflies will navigate ladders to close vertical distance, which is threatening enough to require active evasion, but the pseudo-random element means their paths aren’t fully deterministic.

Mask enemies share butterfly mobility but operate with a shorter behavioral timer — they reverse or change direction more frequently, producing an erratic, less predictable movement pattern. On higher difficulty levels, the spawn distribution shifts toward masks, making later levels substantially harder in ways that the difficulty table’s speed multiplier alone wouldn’t produce.

The difficulty table lives across three data regions read by SUB_758B — $7780, $7617, and $0E00 — providing per-level enemy count, ladder configuration, and spawn positions based on the level number in $7464. Entries exist for levels 1-7. The level-clamp code is straightforward:

LDX  current_level
CPX  #$07
BCS  :use_max
BCC  :use_current
:use_max
LDX  #$07
:use_current
LDA  level_enemy_count,X

At level 8 and above, X is clamped to 7. The game continues indefinitely at maximum difficulty. The victory condition at level 49 is a single comparison in the level-advance routine: CMP #49; BEQ victory_branch. Reaching level 49 on the same difficulty parameters as level 7 is genuinely hard.

The 10-enemy slot limit is enforced by the spawn routine: it scans the slot table for an inactive slot and skips spawning if all 10 are filled. On levels 5+, the game targets 8-10 active enemies simultaneously, so most spawn calls return early.

BCD Scoring and Display

Score is stored as 6 BCD (Binary Coded Decimal) digits at $70BB-$70C0. Each byte holds two decimal digits: $47 means 47 points, not 71 (what $47 would be interpreted as in hex). The 6502 supports BCD arithmetic through the D flag: SED; ADC #points_bcd; CLD adds in BCD mode, handling decimal carries automatically. Multi-digit carries require separate BCD adds per byte pair, so the scoring routine runs the add three times — once for digits 5-6, once for 3-4, once for 1-2 — propagating carry between them.

Scoring events: trapping an enemy (50 points), killing a trapped enemy (100 points). These are small enough to never overflow 6 BCD digits in practice, though the code handles the overflow case anyway by capping at $999999.

The display routine unpacks each BCD byte into its two nybbles — LSR;LSR;LSR;LSR for the upper, AND #$0F for the lower — and indexes each into the score font bitmap table. The font is stored as 14-byte HGR bitmaps for digits 0-9 plus a life-icon glyph. Each digit renders as a 7×14-pixel character at its fixed screen position; updating the score means writing 84 bytes (6 digits × 14 bytes) to HGR page 1 at the score display row.

The Title Screen as a Loader

The title screen deserves more than a mention, because the fact that it runs during disk loading is unusual engineering for 1981.

The display sequence player at $1262 reads 3-byte command entries from a table at $1367-$17FF. Each entry is (row, shape_index, shift_offset). The player iterates through the full command table, drawing each shape at the specified position via the XOR blitter. Between commands, a delay routine controls animation timing. The table encodes a scripted sequence: apple logo appears, title text slides in, character sprites walk across the screen.

The interleaving with disk I/O works because the RWTS loading loop has a defined yield structure. After each complete sector read — at the point where the RWTS returns to the game loader — the loader calls JSR $B7DA three times before starting the next sector. Each call advances the animation by one command. Over 104 sectors (tracks 6-13), the title sequence gets 312 animation steps during loading.

Timing isn’t precisely controlled — some sectors take longer to locate — but the animation is designed to be timing-independent: each step advances by one command regardless of wall-clock time. The visual result is a smooth-enough animation over the ~10 seconds of disk loading, indistinguishable from a frame-locked title screen to anyone watching it run.

The alternative — load all sectors, then display the title screen — would be simpler to implement. The interleaved approach requires knowing the RWTS yield points and designing the animation to tolerate variable-length pauses. The result is a polished product: the player sees a live animated screen for the entire loading period, not a frozen display or a blank wait.

What the Subroutine Catalog Shows

subroutine_analysis.txt catalogs all 104 subroutines: entry address, byte count, call sites, and JSR cross-references. Reading through it gives the game’s structure at a glance.

The largest subroutine by bytes is the enemy state machine dispatcher ($8000-$8100): 256 bytes managing the complete WALKING/FALLING/TRAPPED/DYING cycle for one enemy slot. The second largest is the platform tile renderer ($74E9-$75C0), handling both initial level draw and per-tile redraws when platforms are dug or restored.

The smallest are one-instruction thunks: several JMP $XXXX stubs at fixed addresses that indirect to real handlers. These exist because the dispatch table hardcodes addresses, and the thunks provide stable targets while allowing implementation routines to be placed anywhere in the binary.

The call graph has one hub: the main loop at $74E9 directly calls 12 subroutines per frame. Every other subroutine is downstream of one of those 12. Maximum call depth is 4 levels — shallow by modern standards, but reflecting a real constraint: on a 1 MHz 6502, each JSR/RTS pair costs 12 cycles. A 4-level call chain wastes 48 cycles per call site, which matters when you’re targeting 60 frames per second.

The `nibbler` Toolkit

The investigation produced approximately 39 Python scripts. Each represents a question asked and answered — or asked, exhausted, and abandoned. woz_reader.py asked “what format is this file?”; woz_53brute.py asked “is there a GCR variant I’m missing?” and found nothing; check_wrap.py asked “is the problem in my own bit-stream reader?” and found everything. The scripts trace the full arc of the investigation: where confidence was right, and where it was completely wrong.

nibbler is what the working scripts became after the investigation was complete: a reusable Python package for WOZ file analysis, built from the verified core of the collection.

Twelve modules, nine CLI subcommands:

nibbler info      — WOZ2 structure and track map
nibbler scan      — nibble-level sector discovery and address field decode
nibbler protect   — copy protection analysis report (8 protection classes)
nibbler nibbles   — raw nibble stream dump with byte highlighting
nibbler decode    — extract and decode specific sectors
nibbler boot      — 6502 emulation to a stop address with memory dump
nibbler dsk       — convert between WOZ and .dsk format
nibbler flux      — render magnetic flux visualization as PNG
nibbler disasm    — recursive descent disassembly

The protect command detects dual-format tracks, non-standard prologs, invalid checksums, per-track prolog variations, GCR corruption signatures, self-modifying code markers, custom post-decode patterns, and non-standard address field values. It produces a markdown report. It works on any WOZ file.

The boot command is the most generally useful: it runs the 6502 emulator from the P6 ROM boot sequence to a caller-specified stop address, saves the full 64KB memory state, and can dump specific address ranges. Any custom-loader Apple II disk — copy-protected or not — can be booted this way to extract its runtime memory without running it on real hardware.

Everything is in the Orchard repository: the full nibbler package, all 39 investigation scripts with their categorized README, and the extracted 8,800-line assembly source.

Forty-Five Kilobytes

Total: ~39 Python scripts, 69.8 million emulated instructions, 9 protection layers, 104 game subroutines.

The copy protection is sophisticated because it’s targeted. The dual-format track defeats copiers that only understand 6-and-2. The invalid address checksums defeat copiers that validate headers before reading. The GCR table corruption exploits (A<<3) XOR (B<<3) = (A XOR B)<<3 to maintain passing checksums while corrupting decode values — it targets copiers that read raw nibbles and decode with fresh tables. The $DE marker patch defeats copiers scanning for $D5. The per-track second-byte variations defeat copiers that handle $DE but assume a fixed second byte. Each layer targets a specific attack class; the combination defeats them simultaneously and in depth.

The game fits in 14 tracks because every structure serves double duty. The same 8KB is first relocation source data, then the HGR page 2 background buffer. The Y register value that ends the GCR table builder starts the corruption loop. The title screen animation engine runs during disk loading because the RWTS yield points are accessible and the animation is timing-independent. Nothing is wasted.

This is engineering from an era when constraints were absolute. There was no option to add more memory or use a second disk side. There were 14 tracks on a 5.25-inch floppy, and the entire product — boot loader, protection, title screen with animation, complete game logic, three enemy types with distinct AI, full sprite animation, BCD scoring, level progression, sound effects — had to fit inside them.

Every byte earned its place.

The full investigation — all 39 scripts, the nibbler toolkit, assembly source, and extracted runtime binary — is in the Orchard repository. For a step-by-step walkthrough reproducing the analysis using nibbler, see Walkthrough.md.

Building the Runtime Image

The Recursive Descent Disassembler

The Memory Map

HGR Graphics: Seven Bits, Two Colors, One Peculiarity

The Sound Engine

Collision Detection

Enemy AI: Three Behavioral Trees

BCD Scoring and Display

The Title Screen as a Loader

What the Subroutine Catalog Shows

The nibbler Toolkit

Forty-Five Kilobytes

The `nibbler` Toolkit