The Cooperative-CPU Round-Trip

Part 8 of the cpm-videx series. Part 4 traced the one-time handoff from 6502 to Z-80 at boot. This article covers the runtime handoffs that happen every time CP/M needs disk: the cooperative-CPU round-trip where two physical processors take turns driving the same Apple ][. The article also traces the actual failure path when 2.20 boots with a Videx — what hangs and where.

Update (2026-05-01): The Stage-3 emulator reproduces the 2.20 hang end-to-end and settles the open question this article carried about which failure mode actually fires. It’s the Z-80 stack overflow (Case B from the byte-trace devlog) — $DFBE is $E5 fill, the Z-80 PUSH-HLs through its own stack, SP wraps past $0000, and high memory gets corrupted with whatever HL contained on entry. Not the “6502 stuck inside the Videx ROM” framing this article uses; that’s directionally suggestive but the actual byte-level behavior is the Z-80 corrupting itself first. The article’s narrative stands as the architectural framing; the runtime mechanism is corrected to Case B.

The architectural problem

The Apple ][ has one CPU, one bus, one set of I/O ports. The Disk II controller is a 6502-driven device. Its data latch is at $C0Ec (slot 6, sector data port); its sequencer state machine expects 6502 timing.

The SoftCard adds a Z-80 to a system that doesn’t otherwise have one. CP/M needs disk I/O to load programs, write files, and run any practical workflow. But CP/M’s BIOS is Z-80 code; the disk I/O routines (RWTS) are 6502 code. There’s no Z-80-native way to talk to the Disk II.

Microsoft’s solution: the two CPUs cooperate. When CP/M needs disk, the Z-80 doesn’t talk to the Disk II directly. Instead, the Z-80 sets up the request in shared memory, signals the 6502 via a sync flag, and the SoftCard switches the bus from Z-80 to 6502. The 6502 wakes up, reads the request, runs RWTS, deposits the result, signals back, and the SoftCard switches the bus to Z-80. From CP/M’s perspective, disk I/O is a synchronous BIOS call that returns data; under the hood, two processors took turns.

This article walks the actual mechanism.

The sync flags

The cooperative-CPU model uses a pair of memory addresses in shared Apple RAM as sync flags:

$E000: Set by the 6502 when it has finished servicing whatever the Z-80 requested. Z-80 reads this to detect “6502 is done.”
$E010: Set by the Z-80 to acknowledge the response. 6502 watches this to detect “Z-80 has read the result.”

These addresses are in the language card RAM area on the Apple. Both CPUs see them at the same physical bytes (the SoftCard’s memory mapping is mostly transparent at high addresses).

The Z-80 polling loop

The Z-80 side of the sync is six instructions at Z-80 $1E39:

$1E39: 3A 00 E0        LD A,($E000)            ; read sync flag
$1E3C: 17              RLA                     ; high bit → carry
$1E3D: 30 FA           JR NC,$1E39             ; loop until carry set
$1E3F: 32 10 E0        LD ($E010),A            ; write to acknowledge
$1E42: 3F              CCF                     ; complement carry
$1E43: 1F              RRA                     ; restore A
$1E44: C9              RET

The polling loop is LD A,($E000) / RLA / JR NC, $1E39 — three instructions, four bytes total. It reads $E000, rotates the high bit into the carry flag, and loops back if the carry is clear. Once the carry is set (i.e., $E000 has its high bit set), the Z-80 falls through, writes to $E010 to acknowledge, restores the read value via CCF / RRA, and returns.

(The $1E39 address is where the Z-80 sees these bytes through its TPA-area view of Apple memory. The same bytes also live in the BIOS first 1 KB at Z-80 $FAB8+offset, accessible via the SoftCard’s high-RAM mapping. Z-80 BIOS code can call either address — they’re the same physical bytes.)

The hidden mechanism: SoftCard hardware

Here’s the thing the polling loop doesn’t show, but has to be true: between the Z-80’s first read of $E000 and its successful read of the high-bit-set value, the SoftCard switched the bus to 6502 and back.

The 6502 was asleep when the Z-80 wrote the request and called the polling loop. For the 6502 to set $E000’s high bit, it has to have woken up, run code, and written to $E000 — all while the Z-80 was waiting on the polling loop.

There are a few ways the SoftCard hardware could implement this:

(a) The SoftCard monitors Z-80 reads of $E000. When the Z-80 reads $E000 and the value’s high bit isn’t set, the SoftCard pauses the Z-80’s clock (or denies the read) and switches the bus to 6502. The 6502 runs until it sets $E000’s high bit, then the SoftCard switches back, the Z-80’s pending read completes with the new value, and the polling loop falls through.

(b) The SoftCard runs both CPUs in alternating cycles when both are active. The Z-80 polls; the 6502 services; both CPUs see consistent shared memory.

(c) The first LD A,($E000) itself is the trigger. The SoftCard’s address-bus monitoring detects an access to a specific watched address ($E000) and uses that as the CPU-switch trigger.

I’m not certain which (or which combination). Option (a) is the most parsimonious and matches how some other dual-CPU coprocessor cards from the era worked. The polling loop’s tightness — three instructions of ~16 cycles total per iteration — suggests the architecture expects the wait to be short, consistent with the SoftCard hardware doing real work to make the 6502 run during the wait.

The 6502 side

What the 6502 does during its wake interval isn’t fully documented from static analysis (its code lives in the preserved RWTS at Apple $BA00-$BFFF, which I haven’t extracted). But conceptually it has to:

Read the disk parameters the Z-80 set up in BIOS state (track, sector, DMA address).
Run the standard Apple ][ RWTS to read or write the requested sector.
For a read: deposit the sector data at the DMA address (typically Z-80-readable memory).
Set $E000’s high bit to signal completion.
Wait for the Z-80 to acknowledge via $E010.
Yield back to the Z-80.

The 6502’s last action before yielding is to set $E000. Its first action when woken is to read shared state and run RWTS. Whatever code runs between those bookends lives in the preserved RWTS area and is the 6502’s part of the cooperative loop.

What this looks like for a real disk operation

Concretely, when CP/M wants to read a sector:

Z-80 BIOS READ entry (jump-table offset 39 — JP $FEBD in 2.23) runs.
The READ routine sets up state: current track at $FECB or similar, sector at $FED2, DMA address at $FED4, etc. The exact addresses are the BIOS state slots in the runtime-generator zone.
READ calls into a Z-80 disk-callback at Apple $0A00-$0BFF (Z-80 $1A00-$1BFF). The callback does final parameter setup.
The callback calls the polling loop at $1E39.
The polling loop’s first LD A,($E000) triggers the SoftCard CPU switch.
6502 wakes, reads parameters from shared state, runs RWTS to load the sector from disk, deposits data at the DMA address.
6502 sets $E000’s high bit, signaling done.
SoftCard switches back to Z-80. The Z-80’s pending LD A,($E000) read sees the new value.
Z-80 polling loop falls through, writes to $E010 to acknowledge, returns.
Disk-callback returns to BIOS READ.
BIOS READ returns to BDOS, which returns to whatever CP/M code requested the disk operation.

The whole thing looks synchronous from the Z-80’s perspective. From the 6502’s perspective, the system was asleep for most of the time, then briefly woke up to do disk I/O and went back to sleep. The SoftCard hardware mediates which CPU is alive at any given moment.

Why this design

A simpler design would be: emulate or rewrite RWTS in Z-80, and have the Z-80 talk to the Disk II directly. This is what some later CP/M implementations on Apple-compatible hardware did. Microsoft didn’t.

The reason is timing. RWTS is a tightly-timed routine — it has to read GCR-encoded data from the Disk II at a rate of one bit every 4 microseconds, and the entire decode has to keep up with the disk’s rotation. The 6502 at 1 MHz is exactly fast enough; deviation breaks the read. A Z-80 reimplementation would have had to match this timing precisely, on Z-80’s own clock and instruction set, while keeping up with the same disk speed.

Microsoft’s choice was to leave RWTS alone — it works, Apple shipped it, every Apple ][ owner has its bytes on disk anyway — and add a sync mechanism to invoke it from Z-80 code. The cooperative-CPU model is what made that practical. Two CPUs sharing memory, taking turns at the bus, mediated by hardware that knows which one is allowed to drive the disk controller.

It’s also a beautiful example of why hardware-software co-design works. The polling loop at $1E39 is six Z-80 instructions. The hardware that turns those instructions into a controlled CPU switch is presumably more complex (latches, address-bus comparators, clock-pause logic), but the Z-80 code doesn’t need to know about any of it. From software’s view, the SoftCard provides a synchronous “ask 6502 to do something” primitive; the hardware handles the rest.

What’s still open

The 6502 side. Until I extract the bytes from the preserved RWTS at Apple $BA00-$BFFF and trace what they do at the sync points, the 6502 half of the round-trip is conceptual. The Z-80 side is concrete (the six instructions of the polling loop are right there in newdisk_223.bin at offset 1081-1093).

The exact SoftCard CPU-switch trigger. Hypothesis (a) above (Z-80 reads of $E000 block the Z-80 until 6502 has run) is most plausible but unverified.

Both questions need either a hardware document for the SoftCard (Microsoft’s internal docs, if findable) or a working SoftCard plus instrumentation to observe the bus.

The Videx, the cooperative model, and the hang

Returning to the project’s original question: why does 2.20 hang on a Videx?

The cooperative-CPU model means all expansion-slot I/O goes through the same general path: BIOS sets up parameters, calls into a per-device handler, the handler talks to the slot ROM (typically in the 6502’s address space). For disk, the handler is the disk-callback path. For console output via a Pascal-protocol slot (like a Videx), the handler is a CONOUT-style routine that does JSR $Cn07 (Pascal 1.0 entry point) on the 6502 side.

In 2.20, the slot scanner sees the Videx’s $Cn05=$38 and $Cn07=$18 and tags slot 3 with device code 4 (Pascal 1.0). The BIOS dispatches CONOUT through a routine that does JSR $Cn07 to invoke output. The Videx’s $CB07 byte is $18 — but the Videx’s actual code expects to be entered through Pascal 1.1’s vector table at $Cn0D-$Cn10, not via direct JSR to $Cn07. The Videx’s behavior with a naive JSR $CB07 is undefined; in practice it hangs the system.

The cooperative-CPU model doesn’t cause the hang. The hang would happen on a single-CPU Apple ][ trying the same call. What the cooperative model does is route Microsoft’s BIOS dispatch through the 6502 (because the 6502 is what can actually do JSR $Cn07), so the hang manifests on the 6502 side rather than the Z-80 side. The Z-80 is still spinning on $1E39 waiting for $E000 — but the 6502 will never set $E000 because it’s stuck inside the Videx ROM.

So the hang is, technically, a 6502 stuck inside an expansion-slot ROM, with the Z-80 polling forever for a sync flag the 6502 will never set. Both CPUs are alive; neither makes progress. From the user’s perspective, the screen stops updating after the cold-boot banner, no prompt appears, and the only recovery is power-cycle.

2.23’s fix changes which slot ROM entry point the 6502 calls into. With device code $06 for Pascal 1.1 cards, the BIOS dispatches through the Pascal 1.1 vector table — $Cn0D for INIT, $Cn0E for READ, $Cn0F for WRITE, $Cn10 for STATUS. Those are the entry points the Videx actually expects. Output works; the system reaches the A> prompt.

Update (2026-04-29): A byte-level trace of the 2.20 hang path is in the v2.20 hang byte-trace devlog. The dispatch path lands at handler $DFBE (12 bytes ending in CALL $DAC5, no RET), and $DAC5 is in the runtime-generator zone whose final contents depend on the cold-boot generator output. Two failure modes are statically consistent: (Case A) the 6502 hangs inside Videx ROM at BASOUT1 with uninitialized 6845/VRAM state, while the Z-80 polls $E000 forever, or (Case B) the Z-80 stack overflows on $E5 (PUSH HL) spam from the runtime-generator slot. Distinguishing them requires booting in a Z-80 emulator. Either way, no progress.

That’s the chain, end to end: 11 bytes in the slot scanner → device code $06 → BIOS factory’s device-6 dispatch → runtime-installed Pascal 1.1 handler → 6502-side JSR $Cn0E (the right entry point) → Videx services the output → 6502 sets $E000 → Z-80 polling falls through → BDOS gets a successful return.

The cooperative-CPU round-trip is the substrate that all of that runs on top of.

Part 9 closes the series with the categorical inventory: every byte that differs between 2.20 and 2.23, sorted by what’s driving the change. The Videx fix turns out to be ~21 bytes inside ~8 KB of total churn — most of which has nothing to do with Videx and everything to do with Microsoft taking the opportunity to ship CP/M 2.2 as the new base.

Deep dives

The inter-CPU sync polling loop at Z-80 $1E39 — the six-instruction sync routine, byte-by-byte. The Z-80 side of the cooperative-CPU bridge.