The Ghost in the Source File

Part 5 of 7. Part 4 covered the bytecode interpreter. This part covers what happened when the interpreter ran real Smalltalk-80 initialization code — and what was missing from the source file that the code assumed would be there.

By late October 2025, I had a heap builder and an interpreter. The heap was correctly linked — zero garbage collected at cold start. The interpreter passed 188 block context tests. It was time to run the system’s own initialization code and see what happened.

The Smalltalk-80 source file contains several hundred free-standing expressions scattered throughout the class definitions. These aren’t method definitions — they’re imperative statements. From the actual file:

Character initialize
Symbol initialize
Cursor initialize
Date initialize
CompiledMethod initialize

These are class-side initialize calls — each one sets up global tables, pool dictionaries, and class variables that the rest of the system depends on. There are also direct global assignments like Processor := ProcessorScheduler new and class comment registrations that set up the SystemOrganization.

In a running Smalltalk system, these execute as they’re encountered during file-in, setting up global variables, configuring class variables, and registering objects in system dictionaries.

I had compiled all of them into the JSON alongside the class definitions, preserving source order. The plan was to convert each one into a Smalltalk doIt expression at the end of cold start and execute it. Running 300-odd real Smalltalk expressions against the live heap would validate the interpreter against something more complex than unit tests: actual system initialization code.

This was the right plan. The execution was painful.

The Ordering Problem

The first category of failure was ordering. The source file was written to be filed into a running Smalltalk system, which processes expressions as they appear and has a live class hierarchy already in memory. The expressions are not ordered by dependency — they’re ordered by appearance in the file, which reflects how the file was assembled, not which expressions need to run before which.

A concrete example: the expression that initializes TextConstants (a pool dictionary used by the Text class) appears after the class definitions that reference it. But some of those class definitions, when their initialize methods run, try to access TextConstants. If TextConstants initialize hasn’t run yet, the accessor finds nil and crashes.

The solution was a dependency analysis pass: identify which global variables each expression reads and writes, topologically sort the expressions so writes precede reads, and execute in the resulting order. This worked for most expressions.

But some dependencies were circular: expression A creates an object that expression B needs, but expression B also produces something expression A needs. For these, manual intervention was required: examining the specific dependency, figuring out which direction the dependency actually ran at runtime (often one of the apparent dependencies was actually on a default value that existed without initialization), and adjusting the ordering accordingly.

The more significant problem was simpler to describe and harder to solve.

The Incomplete Record

A Smalltalk-80 file-out is a snapshot of source code. It contains the class definitions and methods that were explicitly filed out from a running system. But a living Smalltalk system accumulates state through interactive development: a programmer opens a Workspace, evaluates an expression, creates a global variable, assigns it a value, and continues working. The system’s binary image checkpoint (the normal shutdown mechanism) preserves this state. The file-out does not.

There are globals in the Smalltalk-80 system that have no initialization expression anywhere in the source file. They exist in every binary image of the running system. They were set interactively — during system development at PARC — and never transcribed into source. The file-out was produced from a live system that had those variables in the expected state. The code that uses them was written assuming they exist. But the source file says nothing about how to create them.

When cold start builds the heap from the file-out, these variables are absent. The SystemDictionary has no entry for them. When initialization code runs and tries to access them, it gets nil — or more commonly, it tries to send a message to nil and crashes with a doesNotUnderstand: error, because nil doesn’t respond to whatever the code expected.

Each of these failures was an investigation. The error pointed to a specific line of Smalltalk code that had failed. Reading that code revealed what variable it expected and what type of object it expected to find there. Reading the Blue Book indicated what the variable’s intended purpose was. Cross-referencing with the class definitions that used it showed what it should contain. Then a fix was added to the cold-start loader to pre-create that object and install it in the SystemDictionary before the init expressions ran.

The Ghosts

Here are some of the ghost variables I found, and what they turned out to be:

TextConstants is a pool dictionary — a shared namespace of constants used by the text system. The Text class, the Paragraph class, and several others all reference variables like Bold, Italic, and Underlined that should be found in TextConstants. In a running system, TextConstants is a Dictionary containing these values. In the file-out, the expressions that initialize TextConstants exist, but they run late and some classes that reference it run their own initializers earlier. The fix was to pre-allocate TextConstants as an empty Dictionary during cold start so that early accesses found a valid object.

ScheduledControllers is the global that manages the list of active windows. The MVC framework’s controller loop checks this variable to decide which window is currently active. In the source file, there is no expression that creates ScheduledControllers — it was created interactively when the original system first set up its window management infrastructure. The fix was to create it during cold start as a ControlManager instance and install it in the SystemDictionary.

SingleCharSymbols is a class variable of the Symbol class that holds pre-allocated Symbol instances for every single-character string. The Symbol class has an initialize method that’s supposed to create this table, but the method assumes certain other initialization has already happened. In the source file ordering, this initialize ran too early. The fix was to create the character symbol table explicitly during a cold-start phase that ran after all the prerequisite initialization.

The ProcessorScheduler singleton is arguably the most important ghost. The Processor global should be an instance of ProcessorScheduler — the object that manages Smalltalk’s cooperative multi-process scheduling. In the source file there is an expression Processor := ProcessorScheduler new, but ProcessorScheduler new requires a working process scheduler to already exist. This circular dependency was resolved by creating a minimal ProcessorScheduler instance during cold start and using it to bootstrap the full scheduler initialization.

Pool dictionary class variables followed the same pattern across several classes — shared namespaces that existed in the running system but were created by init expressions that ran too late. Each required pre-allocation during cold start.

When It Ran Clean

The hardest part was not finding the ghosts — it was determining the order in which to run every init expression. Each expression had direct and indirect dependencies: globals it read, class variables it expected to be populated, pool dictionaries it needed to exist. Some dependencies were obvious from the code. Others were buried three method calls deep — an init expression calls initialize, which sends defaultFont, which accesses TextConstants, which hasn’t been created yet. Mapping these transitive dependencies and finding a valid execution order was the most time-consuming work of this entire phase.

One bootstrapping constraint worth noting: some init expressions set up exception handling, but exception handling isn’t available until those expressions have run. The sequencing has to be handled carefully. The modified source file includes EventDispatcher initialize and EventDispatcher startEventProcess as top-level expressions — these set up the event loop that the entire UI depends on. Those calls appear after all the class definitions because they must: the class needs to exist before its initialize method can run.

The moment when init expressions ran clean was not dramatic. There was no single expression that when it passed meant the whole system worked. It was a gradual shrinkage of the failure list: ten failures, then seven, then three, then one, then zero.

The zero happened in late October. All init expressions executed without errors. The SystemDictionary contained all the globals the system expected. Class variables were initialized. Pool dictionaries existed and contained the right values.

Running the GC after the init expressions was still clean — zero collections. The newly created objects were all reachable.

The system was now, for the first time, in a state that resembled a real Smalltalk-80 environment. The class hierarchy was present and linked. Methods were compiled and installed. Global state was initialized. The interpreter could execute arbitrary Smalltalk code.

What This Reveals About Historical Source

The ghost variable problem is not unique to Smalltalk-80. Any sufficiently old software system, reconstructed from source code that was produced by filing out from a live system rather than built from scratch, will have this characteristic. The source reflects the code that was written; it doesn’t reflect the interactive history.

The canonical Smalltalk-80 file-out was produced from a system that had been developed interactively at Xerox PARC over years. The developers set globals, ran experiments, changed things, and checkpointed the binary image. Some of what they did was captured as source code. Some wasn’t. The file-out is a transcript of the source-code parts only.

This is, in a way, the most human aspect of the whole project. The system has archaeological layers: things that were written down, and things that were just done and preserved in the binary checkpoint but never transcribed. Rebuilding the system from source means reconstructing those undocumented layers by reading the code that depended on them and reasoning about what they must have been.

With the heap fully initialized and all init expressions running clean, the system existed. It just couldn’t show you anything yet.