smalltalk-from-scratch — Part 5
The Ghost in the Source File
The Smalltalk-80 file-out is not a complete description of a running system. Some globals only ever existed interactively — they're not in any source file, but the code that uses them was written assuming they exist.
Part 5 of 7. Part 4 covered the bytecode interpreter. This part covers what happened when the interpreter ran real Smalltalk-80 initialization code — and what was missing from the source file that the code assumed would be there.
By late October 2025, I had a heap builder and an interpreter. The heap was correctly linked — zero garbage collected at cold start. The interpreter passed 188 block context tests. It was time to run the system’s own initialization code and see what happened.
The Smalltalk-80 source file contains several hundred free-standing expressions scattered throughout the class definitions. These aren’t method definitions — they’re imperative statements. From the actual file:
Character initialize
Symbol initialize
Cursor initialize
Date initialize
CompiledMethod initialize
These are class-side initialize calls — each one sets up global tables, pool dictionaries, and class variables that the rest of the system depends on. There are also direct global assignments like Processor := ProcessorScheduler new and class comment registrations that set up the SystemOrganization.
In a running Smalltalk system, these execute as they’re encountered during file-in, setting up global variables, configuring class variables, and registering objects in system dictionaries.
I had compiled all of them into the JSON alongside the class definitions, preserving source order. The plan was to convert each one into a Smalltalk doIt expression at the end of cold start and execute it. Running 300-odd real Smalltalk expressions against the live heap would validate the interpreter against something more complex than unit tests: actual system initialization code.
This was the right plan. The execution was painful.
The Ordering Problem
The first category of failure was ordering. The source file was written to be filed into a running Smalltalk system, which processes expressions as they appear and has a live class hierarchy already in memory. The expressions are not ordered by dependency — they’re ordered by appearance in the file, which reflects how the file was assembled, not which expressions need to run before which.
A concrete example: the expression that initializes TextConstants (a pool dictionary used by the Text class) appears after the class definitions that reference it. But some of those class definitions, when their initialize methods run, try to access TextConstants. If TextConstants initialize hasn’t run yet, the accessor finds nil and crashes.
The solution was a dependency analysis pass: identify which global variables each expression reads and writes, topologically sort the expressions so writes precede reads, and execute in the resulting order. This worked for most expressions.
But some dependencies were circular: expression A creates an object that expression B needs, but expression B also produces something expression A needs. For these, manual intervention was required: examining the specific dependency, figuring out which direction the dependency actually ran at runtime (often one of the apparent dependencies was actually on a default value that existed without initialization), and adjusting the ordering accordingly.
The more significant problem was simpler to describe and harder to solve.
The Incomplete Record
A Smalltalk-80 file-out is a snapshot of source code. It contains the class definitions and methods that were explicitly filed out from a running system. But a living Smalltalk system accumulates state through interactive development: a programmer opens a Workspace, evaluates an expression, creates a global variable, assigns it a value, and continues working. The system’s binary image checkpoint (the normal shutdown mechanism) preserves this state. The file-out does not.
There are globals in the Smalltalk-80 system that have no initialization expression anywhere in the source file. They exist in every binary image of the running system. They were set interactively, probably by the original Xerox PARC developers during the system’s development, and the interactions were never turned into source code. The file-out was produced from a live system that had those variables in the expected state. The code that uses them was written assuming they exist. But the source file says nothing about how to create them.
When cold start builds the heap from the file-out, these variables are absent. The SystemDictionary has no entry for them. When initialization code runs and tries to access them, it gets nil — or more commonly, it tries to send a message to nil and crashes with a doesNotUnderstand: error, because nil doesn’t respond to whatever the code expected.
Each of these failures was an investigation. The error points to a specific line of Smalltalk code that failed. Reading that code tells you what variable it was expecting and what type of object it expected to find there. Reading the Blue Book tells you what the variable’s intended purpose is. Cross-referencing with the class definitions that use it tells you what it should contain. Then a fix is added to the cold-start loader to pre-create that object and install it in the SystemDictionary before the init expressions run.
The Ghosts
Here are some of the ghost variables I found, and what they turned out to be:
TextConstants is a pool dictionary — a shared namespace of constants used by the text system. The Text class, the Paragraph class, and several others all reference variables like Bold, Italic, and Underlined that should be found in TextConstants. In a running system, TextConstants is a Dictionary containing these values. In the file-out, the expressions that initialize TextConstants exist, but they run late and some classes that reference it run their own initializers earlier. The fix was to pre-allocate TextConstants as an empty Dictionary during cold start so that early accesses found a valid object.
ScheduledControllers is the global that manages the list of active windows. The MVC framework’s controller loop checks this variable to decide which window is currently active. In the source file, there is no expression that creates ScheduledControllers — it was created interactively when the original system first set up its window management infrastructure. The fix was to create it during cold start as a ControlManager instance and install it in the SystemDictionary.
SingleCharSymbols is a class variable of the Symbol class that holds pre-allocated Symbol instances for every single-character string. The Symbol class has an initialize method that’s supposed to create this table, but the method assumes certain other initialization has already happened. In the source file ordering, this initialize ran too early. The fix was to create the character symbol table explicitly during a cold-start phase that ran after all the prerequisite initialization.
The ProcessorScheduler singleton is arguably the most important ghost. The Processor global should be an instance of ProcessorScheduler — the object that manages Smalltalk’s cooperative multi-process scheduling. In the source file there is an expression Processor := ProcessorScheduler new, but ProcessorScheduler new requires a working process scheduler to already exist. This circular dependency was resolved by creating a minimal ProcessorScheduler instance during cold start and using it to bootstrap the full scheduler initialization.
Pool dictionary class variables appeared in several classes. Pool dictionaries are shared namespaces: a group of classes can declare that they share a pool dictionary, and variables in that dictionary are accessible to all of them without being instance or class variables of any one class. In a running system, pool dictionaries are created when a class is first loaded. In cold start, the classes exist but the pool dictionaries are created by init expressions — and some class initialization methods ran before those expressions.
The Skip List
Not every init expression could be made to run cleanly. Some required an interactive GUI to be running. Some assumed a working file system at a specific path. Some initialized Smalltalk’s change-tracking mechanism, which requires a writable changes file.
For these, I maintained a skip list: a set of expression identifiers that would be omitted during cold start. The skipped expressions were ones whose functionality wasn’t needed for the core system to run (the change-tracking system, the graphical object inspector, certain benchmark utilities) or ones that would be initialized differently by the runtime (window management, which depended on the display system being initialized first).
The skip list grew over several weeks as each new class of failure was diagnosed. At its peak it contained around thirty expressions. As more cold-start initialization was added, the skip list shrank — some expressions could be removed from it when their prerequisite state was correctly established by earlier cold-start phases.
One broader implication: the init expression system has to work before exception handling can work, and exception handling is itself expressed as init expressions. The modified source file includes EventDispatcher initialize and EventDispatcher startEventProcess as top-level expressions — these set up the event loop that the entire UI depends on. That call appears after all the class definitions because it has to: the class needs to exist before its initialize method can run. The sequencing is always the constraint.
When It Ran Clean
The moment when init expressions ran clean was not dramatic. There was no single expression that when it passed meant the whole system worked. It was a gradual shrinkage of the failure list: ten failures, then seven, then three, then one, then zero.
The zero happened in late October. All init expressions that weren’t on the skip list executed without errors. The SystemDictionary contained all the globals the system expected. Class variables were initialized. Pool dictionaries existed and contained the right values.
Running the GC after the init expressions was still clean — zero collections. The newly created objects were all reachable.
The system was now, for the first time, in a state that resembled a real Smalltalk-80 environment. The class hierarchy was present and linked. Methods were compiled and installed. Global state was initialized. The interpreter could execute arbitrary Smalltalk code.
What This Reveals About Historical Source
The ghost variable problem is not unique to Smalltalk-80. Any sufficiently old software system, reconstructed from source code that was produced by filing out from a live system rather than built from scratch, will have this characteristic. The source reflects the code that was written; it doesn’t reflect the interactive history.
The canonical Smalltalk-80 file-out was produced from a system that had been developed interactively at Xerox PARC over years. The developers set globals, ran experiments, changed things, and checkpointed the binary image. Some of what they did was captured as source code. Some wasn’t. The file-out is a transcript of the source-code parts only.
This is, in a way, the most human aspect of the whole project. The system has archaeological layers: things that were written down, and things that were just done and preserved in the binary checkpoint but never transcribed. Rebuilding the system from source means reconstructing those undocumented layers by reading the code that depended on them and reasoning about what they must have been.
With the init expressions running clean and the heap fully initialized, it was finally time to make the system visible.