Undeclared Variable Reparation, An Epic Journey In a Compiler – Part I

In this series of posts, I present how the current implementation of
Pharo handles compilation errors on undeclared variables and the
interactive reparation to fix them. Targeted readers are people
interested in compilers or object-oriented programming. Non-Pharo
developers are welcome since knowledge of the language or the developing
environment is not required. Some parts of Pharo are explained when
needed in the article.

We illustrate with a small and specific corner case of the code
edition and compilation subsystems of Pharo. It shows how complex
software has to deal with complex situations, requirements, usage and
history. And why design choices matter.

Disclaimer, some parts of the presented code can be qualified as
“awesome”, where “awe” still means “terror”. Maybe I should rename the
article “The Code of Cthulhu” or something, but I’m often bad at
names.

The first and the second parts are a deep-down journey. We start from the GUI and do down (go up?) in the call stack, with very few shortcuts or branching. Explanation, comments, and discussion are done during the visit.

Note also that the presented code is the one of Pharo11 and that most issues should be solved (or working on) for Pharo12. The meta-issue that tracks my work in progress is available at https://github.com/pharo-project/pharo/issues/12883 — warning, it contains spoilers.

Special thanks go to Hugo Leblanc for his thorough review.

Undeclared Variables

Compiling a method in Calypso (the current class browser), in
StDebugger (the current debugger) or in any place that accepts the
edition and installation of methods is an everyday task of Pharo
developers, and most of the time an everyminute task. It’s something
Pharoers do naturally without thinking much about it(possibly to
preserve their own sanity).

One specific picturesque experience is having a menu window pop up
when trying to compile code that contains an undefined variable. The
presented menu contains various options depending on the variable name
and the context: new temporary variable (Pharo name for “local
variable”), new instance variable (Pharo name for “attribute” or
“field”), new class if the name starts with an uppercase letter and some
proposal of existing variable (local, global or other) with a similar
name in case of an obvious typing error. Selecting one or the other of
these options updates the code in the editor and resumes the compilation
(or pops up a similar menu if some other undefined variable
remains).

Note that in Pharo, variables can also remain undeclared, for a lot
of good reasons, but it is a story for another day.

Let us illustrate with a single concrete scenario used in this
article’s first parts. You are in a Calypso editor, on the instance
side, on a class Foo trying to implement a new method
bar.

bar
    baz := 42

The method might not be finished yet and baz is not even
declared, but let’s install it with a classic Ctrl-S
(accept). We get the menu window “Unknown variable: baz please
correct, or cancel:” with some choices:

“Declare new temporary variable”;
“Declare new instance variable”;
“Cancel”;
and also an additional “Cancel” button.

We select the first option (temporary variable) and the code is
automatically repaired as

bar
    | baz |
    baz := 42

the method is also compiled, installed in the class Foo
and fully usable.

Note: the | thing is the Pharo syntax to declare
temporary variables (i.e. local variables).

Part I – Falling Down the
Rabbit Hole

Let’s try to understand what just happened. Is the whole thing
(black) magic or simple object-oriented (black) design?

This first post is down from the compiation request to the menu. The
next post will be about code repair.

We have the Calypso window and its nested text editor component. I
skip the complex graphical UI sequence of calls — there are some
observer design patterns and even a sub-process forked (Pharo processes
are, in fact, green threads) — and for the sake of simplicity and
without loss of generality, I start the story at
ClyMethodCodeEditorToolMorph>>#applyChanges.

`ClyMethodCodeEditorToolMorph>>#applyChanges`

Note: ClyMethodCodeEditorToolMorph>>#applyChanges
means the method applyChanges of the class
ClyMethodCodeEditorToolMorph. Where Cly stands
for Calypso, the name of the tool. And Morph
is the name of the low-level graphical toolkit currently used by Pharo.
So, basically, the current receiver of the method (self) is
a graphical window.

I do not show the full code of the method. The interesting statement
is:

selector := methodClass
    compile: self pendingText
    classified: editingMethod protocol
    notifying: textMorph.

that is a message send (method invocation) of the selector (method
name) compile:classified:notifying: because, in Pharo, and
in most other Smalltalk dialects, arguments can be syntactically placed
inside the name of the method to invoke.

The method asks the class to compile and install a new method.
Receiver and arguments are:

methodClass here the class Foo (instance
of Foo class subclass of ClassDescription that
implements the called method
compile:classified:notifying:)
self pendingText is the full source code (an instance
of the Text class).
editingMethod protocol is the selected protocol (group
of methods) to put the new method. It is nil here, so the
method might remain unclassified, not a big deal.
textMorph is the graphical component (widget) that
corresponds to the part of the tool that contains the source code
editor. Here, we have an instance of RubScrolledTextMorph
that is the common morph widget to represent an editable text area.

Now, why would the compiler need to know about some internal UI
component? Well, we shall see.

`ClassDescription>>#compile`

ClassDescription>>#compile:classified:notifying:
eventually calls
ClassDescription>>#compile:classified:withStamp:notifying:logSource:
that adds two new parameters:

changeStamp that is the current time and date (as a
String, not a DateAndTime)
logSource a Boolean flag set to true.

The important statement of this method is:

    method := self compiler
        source: text;
        requestor: requestor;
        failBlock:  [ ^nil ];
        compile.

Where

self compiler return a new compiler instance, already
configured to compile a method of the class Foo and with
the default environment (Smalltalk globals, the big
dictionary of global variables and constants of the system that,
especially, contains all the class names and their associated class
objects).
text the source code of the method to compile.
requestor the RubScrolledTextMorph
instance (the UI component).
[ ^nil ] the on error block, which the
compiler (or one of its minions) might use in case of a fatal error.
Note: passing blocks (somewhat equivalent to lambdas in other languages)
is a popular Pharo way to deal with error management. Here, evaluating
the block might unwind many methods in the call stack and forces the
method
ClassDescription>>#compile:classified:withStamp:notifying:logSource:
to return nil because ^ means
“return” (this one is called a “non-local return” in Pharo
parlance).
finally, compile that starts the real compilation
work.

`OpalCompiler>>#compile`

The Pharo compiler class is named OpalCompiler and the
invoked method is simply OpalCompiler>>#compile. Here
is the full body of the method:

compile
    ^[
        self parse.
        self semanticScope compileMethodFromASTBy: self
    ] on: SyntaxErrorNotification do: [ :exception |
            self compilationContext requestor
                ifNotNil: [
                        self compilationContext requestor
                            notify: exception errorMessage , ' ->'
                            at: exception location
                            in: exception errorCode.
                    ^ self compilationContext failBlock value ]
                ifNil: [ exception pass ]]

Wow. It’s scarier than it is.

^[ aaaa ] on: SyntaxErrorNotification do: [ :exception | bbbb ]
means return (^) the result of aaaa but if an
exception SyntaxErrorNotification occurs, return the result
of bbbb (where exception is the exception
object, : and | are simply the Pharo syntax
for block parameters. Exceptions are another popular Pharo way to deal
with error management.

Note: the name SyntaxErrorNotification hints that this
exception is special; it is a Notification. We discuss them
in a few sections. The management of syntax errors in Pharo also
deserves its own story (involving adventures, characters and plot
development).
The job of self parse is simple; it calls the
parser, does the semantic analysis and tries to produce a valid
annotated AST of the given source code, or might fail trying if there is
a syntax or a semantic error in the provided code.
self semanticScope compileMethodFromASTBy: self is
more straightforward than the statement suggests. It transforms the AST
into Pharo bytecode (maybe a story for another day) and produces the
result of the compilation as an instance of CompiledMethod.
CompiledMethod is a very important class, as its instances
are natively executable by the Pharo Virtual Machine.
self compilationContext requestor ifNotNil: is a
simple if that checks (when a
SyntaxErrorNotification occurs, since we are in the
do: block of the exception syntax) if the requestor is not
nil. Here the requestor is the
RubScrolledTextMorph object, so not nil. The method
RubScrolledTextMorph>>#notify:at:in: is called and is
used to present the error to the user.
Then self compilationContext failBlock value invokes
the failBlock (it is [ ^nil ] from the
previous section) that terminates the method invocation.

Here, we get part of the answer to our design question: The compiler
has the responsibility to explicitly call the text editor (if any) to
present an error message. It might not be the best design decision,
since it is difficult to argue that the compiler’s responsibility is to
notify UI components in case of errors. Especially here since there are
two levels of error management: an exception and a fail block that could
have been used by Calypso to manage errors and decide by
itself of its specific ways to report errors to the user.

We can also notice the string '->' that is
systematically concatenated at the end of the error message associated
with the caught exception. Why? Because Calypso, for historical reasons,
presents the error message as an insertion directly in the text area in
the editor, in front of the location of the error. For instance, the
syntax error in the code 1 + + 3 (we assume the 2 was
fumbled) appears as
1 + Variable or expression expected ->+ 3 in the
editor.

It’s a second bad design decision, as not only was the compiler
responsible for calling the editor, but it also made some presentation
decisions. In fact, the alternative code editor component, provided in
the Spec2-Code package, strips the ->
string before presenting the error in its own and less intrusive way.
See SpCodeInteractionModel>>#notify:at:in:.

`OCASTSemanticAnalyzer>>#undeclaredVariable:`

Now we enter the classical compilation frontend work: scanning
(lexical analysis, done by RBScanner), parsing (syntactic
analysis, done by RBParser) and finally the semantic
analysis (done by OCASTSemanticAnalyzer, the Opal Compiler
AST Semantic Analyzer).

Our input, the source code of the bar method, is quite
simple and everything is fine, except that, during the semantic
analysis, the variable name baz is analyzed by
OCASTSemanticAnalyzer>>#visitAssignmentNode: (as a
nice compiler, it processes its AST with visitors), that calls
OCASTSemanticAnalyzer>>#resolveVariableNode: but
which cannot resolve baz thus calls
OCASTSemanticAnalyzer>>#undeclaredVariable: whose
responsibility is to deal with the situation of undeclared
variables.

Note: resolving variables can be a complex task because, in Pharo,
methods and expressions can be used in various contexts with, sometimes,
particular rules. For instance, the playground (workspace) has some
specific variables lazily declared; and the debugger has to deal with
methods currently executed, thus runtime contexts (frames) that require
a non-trivial binding process. Under the hood, the requestor can also be
involved in such symbol resolution. However, I chose to skip this
complexity in this article.

Here is its source code of
OCASTSemanticAnalyzer>>#undeclaredVariable:

undeclaredVariable: variableNode
    compilationContext optionSkipSemanticWarnings
        ifTrue: [ ^UndeclaredVariable named: variableNode name asSymbol ].
    ^ OCUndeclaredVariableWarning new
        node: variableNode;
        compilationContext: compilationContext;
        signal

If we are in a specific mode optionSkipSemanticWarnings
then just resolve as a special undefined variable. Since it’s not the
case currently, I won’t give more detail (yet).

What follows is more interesting.

OCUndeclaredVariableWarning is a subclass of
Notification, a basic class of the kernel of the Pharo
language that is a subclass of Exception (the same kind of
exception we discussed in the previous section). Exceptions in Pharo
work more or less like what you get in many other programming languages.
You catch them with the on:do: method of blocks (that we
have already explained) and throw them with the signal
method.

What is noticeable here is the ^ (a return) in
front of the exception signalment. Notification is a
special kind of Exception that have the ability to be
resumed. Once resumed, the execution of the program continues after the
signal message send. The second special feature of
Notification is that when unhandled (no on:do:
catch them and the notification “goes through” the whole call stack)
then signal has no particular effect and just returns
nil. This is explicit in the method
Notification>>#defaultAction:

defaultAction
    "No action is taken. The value nil is returned as the value of
    the message that signaled the exception."

    ^nil

In summary, Notification instances are just
notifications; if nothing cares, then signal has no
effect.

Let’s go back to
OCASTSemanticAnalyzer>>#undeclaredVariable:. A
notification OCUndeclaredVariableWarning is signaled, and
if some method in the call stack cares and catches the notification, it
can choose to do something and possibly resume the execution with a
Variable object that shall be used to bind
baz.

Is this design decision sound? Let’s discuss this.

There are some drawbacks in the use of such notifications. First, the
link between the signaler
(OCASTSemanticAnalyzer>>#undeclaredVariable:) and the
potential catchers is indirect in the code: it is circumstantial.
Second, a given catcher might unwarily catch a notification it did not
expect (from another compiler, for instance), especially with
Notification because they are silent by default. But the
advantage is that some grandparent callers have more latitude to set up
the kind of execution environment it requires and deal with potential
notifications. We shall explore this possibility later.

An alternative design could be callback based: give the compiler some
objects to call when such decisions have to be made. It could be a block
(lambda) or, for instance, the requestor since we already have one. This
design has the advantage of making the subordination relationship more
obvious in the code, but it might require more management (to store and
pass objects around).

A part of another approach could be to have a set of alternative
behaviors in the compiler that can be activated or configured by the
client (with boolean flags, for instance) This offers a certain control
by the client (that sets up the configuration) and gives the
responsibility of implementing them to the compiler. The drawbacks are
that the effect of flags is limited and that the space of available
combinations on configuration can become large with possible complex
interactions or conflicts.

Another approach could be to silently use place-holded for the
variable of baz (let’s call it
UndeclaredVariable), then continue the compilation and
produce a CompiledMethod instance as the result of the
compile method. The caller is then free to inspect this
CompiledMethod instance, detect the presence of undeclared
variables, then choose to act. The obvious issue is that maybe the
compilation (including byte code generation) was just done for nothing,
wasting precious CPU time and Watts. The advantage is that the compiler
is simpler (no need to try to repair or even report errors) and that the
caller can easily manage multiple error conditions at the same time,
whereas the two other approaches basically impose the caller to solve
each error situation one by one.

Readers might look again at the
optionSkipSemanticWarnings at the beginning of the method
and realize that it feels like these two last alternatives are
implemented here. UndeclaredVariable are a real thing and,
for instance, are used when source codes are analysed for highlighting.
UndeclaredVariable are also used in two other cases:
package loading (because cycles in dependecies are hard) and code
invalidation (because you can always remove classes or instance
variable).

`OCUndeclaredVariableWarning>>#defaultAction`

So, since baz is not declared,
OCASTSemanticAnalyzer signals an
OCUndeclaredVariableWarning hoping that something can catch
it with the task to provide a Variable object to be bound
to the name baz.

But in the scenario, the notification is not caught by anyone. Is
nil associated with baz? This is not what we
need, nor
OCASTSemanticAnalyzer>>#resolveVariableNode: by the
way.

The answer is in
OCUndeclaredVariableWarning>>#defaultAction (see code
below) which overrides the default
Notification>>#defaultAction that is shown in the
previous section.

defaultAction
    | className selector |
    className := self methodClass name.
    selector := self methodNode selector.

    NewUndeclaredWarning signal: node name in: (selector
        ifNotNil: [className, '>>', selector]
            ifNil: ['<unknown>']).

    ^super defaultAction ifNil: [ self declareUndefined ]

The first part just creates a system notification. You can see them
in the Transcript (basically the system console of Pharo),
or in the standard output in command line mode (search them in the build
log produced by Jenkins, they are numerous, thus hard to miss).

The second part delegates to the superclass, and if the superclass
does not care, fallback to
OCUndeclaredVariableWarning>>#declareUndefined that
is:

declareUndefined
    ^UndeclaredVariable registeredWithName: node name

So an UndeclaredVariable object, shall make
OCASTSemanticAnalyzer happy since it is a very acceptable
thing to bind baz to.

`OCSemanticWarning>>#defaultAction`

The superclass of OCUndeclaredVariableWarning is
OCSemanticWarning, what does it offer?

defaultAction

    compilationContext interactive ifFalse: [ ^nil ].
    ^self openMenuIn:
        [:labels :lines :caption |
        UIManager default chooseFrom: labels lines: lines title: caption]

compilationContext interactive is true if
there is a requestor and is interactive, false otherwise.
Our requestor is still the instance of RubScrolledTextMorph
and is interactive, so we continue.
UIManager>>#chooseFrom:lines:title: is a standard
UI abstract method to pop up a selection window according to the current
system UI (here MorphicUIManager), or a launch a
command-line menu when in command line mode, or even produce a warning
and select the default when in non-interactive mode (asking for things
in non-interactive mode deserves a warning).

What is openMenuIn:? There are 3 implementations:

OCSemanticWarning>>#openMenuIn: (the method
introduction), that just call self subclassResponsibility.
This is the Pharo way to declare the method abstract (and signals an
error if executed).
OCShadowVariableWarning>>#openMenuIn: (a subclass
that is not part of the scenario), that just call
self error: 'should not be called' that also just signal an
error.
OCUndeclaredVariableWarning>>#openMenuIn:, a
large Pharo method of 55 lines that is discussed in the next
section.

What uses openMenuIn:? There are 2 senders:

OCSemanticWarning>>#defaultAction
(obviously),
OCUndeclaredVariableWarning>>#openMenuIn:. A
recursive call? We shall see.

This leads to some more questions:

Is it reasonable that the compiler cares about the interactiveness
of the requestor? Note that it could have been a recent addition since
most requestors are not aware of that part of the API. See
CompilationContext>>#interactive that uses the
questionable message respondTo:.
Why such polymorphism if there is only one effective implementation?
Code leftover? Future-proofing?
Why pass a block as an argument if no other sender exists? It seems
superfluous.
Is it the responsibility of a Notification object to
call UI with a menu?

In the next post, we will present the menu, do the reparation and try
to get out of here (the compiler is far away in the call stack) to
finish the compilation successfully.

Undeclared Variables

Part I – Falling Down the Rabbit Hole

ClyMethodCodeEditorToolMorph>>#applyChanges

ClassDescription>>#compile

OpalCompiler>>#compile

OCASTSemanticAnalyzer>>#undeclaredVariable:

OCUndeclaredVariableWarning>>#defaultAction

OCSemanticWarning>>#defaultAction

Share this:

Related

Published by Jean Privat