Undeclared Variable Reparation, An Epic Journey In a Compiler – Part II

Part II – The Return Journey

undeclvar

Welcome to the next step in the compiler journey.

As a simple recap, we were compiling baz := 42 in a
method bar, except that baz is not declared.
We are currently in
OCSemanticWarning>>#defaultAction, the default action
of an uncaught Notification, that is ready to open a graphical menu by
calling openMenuIn:.

OCUndeclaredVariableWarning>>#openMenuIn:

The method is long; let us review it in small pieces.

openMenuIn: aBlock
    | alternatives labels actions lines caption choice name interval requestor |

A bunch of temporary variables.

    "Turn off suggestions when in RubSmalltalkCommentMode
    This is a workaround, the plan is to not do this as part of the exception"
    requestor := compilationContext requestor.
    ((requestor class name = #RubEditingArea) and: [
        requestor editingMode class name = #RubSmalltalkCommentMode])
                    ifTrue: [ ^UndeclaredVariable named: node name ].

These are some type checks. Type checks are usually bad. Those are
bad.

They prevent the menu thing if the requestor is a
RubEditingArea in a RubSmalltalkCommentMode
“mode”. Where RubSmalltalkCommentMode is used?

  • By ClyRichTextClassCommentEditorToolMorph, which seems
    never used in the system (dead class?)
  • By RubEditingArea>>#beForSmalltalkComment, that
    is called only by FileList, a basic file explorer, but it’s
    not clear when or why the compiler is called by the file explorer, nor
    why a “workaround” is needed here (quite deep) in the compiler
    especially since it’s the only workaround of this type in the whole
    source code for RubEditingArea or
    RubSmalltalkCommentMode.

It could be just dead code, so less problematic: a dead workaround is
a less technical dept than a live one. It also illustrates why type
checks can be bad; it reverses the responsibility:
RubEditingArea and RubSmalltalkCommentMode
here are not involved at all in the workaround, so code evolution
related to one of these two classes might likely miss the present
hack.

Moreover, such a workaround is fragile. The compiler should not care
about specific clients, and especially not care about their names, and
should behave equitably. E.g. imagine renaming classes or using
subclasses of the blacklisted ones, they could likely pass the check and
cause really subtle bugs.

Anyway, let us continue with the method:

    interval := node sourceInterval.
    name := node name.
    alternatives := self possibleVariablesFor: name.
    labels := OrderedCollection new.
    actions := OrderedCollection new.
    lines := OrderedCollection new.

All those are the initialization of the temporary variables.

OCUndeclaredVariableWarning>>#possibleVariablesFor:
provides a list of existing names usable as a replacement (sorted from
the best match to the worst match). See
String>>#correctAgainst:continuedFrom: for the
details and the scoring system.

    name first isLowercase
        ifTrue: [
            labels add: 'Declare new temporary variable'.
            actions add: [ self declareTempAndPaste: name ].
            labels add: 'Declare new instance variable'.
            actions add: [ self declareInstVar: name ] ]

The two first items of the menu. If the name looks like a temporary
or an instance variable, because it starts with a lowercase letter, then
maybe the programmer wants a new temporary or instance variable?

Note that there are two parallel lists, one of the labels (shown in
the menu) and the other of actions (here blocks), that is evaluated if
the user chose the corresponding label.

In the scenario, the option
Declare new temporary variable is selected, so
self declareTempAndPaste: name is eventually called. We
detail it in the next section. For now, we continue to read the
method.

        ifFalse: [
            labels add: 'Leave variable undeclared'.
            actions add: [ self declareUndefined ].
            lines add: labels size.
            labels add: 'Define new class'.
            actions
                add: [
                    [ self defineClass: name ]
                        on: Abort
                        do: [ self openMenuIn: aBlock ] ].
            labels add: 'Declare new global'.
            actions add: [ self declareGlobal ].
            compilationContext requestor isScripting ifFalse:
                [labels add: 'Declare new class variable'.
                actions add: [ self declareClassVar ]].
            labels add: 'Define new trait'.
            actions
                add: [
                    [ self defineTrait: name ]
                        on: Abort
                        do: [ self openMenuIn: aBlock ] ] ].

For names that start with an uppercase, they look like global
variables, and that includes all the named classes, so the proposed
items in the menu are different. A first curiosity, there is the choice
to “leave variable undeclared” that is absent in the previous code
snippet. Another curiosity, defining a new class (or a new trait) opens
a new window, but if, for some reason, the entity creation fails or is
canceled, then a recursive call is used to open the same menu again.

    lines add: labels size.
    alternatives
        do: [ :each |
            labels add: each.
            actions
                add: [
                    ^self substituteVariable: each atInterval: interval ] ].
    lines add: labels size.
    labels add: 'Cancel'.
    caption := 'Unknown variable: ' , name , ' please correct, or cancel:'.

We have the addition of the possible variables (computed at the
beginning of the method), a cancel item, and the window title. The next
last two lines are the fun ones.

    choice := aBlock value: labels value: lines value: caption.
    ^choice ifNotNil: [ self resume: (actions at: choice ifAbsent: [ compilationContext failBlock value ]) value ]
  • aBlock is the parameter of the method, it was more
    than 50 lines ago, so we almost forgot about it. It is always
    [:labels :lines :caption | UIManager default chooseFrom: labels lines: lines title: caption]
    that just calls the UI and returns the index number of the selected item
    starting at 1 (or 0 if the cancel button is used).
  • The selector value:value:value: is used to evaluate
    the block with 3 supplied arguments (it is a Pharo thing, do not
    judge).
  • ^choice ifNotNil: ... returns nil if the choice is
    nil (unlikely according to the API of chooseFrom, but
    better safe than sorry). In the scenario, the first choice is selected
    (declare new temporary variable). Therefore choice is 1, which is not
    nil, so we look at the ifNotNil: part.
  • self resume: cause the signal to finish
    is execution with the given value. Hopefully, a Variable
    object to bind to baz (look back at the section
    OCASTSemanticAnalyzer>>#undeclaredVariable: if you
    need to see the original signal method invocation). Here,
    the call to resume feels superfluous as the result of the
    current method is used as the result of defaultAction that
    is used as the value of the automatic resume call performed
    on Notification objects (see
    UndefinedObject>>#handleSignal:).
  • actions at: choice return the action (the block)
    associated to the corresponding choice number. Ordered collections in
    Pharo are 1-based; therefore 1 is the first block action. Here, the
    block [ self declareTempAndPaste: name ].
  • ifAbsent: is for what to do when there is no
    corresponding action for the given choice. This happens when the user
    chooses the cancel button (no action for 0) or chooses the cancel item
    (no action for 3 in our scenario).
  • compilationContext failBlock value is therefore
    executed on a “cancel”. It evaluates the failBlock that, in the
    scenario, comes from the ClassDescription>>#compile
    method and contains [ ^ nil ] (a non-local return).
    Evaluating this failBlock cause the unwinding of many methods in the
    call stack (something around 30 or 40 frames) and the return of the
    ClassDescription>>#compile method with nil.

    Note that there is a potential weakness here if the failBlock does
    not perform a non-local return, then the result of the block evaluation
    is used as the return of openMenuIn: and eventually used as
    a Variable object to bind baz to. Callers of
    the compiler might forget to do that and just provide
    [nil], for instance (without a ^).

  • value evaluates the action block (since it exists in
    the list), that has the responsibility to provide a
    Variable instance.

OCUndeclaredVariableWarning>>#declareTempAndPaste:

We selected “declare new temporary variable” in the menu, thus
executing this method. We’ll cover this large method (35 lines) piece by
piece.

declareTempAndPaste: name
    | insertion delta theTextString characterBeforeMark tempsMark newMethodNode |

Some temporary variables.

    "Below we are getting the text that is actually seen in the morph.
     This is rather ugly. Maybe there is a better way to do this."
    theTextString := self requestor textMorph editor paragraph text.

Indeed, this is rather ugly. This leads to many
questions:

  • Why is the text (source code) of the method bar
    needed?
  • Why does it assume that the requestor has a textMorph
    method?
  • Why ask for something so deep? Demeter is likely rolling over in its
    grave (it’s a joke on the Law of Demeter. Demeter is not dead
    and is not even a person, it was a project named after the Greek goddess
    of Agriculture).
  • Why? We are still in a (deep) part of the compiler,
    self should have a better way to get the source code
    currently compiled.
    "We parse again the method displayed in the morph.
     The variable methodNode has the first version of the method,
     without temporary declarations. "
    newMethodNode := RBParser parseMethod: theTextString.

Let us take a breath.

We are doing a semantic analysis on an already parsed source code of
a method bar trying to get a variable to bind to
baz. And we parse the full source code again? Don’t we have
it? Just call self node methodNode or something?

The hint might be “without temporary declarations” from the
comment. Does that mean we do not trust the current AST to be genuine?
Why? Maybe the previous interactive code error reparation changed the
current AST? Is this actually true in some possible scenarios? Is this
just leftover code?

Let us just continue… we must continue…

    "We check if there is a declaration of temporary variables"
    tempsMark :=  newMethodNode body  rightBar ifNil: [ self methodNode body start ].

It’s getting warm here, isn’t it?

  • newMethodNode body rightBar gets the position (an
    integer) of the closing | character of the temporary
    variable declaration syntax, or nil if there is no temporary variable
    declared (like in the current scenario) an AST is useful for this task
    since it knows which part of the source code is really a block of
    temporary variable declarations.
  • self methodNode body start is the position (an integer)
    of the beginning of the main body of the method, that position is
    therefore used when there are no declarations of temporary
    variables.
    characterBeforeMark := theTextString at: tempsMark-1 ifAbsent: [$ ].

gets the character before the closing | or before the
main body. The ifAbsent might only occur if the source code
is empty, and the compiler let us progress until here because an empty
method is a syntax error, a method name is minimally needed (the name
(selector) and parameters are called the “method pattern” in Pharo
parlance). But better safe than sorry.

    (theTextString at: tempsMark) = $| ifTrue:  [
        "Paste it before the second vertical bar"
        insertion := name, ' '.

        characterBeforeMark isSeparator ifFalse: [ insertion := ' ', insertion].
        delta := 0.
    ] ifFalse: [

Some temporary variables are declared, and we want to add the new
variable after the last one. The code mainly manages spacing to avoid
concatenating the new variable and a previous one, or injecting
superfluous spaces.

In our scenario, there is no temporary variable (yet), so the
ifFalse: part interests us more.

        "No bars - insert some with CR, tab"
        insertion := '| ' , name , ' |',String cr.
        delta := 2. "the bar and CR"
        characterBeforeMark = Character tab ifTrue: [
            insertion := insertion , String tab.
            delta := delta + 1. "the tab" ]
        ].

Here we prepare the text to insert in the source code and compute a
delta thing, we’ll discuss that later. The code tries to
care about preserving the indentation, if any.

    tempsMark := tempsMark +
        (self substituteWord: insertion
            wordInterval: (tempsMark to: tempsMark-1)
            offset: 0) - delta.

Err… it’s getting cold here, isn’t it?

self substituteWord: insertion wordInterval: (tempsMark to: tempsMark-1) offset: 0
asks to insert the new string in the source code (because the interval
is empty, it is an insertion and not a replacement).

How does
OCUndeclaredVariableWarning>>#substituteWord:wordInterval:offset:
do that? By simply calling #correctFrom:to:with: on the
requestor and doing some math, then doing more math to update
tempsMark.

    " we can not guess at this point where the tempvar should be stored,
    tempvars vs. tempvector therefore -> reparse"
    (ReparseAfterSourceEditing new newSource: self requestor text) signal

And it is the end of the method. I think it’s getting humid here,
isn’t it?

The code is altered. We have no idea what really happened, there is a
new source code in town. The full AST might need to be rebuilt as there
are new potential AST nodes. The semantic analysis might need to be
redone, as the new temporary variable might conflict with other
variables declared further in the code. So at this point, it seems
better to just call it a day and run the compilation again.

It’s the point of the ReparseAfterSourceEditing class
that is a subclass of Notification (we are now experts in
notifications and no afraid no more of them!).

There are still some questions about the behavior of the program and
some of its design decisions:

  • The math thing about offset, delta, and
    tempsMark update is completely unused. Possible leftover of
    previously removed code.
  • Where does the signal on ReparseAfterSourceEditing
    go?
  • Why the new source code should be passed around in the notification?
    We did already update it in the requester.
  • Why does the ugly (indeed)
    self requestor textMorph editor paragraph text at the
    beginning of the method exists, since apparently
    self requestor text give the same damn source code (while
    not ideal, it is still better).
  • What happens when the notification is resumed? The point of a
    notification is to be resumable. Here it clearly appears that such an
    endeavor is not supposed to happen.
  • Why so much coupling?
  • Why so little cohesion?
  • And more specifically, why it is the job of
    OCUndeclaredVariableWarning to perform this menu and string
    based code reparation and hijack the requestor as if there were friends
    in some abusive relationship? Shouldn’t a notification be just a means
    of sending some kind of signal to the previous method in the call
    stack?

There are a lot of symptoms of schizophrenia in the responsibilities
here.

OpalCompiler>>#parse

Where does ReparseAfterSourceEditing go? I
(intentionally) skipped some steps between
OpalCompiler>>#compile and
OCASTSemanticAnalyzer>>#undeclaredVariable:.

Here is the source code of
OpalCompiler>>#parse:

parse
    | parser |
    [
        parser := self createParser.
        ast := self semanticScope parseASTBy: parser.

        ast methodNode compilationContext: self compilationContext.
        self callParsePlugins.
        self doSemanticAnalysis ]
    on: ReparseAfterSourceEditing do: [:notification |
            self source: notification newSource.
            notification retry ].

    ^ ast

The job of this method (as explained in a previous section) is to do
the frontend part of the compilation and produce a fully annotated and
analyzed AST of the method so that (virtual) machine code can be
generated. The content of the method is mostly straightforward.

What is interesting is the on:do: method call used for
exception (and thus, notification) handling. When a
ReparseAfterSourceEditing is intercepted, we update the
source code and run the protected block from the beginning again (see
Exception>>#retry).

Some potential exits of this loop are:

  • The source code is good enough and no more
    ReparseAfterSourceEditing are signaled.
  • The source code is bad enough that either the failBlock
    is invoked, or another unrecoverable exception occurs. Remember
    SyntaxErrorNotification in
    OpalCompiler>>#compile, for instance.
  • Someone is tired enough and terminates the process.

The first alternative is what happens in our scenario:

  • A declaration of the temporary variable baz is added to
    the source code.
  • The content editor window is updated to reflect that.
  • The new code source is fully parsed and correctly analyzed and a
    legitimate CompiledMethod is produced.

What a happy ending!

Published by Jean Privat

Professor at Université du Québec à Montréal. I like programming languages, compilers, and cybersecurity.

%d bloggers like this: