Undeclared Variable Reparation, An Epic Journey In a Compiler – Part III

Part III –
More fun with OCUndeclaredVariableWarning

Foo is a nice class, but could it become nicer if we
added an instance variable?

Let us click on the Foo tab of the Calypso window. We
get some code that declares the class.

Object subclass: #Foo
    instanceVariableNames: ''
    classVariableNames: ''
    package: ''

It is a legal Pharo expression that creates (or updates) a subclass
(named Foo) of the Object class (that is the
almost root of the class hierarchy). Creating or updating the
class (accept with Ctrl-S) simply evaluates the expression.
This is neat.

Unrelated note: there is a small check box Fluid in the
bottom right corner that switches to the modern fluid class
syntax.

Object << #Foo
    slots: {};
    package: ''

It is just a different syntax for almost the same stuff. It is still
neat, although a little less neat because while the syntax is better, it
is no more a sufficient expression to declare or update classes.

The thing is that to evaluate some random source code, we need to
compile it first. The evaluation of the class definition is done by
either
ClySystemEnvironment>>#defineNewClassFrom:notifying:startingFrom:
or
ClySystemEnvironment>>#defineNewFluidClassOrTraitFrom:notifying:startingFrom:
according to the fluid check box.

Both methods are really similar (and could be factorized), except
that one warned the developer in a comment about for now, a super
ugly patch
, so we shall look at the other one.

But let us play with the compiler a little without looking at the
code yet.

Object subclass: #Foo
    instanceVariableNames: +''
    classVariableNames: ''
    package: ''

I added a syntax error since + is a binary operator and
the left operand is missing. We evaluate (accept with Ctrl-S), and we
get the following.

class-err1

Object subclass: #Foo
    instanceVariableNames:  Variable or expression expected ->+''
    classVariableNames: ''
    package: ''

The -> way to present syntax error is very old school
(and I hate it) but that is not the point here. The point is that the
compilation process behaved in a sane and expected way:

  • OpalCompiler>>#compile is somewhat executed (cf
    the previous section for the content of the method);
  • it tries to parse, but a syntax error is detected, so an exception
    SyntaxErrorNotification is signaled;
  • OpalCompiler>>#compile catch the exception;
  • notify:at:in: is sent to the requestor (that injects
    the error message in the source code);
  • the failBlock is executed that terminates the method
    ClySystemEnvironment>>#defineNewClassFrom:notifying:startingFrom:.

Let us try with something else:

Object subclass: #Foo
    instanceVariableNames: baz
    classVariableNames: ''
    package: ''

The source code is an expression, and in expressions we can use
variables. Except that here baz is an undeclared variable,
so what happens when we evaluate?

The code is updated to:

class-err2

Object  Variable or expression expected ->subclass: #Foo
    instanceVariableNames: baz
    classVariableNames: ''
    package: ''

Wow. This is bad, and wrong, and bad again.

  • An error message is baldy placed and wrong and unrelated to
    baz.
  • There is no menu asking us what to do with the undeclared variable
    baz like we saw inside a method.
  • Since I work with a small screen, I also noticed an ephemeral
    notification popup box in the bottom left corner of the screen
    (poetically called a growl in Morphic) that stated the truthful
    information “Undeclared Variable in Class Definition”. Such
    growls are usually displayed by invoking the method
    Object>>#inform:.

ClySystemEnvironment>>#defineNewClassFrom:notifying:startingFrom:

Here is the code source of the method:

defineNewClassFrom: newClassDefinitionString notifying: aController startingFrom: oldClass

    "Precondition: newClassDefinitionString is not a fluid class"

    | newClass |
    [
    newClass := (self classCompilerFor: oldClass)
                    source: newClassDefinitionString;
                    requestor: aController;
                    failBlock: [ ^ nil ];
                    logged: true;
                    evaluate ]
        on: OCUndeclaredVariableWarning
        do: [ :ex | "we are only interested in class definitions"
            ex compilationContext noPattern ifFalse: [ ex pass ].
            "Undeclared Vars should not lead to the standard dialog to define them but instead should not accept"
            self inform: 'Undeclared Variable in Class Definition'.
            ^ nil ].

    ^ newClass isBehavior
          ifTrue: [ newClass ]
          ifFalse: [ nil ]

I won’t dive into all the details of this one. What is interesting is
the on: OCUndeclaredVariableWarning do: part that
intercepts the notification, thus preventing it from reaching the end of
the call stack, thus preventing it from executing its default action,
thus preventing it from displaying a menu, thus preventing the user to
repair the code having a semantic error.

Here we can see how it is possible to intercept the default error
reparation mechanism in case of undeclared variables in a specific
context where a temporary or an instance variable does not make much
sense.

What behavior do we get instead?

  • ex compilationContext noPattern ifFalse: [ ex pass ].
    if noPattern is false (double negation isn’t not bad) then
    process the notification anyway. Except that, here,
    noPattern isn’t unlikely to not be not true (nested
    negations are annoying, aren’t they?) because it is what distinguishes
    the compilation of an expression from the compilation of a method
    definition: a method definition starts with a name (and potential
    arguments) that is called the method pattern. But the first
    statement of the OpalCompiler>>#evaluate method that
    is called it to override noPattern with true, because one
    can only evaluate expressions, not method definitions.
  • self inform: 'Undeclared Variable in Class Definition'
    is responsible for the growl we get.

    You know what I hate? The -> error message
    insertions. You know what I hate more? Inconsistencies. Here, the source
    code error is reported as a (missable) popup with poor information,
    whereas all other errors are reported with ->.

Nevertheless, is the design legitimate? I’m not a fan of exceptions,
they make code comprehension harder and, in my humble opinion, should be
used with great reserve. Here there is also some breach of
encapsulation. But this is debatable. What is less debatable is that the
whole reparation of undeclared variables is bypassed completely,
including some legitimate needs.

For instance, we get no help if we try to evaluate
(Ctrl-S accept) Objectt subclass: #Bar, which
contains too much t in the identifier of the superclass
Object. All we get is an unhelpful growl and the same
wrongful -> error message insertion. As a matter of
fact, as Pharo users, we can bypass the accept (Ctrl-S) behavior and
just evaluate the code in place by selecting all the text (Ctrl-A) then
Doing It (Ctrl-D). The DoIt simply evaluates the
selected source code without (too much) hacking. So there is no
intercepting OCUndeclaredVariableWarning for instance, and
we get the menu “Unknown variable: Objectt please correct, or
cancel
” that presents Object (with a correct amount of
t) in the list of choice. We can select it and we get a
successful class definition and a new available class
Bar.

But why does the code signal a missable popup instead of a classic
text error insertion? Maybe because the
OCUndeclaredVariableWarning that is caught might not come
from the class definition syntax. When a random string is evaluated, it
can do a lot of things, like signaling exceptions. Unfortunately, a
broad error handling mechanism like exceptions has no way to distinguish
exceptions that come from the analysis of the code (syntactic and
semantic error) from the ones that come from the proper evaluation.

And that happens frequently. If you remove an instance variable
(attribute) from a class definition, then accept the new definition, the
system will recompile all the methods of this class. Methods that use
the removed instance variable will also be compiled and signal
OCUndeclaredVariableWarning. That exception will be caught
and growl “Undeclared Variable in Class Definition”. Note that
the message is misleading since the undeclared variable is not in the
class definition. So maybe it was not the original intention of the
growl and was just a random inconsistency.

Let us discuss the last statement of the method. It is a sanity
check. Because the initial class definition expression could have been
heavily edited by the programmer and replaced by anything else, the
method checks that the final result of the evaluation is a class-like
object. Otherwise, nil is returned.

ClyClassDefinitionEditorToolMorph>>#applyChangesAsClassDefinition

OK, we have an explanation for the absence of the menu, and an
explanation for the presence of the growl, but noting here is related to
the wrong -> syntax error insertion. Where does this one
come from?

First, there is no syntax error in the expression (it is a lie!). The
compiler manages to signal an OCUndeclaredVariableWarning
notification (the growl is the proof!) launched by the
OCASTSemanticAnalyzer, meaning that the parser can produce
an AST and not find any syntax error.

So, what is the deal?

  • We are at
    ClySystemEnvironment>>#defineNewClassFrom:notifying:startingFrom:,
  • that is called by
    ClySystemEnvironment>>#compileANewClassFrom:notifying:startingFrom:,
  • that is called by
    ClyFullBrowserMorph>>#compileANewClassFrom:notifying:startingFrom:,
  • that is called by
    ClyClassDefinitionEditorToolMorph>>#applyChangesAsClassDefinition,
  • that goes like this:
applyChangesAsClassDefinition

    | newClass oldText |
    oldText := self pendingText copy.
    newClass := browser
                    compileANewClassFrom: self pendingText asString
                    notifying: textMorph
                    startingFrom: editingClass.

    "This was indeed a class, however, there is a syntax error somewhere"
    textMorph text = oldText ifFalse: [ ^ true ].

    newClass ifNil: [ ^ false ].

    editingClass == newClass ifFalse: [ self removeFromBrowser ].
    browser selectClass: newClass.
    ^ true

We see the invocation of the compilation (the
newClass := browser compileANewClassFrom: thing). Because
it failed, newClass is nil (instead of a class).

What follows is interesting:
textMorph text = oldText ifFalse: [ ^ true ]. This states
that if the text in the code editor was changed during the compilation,
then there was an error. Interesting and sooo wrooong on sooo many
leveeels:

  • Detecting syntax error should not be done thanks to string
    comparison.
  • The method should not assume that error reporting to the user
    changed the source code (even if it is the old school way and is the
    active tradition of the present and previous millennium). For instance,
    instead of an ugly -> something might have preferred to
    display a growl (even if inconsistencies are bad, and I hate them, here
    the culprit is not the inconsistency).
  • Code change might be related to some code reparation that fixed an
    error, so exactly the opposite of an error.

ClyClassDefinitionEditorToolMorph>>#applyChanges

But wait, there is more, because newClass is nil, the
method returns false to its caller, that is
ClyClassDefinitionEditorToolMorph>>#applyChanges and
is defined by:

applyChanges

    | text |
    text := self pendingText copy.
    ^ self applyChangesAsClassDefinition or: [
          self pendingText: text.
          self applyChangesAsMethodDefinition ]

What the heck is that? I do not even understand what is the objective
of this thing! The defining class is
ClyClassDefinitionEditorToolMorph that is the widget whose
sole job as a text editor is to define new classes and to update
existing classes. And the Pharo way to do that is by evaluating
expressions that define or update classes. It seems to be an easy job
that even an unaware DoIt action can manage
successfully.

So what is this insane method doing:

  • saves the source code’s content (to avoid code change in the editor
    due to syntax error or code reparation);
  • tries to evaluate as a class definition (an expression);
  • if the result is false (and it is, in our case), then
    tries to compile the original pristine source code as a method
    definition.

OK.

Let’s just do that. Remember that the class definition we try to
process is:

Object subclass: #Foo
    instanceVariableNames: baz
    classVariableNames: ''
    package: ''

Let us try to parse this source code as a method definition instead
of as an expression.

  • A method starts with a method pattern that can be many
    things, but for simple unary methods, they are simple and plain
    identifiers. Do we have a simple and plain identifier? Yes, the token
    Object (RBIdentifierToken).
  • A method then have a body, with statements, that usually start with
    an expression. What follows is the token subclass:
    (RBKeywordToken) which is not the beginning or any correct
    expression. But the parser wants an expression right now! So it
    reports the error
    Variable or expression expected ->subclass:.

It’s the beauty of computer science. Whatever insane behavior we
might witness, there is always a rational explanation.

A Final Experiment

Can we bypass the bypass of OCUndeclaredVariableWarning
with the undefined variable baz? Let’s try the
DoIt way, it made wonder with the superfluous t of
Objectt some sections ago.

  • Select the full text (Ctrl-A);
  • Do It (Ctrl-D);
  • The menu “Unknown variable: baz please correct, or cancel
    appears and proposes: a new temporary variable, a new instance variable
    or to cancel;
  • Chose the “temporary variable”;
  • A debugger window appears: “Instance of ClyTextEditingMode did
    not understand #textMorph
    ”. What?
  • The error is caused by the line
    theTextString := self requestor textMorph editor paragraph text.
    of
    OCUndeclaredVariableWarning>>#declareTempAndPaste:.
    We discussed this (rather ugly) line in a previous section, stating that
    it is fragile. What a coincidence

Conclusion and Perspective

During this exploration of the
OCUndeclaredVariableWarning we discovered a lot of classes
and methods with a very variable quality of code and design. Obviously,
the present article focuses on the discussable parts that could be
improved, because it forces us to understand why things are bad, and how
they could be improved. It is also fun to see concrete effects of how
things can go bad when software design is not as tidy as it should
be.

Pharo is a wonderful dynamically typed programming language with
great features, abstractions and powerful semantics. And with great
power comes great responsibility.

An example could be the requestor thing. Adding a dependency between
UI and the compiler work is enough to raise some eyebrows (independently
of the programming language or its paradigm). But in Pharo this
dependency appears as an unwritten API (orality-based API?) with some
inconsistent or fragile hacks: notify:at:in:,
Object>>#bindingOf:,
requestor respondsTo: #interactive,
requestor textMorph,
requestor class name = #RubEditingArea, etc. This also
causes subtitle breakages when someone tries to fix things, breakages
that are often hard to catch because, for instance, they could be only
related to untested or rare UI interactions.

A lot of change is currently ongoing for Pharo 12 on the compiler. We
are in the early part of its development cycle, so it’s the best moment
to try large and disruptive hacks. At the time of publishing, most of
the design issues discussed here are already fixed. But they represent a
specific use case, and a lot of work is still needed. The full
meta-issue is available at
https://github.com/pharo-project/pharo/issues/12883.

Undeclared Variable Reparation, An Epic Journey In a Compiler – Part II

Part II – The Return Journey

undeclvar

Welcome to the next step in the compiler journey.

As a simple recap, we were compiling baz := 42 in a
method bar, except that baz is not declared.
We are currently in
OCSemanticWarning>>#defaultAction, the default action
of an uncaught Notification, that is ready to open a graphical menu by
calling openMenuIn:.

OCUndeclaredVariableWarning>>#openMenuIn:

The method is long; let us review it in small pieces.

openMenuIn: aBlock
    | alternatives labels actions lines caption choice name interval requestor |

A bunch of temporary variables.

    "Turn off suggestions when in RubSmalltalkCommentMode
    This is a workaround, the plan is to not do this as part of the exception"
    requestor := compilationContext requestor.
    ((requestor class name = #RubEditingArea) and: [
        requestor editingMode class name = #RubSmalltalkCommentMode])
                    ifTrue: [ ^UndeclaredVariable named: node name ].

These are some type checks. Type checks are usually bad. Those are
bad.

They prevent the menu thing if the requestor is a
RubEditingArea in a RubSmalltalkCommentMode
“mode”. Where RubSmalltalkCommentMode is used?

  • By ClyRichTextClassCommentEditorToolMorph, which seems
    never used in the system (dead class?)
  • By RubEditingArea>>#beForSmalltalkComment, that
    is called only by FileList, a basic file explorer, but it’s
    not clear when or why the compiler is called by the file explorer, nor
    why a “workaround” is needed here (quite deep) in the compiler
    especially since it’s the only workaround of this type in the whole
    source code for RubEditingArea or
    RubSmalltalkCommentMode.

It could be just dead code, so less problematic: a dead workaround is
a less technical dept than a live one. It also illustrates why type
checks can be bad; it reverses the responsibility:
RubEditingArea and RubSmalltalkCommentMode
here are not involved at all in the workaround, so code evolution
related to one of these two classes might likely miss the present
hack.

Moreover, such a workaround is fragile. The compiler should not care
about specific clients, and especially not care about their names, and
should behave equitably. E.g. imagine renaming classes or using
subclasses of the blacklisted ones, they could likely pass the check and
cause really subtle bugs.

Anyway, let us continue with the method:

    interval := node sourceInterval.
    name := node name.
    alternatives := self possibleVariablesFor: name.
    labels := OrderedCollection new.
    actions := OrderedCollection new.
    lines := OrderedCollection new.

All those are the initialization of the temporary variables.

OCUndeclaredVariableWarning>>#possibleVariablesFor:
provides a list of existing names usable as a replacement (sorted from
the best match to the worst match). See
String>>#correctAgainst:continuedFrom: for the
details and the scoring system.

    name first isLowercase
        ifTrue: [
            labels add: 'Declare new temporary variable'.
            actions add: [ self declareTempAndPaste: name ].
            labels add: 'Declare new instance variable'.
            actions add: [ self declareInstVar: name ] ]

The two first items of the menu. If the name looks like a temporary
or an instance variable, because it starts with a lowercase letter, then
maybe the programmer wants a new temporary or instance variable?

Note that there are two parallel lists, one of the labels (shown in
the menu) and the other of actions (here blocks), that is evaluated if
the user chose the corresponding label.

In the scenario, the option
Declare new temporary variable is selected, so
self declareTempAndPaste: name is eventually called. We
detail it in the next section. For now, we continue to read the
method.

        ifFalse: [
            labels add: 'Leave variable undeclared'.
            actions add: [ self declareUndefined ].
            lines add: labels size.
            labels add: 'Define new class'.
            actions
                add: [
                    [ self defineClass: name ]
                        on: Abort
                        do: [ self openMenuIn: aBlock ] ].
            labels add: 'Declare new global'.
            actions add: [ self declareGlobal ].
            compilationContext requestor isScripting ifFalse:
                [labels add: 'Declare new class variable'.
                actions add: [ self declareClassVar ]].
            labels add: 'Define new trait'.
            actions
                add: [
                    [ self defineTrait: name ]
                        on: Abort
                        do: [ self openMenuIn: aBlock ] ] ].

For names that start with an uppercase, they look like global
variables, and that includes all the named classes, so the proposed
items in the menu are different. A first curiosity, there is the choice
to “leave variable undeclared” that is absent in the previous code
snippet. Another curiosity, defining a new class (or a new trait) opens
a new window, but if, for some reason, the entity creation fails or is
canceled, then a recursive call is used to open the same menu again.

    lines add: labels size.
    alternatives
        do: [ :each |
            labels add: each.
            actions
                add: [
                    ^self substituteVariable: each atInterval: interval ] ].
    lines add: labels size.
    labels add: 'Cancel'.
    caption := 'Unknown variable: ' , name , ' please correct, or cancel:'.

We have the addition of the possible variables (computed at the
beginning of the method), a cancel item, and the window title. The next
last two lines are the fun ones.

    choice := aBlock value: labels value: lines value: caption.
    ^choice ifNotNil: [ self resume: (actions at: choice ifAbsent: [ compilationContext failBlock value ]) value ]
  • aBlock is the parameter of the method, it was more
    than 50 lines ago, so we almost forgot about it. It is always
    [:labels :lines :caption | UIManager default chooseFrom: labels lines: lines title: caption]
    that just calls the UI and returns the index number of the selected item
    starting at 1 (or 0 if the cancel button is used).
  • The selector value:value:value: is used to evaluate
    the block with 3 supplied arguments (it is a Pharo thing, do not
    judge).
  • ^choice ifNotNil: ... returns nil if the choice is
    nil (unlikely according to the API of chooseFrom, but
    better safe than sorry). In the scenario, the first choice is selected
    (declare new temporary variable). Therefore choice is 1, which is not
    nil, so we look at the ifNotNil: part.
  • self resume: cause the signal to finish
    is execution with the given value. Hopefully, a Variable
    object to bind to baz (look back at the section
    OCASTSemanticAnalyzer>>#undeclaredVariable: if you
    need to see the original signal method invocation). Here,
    the call to resume feels superfluous as the result of the
    current method is used as the result of defaultAction that
    is used as the value of the automatic resume call performed
    on Notification objects (see
    UndefinedObject>>#handleSignal:).
  • actions at: choice return the action (the block)
    associated to the corresponding choice number. Ordered collections in
    Pharo are 1-based; therefore 1 is the first block action. Here, the
    block [ self declareTempAndPaste: name ].
  • ifAbsent: is for what to do when there is no
    corresponding action for the given choice. This happens when the user
    chooses the cancel button (no action for 0) or chooses the cancel item
    (no action for 3 in our scenario).
  • compilationContext failBlock value is therefore
    executed on a “cancel”. It evaluates the failBlock that, in the
    scenario, comes from the ClassDescription>>#compile
    method and contains [ ^ nil ] (a non-local return).
    Evaluating this failBlock cause the unwinding of many methods in the
    call stack (something around 30 or 40 frames) and the return of the
    ClassDescription>>#compile method with nil.

    Note that there is a potential weakness here if the failBlock does
    not perform a non-local return, then the result of the block evaluation
    is used as the return of openMenuIn: and eventually used as
    a Variable object to bind baz to. Callers of
    the compiler might forget to do that and just provide
    [nil], for instance (without a ^).

  • value evaluates the action block (since it exists in
    the list), that has the responsibility to provide a
    Variable instance.

OCUndeclaredVariableWarning>>#declareTempAndPaste:

We selected “declare new temporary variable” in the menu, thus
executing this method. We’ll cover this large method (35 lines) piece by
piece.

declareTempAndPaste: name
    | insertion delta theTextString characterBeforeMark tempsMark newMethodNode |

Some temporary variables.

    "Below we are getting the text that is actually seen in the morph.
     This is rather ugly. Maybe there is a better way to do this."
    theTextString := self requestor textMorph editor paragraph text.

Indeed, this is rather ugly. This leads to many
questions:

  • Why is the text (source code) of the method bar
    needed?
  • Why does it assume that the requestor has a textMorph
    method?
  • Why ask for something so deep? Demeter is likely rolling over in its
    grave (it’s a joke on the Law of Demeter. Demeter is not dead
    and is not even a person, it was a project named after the Greek goddess
    of Agriculture).
  • Why? We are still in a (deep) part of the compiler,
    self should have a better way to get the source code
    currently compiled.
    "We parse again the method displayed in the morph.
     The variable methodNode has the first version of the method,
     without temporary declarations. "
    newMethodNode := RBParser parseMethod: theTextString.

Let us take a breath.

We are doing a semantic analysis on an already parsed source code of
a method bar trying to get a variable to bind to
baz. And we parse the full source code again? Don’t we have
it? Just call self node methodNode or something?

The hint might be “without temporary declarations” from the
comment. Does that mean we do not trust the current AST to be genuine?
Why? Maybe the previous interactive code error reparation changed the
current AST? Is this actually true in some possible scenarios? Is this
just leftover code?

Let us just continue… we must continue…

    "We check if there is a declaration of temporary variables"
    tempsMark :=  newMethodNode body  rightBar ifNil: [ self methodNode body start ].

It’s getting warm here, isn’t it?

  • newMethodNode body rightBar gets the position (an
    integer) of the closing | character of the temporary
    variable declaration syntax, or nil if there is no temporary variable
    declared (like in the current scenario) an AST is useful for this task
    since it knows which part of the source code is really a block of
    temporary variable declarations.
  • self methodNode body start is the position (an integer)
    of the beginning of the main body of the method, that position is
    therefore used when there are no declarations of temporary
    variables.
    characterBeforeMark := theTextString at: tempsMark-1 ifAbsent: [$ ].

gets the character before the closing | or before the
main body. The ifAbsent might only occur if the source code
is empty, and the compiler let us progress until here because an empty
method is a syntax error, a method name is minimally needed (the name
(selector) and parameters are called the “method pattern” in Pharo
parlance). But better safe than sorry.

    (theTextString at: tempsMark) = $| ifTrue:  [
        "Paste it before the second vertical bar"
        insertion := name, ' '.

        characterBeforeMark isSeparator ifFalse: [ insertion := ' ', insertion].
        delta := 0.
    ] ifFalse: [

Some temporary variables are declared, and we want to add the new
variable after the last one. The code mainly manages spacing to avoid
concatenating the new variable and a previous one, or injecting
superfluous spaces.

In our scenario, there is no temporary variable (yet), so the
ifFalse: part interests us more.

        "No bars - insert some with CR, tab"
        insertion := '| ' , name , ' |',String cr.
        delta := 2. "the bar and CR"
        characterBeforeMark = Character tab ifTrue: [
            insertion := insertion , String tab.
            delta := delta + 1. "the tab" ]
        ].

Here we prepare the text to insert in the source code and compute a
delta thing, we’ll discuss that later. The code tries to
care about preserving the indentation, if any.

    tempsMark := tempsMark +
        (self substituteWord: insertion
            wordInterval: (tempsMark to: tempsMark-1)
            offset: 0) - delta.

Err… it’s getting cold here, isn’t it?

self substituteWord: insertion wordInterval: (tempsMark to: tempsMark-1) offset: 0
asks to insert the new string in the source code (because the interval
is empty, it is an insertion and not a replacement).

How does
OCUndeclaredVariableWarning>>#substituteWord:wordInterval:offset:
do that? By simply calling #correctFrom:to:with: on the
requestor and doing some math, then doing more math to update
tempsMark.

    " we can not guess at this point where the tempvar should be stored,
    tempvars vs. tempvector therefore -> reparse"
    (ReparseAfterSourceEditing new newSource: self requestor text) signal

And it is the end of the method. I think it’s getting humid here,
isn’t it?

The code is altered. We have no idea what really happened, there is a
new source code in town. The full AST might need to be rebuilt as there
are new potential AST nodes. The semantic analysis might need to be
redone, as the new temporary variable might conflict with other
variables declared further in the code. So at this point, it seems
better to just call it a day and run the compilation again.

It’s the point of the ReparseAfterSourceEditing class
that is a subclass of Notification (we are now experts in
notifications and no afraid no more of them!).

There are still some questions about the behavior of the program and
some of its design decisions:

  • The math thing about offset, delta, and
    tempsMark update is completely unused. Possible leftover of
    previously removed code.
  • Where does the signal on ReparseAfterSourceEditing
    go?
  • Why the new source code should be passed around in the notification?
    We did already update it in the requester.
  • Why does the ugly (indeed)
    self requestor textMorph editor paragraph text at the
    beginning of the method exists, since apparently
    self requestor text give the same damn source code (while
    not ideal, it is still better).
  • What happens when the notification is resumed? The point of a
    notification is to be resumable. Here it clearly appears that such an
    endeavor is not supposed to happen.
  • Why so much coupling?
  • Why so little cohesion?
  • And more specifically, why it is the job of
    OCUndeclaredVariableWarning to perform this menu and string
    based code reparation and hijack the requestor as if there were friends
    in some abusive relationship? Shouldn’t a notification be just a means
    of sending some kind of signal to the previous method in the call
    stack?

There are a lot of symptoms of schizophrenia in the responsibilities
here.

OpalCompiler>>#parse

Where does ReparseAfterSourceEditing go? I
(intentionally) skipped some steps between
OpalCompiler>>#compile and
OCASTSemanticAnalyzer>>#undeclaredVariable:.

Here is the source code of
OpalCompiler>>#parse:

parse
    | parser |
    [
        parser := self createParser.
        ast := self semanticScope parseASTBy: parser.

        ast methodNode compilationContext: self compilationContext.
        self callParsePlugins.
        self doSemanticAnalysis ]
    on: ReparseAfterSourceEditing do: [:notification |
            self source: notification newSource.
            notification retry ].

    ^ ast

The job of this method (as explained in a previous section) is to do
the frontend part of the compilation and produce a fully annotated and
analyzed AST of the method so that (virtual) machine code can be
generated. The content of the method is mostly straightforward.

What is interesting is the on:do: method call used for
exception (and thus, notification) handling. When a
ReparseAfterSourceEditing is intercepted, we update the
source code and run the protected block from the beginning again (see
Exception>>#retry).

Some potential exits of this loop are:

  • The source code is good enough and no more
    ReparseAfterSourceEditing are signaled.
  • The source code is bad enough that either the failBlock
    is invoked, or another unrecoverable exception occurs. Remember
    SyntaxErrorNotification in
    OpalCompiler>>#compile, for instance.
  • Someone is tired enough and terminates the process.

The first alternative is what happens in our scenario:

  • A declaration of the temporary variable baz is added to
    the source code.
  • The content editor window is updated to reflect that.
  • The new code source is fully parsed and correctly analyzed and a
    legitimate CompiledMethod is produced.

What a happy ending!

Undeclared Variable Reparation, An Epic Journey In a Compiler – Part I

In this series of posts, I present how the current implementation of
Pharo handles compilation errors on undeclared variables and the
interactive reparation to fix them. Targeted readers are people
interested in compilers or object-oriented programming. Non-Pharo
developers are welcome since knowledge of the language or the developing
environment is not required. Some parts of Pharo are explained when
needed in the article.

We illustrate with a small and specific corner case of the code
edition and compilation subsystems of Pharo. It shows how complex
software has to deal with complex situations, requirements, usage and
history. And why design choices matter.

Disclaimer, some parts of the presented code can be qualified as
“awesome”, where “awe” still means “terror”. Maybe I should rename the
article “The Code of Cthulhu” or something, but I’m often bad at
names.

The first and the second parts are a deep-down journey. We start from the GUI and do down (go up?) in the call stack, with very few shortcuts or branching. Explanation, comments, and discussion are done during the visit.

Note also that the presented code is the one of Pharo11 and that most issues should be solved (or working on) for Pharo12. The meta-issue that tracks my work in progress is available at https://github.com/pharo-project/pharo/issues/12883 — warning, it contains spoilers.

Special thanks go to Hugo Leblanc for his thorough review.

Undeclared Variables

Compiling a method in Calypso (the current class browser), in
StDebugger (the current debugger) or in any place that accepts the
edition and installation of methods is an everyday task of Pharo
developers, and most of the time an everyminute task. It’s something
Pharoers do naturally without thinking much about it(possibly to
preserve their own sanity).

One specific picturesque experience is having a menu window pop up
when trying to compile code that contains an undefined variable. The
presented menu contains various options depending on the variable name
and the context: new temporary variable (Pharo name for “local
variable”), new instance variable (Pharo name for “attribute” or
“field”), new class if the name starts with an uppercase letter and some
proposal of existing variable (local, global or other) with a similar
name in case of an obvious typing error. Selecting one or the other of
these options updates the code in the editor and resumes the compilation
(or pops up a similar menu if some other undefined variable
remains).

Note that in Pharo, variables can also remain undeclared, for a lot
of good reasons, but it is a story for another day.

Let us illustrate with a single concrete scenario used in this
article’s first parts. You are in a Calypso editor, on the instance
side, on a class Foo trying to implement a new method
bar.

bar
    baz := 42

The method might not be finished yet and baz is not even
declared, but let’s install it with a classic Ctrl-S
(accept). We get the menu window “Unknown variable: baz please
correct, or cancel:
” with some choices:

  • “Declare new temporary variable”;
  • “Declare new instance variable”;
  • “Cancel”;
  • and also an additional “Cancel” button.
undeclvar

We select the first option (temporary variable) and the code is
automatically repaired as

bar
    | baz |
    baz := 42

the method is also compiled, installed in the class Foo
and fully usable.

Note: the | thing is the Pharo syntax to declare
temporary variables (i.e. local variables).

Part I – Falling Down the
Rabbit Hole

Let’s try to understand what just happened. Is the whole thing
(black) magic or simple object-oriented (black) design?

This first post is down from the compiation request to the menu. The
next post will be about code repair.

We have the Calypso window and its nested text editor component. I
skip the complex graphical UI sequence of calls — there are some
observer design patterns and even a sub-process forked (Pharo processes
are, in fact, green threads) — and for the sake of simplicity and
without loss of generality, I start the story at
ClyMethodCodeEditorToolMorph>>#applyChanges.

ClyMethodCodeEditorToolMorph>>#applyChanges

Note: ClyMethodCodeEditorToolMorph>>#applyChanges
means the method applyChanges of the class
ClyMethodCodeEditorToolMorph. Where Cly stands
for Calypso, the name of the tool. And Morph
is the name of the low-level graphical toolkit currently used by Pharo.
So, basically, the current receiver of the method (self) is
a graphical window.

I do not show the full code of the method. The interesting statement
is:

selector := methodClass
    compile: self pendingText
    classified: editingMethod protocol
    notifying: textMorph.

that is a message send (method invocation) of the selector (method
name) compile:classified:notifying: because, in Pharo, and
in most other Smalltalk dialects, arguments can be syntactically placed
inside the name of the method to invoke.

The method asks the class to compile and install a new method.
Receiver and arguments are:

  • methodClass here the class Foo (instance
    of Foo class subclass of ClassDescription that
    implements the called method
    compile:classified:notifying:)
  • self pendingText is the full source code (an instance
    of the Text class).
  • editingMethod protocol is the selected protocol (group
    of methods) to put the new method. It is nil here, so the
    method might remain unclassified, not a big deal.
  • textMorph is the graphical component (widget) that
    corresponds to the part of the tool that contains the source code
    editor. Here, we have an instance of RubScrolledTextMorph
    that is the common morph widget to represent an editable text area.

Now, why would the compiler need to know about some internal UI
component? Well, we shall see.

ClassDescription>>#compile

ClassDescription>>#compile:classified:notifying:
eventually calls
ClassDescription>>#compile:classified:withStamp:notifying:logSource:
that adds two new parameters:

  • changeStamp that is the current time and date (as a
    String, not a DateAndTime)
  • logSource a Boolean flag set to true.

The important statement of this method is:

    method := self compiler
        source: text;
        requestor: requestor;
        failBlock:  [ ^nil ];
        compile.

Where

  • self compiler return a new compiler instance, already
    configured to compile a method of the class Foo and with
    the default environment (Smalltalk globals, the big
    dictionary of global variables and constants of the system that,
    especially, contains all the class names and their associated class
    objects).
  • text the source code of the method to compile.
  • requestor the RubScrolledTextMorph
    instance (the UI component).
  • [ ^nil ] the on error block, which the
    compiler (or one of its minions) might use in case of a fatal error.
    Note: passing blocks (somewhat equivalent to lambdas in other languages)
    is a popular Pharo way to deal with error management. Here, evaluating
    the block might unwind many methods in the call stack and forces the
    method
    ClassDescription>>#compile:classified:withStamp:notifying:logSource:
    to return nil because ^ means
    “return” (this one is called a “non-local return” in Pharo
    parlance).
  • finally, compile that starts the real compilation
    work.

OpalCompiler>>#compile

The Pharo compiler class is named OpalCompiler and the
invoked method is simply OpalCompiler>>#compile. Here
is the full body of the method:

compile
    ^[
        self parse.
        self semanticScope compileMethodFromASTBy: self
    ] on: SyntaxErrorNotification do: [ :exception |
            self compilationContext requestor
                ifNotNil: [
                        self compilationContext requestor
                            notify: exception errorMessage , ' ->'
                            at: exception location
                            in: exception errorCode.
                    ^ self compilationContext failBlock value ]
                ifNil: [ exception pass ]]

Wow. It’s scarier than it is.

  • ^[ aaaa ] on: SyntaxErrorNotification do: [ :exception | bbbb ]
    means return (^) the result of aaaa but if an
    exception SyntaxErrorNotification occurs, return the result
    of bbbb (where exception is the exception
    object, : and | are simply the Pharo syntax
    for block parameters. Exceptions are another popular Pharo way to deal
    with error management.

    Note: the name SyntaxErrorNotification hints that this
    exception is special; it is a Notification. We discuss them
    in a few sections. The management of syntax errors in Pharo also
    deserves its own story (involving adventures, characters and plot
    development).

  • The job of self parse is simple; it calls the
    parser, does the semantic analysis and tries to produce a valid
    annotated AST of the given source code, or might fail trying if there is
    a syntax or a semantic error in the provided code.
  • self semanticScope compileMethodFromASTBy: self is
    more straightforward than the statement suggests. It transforms the AST
    into Pharo bytecode (maybe a story for another day) and produces the
    result of the compilation as an instance of CompiledMethod.
    CompiledMethod is a very important class, as its instances
    are natively executable by the Pharo Virtual Machine.
  • self compilationContext requestor ifNotNil: is a
    simple if that checks (when a
    SyntaxErrorNotification occurs, since we are in the
    do: block of the exception syntax) if the requestor is not
    nil. Here the requestor is the
    RubScrolledTextMorph object, so not nil. The method
    RubScrolledTextMorph>>#notify:at:in: is called and is
    used to present the error to the user.
  • Then self compilationContext failBlock value invokes
    the failBlock (it is [ ^nil ] from the
    previous section) that terminates the method invocation.

Here, we get part of the answer to our design question: The compiler
has the responsibility to explicitly call the text editor (if any) to
present an error message. It might not be the best design decision,
since it is difficult to argue that the compiler’s responsibility is to
notify UI components in case of errors. Especially here since there are
two levels of error management: an exception and a fail block that could
have been used by Calypso to manage errors and decide by
itself of its specific ways to report errors to the user.

We can also notice the string '->' that is
systematically concatenated at the end of the error message associated
with the caught exception. Why? Because Calypso, for historical reasons,
presents the error message as an insertion directly in the text area in
the editor, in front of the location of the error. For instance, the
syntax error in the code 1 + + 3 (we assume the 2 was
fumbled) appears as
1 + Variable or expression expected ->+ 3 in the
editor.

It’s a second bad design decision, as not only was the compiler
responsible for calling the editor, but it also made some presentation
decisions. In fact, the alternative code editor component, provided in
the Spec2-Code package, strips the ->
string before presenting the error in its own and less intrusive way.
See SpCodeInteractionModel>>#notify:at:in:.

OCASTSemanticAnalyzer>>#undeclaredVariable:

Now we enter the classical compilation frontend work: scanning
(lexical analysis, done by RBScanner), parsing (syntactic
analysis, done by RBParser) and finally the semantic
analysis (done by OCASTSemanticAnalyzer, the Opal Compiler
AST Semantic Analyzer).

Our input, the source code of the bar method, is quite
simple and everything is fine, except that, during the semantic
analysis, the variable name baz is analyzed by
OCASTSemanticAnalyzer>>#visitAssignmentNode: (as a
nice compiler, it processes its AST with visitors), that calls
OCASTSemanticAnalyzer>>#resolveVariableNode: but
which cannot resolve baz thus calls
OCASTSemanticAnalyzer>>#undeclaredVariable: whose
responsibility is to deal with the situation of undeclared
variables.

Note: resolving variables can be a complex task because, in Pharo,
methods and expressions can be used in various contexts with, sometimes,
particular rules. For instance, the playground (workspace) has some
specific variables lazily declared; and the debugger has to deal with
methods currently executed, thus runtime contexts (frames) that require
a non-trivial binding process. Under the hood, the requestor can also be
involved in such symbol resolution. However, I chose to skip this
complexity in this article.

Here is its source code of
OCASTSemanticAnalyzer>>#undeclaredVariable:

undeclaredVariable: variableNode
    compilationContext optionSkipSemanticWarnings
        ifTrue: [ ^UndeclaredVariable named: variableNode name asSymbol ].
    ^ OCUndeclaredVariableWarning new
        node: variableNode;
        compilationContext: compilationContext;
        signal

If we are in a specific mode optionSkipSemanticWarnings
then just resolve as a special undefined variable. Since it’s not the
case currently, I won’t give more detail (yet).

What follows is more interesting.

OCUndeclaredVariableWarning is a subclass of
Notification, a basic class of the kernel of the Pharo
language that is a subclass of Exception (the same kind of
exception we discussed in the previous section). Exceptions in Pharo
work more or less like what you get in many other programming languages.
You catch them with the on:do: method of blocks (that we
have already explained) and throw them with the signal
method.

What is noticeable here is the ^ (a return) in
front of the exception signalment. Notification is a
special kind of Exception that have the ability to be
resumed. Once resumed, the execution of the program continues after the
signal message send. The second special feature of
Notification is that when unhandled (no on:do:
catch them and the notification “goes through” the whole call stack)
then signal has no particular effect and just returns
nil. This is explicit in the method
Notification>>#defaultAction:

defaultAction
    "No action is taken. The value nil is returned as the value of
    the message that signaled the exception."

    ^nil

In summary, Notification instances are just
notifications; if nothing cares, then signal has no
effect.

Let’s go back to
OCASTSemanticAnalyzer>>#undeclaredVariable:. A
notification OCUndeclaredVariableWarning is signaled, and
if some method in the call stack cares and catches the notification, it
can choose to do something and possibly resume the execution with a
Variable object that shall be used to bind
baz.

Is this design decision sound? Let’s discuss this.

There are some drawbacks in the use of such notifications. First, the
link between the signaler
(OCASTSemanticAnalyzer>>#undeclaredVariable:) and the
potential catchers is indirect in the code: it is circumstantial.
Second, a given catcher might unwarily catch a notification it did not
expect (from another compiler, for instance), especially with
Notification because they are silent by default. But the
advantage is that some grandparent callers have more latitude to set up
the kind of execution environment it requires and deal with potential
notifications. We shall explore this possibility later.

An alternative design could be callback based: give the compiler some
objects to call when such decisions have to be made. It could be a block
(lambda) or, for instance, the requestor since we already have one. This
design has the advantage of making the subordination relationship more
obvious in the code, but it might require more management (to store and
pass objects around).

A part of another approach could be to have a set of alternative
behaviors in the compiler that can be activated or configured by the
client (with boolean flags, for instance) This offers a certain control
by the client (that sets up the configuration) and gives the
responsibility of implementing them to the compiler. The drawbacks are
that the effect of flags is limited and that the space of available
combinations on configuration can become large with possible complex
interactions or conflicts.

Another approach could be to silently use place-holded for the
variable of baz (let’s call it
UndeclaredVariable), then continue the compilation and
produce a CompiledMethod instance as the result of the
compile method. The caller is then free to inspect this
CompiledMethod instance, detect the presence of undeclared
variables, then choose to act. The obvious issue is that maybe the
compilation (including byte code generation) was just done for nothing,
wasting precious CPU time and Watts. The advantage is that the compiler
is simpler (no need to try to repair or even report errors) and that the
caller can easily manage multiple error conditions at the same time,
whereas the two other approaches basically impose the caller to solve
each error situation one by one.

Readers might look again at the
optionSkipSemanticWarnings at the beginning of the method
and realize that it feels like these two last alternatives are
implemented here. UndeclaredVariable are a real thing and,
for instance, are used when source codes are analysed for highlighting.
UndeclaredVariable are also used in two other cases:
package loading (because cycles in dependecies are hard) and code
invalidation (because you can always remove classes or instance
variable).

OCUndeclaredVariableWarning>>#defaultAction

So, since baz is not declared,
OCASTSemanticAnalyzer signals an
OCUndeclaredVariableWarning hoping that something can catch
it with the task to provide a Variable object to be bound
to the name baz.

But in the scenario, the notification is not caught by anyone. Is
nil associated with baz? This is not what we
need, nor
OCASTSemanticAnalyzer>>#resolveVariableNode: by the
way.

The answer is in
OCUndeclaredVariableWarning>>#defaultAction (see code
below) which overrides the default
Notification>>#defaultAction that is shown in the
previous section.

defaultAction
    | className selector |
    className := self methodClass name.
    selector := self methodNode selector.

    NewUndeclaredWarning signal: node name in: (selector
        ifNotNil: [className, '>>', selector]
            ifNil: ['<unknown>']).

    ^super defaultAction ifNil: [ self declareUndefined ]

The first part just creates a system notification. You can see them
in the Transcript (basically the system console of Pharo),
or in the standard output in command line mode (search them in the build
log produced by Jenkins, they are numerous, thus hard to miss).

The second part delegates to the superclass, and if the superclass
does not care, fallback to
OCUndeclaredVariableWarning>>#declareUndefined that
is:

declareUndefined
    ^UndeclaredVariable registeredWithName: node name

So an UndeclaredVariable object, shall make
OCASTSemanticAnalyzer happy since it is a very acceptable
thing to bind baz to.

OCSemanticWarning>>#defaultAction

The superclass of OCUndeclaredVariableWarning is
OCSemanticWarning, what does it offer?

defaultAction

    compilationContext interactive ifFalse: [ ^nil ].
    ^self openMenuIn:
        [:labels :lines :caption |
        UIManager default chooseFrom: labels lines: lines title: caption]
  • compilationContext interactive is true if
    there is a requestor and is interactive, false otherwise.
    Our requestor is still the instance of RubScrolledTextMorph
    and is interactive, so we continue.
  • UIManager>>#chooseFrom:lines:title: is a standard
    UI abstract method to pop up a selection window according to the current
    system UI (here MorphicUIManager), or a launch a
    command-line menu when in command line mode, or even produce a warning
    and select the default when in non-interactive mode (asking for things
    in non-interactive mode deserves a warning).

What is openMenuIn:? There are 3 implementations:

  • OCSemanticWarning>>#openMenuIn: (the method
    introduction), that just call self subclassResponsibility.
    This is the Pharo way to declare the method abstract (and signals an
    error if executed).
  • OCShadowVariableWarning>>#openMenuIn: (a subclass
    that is not part of the scenario), that just call
    self error: 'should not be called' that also just signal an
    error.
  • OCUndeclaredVariableWarning>>#openMenuIn:, a
    large Pharo method of 55 lines that is discussed in the next
    section.

What uses openMenuIn:? There are 2 senders:

  • OCSemanticWarning>>#defaultAction
    (obviously),
  • OCUndeclaredVariableWarning>>#openMenuIn:. A
    recursive call? We shall see.

This leads to some more questions:

  • Is it reasonable that the compiler cares about the interactiveness
    of the requestor? Note that it could have been a recent addition since
    most requestors are not aware of that part of the API. See
    CompilationContext>>#interactive that uses the
    questionable message respondTo:.
  • Why such polymorphism if there is only one effective implementation?
    Code leftover? Future-proofing?
  • Why pass a block as an argument if no other sender exists? It seems
    superfluous.
  • Is it the responsibility of a Notification object to
    call UI with a menu?

In the next post, we will present the menu, do the reparation and try
to get out of here (the compiler is far away in the call stack) to
finish the compilation successfully.

Sentiment Analysis in Pharo using a real data set

You are a movie reviewer, and a colleague has just sent to you a set of files with hundreds of reviews to determine their sentiments, for example classify them into positive or negative. You read that machine learning can help here processing massive amounts of data by using a classifier. But computers are not good with textual data, so all these reviews needs to be converted into a friendly format for the machine (hint: vectors of numbers). How do we go from hundreds of text files to an object which can predict new inputs? Meet TF-IDF + Naive Bayes: an algorithm which penalizes words that appear frequently in most of the texts, and a machine learning classifier which has proven to be useful for natural language processing.

The whole idea of the TF-IDF invention is to measure the importance of words in documents (so-called “corpus” in the vocabulary). So if we just can “teach” the machine what words are important for sentiment analysis, then we could classify sentiments in your colleague’s reviews. Teaching means that something was learned before. This is our dataset, which was enriched with knowledge. Fortunately, there were people who already annotated sentiments of IMDB reviews to help in our task.

Probably you would also like to do other high-level analysis of text, like hot topics detection, or any quantitative analysis (meaning: which can be ranked). Although there is no all-in-one recipe, most chances are that there is a standardized workflow for you, which could include: Lowercasing words, remove stop words, punctuations, abbreviations, apostrophe, single characters, stemming or term recognition.

So the basic idea is to go from text to vectors (with TF-IDF) so it can be applied to a predictor algorithm. Later, in a second part, we will use Naïve Bayes as classifier and, of course, you can try to generalize to other types of algorithms like SVM or Neural Networks.

Dataset

We are going to use the IMDB Large Movie Review Dataset with 50,000 reviews where 1 review = 1 file. They are divided in two folders: one for training (25k) and another one for testing (25k). Additionally, both the training and testing sets are sub-divided into positive (12,5k) and negative (12,5k) annotated reviews. The reviews here are ranked between 1 and 10 stars. A review is considered positive if it has more than 7 stars, and negative if it has less than 4 stars, there are no reviews with 5 or 6 stars.

The IMDB dataset is commonly used in a Natural Language Processing (NLP) task named “binary sentiment classification”. Summarizing: It is used when you want to build something to identify between two types or classes of “sentiments”, positive or negative. You could also expand the classification into as many classes as you could get. In this case you could consider to classify using up to 8 classes.

To start working with the dataset, download and uncompress the files to the Pharo image directory (which is where your .image file is located) as follows:

wget http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
tar zxvf aclImdb_v1.tar.gz

Now you should have a folder named aclImdb/ with all the files ready to analyze.

Setup

You can launch Pharo, for this article we use Pharo 10 but the process should work for Pharo 11 too:

./pharo-ui Pharo.image &

Let’s first install the AI packages in Pharo:

EpMonitor disableDuring: [ 
  Metacello new
    baseline: 'AIPharo';
    repository: 'github://pharo-ai/ai/src';
    onWarningLog;
    load ]

Data Exploration

To bring some context, we could say that in the Data Science pipeline there are some typical steps for classification tasks. They can be grouped into 3 big stages: Data Engineering (Exploration, Wrangling, Cleansing, Preparation), Machine Learning (Model Learning, Model Validation) and Operations (Model Deployment, Data Visualization).

Now let’s begin the stage commonly named as “Data wrangling”. This is what popular libraries like pandas does. A first step here is data exploration and data sourcing. The uncompressed dataset has the following directory structure:

acImdb\
    test\
        neg\
        pos\
    train\
        neg\
        pos\

With the following expression, open a Pharo Inspector on the result of the train reviews (highlight and evaluate with Cmd + I or Ctrl + I):

('aclImdb/train/' asFileReference childrenMatching: 'neg;pos')
  collect: [ : revFileDir | revFileDir children collect: #contents ].

And it looks like this:

You have there two main containers. One contains the negative reviews (very funny to read indeed), and the other one the positive ones. Hold on this information for later.

Sourcing the annotations

Classification tasks includes some kind of annotation somewhere, which you can use as “predictor” to train a model. Hopefully, your raw data includes a column with it. In this case the stars (i.e. the classes) are in the file name of each review (which has the pattern reviewID_reviewStarRating.txt) so if you want to enrich your classifier with more classes, you could check the file name star rating depending if it’s greater than 7 or lesser than 4. We will adapt our previous expression to add a sentiment polarity value of value 1 (positive sentiment) and with value 0 if it’s negative. But we do not need to check the file name star rating, this information is already available in the directory name, so we adapt our script to associate the polarity to each review:

| reviews |
reviews := (#('train' 'test') collect: [ : setName | 
  (('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
    collect: [ : revFileDir | 
	| polarity |
        polarity := (revFileDir basename endsWith: 'neg') 
                           ifTrue: [ 0 ] ifFalse: [ 1 ].			
	revFileDir children 
	     collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).

In a real project now it could be a good time to create a ReviewsCollector class, and create a basic protocol for loading and reading reviews. You could also consider using internally a DataFrame instead of “plain” Collections, specially if you want to augment each review in the dataset with features to be calculated. Here we will concentrate in the raw workflow rather than building an object model.

Note : A Pharo/Smalltalk session to typically involves evaluation of expressions directly in the Inspector evaluator. You can copy & paste scripts from this post and re-evaluate the whole workflow from the start each time (if you have enough time), but I encourage to use the Inspector, which is more in line with Exporatory Data Analysis (EDA). At the end of your working session, you can save the image, or just build a script for reproducibility. In this post we will also checkpoint each step for better reproducibility, using the built-in Pharo serializer.

Duplicates removal

To start cleaning the dataset, one of the first tasks we could do is to check if there are duplicates, and remove them from our dataset. We use the message #asSet to remove duplicates:

| reviews dedupReviews |
reviews := (#('train' 'test') collect: [ : setName | 
  (('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
    collect: [ : revFileDir | 
	| polarity |
        polarity := (revFileDir basename endsWith: 'neg') 
                           ifTrue: [ 0 ] ifFalse: [ 1 ].			
	revFileDir children 
	     collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.

Special artifacts removal

After manual inspection we can see our dataset contains artifacts, such as HTML tags. In this case it means the data was scrapped from HTML web pages, so it would not be detected by our word tokenizer which can recognize separators and special characters but not HTML tags. You could discover tags by exploring with the Pharo Inspector (Cmd + I or Ctrl+ I) with a script like this:

| reviews dedupReviews |
reviews := (#('train' 'test') collect: [ : setName | 
  (('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
    collect: [ : revFileDir | 
	| polarity |
        polarity := (revFileDir basename endsWith: 'neg') 
                           ifTrue: [ 0 ] ifFalse: [ 1 ].			
	revFileDir children 
	     collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.
dedupReviews anySatisfy: [ : assoc | 
	| reviewText |
	reviewText := assoc key.
	(reviewText findTokens: ' ') anySatisfy: [ : word | word beginsWith: '<br' ] ]

So, if we pick a random review, our idea is to go from:

Working with one of the best Shakespeare sources, this film manages to be creditable to it's source, whilst still appealing to a wider audience.< br/>Branagh steals the film from under Fishburne's nose, and there's a talented cast on good form.

to

Working with one of the best Shakespeare sources, this film manages to be creditable to it's source, whilst still appealing to a wider audience.Branagh steals the film from under Fishburne's nose, and there's a talented cast on good form.

And we can do it with a simple expression which splits the whole sentence String by the HTML BR pattern and then join the splitted substrings:

| reviews dedupReviews cleanedReviews |
reviews := (#('train' 'test') collect: [ : setName | 
	(('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
		collect: [ : revFileDir | 
			| polarity |
		    polarity := (revFileDir basename endsWith: 'neg') ifTrue: [ 0 ] ifFalse: [ 1 ].			
			revFileDir children 
		     collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.
cleanedReviews := dedupReviews collectDisplayingProgress: [ : docAssoc | 
	(docAssoc key findBetweenSubstrings: #('<br />')) joinUsing: '' ].

So #findBetweenSubstrings: can detect multiple patterns, tokenize the receiver, and then we join them again to get rid of noise patterns. Of course you can adapt and play with the expression to your own needs. I feel it is a good starting point and it avoids nasty regular expressions. Other non-sense text artifacts you might want to check are: ‘\n’, EOL, ‘^M’, ‘\r’.

To generalize for other artifacts, use the #removeSpecialArtifacts: method.

| reviews dedupReviews cleanedReviews tokenizedReviews |
reviews := (#('train' 'test') collect: [ : setName | 
 (('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
   collect: [ : revFileDir | 
     | polarity |
     polarity := (revFileDir basename endsWith: 'neg') 
        ifTrue: [ 0 ] ifFalse: [ 1 ].		
     revFileDir children 
	collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.
cleanedReviews := dedupReviews collectDisplayingProgress: #removeSpecialArtifacts.

"This serialization step is optional and could take some time to complete"
FLSerializer 
	serialize: cleanedReviews 
	toFileNamed: 'acImdb_49582_nodups_noartfcts.fuel'.

cleanedReviews

Notice you cannot clean such artifacts directly with a (typical) tokenizer, because tokenization involves detection of punctuation Characters: If you apply tokenization first, you could lose common (written) language expressions which includes punctuation, for example a smiley 🙂

Punctuation, Special characters (Tokenization)

The next logical step is to transform each of the cleaned reviews Collection (which is composed of Strings “rows”, where a row = a document), into sequences of words, a process called whitespace tokenization, so they only contain words without “noise”.

When it comes to analysis of special characters and punctuation is when things become very interesting. From a näive point of view, just removing all separators would be simple, clean and enough. But language systems are much more complicated, specially when you bring into the analysis variables such as idiom, alphabet types, or even noise. For example: If you are doing more finer semantic (linguistic) analysis then punctuation could be significative, because the target language affects the meaning of a sentence.

Removal of punctuation and special characters is done sending the #tokenize message to any Collection of String. We can see it in action evaluating the following expression :

| reviews dedupReviews cleanedReviews wordTokenizer tokenizedReviews |
reviews := (#('train' 'test') collect: [ : setName | 
	(('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
		collect: [ : revFileDir | 
			| polarity |
		    polarity := (revFileDir basename endsWith: 'neg') ifTrue: [ 0 ] ifFalse: [ 1 ].			
			revFileDir children 
		     collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.
wordTokenizer := AIWordTokenizer specialArtifacts.
cleanedReviews := dedupReviews collectDisplayingProgress: [ : docAssoc | 
	(docAssoc key removeSpecialArtifacts: wordTokenizer) -> docAssoc value ].
tokenizedReviews := cleanedReviews collectDisplayingProgress: [ : docAssoc | 
	docAssoc key tokenize -> docAssoc value ].

"This serialization step is optional and could take some time to complete"
FLSerializer 
	serialize: tokenizedReviews 
	toFileNamed: 'acImdb_49582_nodups_noartfcts_tokenized.fuel'.

tokenizedReviews

Stopwords

Words such as “the”, “of”, “a”, etc could be removed in two ways: By hand (using premade stopwords lists) or by the automagical (statistical) use of TF-IDF. But read, here there are two excellent different opinions from the pros and cons of removing stop words. TL;DR: Removing stopwords with TF-IDF depends of the context and the goal of your task. We can check if the TF-IDF algorithm will “automatically” rank low the very frequent terms which appear in many documents.

If you decided to go with the stopword removal, the stopwords package in Pharo which provides multiple stopword premade lists. We can use a default list of stopwords, but you can use another one you prefer.

AIStopwords forEngish.
AIStopwords forSpanish.

To explore other lists

AIStopwords listSummary.

So our script so far with stopword removal:

| reviews dedupReviews cleanedReviews wordTokenizer tokenizedReviews |
reviews := (#('train' 'test') collect: [ : setName | 
 (('aclImdb/' , setName , '/') asFileReference childrenMatching: 'neg;pos') 
   collect: [ : revFileDir | 
     | polarity |
     polarity := (revFileDir basename endsWith: 'neg') 
        ifTrue: [ 0 ] ifFalse: [ 1 ].		
     revFileDir children 
	collectDisplayingProgress: [ : file | file contents -> polarity ] ] ]).
dedupReviews := reviews deepFlatten asSet.
wordTokenizer := AIWordTokenizer specialArtifacts.
cleanedReviews := dedupReviews collectDisplayingProgress: [ : docAssoc | 
	(docAssoc key removeSpecialArtifacts: wordTokenizer) -> docAssoc value ].
tokenizedReviews := cleanedReviews collectDisplayingProgress: [ : docAssoc | 
	docAssoc key tokenizeWithoutStopwords -> docAssoc value ].

"This serialization step is optional and could take some time to complete"
FLSerializer 
	serialize: tokenizedReviews 
	toFileNamed: 'acImdb_49582_nodups_noartfcts_tokenized.fuel'.

tokenizedReviews

To ignore stopwords removal just replace #tokenizeWithoutStopwords with #tokenize.

So far a first part covering reading and cleaning data for a classification task. In a next article we will see how to classify these reviews with a classifier.

Resurrecting Dead Images

Using images to develop software has proven very useful.
However an image may break for multiple reasons and not being able to open anymore.
For example when using a breakpoint in a method used by the system may prevent it to open.
We explain here how we were able to recover such an image.

This blogpost has been used as the base for a academic paper accepter at VMIL’22.
Conference link: https://2022.splashcon.org/details/vmil-2022-papers/1/Ease-VM-Level-Tooling-with-Language-Level-Ordinary-Object-Pointers
Preprint: https://hal.inria.fr/hal-03827632

Introduction

Have you ever had a Pharo image that does not open anymore ?
Hours or even days of work lost forever because you forgot to save your code outside of the image ?
It happens to all Pharo developers at some point.
And we all hate the existing solutions when a lot of code is involved.

Epicea:

Epicea is pretty much the best option we have.
Quite frankly, it saved me multiple times.
It is awesome in a number of cases, because it allows us to reapply the latest changes on the current image.
One image may be dead, but we are able to resurrect it !
However Epicea is not as great when you have to apply back multiple sessions, ones that you forgot, and particularly when it touches sensitive code such as the Compiler.

Real World Case Description

The Murder

In the RMOD team, we currently have a student that is working on having multiple OS windows, rather than only one.
While working on this project, she was modifying very sensitive user interface code.
At some point, she added a breakpoint to explore a method, and saved her image to be sure she would be able to recover if something went wrong.
She then closed the image.
This image still had the breakpoint in the method.

Later, she tries to open the image.\newline
Nothing.
The Pharo Launcher stays silent.

The Investigation

We followed advice from a colleague.

“If you can run images on your laptop but not from Pharo Launcher, I suggest you open Pharo Launcher, select an image you know it can be opened, right-click on it and click on ‘copy launch command’.
You will get the command PL use to run the image.
Then you could try it directly from a shell and see what happens.
It is often the best way to understand what is wrong when an image does not open.”

Following his advice, we found that the breakpoint is triggered by the startup of his image.
Therefore, the image cannot open the UI.
Therefore, the image cannot open.

Debug stack printed by the VM when trying to open the image from a command line.
This clearly shows that (1) the error preventing to open is indeed the halt and (2) the method containing the halt.

The Resurrection

To fix this, we used a recent tool we wrote based on the VM simulator: Polyphemus (https://github.com/hogoww/Polyphemus/).
We opened the dead image inside a VM simulator, running inside a clean Pharo image.
We then explored the classes to find the class with the breakpoint method.

Allows to browse the classes and methods of the dead image.
This allows to bypass the metacircularity of the environment and to browse the methods without executing any of them.

Once we had the method object, we were able to compare the breakpoint version of this method, with one from a fresh Pharo 11, and analyse what we had to fix.

Inspection of the breakpoint compiled method object on the left and the clean compiled method object on the right.
The breakpoint method has 3 more bytecodes sending the haltOnce message.
It also has an extra literal for the selector \textit{#haltOnce} in the first literal.

Of course, we first had to change the bytecodes.
We take the bytecode from the Pharo version and inject it into the image.
We replaced manually the content of this object with bytecodes that would execute the correct behavior.
We replaced the last extra bytecodes with 0.

replacementBytecodes := #(76 76 128 76 129 130 104 147 92 16 152 150 252 0 0 0).
replacementBytecodes doWithIndex: 
    [ :aBytecode :anIndex | self bytecodeAt: anIndex put: aBytecode

This also required fixing the method literals, which are used to express the breakpoint.
To send a message, the bytecode pushes the selector of that message on the execution stack.
To have this selector available, it is stored inside the method literals.
In this case, we can see that the \textit{haltOnce} selector is stored at index one.
Therefore, the literals were off by one, and the indexes used by the bytecodes were wrong.
To fix that, we replace each literal by the next one.
This leaves a duplicated literal in the last index.
This is not really a problem, because it is not used in the bytecode.
This could be a problem because the last and penultimate literals have particular properties.
They hold the class and the selector of the method.
Therefore we advised the student to immediately recompile the method for safety !

0 to: self numberOfLiterals - 1
  do: [ :aLiteralIndex | | nextLiteral |
     nextLiteral := self literalAt: aLiteralIndex + 1.
     self literalAt: aLiteralIndex put: nextLiteral 
     ]

Using the simulator to snapshot the memory onto the hard-drive.
reifiedMemory simulator coInterpreter primitiveSnapshot.

Finally, we snapshot this image.

reifiedMemory simulator coInterpreter primitiveSnapshot.

And poof, dead image is now resurrected !

With the abstractions provided by Polyphemus, it took us less than an hour to fix this image and recover days of work.

Note that this solution was possible because the method with the breakpoint has both more bytecodes and literals than the clean one.
Therefore we were able to keep it simple and just patch it.
More complex solutions would be required for more complex cases.

Closing Words

Although rudimentary, these solutions were achievable because of the initial effort of reifying the objects inside the VM simulator.
We are also investigating how to investigate and repair images suffering from memory corruption.
At this time, Polyphemus is just a base.
A base that took a lot of thinking and effort to create, but a base nonetheless.
The tools are yet to come !
https://github.com/hogoww/Polyphemus/

This blogpost has been used as the base for a academic paper accepter at VMIL’22.
Conference link: https://2022.splashcon.org/details/vmil-2022-papers/1/Ease-VM-Level-Tooling-with-Language-Level-Ordinary-Object-Pointers
Preprint: https://hal.inria.fr/hal-03827632

Creating a new visualization in Roassal

Data visualization is the discipline of trying to understand data information by placing it in a visual context so that patterns and trends that might not be detected easily will appear.

In this article I will describe step by step how to create a visualization using Roassal30 in a beginner friendly way. In addition we will show how we can turn a script in a visualization class with powerful customization hooks. Finally we will show how we can extend the object inspector with a new dedicated visualization. Doing so we show (1) how specific domains can be represented differently and specifically and (2) how domain objects can be navigated using diverse representations (lists, table, visualization).

Who is this article for?

The audience of this article:

  • Data scientists – readers familiar with Pharo.
  • Designers of visualization engines will find valuable resources regarding the design and implementation of a visualization engine.

Getting started with Roassal30

Roassal is a visualization engine developed in Pharo (http://pharo.org). Roassal offers a simple API. It shares similarities of other visualization engines like Mathplotlib and D3.js. Roassal3.0 produces naturally interactive visualizations to directly explore and operate on the represented domain object (https://link.springer.com/book/10.1007/978-1-4842-7161-2). Roassal 2’s documentation is available at (http://agilevisualization.com).

To use, it download the lasted stable version, currently Pharo10. Roassal3.0 is integrated in the Pharo environment, now if you want to use the last version of Roassal, select the menu item

Library >> Roassal3 >> Load full version

Loading Roassal should take a few seconds, depending on your Internet connection.

Visualizing a class hierarchy

Roassal provides examples as scripts directly executable in the Playground. We will start by designing a new visualization as a large script and in the following section we will turn into a class.

The following example, display the full hierarchy of a class with shapes whose size and colors depend on the class they represent. Evaluate the following example with Cmd+D

data := Collection withAllSubclasses.

boxes := data collect: [ :class |
	RSBox new
		model: class;
		popup;
		draggable;
		yourself ].
	
canvas := RSCanvas new.
canvas addAll: boxes.

RSNormalizer size
	from: 10;
	to: 100;
	shapes: boxes;
	normalize: #linesOfCode.
	
RSNormalizer color
	from: Color blue;
	to: Color red;
	shapes: boxes;
	normalize: #linesOfCode.
	
RSLineBuilder orthoVertical
	withVerticalAttachPoint;
	shapes: boxes;
	connectFrom: #superclass.
	
RSTreeLayout on: boxes.

canvas @ RSCanvasController.
canvas.

The execution of the previous script will produce the next image

This visualization shows a hierarchy of boxes, where each box is the graphical representation of a class. The width, height and color represent the size of the lines of code of the class. The lines show the connection or relationship between each class. In this case the connection is to the superclass of each class. The type of these lines is orthogonal vertical line. The position of each box shows a nice tree structure for the Collection with all subclasses.

Let’s analyse each part of the script:

Collecting data: getting a model for the visualization

data := Collection withAllSubclasses.

Usually when you are building a visualization, you use a group of objects, or collection of data. In this case our data is a collection of classes, but you can use CSV files, sql databases, or objects. In some cases you should preprocess your data before visualize it.

Generating shapes

Roassal uses several basic shapes to create the visualization, like RSBox, RSEllipse, RSLabel, RSLine, and RSComposite to combine them.

boxes := data collect: [ :class |
	RSBox new
		model: class;
		popup;
		draggable;
		yourself ].

This script creates a drawable box shape for each class. This box has a model (the class it represents). In addition this code puts a popup and a drag and drop interactions.

Creating the canvas

In Roassal, an instance of RSCanvas is the default container of shapes to show your shapes. When you want to show a shape, create it and then add it into the canvas.

canvas := RSCanvas new.
canvas addAll: boxes.

Normalization

To understand the relation between our data or to stress an aspect, sometimes it is necessary to apply a normalization or transformation for some properties of the visual elements, like the width, height, color, etc.

RSNormalizer size
	from: 10;
	to: 100;
	shapes: boxes;
	normalize: #linesOfCode.
	
RSNormalizer color
	from: Color blue;
	to: Color red;
	shapes: boxes;
	normalize: #linesOfCode.

In this example we are using lines of code for each class to represent the color and the size of each box.

  • RSNormalizer size returns an instance of RSNormalizer, this object can modify the size(width and height) property for a group of shapes.
  • The methods from: and to: define the values from the minimum and the maximum values from the collection of shapes. That means that for the class that have the minimum number of lines of code the size of that box will be 10. And for the class with the maximum number of lines of code the size will be 100.
  • We use normalize: message to apply the transformation on each shape, using a block or method that takes as argument the model of each shape.

Lines

To show links between these boxes we can use lines.

RSLineBuilder orthoVertical
	withVerticalAttachPoint;
	shapes: boxes;
	connectFrom: #superclass.

A line builder has many parameters to create and add new lines to the canvas.

  • RSLineBuilder orthoVertical returns an instance of RSLineBuilder, orthoVertical, is the type of line that we want to use, we can use line, or bezier or etc.
  • withVerticalAttachPoint creates an attach point. This object knows the starting and ending point for the line.
  • This builder interacts with a group of shapes so we set it using the message shapes:.
  • connectFrom: is the method that will apply the builder and will generate new lines between the group of boxes, using #superclass for each box’s model.

If need it you can use directly the basic shapes RSLine or RSPolyline for some specific scenarios.

Layout

Finally a good visualization needs to put each element in the correct position by using a layout.

RSTreeLayout on: boxes.

canvas @ RSCanvasController.
canvas.

Roassal layout is an object that changes the position of shapes when the method on: is executed. You can use RSTreeLayout, RSVerticalLineLayout, RSHorizontalLineLayout, etc. There are many layouts in Roassal that you can use. You can also define your own layouts in case the default ones are not enough.

RSCanvasController puts basic interactions in the canvas, for zoom in or zoom out and navigate into the canvas. And the last line is the canvas itself to inspect it on the playground.

The next step: convert a script into a class

Using a script to prototype a visualization is handy and fast. However, it does not scale in terms of reusability and customization.

To reuse this code we need to create a class, there are many ways to do it. We suggest to create a subclass of RSAbstractContainerBuilder. Builders in Roassal, create a group of shapes set the interaction events and put each element in the correct position and they add the shapes into a canvas.

Create a Demo package in the system browser and then a new class TreeClassBuilder, this new class will create the same visualization as the script.

We should implement the method renderIn:. If we copy the previous script, but we omit the creation of the canvas, and then we declare the local variables and the we use the text code formatter, the method will be:

"hooks"
TreeClassBuilder >> renderIn: canvas

	| data boxes |
	data := Collection withAllSubclasses.

	boxes := data collect: [ :class | 
		         RSBox new
			         model: class;
			         popup;
			         draggable;
			         yourself ].

	canvas addAll: boxes.

	RSNormalizer size
		from: 10;
		to: 100;
		shapes: boxes;
		normalize: #linesOfCode.

	RSNormalizer color
		from: Color blue;
		to: Color red;
		shapes: boxes;
		normalize: #linesOfCode.

	RSLineBuilder orthoVertical
		withVerticalAttachPoint;
		color: Color black;
		shapes: boxes;
		connectFrom: #superclass.

	RSTreeLayout on: boxes.

	canvas @ RSCanvasController.
	canvas

To test it in a playground execute this line: TreeClassBuilder new open. It will open a window with the same visualization as before. That means that open method calls renderIn: method.

Now we would like to customise each part of the script, for example different domain model(data), different shapes, different transformations. or lines, or event layout, using OOP.

Test driven development

It is a good practice to use test cases, to create the test class, right-click over the class name and click over Jump to test class. Then we get the class TreeClassBuilderTest

We will need some tests, like the test that opens the window but this test needs to close/delete the opened window.

"tests"
TreeClassBuilderTest >> testOpen
	| builder window |
	builder := TreeClassBuilder new.
	window := builder open.
	window delete.

Creating new methods with Unit Tests

Create a new test to change the domain classes, because currently the visualization is using Collection withAllSubclasses

"tests"
TreeClassBuilderTest >> testClasses
	| builder classes |
	builder := TreeClassBuilder new.
	self assert: builder classes isEmpty.
	classes := { TestCase. Collection. Array }.
	builder classes: classes.
	self assert: builder classes equals: classes.

If we execute this test will fail. Because we do not have the accessor method for classes.

Using the debugger we can generate this this getter method, using the button Create, first we select the TreeClassBuilder, then we select/write the accessing protocol for the new method.

We return an instance variable classes,

"accessing"
TreeClassBuilder >> classes
	^ classes

Then click on Proceed button on the debugger. We will have a new error: classes is nil. To solve it we can define a default value in the initialize method. Browse the class TreeClassBuilder and create initialize method as follow:

"initialization"
TreeClassBuilder >> initialize
	super initialize.
	classes := #().

Then we define the setter method.

"accessing"
TreeClassBuilder >> classes: aCollectionOfClasses
	classes := aCollectionOfClasses

Finally our test is green, viva viva!

Now we need to change and separate the renderIn: method into different methods. Because:

  • Big methods are hard to understand.
  • Big methods are hard to customise.
  • We can create test for each new method.
  • We can reuse code creating small methods.

Overriding methods allow us to extent the class functionalities, adding new functionalities in the future with new classes.

Then renderIn: method would be:

"hooks"
TreeClassBuilder >> renderIn: canvas
	| boxes lines |
	boxes := self createBoxes.
	self normalize: boxes.
	lines := self createLinesFor: boxes.
	self layout: boxes.
	canvas
		addAll: lines;
		addAll: boxes

As you can see each part of the visualization is now in a method.

"hooks"
TreeClassBuilder >> createBoxes
	^ self classes collect: [ :class | self createBoxFor: class ]
"hooks"
TreeClassBuilder >> createBoxFor: aClass
	^ RSBox new
		model: aClass;
		popup;
		draggable;
		yourself
"hooks"
TreeClassBuilder >> normalize: aGroupOfBoxes
	RSNormalizer size
		from: 10;
		to: 100;
		shapes: aGroupOfBoxes;
		normalize: #linesOfCode.

	RSNormalizer color
		from: Color blue;
		to: Color red;
		shapes: aGroupOfBoxes;
		normalize: #linesOfCode.
"hooks"
TreeClassBuilder >> createLinesFor: boxes
	| lineBuilder |
	lineBuilder := RSLineBuilder orthoVertical.
	lineBuilder
		withVerticalAttachPoint;
		color: Color black;
		shapes: boxes.
	^ lineBuilder connectFrom: #superclass.
"hooks"
TreeClassBuilder >> layout: aGroupOfBoxes
	RSTreeLayout on: aGroupOfBoxes
"accessing - defaults"
TreeClassBuilder >> defaultContainer
	^ RSCanvas new @ RSCanvasController

Now lets try to use it in a playground with different data. Here we display two hierarchies one of ArrayedCollection and other of SliderMorph.

builder := TreeClassBuilder new.
builder classes: ArrayedCollection withAllSubclasses, SliderMorph withAllSubclasses.
builder asPresenter open

Inspector Integration

The inspector is a central tool in Pharo as it allow one to (i) see the internal representation of an object, (ii) intimately interact with an object through an evaluation panel, and (iii) get specific visual representations of an object. These three properties and the fact that developers can define their own visualizations or representations of a given object makes the inspector a really powerful tool. In addition the inspector navigation will let the user smoothly walk through the objects each of them via specific or generic representations.

In this case our data domain are classes, we can define an inspector pane to show the hierarchy. Open the class Class in the system browser, then add the following method:

"*Demo"
Class >> inspectorTreeBuilder
	<inspectorPresentationOrder: 1 title: 'Subclass Hierarchy'>

	^ TreeClassBuilder new
		classes: self withAllSubclasses;
		asPresenter

The previous method is an extension method, for the package Demo, this means that the source code of this method will be save in the Demo package. By using the pragma <inspectorPresentationOrder:title:>, Pharo developers can define a new view pane for that object in the inspector.

Also because our TreeClassBuilder is subclass of RSAbstractContainerBuilder, you can use the method asPresenter. Now execute: Collection inspect

If you click over one box you will see the subclass hierarchy for the selected box in a new inspector pane.

With the approach presented in this article, you can extent any of your domain objects or event the ones offered by the system and have different views of the same object.

The inspector and its specific panes let the user navigation from one visualization to another. Such interaction and navigation constitute a context to ease the manipulation and understanding of domain or objects.

Export the visualization to other formats

To export in other image formats like PNG, PDF, SVG, we need the project Roassal3Exporters, use the next incantation in a playground:

Metacello new
    baseline: 'Roassal3Exporters';
    repository: 'github://ObjectProfile/Roassal3Exporters';
    load.

After that we need to reopen the inspector. And we will find a new toolbar button

The button will open a file chooser window where we can use SVG, PNG or PDF file formats

The next pdf document was created by the exporter

You can use more file formats with different constraints. like html using https://aframe.io/ or mp4 files o video files with transparency using https://ffmpeg.org/.

Share the code in Github

Finally, to share our cool project with the people we will need to create a repository in a git server like Github.

Please visit this link: https://github.com/pharo-vcs/iceberg/wiki/Tutorial, to know how to publish use Iceberg.

In order to publish a project it is good idea to create a baseline check this link https://github.com/pharo-open-documentation/pharo-wiki/blob/master/General/Baselines.md

This project is in the Github repository https://github.com/akevalion/DemoClassBuilder, please follow the readme instructions to install it and used it.

Conclusion

We saw how to create a new visualization from scratch and how it can be scaled into a class. In addition, Pharo provides many tools in the development of basic and complex applications such as: the Inspector, the Debugger, Spec, Roassal, Iceberg, etc.

We show how we can extend the environment to be able to use our new visualization.

We saw that use can use Roassal visualizations to get a different perspective on domain objects. We show that a new visualization can be tested, shared and used by other developers. You also can use projects like: Athens-Cairo, Morphic or other backends such as GTK. But Roassal offers many tools to interact manipulate and talk with your object data.

Custom transmissions in the Inspector

In this post, I will explain what is a transmission, a transformation and how to have custom transmissions in the Pharo Inspector. We will show how transmissions are used in the inspector with a soccer database analysis tool that you can find at https://github.com/akevalion/Futbol/. We used this application because we needed a real domain to navigate within the inspector and to show some non-trivial situations where you want to abstract over the domain to get a smooth navigation flow.

In particular, the inspector is a special object presenter that control the navigation between spec presenters. And we will explain such a specific case.

So let us start by explaining transmissions.

What is a transmission?

Transmissions are a generic way to propagate information between Spec components. They are a way to connect presenters, thinking on the “flow” of information more than the way it is displayed.

For example, let us say that we want to have a presenter with a list and with a text inside. When one clicks on the list, the text inside the text presenter will update.

Imagine we defined the layout as follows:

layout := SpBoxLayout newHorizontal
     add: (list := self newList);
     add: (detail := self newText);
     yourself

But this does not say how list and detail are linked. We could do it by describing action to be raised on specific list events or via transmissions.

The transmission sub-framework solves this in an elegant way: Each presenter defines ”output ports” (ports to send information) and ”input ports” (ports to receive information). Each presenter defines also default input and output ports.

Transmitting

A transmission connects a presenter’s output port with a presenter’s input port using the message transmitTo: as follows:

list transmitTo: detail.

This will connect the list presenter default output port with the detail presenter default input port. This line is equivalent (but a lot simpler) to this one:

list defaultOutputPort transmitTo: detail defaultInputPort.

It is important to remark that a transmission does not connect two components, it connects two component ports. The distinction is important because there can be many ports!
Take for example the class SpListPresenter, it defines two output ports (selection and activation), this means it is possible to define also this transmission (note that we do not use the defaultInputPort but outputActivationPort instead):

list outputActivationPort transmitTo: detail defaultInputPort

Note that some presenters such as SpListPresenter offer selection and activation events (and ports). An activation event is a kind of more generic event. Now the selection event can be mapped to an activation event to provide a more generic layer.

Transforming values during a transmission

The object transmitted from a presenter output port can be inadequate for the input port of the target component. To solve this problem a transmission offers transformations.
This is as simple as using the transform: message as follows:

list
     transmitTo: detail
     transform: [ :aValue | aValue asString ]
list defaultOutputPort
     transmitTo: detail defaultInputPort
     transform: [ :aValue | aValue asString ]

Transforming to arbitrary input

We can also transmit from an output port to an arbitrary input receiver using the message transmitDo: and transmitDo:transform:.
It is possible that the user requires to listen an output port, but instead transmitting the value to another presenter, another operation is needed. The message transmitDo: message handles this situation:

list transmitDo: [ :aValue | aValue crTrace ]

Acting after a transmission

Sometimes after a transmission happens, the user needs to react to modify something given the new status achieved by the presenter such as pre-selecting an item, shading it…
The postTransmission: message handles that situation.

list
    transmitTo: detail
    postTransmission: [ :fromPresenter :toPresenter :value |
       "something to do here" ]

Transforming transmissions inside the Inspector

We will use as an example for this blogpost a football project. You can download it from herehttps://github.com/akevalion/Futbol/. There are instructions in the README of how to install it. The project allows one to visualise a european football database. Note that installing the project may take some minutes because it needs to install the database.

Teaser of the project

First, we can select the country league that we would like to see. In this case we choose France. Then we see a list with the seasons available. We select a season and we can see a table of points of that season of the French league.

The problem

For calculating the statistics of a team, we use a calculator class (TeamStatsCalculator) and not the real model. This is a common situation in which one needs to use a helper class not to pollute the model. So, if we click on a team of the table, instead of seing a soccer team object, which is what we would expect, we see a TeamStatsCalculator instance. In this example, we clicked on the team LOSC of Lille but we get a TemStatsCalculator instead of the LOSC team (yes our team is located at Lille so we took a local team).

What we would like to have is to transform the transmission. The class TeamStatsCalculator knows its real Team and it knows how to convert itself into an instance of Team with the message TeamStatsCalculator >> asTeam. So, we want to transform the object into an instance of Team and then transmit it.

Now there is a catch, as we are inside the inspector, doing the transformation is not the same as before because the inspector takes control of the presenter.

Using SpTActivable trait

SpTActivable is a trait that helps us to use transmission in the Inspector.

We need to use that trait in the class of our presenter. For this example we will use the trait in the class TablePointsPresenter, that is our presenter class that is used inside the inspector.

SpPresenter << #TablePointsPresenter
	traits: { SpTActivable };
	slots: { #table . #teamEvolutionData };
	tag: 'Core';
	package: 'Football-Spec'

Specifying the custom transmition

Once we user that trait, we need to define the method: outputActivationPort on our TablePointsPresenter class.

The method will say how we are going to transform the element. As we said previously, in this case, we only need to send the message asTeam to the object and it will return an instance of its real team when we click on an item of the table. To define the outputActivationPort method, we need to return an instance of SpActivationPort and specify the transform action.

For this example, we will return the output activation port of the table and to specify the transformation.

outputActivationPort

    ^ table outputActivationPort
        transform: [ :selection | selection selectedItem asTeam ];
        yourself

Now, we refresh the inspector, and when we click on a row and we see that now we have the real team object for the LOSC of Lille!

Conclusion

We saw that to transform value during a transmission in the Pharo Inspector, we only need to use the trait SpTActivable and to define the method outputActivationPort. Also, we saw how thanks to the transmissions sub-framework this work is easy to do as Spec takes care of it for us.

Sebastian Jordan-Montano

Setting up your environment for working on the Pharo VM

To work on Pharo’s virtual machine, you’ll need to set up both the virtual machine you’ll be modifying, as well as a Pharo image. Below are the instructions required to set up the environment. You will need both the VM and the image containing the VM compilation chain (VMMaker package).

Building the Pharo VM

  • Clone the pharo-vm repo (https://github.com/pharo-project/pharo-vm) .
    • You may need to switch to a branch different from the default one, depending on top of which version you want to make your changes.
  • Check you have all dependencies needed to build the pharo-vm
    • Tip: When working on macOs, you can install Command Line Tools for Xcode, that already contains many of the dependencies you’ll need.
  • Build the virtual machine to check everything went correctly:
    • To do so, run the following command inside pharo-vm‘s root directory: cmake --build . --target install
    • If compilation completes without error, you’re good to go!

Configuring the Pharo Image with VMMaker

  • Download PharoLauncher from http://pharo.org
  • Create a new Pharo image (if in doubt, choose the latest stable version)
  • Set-up pharo-vm in Pharo
    • Open Pharo image
    • Add the pharo-vm github repository to the Pharo image via Iceberg (Add > Import from existing clone)
    • Right-click repo -> Metacello > Install baseline of VMMaker (Default)
  • Verify that the installation completed correctly
    • Run all VMMakerTests test suites
    • If all tests pass (except from the JIT-related ones), you are good to go!
  • To run a Pharo image with a specific version of the VM, you can use the following command in the root of pharo-vm‘s root directory:
    • build % ./build/dist/Pharo.app/Contents/MacOS/Pharo ~/Documents/Pharo/images/Pharo\ 10.0\ -\ 64bit\ \(stable\)/Pharo\ 10.0\ -\ 64bit\ \(stable\).image --interactive
    • Note: the exact path may differ in your case, depending on your Pharo version and OS.

Now you are ready to go and change the VM code and generate a new image.

What is RBParseTreeSearcher ?

Imagine that you want to find a specific expression and that you want to find it in the complete system. How many classes would you have to look for? How can you be sure that you did not miss any class and being sure that you won’t be frustrated because of the number of issues thrown on compilation or execution? In addition imagine other scenario where you want to transform that expression into another one.

Changing code, removing, or replacing deprecated methods is costly for a developer by doing it manually instead of using an automated feature.

This blog post will explain how to find a specific piece of code we may look for inside a Pharo program, and make it easy for the developers to deal with pattern matching and RBParseTreeSearcher class.

Following work will be about how to replace code using RBParseTreeRewriter and doing the exact same thing automatically using the Rewrite tool (a tool built on top the RBParseTreeRewriter). 

For the moment, we will explain some fundamental definitions and for that the post is structured following below sections:

  1. Pattern code description
  2. RBParseTreeSearcher description
  3. RBParseTreeSearcher examples with pattern code

1. Pattern code description

A pattern expression is very similar to an ordinary Smalltalk expression, but allows one to specify some “wildcards”. The purpose is simple. Imagine that you have a piece of code:

car isNil ifTrue: [ ^ self ].

You can of course compare it with the same piece of code for equality, but wouldn’t it be cool if you could compare something similar, but ignore the fact that the receiver is named car? With pattern rules you can do exactly that. Consider the following code and notice the back-tick before car:

`car isNil ifTrue: [ ^ self ].

Now this expression can match any other expression where isNil ifTrue: [^self] is sent to any variable (or literal). With such a power you can find all the usages of isNil ifTrue: and replace them with ifNil. So what are the “wildcards” that we are using?

(`)Basic pattern nodes

Code prefixed with a back-tick character (`) defines a pattern node. The table below is listing three simple patterns that can be declared with the back-tick:

Pattern typeExampleDescription
Variable`someName asStringThis pattern will match message asString sent to any receiver, disregarding the name of it
MessagePharo globals `someMessageThis pattern will match any unary message sent to Pharo globals.
Method`someMethod ^ nilThis pattern will match any method which returns nil
Selector`sel: aValThis pattern will match any selector followed by aVal.

Example with matches:

`receiver foo 

matches:

  • self foo
  • x foo
  • OrderedCollection foo
(`#) Literal pattern nodes

A back-tick can be followed by the hash sign to ensure that matched receiver will be a literal:

Pattern typePattern nodeDescription
Literal`#literal asArrayThis pattern will match any literal (Number, String, Array of literal ) followed by asArray message

Example:

 `#lit size

matches:

  • 3 size
  • 'foo' size
  • #(a b c) size
(`@) List pattern nodes

To have complete flexibility, there is the possibility to use an at sign @ before the name of a pattern node which turns the node into a list pattern node, which can be empty, returns one or multiple values.

Pattern typePattern nodeDescription
Entity`@expr isPatternVariableThis pattern will match a single or multiple entities followed by isPatternVariable
MessagemyVar `@messageThis pattern will match any message (including unary) sent to myVar
Temporary variable|`temp `@temps|This pattern will match at least one temporary variable which is defined as `temp; For`@temps, the matching can find nil, one or many temporary variables defined.
ArgumentmyDict at: 1 put:`@argsThis pattern will match myDict at: 1 put: followed by a list of arguments `@args that can be nil, one or many args.
List of statements[ `.@statements.
 `var := `myAttribute. ]
We will explain statements later on, but this is to mention that @ can be used also to define a list of statements which can be empty, contain one or many elements.

This expression will match a block which has at first a list of statements, that must be followed by 1 last assignment statement `var := `myAttribute.

Disclaimer:

  • You may write an expression with just args instead of `@args.
  • The list patterns does not make any sense for literal nodes i.e. `#@literal.

Example 1:

`x := `@value

matches:

myVar := OrderedCollection new

Example 2:

`sel1 at: `@args1 `sel2: `@args2

matches:

self at: index putLink: (self linkOf: anObject ifAbsent: [anObject asLink])

Where:

  • `args1 and `args2 have different values
  • `sel1 matches self
  • `@args1 matches index
  • `sel2: matches putLink:
  • `@args2 matches (self linkOf: anObject ifAbsent: [anObject asLink])

Example 3:

`@rcvr `@msg: `@args matches:

(self class deferUpdates: true) ifTrue: [^aBlock value].

Where:

  • `@rcvr matches (self class deferUpdates: true)
  • `@msg: matches ifTrue:
  • `@args matches [^aBlock value]

Example 4:

|`@args1 `myArgument `@args2| matches:

| t1 t2 |

Here we need to have at least 1 argument myArgument , and the example is matching because `@args1 can be empty. So here we have:

  • myArgument is matching with t1
  • `@args2 is matching with t2
(`.) Statement pattern nodes

Back-tick can be followed by a period to match statements. For example:

Pattern typePattern nodeDescription
Statementvar
ifTrue: [`.statement1 ]
ifFalse: [ `.statement2 ]
This pattern will match an ifTrue:ifFalse: message send to any variable, where both blocks have only one statement each.

Example1:

`.Statement1.

is matching:

  • x := 1.
  • myVal:= 'Hello World'.
  • self assert: myVal size equals: 11.

Example2:

|`@temps|
`@.statements1.
`.duplicate.
`@.statements2

matches:

|x|
x := 1.
x := 2

Where:

  • |`@temps| matches |x|
  • `@.statements1. is nil
  • `.duplicate. matches x := 1.
  • `@.statements2

P.S. In the end it does not matter whether you will write `.@Statement or `@.Statement.

(`{ }) Block Pattern Nodes

These are the most exotic of all the nodes. They match any AST nodes like a list pattern and test it with a block. The syntax is similar to the Pharo block, but curly braces are used instead of square brackets and as always the whole expression begins with a back-tick.

Consider the following example:

`{ :node | node isVariable and: [ node isGlobal ] } become: nil

this pattern will match a message #become: with an attribute nil, where the receiver is a variable and it is a global variable. 

There is also a special case called wrapped block pattern node which has the same syntax and follows a normal pattern node. In this case first the node will be matched based on its pattern, and then passed to the block. For example:

`#arr `{ :node | node isLiteralArray } asArray

is a simple way to detect expression like #(1 2 3) asArray. In this case first #(1 2 3) will be matched by the node and then tested by the block.

Naming is Important

The pattern nodes are so that you can match anything in their place. But their naming is also important as the code gets mapped to them by name. For example:

`block value: `@expression value: `@expression

will match only those #value:value: messages that have exactly the same expressions as both arguments. It is like that because we used the same pattern variable name.

2. RBParseTreeSearcher description

So, after figuring out what are the patterns that can be used and what kind of matches they can perform, now we can move forward to discover how RBParseTreeSearcher class works in Pharo , in order to be able to understand in the last section how RBParseTreeSearcher and defined patterns work together to find the matches we are looking for.

RBParseTreeSearcher is supposed to look for a defined pattern using the ‘wildcards’ of a matcher defined as a Tree, and on success (when match is found) a block can be executed.

Basically, when a developper uses this class, the most used instance variables are:

  • #matches:do: which a message that looks for patterns defined in matches: block using the wildcards, and if a match is found the do: block is executed.
    The do block takes two parameters: :aNode and :answer. The aNode refers to each node of the pattern defined, and the answer can be used for example to increment value on each node match.
    The blocks defined in #matches:do: are called rules, and they are stored only in success in instance searches of RBParseTreeSearcher defined below.
  • searches which type is Ordered collection, contains all the successful rules applied whenever using: #matches:do:, #matchesMethod:do … to store rules of type Rule, MethodRule, ArgumentRule, TreeRule …
  • context which type is dictionary: contains all the successfully matched patterns.
  • executeTree: this method takes aParseTree as input parameter, which is the possible matching code that we are looking for, and starts the matching process using the defined pattern.
  • messages of type OrderedCollection, and returns the list of messages found in a match.
  • hasRules returns searches list

Consider the following example which is using the instance sides defined above:

|searcher dict|
searcher := RBParseTreeSearcher new.
searcher
    matches: '`@rcv at:`@arg `sel:`@arg1'
    do: [ :aNode :answer | dict := searcher context ].
searcher executeTree:
    (RBParser parseExpression: 'cache at: each ifAbsentPut: [ each ].').

The method #matches:do: is used to define the pattern that we are looking for, using the ‘wildcards’ defined in first section; In addition of that, the do is running only on match, and in our case it will fill the dictionary dict with the searcher context (which is the pattern defined in matches block).
This execution is fired on executeTree: which defines the matcher that is a String parsed as a Tree using parseExpression, then starts matching it with the pattern.

3. RBParseTreeSearcher examples with pattern code

Finally, in this section we use patterns with the RBParseTreeSearcher class and do some magic by finding some matches in Pharo code !

Consider the following example:

| dict searcher|
searcher := RBParseTreeSearcher new.

searcher  
   matches: '`@receiver assert: `@arg equals: true'
   do: [ :aNode :answer | dict := searcher context ].

searcher 
   executeTree: (RBParser parseExpression: 'self assert: reader storedSettings first realValue equals: true.').

dict 	
   collect: [ :each | each displayString ].

The example is matching successfully and the dictionary dict will return different values during the iteration:

Match 1: (key) `@receiver is matching with (value) self
Match 2: (key) `@arg is matching with (value) reader storedSettings first realValue

If we want to check all the messages in the matcher, we can use searcher messages which will return an array of one item containing message #assert:equals: as it is the only message available in the matched expression.

*********************************

Using styles in Spec applications

In this post we will see how to use custom styles in Spec applications. We will start to present styles and then build a little editor as the one displayed hereafter.

We will show that an application in Spec manages styles and let you adapt the look of a presenter.

How do styles work?

Styles in Spec work like CSS. They are style sheets in which the properties for presenting a presenter are defined. Properties such as colors, width, height, font, and others. As a general principle it is better to use styles instead of fixed constraints, because your application will be more responsive.

For example, you want a button to have a specific width and height. You can do it using constraints with the method add:withConstraints: or using styles. In both cases the result will be this:

But, if you change the size of the fonts of the Pharo image using Settings/Appearance/Standard Fonts/Huge, using fixed constraints, you will obtain the following result. You will for example do not be able to see the icons because the size is not recomputed correctly.

If you use styles, the size of the button will also scale as shown below.

Style format

The styles in Spec format are similar to CSS. Style style sheets are written using STON as format. We need to write the styles as a string and then parse it as a STON file.

Here is an example that we will explain steps by steps below.

'.application [       
    .lightGreen [ Draw { #color: #B3E6B5 } ],          
    .lightBlue [ Draw { #color: #lightBlue } ] ]'

We will go by steps.

SpPropertyStyle has 5 subclasses: SpContainerStyle, SpDrawStyle, SpFontStyle, SpTextStyle, and SpGeometryStyle. These subclasses define the 5 types of properties that exist. On the class side, the method stonName that indicates the name that we must put in the STON file.

  • SpDrawStyle modifies the properties related to the drawing of the presenter, such as the color and the background color.
  • SpFontStyle manipulates all related to fonts.
  • SpGeometryStyle is for sizes, like width, height, minimum height, etc.
  • SpContainerStyle is for the alignment of the presenters, usually with property is changed on the main presenter, which is the one that contains and arranges the other ones.
  • SpTextStyle controls the properties of the SpTextInputFieldPresenter.

If we want to change the color of a presenter, we need to create a string and use the SpDrawStyle property, which STON name is Draw as shown below. For setting the color, we can use either the hexadecimal code of the color or the sender of Color class.

'.application [       
    .lightGreen [ Draw { #color: #B3E6B5 } ],          
    .lightBlue [ Draw { #color: #lightBlue } ] ]'

Now we have two styles: lightGreen and lightBlue that can be applied to any presenter.

We can also use environmental variables to get the values of the predefined colors of the current theme, or the fonts. For example, we can create two styles for changing the fonts of the letters of a presenter:

'.application [
    .codeFont [ Font { #name: EnvironmentFont(#code) } ],
    .textFont [ Font { #name: EnvironmentFont(#default) } ]
]'

Also we can change the styles for all the presenters by default. We can put by default all the letters in bold.

'.application [
	Font { #bold: true }
]'

Defining an Application

To use styles we need to associate the main presenter with an application. The class SpApplication already has default styles. To not redefine all the properties for all the presenters, we can concatenate the default styles (SpStyle defaultStyleSheet) with our own. As said above, the styles are actually STON files that need to be parsed. To parse the string into a STON we can use the class SpStyleVariableSTONReader.

presenter := SpPresenter new.
presenter application: (app := SpApplication new).

styleSheet := SpStyle defaultStyleSheet, 
	(SpStyleVariableSTONReader fromString: 
	'.application [
	     Font { #bold: true },
            .lightGreen [ Draw { #color: #B3E6B5 } ],
            .bgBlack [ Draw { #backgroundColor: #black } ],
	    .blue [ Draw { #color: #blue } ]
]' ).

app styleSheet: styleSheet.

Now, can can add one or more styles to a presenter, like follows:

presenter layout: (SpBoxLayout newTopToBottom
	add: (label := presenter newLabel);
	yourself).

label label: 'I am a label'.
label addStyle: 'lightGreen'.
label addStyle: 'black'.
 
presenter openWithSpec.

Also we can remove and add styles at runtime.

label removeStyle: 'lightGreen'.
label removeStyle: 'bgBlack'.
label addStyle: 'blue'.

Using classes

To properly use styles, it is better to define a custom application as a subclass of SpApplication.

SpApplication << #CustomStylesApplication
	slots: {};
	package: 'Spec-workshop'

In the class we need to override the method styleSheet to return our custom style sheet concatenated with the default one.

CustomStylesApplication >> styleSheet

	^ SpStyle defaultStyleSheet, 
	(SpStyleVariableSTONReader fromString:
'.application [
	Font { #bold: true },

	.lightGreen [ Draw { #color: #B3E6B5 } ],
	.lightBlue [ Draw { #color: #lightBlue } ],
	.container [ Container { #padding: 4, #borderWidth: 2 } ],
	.bgOpaque [ Draw { #backgroundColor: EnvironmentColor(#base) } ],
	.codeFont [ Font { #name: EnvironmentFont(#code) } ],
	.textFont [ Font { #name: EnvironmentFont(#default) } ],
	.bigFontSize [ Font { #size: 20 } ],
	.smallFontSize [ Font { #size: 14 } ],
	.icon [ Geometry { #width: 30 } ],
	.buttonStyle [ Geometry { #width: 110 } ],
	.labelStyle [ 
		Geometry { #height: 25 },
		Font { #size: 12 }	]
]')

We can use different properties in the same style. For example, in labelStyle we are setting the height of the presenter to 25 scaled pixels and the font size to 12 scaled pixels. Also, we are using EnvironmentColor(#base)for obtaining the default background colour according to the current theme. Because the colour will change according to the theme that used in the image.

For the main presenter, we will build a mini-text-viewer in which we will be able to change the size and the font of the text that we are viewing.

SpPresenter << #CustomStylesPresenter
	slots: { #text . #label . #zoomOutButton . #textFontButton . #codeFontButton . #zoomInButton };
	package: 'Spec-workshop'

In the initializePresenters method we will first initialise the presenters, then set the styles for the presenters and finally initialise the layout.

CustomStylesPresenter >> initializePresenters

	self instantiatePresenters.
	self initializeStyles.
	self initializeLayout
CustomStylesPresenter >> instantiatePresenters

	zoomInButton := self newButton.
	zoomInButton icon: (self iconNamed: #glamorousZoomIn).
	zoomOutButton := self newButton.
	zoomOutButton icon: (self iconNamed: #glamorousZoomOut).

	codeFontButton := self newButton.
	codeFontButton
		icon: (self iconNamed: #smallObjects);
		label: 'Code font'.
	textFontButton := self newButton.
	textFontButton
		icon: (self iconNamed: #smallFonts);
		label: 'Text font'.

	text := self newText.
	text
		beNotEditable
		clearSelection;
		text: String loremIpsum.

	label := self newLabel.
	label label: 'Lorem ipsum'
CustomStylesPresenter >> initializeLayout
	
	self layout: (SpBoxLayout newTopToBottom
		add: label expand: false;
		add: (SpBoxLayout newLeftToRight
			add: textFontButton expand: false;
			add: codeFontButton expand: false;
			addLast: zoomOutButton expand: false;		
			addLast: zoomInButton expand: false;
			yourself)
		expand: false;
		add: text;
		yourself)

Finally, we change the window title and size:

CustomStylesPresenter>> initializeWindow: aWindowPresenter

	aWindowPresenter
		title: 'Using styles';
		initialExtent: 600 @ 400

Without setting the custom styles nor using our custom application in the presenter, we have:

We do not want the black background color for the text presenter. We will like to have a sort of muti-line label. We want the zoom buton to be smaller as they only have icons. We want to have the option to change the size and font of the text inside the text presenter. Finally, why not, we want to change the color of the label, change the height and make it a little more bigger.

CustomStylesPresenter >> initializeStyles

    "Change the height and size of the label."
    label addStyle: 'labelStyle'.
    "But the color as light green"
    label addStyle: 'lightGreen'.

    "The default font of the text will be the code font and the size size will be the small one."
    text addStyle: 'codeFont'.
    text addStyle: 'smallFontSize'.
    "Change the background color."
    
    text addStyle: 'bgOpaque'.

    "But a smaller width for the zoom buttons"
    zoomInButton addStyle: 'icon'.
    zoomOutButton addStyle: 'icon'.
	
    codeFontButton addStyle: 'buttonStyle'.
    textFontButton addStyle: 'buttonStyle'.

    "As this presenter is the container, set to self the container
    style to add a padding and border width."
	
    self addStyle: 'container'

Finally, we have to override the start method in the application. With this, we are going to set the application of the presenter and run the presenter from the application.

CustomStylesApplication >> start

	(self new: CustomStylesPresenter) openWithSpec

Now, if we run CustomStylesApplication new start we will have:

The only thing missing is to add the behaviour when pressing the buttons.

For example, if we click on the zoom in button we want to remove the smallFontStyle and add the bigFontSize. Or, if we click on the text font button, we want to remove the style codeFont and add the textFont style. So, in the connectPresenters method we have:

CustomStylesPresenter >> connectPresenters

	zoomInButton action: [
		text removeStyle: 'smallFontSize'.
		text addStyle: 'bigFontSize' ].
	zoomOutButton action: [ 
		text removeStyle: 'bigFontSize'.
		text addStyle: 'smallFontSize'].

	codeFontButton action: [
		text removeStyle: 'textFont'.
		text addStyle: 'codeFont' ].
	textFontButton action: [ 
		text removeStyle: 'codeFont'.
		text addStyle: 'textFont']

Now, if we click on zoom in we will have:

And if we click on text font:

Conclusion

Using styles in Spec is great. It make easier to have a consistent design as we can add the same style to several presenters. If we want to change some style, we only edit the styles sheet. Also, the styles automatically scale if we change the font size of all the image. They are one of the main reason why in Spec we have the notion of an application. We can dynamically change how a presenter looks.

Sebastian Jordan-Montano