Ed: your new Emergency Debugger

“What is the nature of your debugging emergency?”


In this post, we’ll speak about debugging emergencies due to system error handling failures. We’ll cover the current tools available to revert code changes provoking such failures and introduce Ed, the new emergency debugger with improved revert capabilities.

What is a debugging emergency?

Sometimes when you execute code, you get a strange failure that opens that window and blocks your execution. This is the emergency evaluator invite.

It says a lot of things and it is scary, and it is common to just close it as fast as possible to get back to our Pharo image and forget about it.

Unfortunately, this means that there was a system error handling failure, it may happen again, and sometimes it totally breaks your image. This is a debugging emergency!

A debugging emergency is a system error handling failure

A system error handling failure happens when the debugger cannot be opened after an exception is raised. Most of the time, this happens because an error occurs within the debugger itself, which prevents it to work properly. Consequently, the debugger cannot open to debug its own error. This is an error-handling failure.

Because the system failed to handle the error, the whole system crashes. The system falls back to the emergency evaluator invite (see above) and waits for the user to decide what to do. We are in a debugging emergency state because tools are broken (here, the debugger) but nevertheless we need to recover from that error. At this point, the only tool still able to work is the emergency evaluator.

The only tool able to deal with an error-handling failure is the emergency evaluator

The emergency evaluator is a rudimental tool with a super limited user interface. It is started from the emergency evaluator invite. It answers to two commands only:

  • revert: reverts the last modified method to the last known version of that method (in this screenshot, it just failed doing that… why?).
  • exit: quits the emergency evaluator, dropping the original error.

Any other command is treated as Pharo source code: it is compiled and executed.

We need better emergency handling because current tools are limited

Emergency tools are too limited: they do not provide enough information, that information is not explorable and on top of that, they are buggy!

The first limitation is the restricted information that is given by the emergency evaluator invite. It is a printed representation of the last 20 elements of the call stack. All other debugging information is dumped, to only keep that printed stack. Because it is only printed, a lot of contextual information is lost. It is often hard to see in that stack where it crashed. The problem is that users have to rely on that information to devise what to do in the emergency evaluator!

Then comes the emergency evaluator. Its revert command is interesting. It allows us to recover from a fatal method change that provokes the error. But there are many problems. Among others, it is buggy and does not work all the time. Then, it only reverts the last modified method in the system. What if the problematic method is not the last modified one, but another one modified much before?

When reverting does not work, the only option left is to manually recompile the buggy method. This is tedious and error-prone: we have to remember the original class name, the method’s source code and rewrite it without any error. Although this is possible for simple methods, it does not scale. Finally, in case of error, the emergency evaluator will not give us any feedback and we’re left in the dark.

ED: the Emergency Debugger

Ed is an enhanced emergency evaluator that keeps as much debugging features as possible before falling back to the emergency evaluator. Ed provides a structured and navigable call stack from which users decide their debugging actions.

Ed is designed as a minimal, last-resort debugger to handle error handling failures. The goal of Ed is to provide a debugging tool with the less possible system dependencies to ensure that it can work even if almost everything else breaks.

Ed always opens on your domain code failure: it does not show you the debugger error. Your concern is to debug your code. However, it is possible to see a stack trace of the debugger failure, and to debug it with Ed.

Ed’s improved emergency stack view

Ed presents a navigable stack instead of a string representation of the stack. Instead of dropping the debug session when a system error handling failure occurs, Ed keeps it and uses it to provide us with contextual information about the failure. The stack that Ed presents is a stack of execution contexts extracted from the debug session.

The navigable stack helps us understand the error and where it originally happened:

  • We can see and navigate the call stack like in a standard debugger (top pane)
  • We can see the argument value (if any) printed between parentheses near each method in the call stack
  • We can see the source code of the method selected in the stack (middle pane), and see exactly what expression failed or is being executed (in red)

Ed’s method version selector

For a selected method in the call stack, Ed provides access to the different versions of that method. The source code of each version is displayable and navigable. Viewing methods’ versions helps in deciding which one is the method to revert. In addition, it does not matter if the right method was the last modified method in the system. It’s highly probable that this method is somewhere in the call stack, and we can navigate the call stack!

  • We can see and navigate the selected stack method’s versions using left/right arrows: bottom pane shows version 2/15 of the selected method
  • We can revert the method to the selected version by using the revert command
  • Once a method is reverted, the retry command allows you to retry opening a real debugger on your error, and the proceed command will attempt continue execution from there.
  • Any method in the stack can be selected and reverted to a previous version


What else can Ed do?

Ed provides a prompt with basic commands. For now, Ed uses a simple debugging API with minimal commands. Remember: you got here because the debugger failed while opening on an error from your domain code. So Ed just gives you another chance to debug your error. Maybe you are interested in debugging the debugger failure, in which case Ed allows you to switch context to debug that failure. It is also possible to retry the opening of the standard debugger after reverting a method or to terminate all non-vital system processes to kill potential image-breaker running code.

  • Ed provides some help to know all available commands: type "h"
  • The set of commands is provided by the debugging API of Ed
  • Using the showDebuggerError command, Ed shows the debugger failure that led you here, and the debugDebugger command opens a new Ed on the debugger failure.

Can Ed fail?

Yes, it can. Any program can. In case Ed fails, the system fallsback to the original emergency evaluator. For now, a failure in Ed should not block you. For instance, imagine that you get a debugger failure while debugging an exception in your domain code. First, Ed will open on your exception (not the debugger failure) so that you can debug your code. Second, if an exception is raised from within Ed and that it breaks Ed, the emergency evaluator will open on your exception again. Debugging the debugger is not going to be on your soulders, and you can focus on your bugs!

Note that, in the future, this last part is something we are going to improve to bring more flexibility to developers. Settings will enable the debugging of debugger failures or disable them to try to always find a debugger that can debug your exceptions.

Additionaly, the default emergency debugger can now be selected in the settings. You can choose if you wish to experiment Ed or to stay with the old emergency evaluator:

Ed is a prototype, but it moves fast!

Ed is still a prototype, and it needs better testing and experimentation. For instance, it still has system dependencies that can be removed, it has code that can be improved.

In the end, the goal is also to give users a read-eval-print-loop prompt with a strong debugging API. For now, Ed only has it’s minimal emergency API. However, it has been designed in a way that its API is extensible by additional tools, improving Ed’s potential capabilities. In the future, we want to explore remote debugging with Ed. For instance, Ed could act as a minimal debugging infrastructure that you can remotely connect to through an simple http connection. And if your debugging needs grow, Ed could receive a bytecode upload containing new debugging tools and API. We will experiment and explore this in future articles about debugging IOT devices.

About Blocks: Basics 101

This blog post is a first and short of a series where we will explore block-closures. This post will cover basic elements while the other will start to go over more semantical aspects using simple examples. This blog post is extracted from “Deep Into Pharo” and a new book under writing that will revisit chosen pieces of Pharo.

Lexically-scoped block closures, blocks in short, are a powerful and essential feature of Pharo. Without them it would be difficult to have such a small and compact syntax.
The use of blocks is key to get conditionals and loops as library messages and not hardcoded in the language syntax. This is why we can say that blocks work extremely well with the message passing syntax of Pharo.

What is a block?

A block is a lambda expression that captures (or closes over) its environment at creation-time. We will see later what it means exactly. For now, imagine a block as an anonymous function or method. A block is a piece of code whose execution is frozen and can be kicked in using messages. Blocks are defined by square brackets.

If you execute and print the result of the following code, you will not get 3, but a block.
Indeed, you did not ask for the block value, but just for the block itself, and you got it.

[ 1 + 2 ]
>>> [ 1 + 2 ]

A block is executed by sending the message value to it.
More precisely, blocks can be executed using value (when no argument is mandatory), value: (when the block requires one argument), value:value: (for two arguments), value:value:value: (for three) and valueWithArguments: anArray (for more arguments).

These messages are the basic and historical API for block execution.

[ 1 + 2 ] value
[ :y | y + 2 ] value: 5

Some handy extensions

Beyond the value messages, Pharo includes some handy messages
such as cull: and friends to support the execution of blocks even
in the presence of more values than necessary.
cull: will raise an error if the receiver requires more arguments than provided.

The valueWithPossibleArgs: message is similar to cull: but takes
an array of parameters to pass to a block as argument.
If the block requires more arguments than provided, valueWithPossibleArgs:
will fill them with nil .

[ 1 + 2 ] cull: 5
>>> 3
[ 1 + 2 ] cull: 5 cull: 6
>>> 3
[ :a | 2 + a ] cull: 5
>>> 7
[ :a | 2 + a ] cull: 5 cull: 3
>>> 7
[ :a :b | 1 + a + b ] cull: 5 cull: 2
>>> 8
[ :a :b | 1 + a + b ] cull: 5
>>> error because the block needs 2 arguments.
[ :a :b | 1 + a + b ] valueWithPossibleArgs: #(5)
>>> error because 'y' is nil and '+' does not accept nil as a parameter.

Other messages

Some messages are useful to profile execution:

  • bench. Return how many times the receiver block can be executed in 5 seconds.
  •  durationToRun. Answer the duration (instance of class Duration ) taken to execute the receiver block.
  • timeToRun. Answer the number of milliseconds taken to execute this block.

Some messages are related to error handling – we will cover them when we will present exceptions.

  • ensure: terminationBlock. Execute the termination block after evaluating the receiver, regardless of whether the receiver’s evaluation completes.
  •  ifCurtailed: onErrorBlock. Execute the receiver, and, if the evaluation does not complete, execute the error block. If evaluation of the receiver finishes normally, the error block is not executed.
  • on: exception do: catchBlock. Execute the receiver. If an exception exception is raised, execute the catch block.
  • on: exception fork: catchBlock. Execute the receiver. If an exception exception is raised, fork a new process, which will handle the error. The original process will continue running as if the receiver evaluation finished and answered nil, an expression like: [ self error: ‘some error’] on: Error fork: [:ex | 123 ] will always answer nil to the original process. The context stack, starting from the context which sent this message to the receiver and up to the top of the stack will be transferred to the forked process, with the catch block on top. Eventually, the catch block will be executed in the forked process.

Some messages are related to process scheduling. We list the most important ones. Since this Chapter is not about concurrent programming in Pharo, we will not go deep into them but you can read the book on Concurrent Programming in Pharo available at http://books.pharo.org.

  • fork. Create and schedule a Process evaluating the receiver.
  • forkAt: aPriority. Create and schedule a Process evaluating the receiver at the given priority. Answer the newly created process.
  • newProcess. Answer a Process evaluating the receiver. The process is not scheduled.

In the next post we will present some little experiences to show how blocks captures variables.

On the edge of class rules

Pharo has one simple basic rule: everything is an object. But the objects themselves are not entities living in an abstract universe and do not drink the dew of lilies of the valley for breakfast. They exist in the object memory served by the virtual machine. The virtual machine defines an interface that strictly specifies how the objects have to look like and what rules they need to follow.

For objects, the amount of rules is very low – basically just one. Pharo is a class-based object system that requires every object to have an assigned class. Every object is an instance of some class. 

Classic Class rules

For classes, the amount of rules is much wider. Classes are the source of the behaviour of objects, and they shape how the objects will look physically in memory. For this reason, every class needs to be an object with at least three instance variables. The placement and order of these variables are strictly given. These variables are:

  • superclass – can be nil for the class hierarchy roots
  • methodDict – a dictionary of method name symbols and associated compiled methods. It really needs to be a dictionary, and because this data structure is so essential for the virtual machine, the internal structure of it is strictly defined as well
  • format – an integer that encodes the kinds and numbers of variables

These three instance variables are defined in the class named Behavior that is the basic class of all classes in Pharo. If you try to add a new instance variable that will change the order of these three, the virtual machine crashes because it changes the expected layout of the classes in memory. Even if you try to define a superclass of Behavior and add a new instance variable there, it has the same fatal result because it modifies the memory layout as well. 

Classes themselves are objects too, so they need to follow the same rules. Most of the classes have a name and are nicely placed in the Pharo class hierarchy. They are mentioned in the system dictionary (a namespace) and the system browser can work with them. But not all classes need to be like that. Pharo provides an option to create so-called anonymous classes that behave more like standard objects. You can inspect them, but the other development tools do not know about them. They have various usages – mostly in special and strange cases – but they may be handy especially during testing because such classes are not registered in the system, do not need to have a unique name and are very easy to discard. They are created using the messagenewAnonymousSubclass that you send to an existing class.

aClass := Object newAnonymousSubclass.

We said that objects are instances of classes and classes are objects. If you think about such rules while drinking something uplifting, you may ask yourself: Is it possible to have an object that is an instance of itself? Can an object be its own class? Does Pharo allow such an edge case? And if yes, is it any useful?

Funky Classes Objects

How to try it? The most straightforward way how to do that would be to create an object (e.g., as an instance of the class Object) and then assign its class to itself. But here you quickly hit two problems. First, as we said, the virtual machine has some restrictions on the memory layout of classes that you need to follow. The second problem is that Pharo objects have no direct way of how to change their class to another. They do not understand any message like class: that would do this job.

Surprisingly, both of these problems are quite easily solvable! Let’s see how.

First, we will create an anonymous class. Unlike most of the anonymous classes, we will create classes that have Class as a superclass (an anonymous metaclass :)). That means that such anonymous class will define the same instance variables as Class. The three instance variables that make the virtual machine happy plus some other variables that make the tools happy.

aClass := Class newAnonymousSubclass.

Such class looks pretty standard in the Inspector.

Then we create an instance of such a class. 

anObject := aClass basicNew.

Inspecting of this class does not offer a very satisfying view. All instance variables are referencing the default value – nil. Moreover, the print string is broken.

There is an easy fix of this shattering state. We will just copy all instance variables from the original anonymous class.

aClass allInstVarNames do: [ :instVarName |
  anObject instVarNamed: instVarName put: (aClass instVarNamed: instVarName) ].

 Now the object looks the same way as the anonymous class in the Inspector, so there is no need to put the picture here again. They look the same way, but they are not the same objects. And, we still haven’t reached our goal to create an object that is its own class.

(anObject == aClass)
>>> false. (anObject class == anObject)
>>> false.

But we are close. Now, the object is an instance of our anonymous class and the object itself is already quite a suitable class.

(anObject class == aClass) 
>>> true.

So we may do the same step that Pharo does when it migrates object to a new class. For example, when you change the structure of the original class object. We may let our anonymous class become the desired object. 

aClass becomeForward: anObject.

That updates even the low-level pointer in the object memory that defines a class of our object. It was pointing to the anonymous class that becomes our object. The original anonymous class will stop to exist; we do not need it anymore.

(anObject class == anObject) 
>>> true.

The mission accomplished! We have created an object that is its own class.

Now, we can start to play with it. It is a class so we can add methods to it, which can be useful for creating mocks. 

anObject compile: 'ultimateAnswer ^ 42'.

And the object will understand it. As the only object in the object memory.

anObject ultimateAnswer 
>>> 42

So what?

Great, but is it useful? It can be. This technique is not something new. I have seen it in the Prototypes package by Russell Allen from from year 2005 that allowed prototype-based approach in the class based system. His original implementation cannot work in Pharo; however, the principle stays. Like JavaScript and unlike Self, it allows only a single target of delegation – here named superclass

It is more an interesting oddity than something that you would use every day in your programs. But it can imagine it may be useful mocking, in some debugging scenarios or modeling programs. Last but not least, it is a nice and amusing feature that demonstrates the flexibility of the object model of Pharo.

Here is the entire code:

aClass := Class newAnonymousSubclass.
anObject := aClass basicNew.
(aClass allInstVarNames) do: [ :instVarName |
    anObject instVarNamed: instVarName put: (aClass instVarNamed: instVarName) ].
aClass becomeForward: anObject.


We got some fun around classes, and we hope that you enjoyed it. If you want to understand more conceptually objects, classes and metaclasses you should read the little book: ‘A simple reflective object kernel’ http://books.pharo.org/booklet-ReflectiveCore/ . It is not about Pharo, but it shows in Pharo how to define a minimal reflective kernel, and you can learn a lot of fun aspects of OOP (like instantiation, allocation, lookup,…) in the process. 

Implementing Indexes – Who uses all my memory

This is the second entry of the series about implementing full-text search indexes in Pharo. The first entry is Implementing Indexes – A Simple Index. You can start reading from this post, but there are things that we have presented in the previous one. For example, how to install the Trie implementation.

In the first entry, we have seen how we can create a Trie, today we are going to analyze the drawbacks of this first implementation and how to improve it to get an acceptable ratio between used space and access time.

Let’s start with a simple example, let’s create a Trie with all the class names, so we can query them.

behaviors := CTTrie new. 
SystemNavigation default allBehaviorsDo: [ :aBehavior | 
       behaviors at: aBehavior name put: aBehavior name ].

In this example, we are creating a new Trie that indexes the names of all the behaviors in the system and put into the values of the tries the name of each behavior. This looks silly, but it will be clear later, as we want to analyze the performance of the data structure, so we put the simplest payload we can have.

This simple index allows us, as we have already seen before, to find all the classes that start with a given prefix.

behaviors allValuesBeginningWith: 'T'. 
behaviors allValuesBeginningWith: 'Obj'. 
behaviors allValuesBeginningWith: 'Spt'.

Now, we are exactly where we have been before. Let’s now analyze the memory footprint of our trie and where we can improve it.

Analyzing a Graph of Objects

As we all know, the programs and data in Pharo are represented by graphs of objects. Each object has a series of references to other objects, and they collaborate to fulfill a mission. One possible, way of looking at it is a static analysis. Let’s start with a class diagram.


This diagram does not help at all, as we only can see that the trie has a node, and the nodes have a value, a collection of children and a character. This does not help to see the memory occupied by the graph.

We need to have a dynamic view of the graph.

The first approach that we can try is to use the inspector, and use all the advantages of the Pharo live environment.

Screenshot 2020-04-03 at 20.52.30

This is a really powerful tool to explore and to understand data structures and complex object graphs. Although, we are facing a problem where our trie has 18.814 elements. So the task to analyze the size of the Trie with all its instances is not easy. We need another tool. Then, we are going to use the inspector, because it is so powerful to navigate the graph with tip of the fingers.

For this, we are going to analyze the statistics of a given graph of objects. We want to know:

  • Number of instances
  • How much space do they occupy
  • And to see how these numbers are related.

A custom tool for this particular problem

A nice way of working that you see a lot in machinists and carpenters is that they build custom tools for very specific problems. Investing a little time in a small tool that serves a particular problem provides better results than trying to apply a tool that tries to solve all problems in the world.

So we are going to use a little tool, 3 classes, that I have implemented in an hour or so. This tool takes a root object, and traverse the graph, taking all the objects reachable in the graph and then calculates some statistics.

You can get it from github.com/tesonep/spaceAndTime.

So we are going to do:

stats := GraphSpaceStatistics new
	rootObject: behaviors;

stats totalSizeInBytes.
stats totalInstances.
stats statisticsPerClass.	
stats statisticsPerClassCSV.

And get statistics about memory usage. With the last expression, we can generate a CSV of the results of the analysis. I got this table with the results:

Class name # Instances Memory (Bytes) Memory(%)
Array 159.568 7.694.592 36.46%
CTTrieNode 159.568 5.106.176 24.20%
IdentityDictionary 159.568 3.829.632 18.15%
Association 159.567 3.829.608 18.15%
ByteString 9.242 349.872 1.66%
ByteSymbol 9.242 293.928 1.39%
CTTrie 1 16 0.00%
UndefinedObject 1 16 0.00%
Character 64 0 0.00%
SmallInteger 90 0 0.00%
Total   21.103.840 100.00%

Nice data… but we need to extract some knowledge from it.

First, we can start reducing the noise of the table, let’s analyze why we have the last four lines that do not affect the result. UndefinedObject is an easy task. This is the class of nil, and as we only have one instance of it so it is easy to identify. We can see that this single instance occupies 16 bytes, which is the smallest size an object can occupy in Pharo.  The same happens with our sole instance of CTTrie.

Then we have, Character and SmallInteger, those have 0 bytes occupied even if we have a lot of instances. That should be wrong. But no, this is not an error. These objects are immediate. An immediate object is stored encoded in the reference. When you have an object with references only to immediate objects, the only occupied space is the space where the reference is. You can read more about it in Section 6.10 of  Pharo by Example (Page 112).

Now, let’s see the classes that have more impact. If we see the first 4 rows in the table, we can see the number of instances of Array, CTTrieNode, IdentityDictionary, and Association. We have the same number of instances of each of them, the only difference is that we have one less for Association. So, this is not a chance, we have something there. If we inspect the nodes in the Trie, we can see that each node has an instance of  IdentityDictionary to hold its children. That explains the relationship between each node and each dictionary. But the instances of Array and Association, where they came from?

Inspecting an IdentityDictionary
With the Inspector we understand how the IdentityDictionary is built

Doing a deeper analysis, using the inspector, we can see that an instance of IdentityDictionary has inside an array of associations. So we, have an array per IdentityDictionary and an Association per node. As each entry In the Array is a child node represented with an association.

If our theory is correct, where is the missing Association, as we have 159.568 nodes but one Association less. The explanation to this came when we inspect the CTTrie. The CTTrie instance has a direct reference to the root, so that explains the missing association. As the root is not contained in a Dictionary, so it is not referenced by an association.

Finally, to end our analysis, we can see the instances of ByteSymbol and ByteString. We have exactly 9.242 instances of each of them. Why? We need to remember that our trie contains exactly 18.484 entries and we are storing there the name of the classes. But the question is why some are instances of ByteSymbol and others are instances of ByteString. Again, analyzing the graph with the inspector. We see that for example, we have the following elements in the Trie: #Object and ‘Object class’. So we see that the name of the classes are Symbols and the names of the metaclasses are Strings. And of course, we know that for each class in the system we have a metaclass associated with them.

Can we do better?

One of the more interesting numbers that we have avoided so long, is the ratio of nodes and values stored. As we can see we have 18.484 values stored in the trie, but for those, we need 159.568 node instances; a ratio of 9 nodes per value. Why do we have so many nodes?

If we inspect the nodes, we see that there are lots of long chains of nodes with a single child and without node value. Again, we need to take some measures to understand the problem.  We are going to measure something that we will call the occupation rate of the trie. Again, I couldn’t find this is something studied or not, but it is helpful for my problem and to optimize in the future. We defined like this:

occupationRate = (nodes - intermediateNodes) / nodes

The Nodes value is the number of nodes in the trie, and the intermediateNodes value is the number of nodes that only have a child and does not contain a value. This rate gives us the idea of how much of the trie is holding information. In our case, the value is very small 13%.

Thinking about this problem, we got to the root of it (or in this case the branches 🙂 ). The problem is how the trie is structured to store the data. Let’s see a small example with the keys “aaaaa” and “aaaab”. The resulting trie is something like the following picture


In this diagram, it is clear that all the colored nodes are internal nodes. They have only one child and they don’t hold a value. And it is clear that this structure should be compressed in way that those nodes are reduced to a single entity. With the desired result of reducing the 4 of them to a single one.

Another elephant in the room is the space occupied by the use of an IdentityDictionary to hold the children of a node. Around 72% of the memory impact is related to the decision of using an IdentityDictionary. Can it be done using a better strategy?


We are in a great moment of the series, it looks like we have spent a lot of time to understand the problem, but still, we don’t have anything.

This is not true, we have a lot, we have a way of measuring the performance of our solution, we understand where these numbers came from and now every step that we can take we can validate it with the original baseline we have.

The first thing to improve the performance of any solution is to measure if we don’t measure we cannot compare if we cannot compare we make modifications blindly. Optimizing without measuring is like developing without testing… and you know what we think about that.

A Floating GCC Optimization

A couple of months ago, while debugging the build of the VM, I found a problem with the optimization of GCC 8.3 (I am not sure what other versions may show it). Basically, this GCC version does not compile one of our virtual machines functions well, because it unexpectedly removes what looks like valid code. The affected code in the VM was the following:

Spur64BitMemoryManager >> fetchLong32: fieldIndex ofFloatObject: oop
    "index by word size, and return a pointer as long as the word size"
    | bits |

    (self isImmediateFloat: oop) ifFalse:
        [^self fetchLong32: fieldIndex ofObject: oop].
    bits := self smallFloatBitsOf: oop.
        cCode: [(self cCoerceSimple: (self addressOf: bits) to: #'int *') at: fieldIndex]
        inSmalltalk: [
            self flag: #endian.
            fieldIndex = 0
                ifTrue: [bits bitAnd: 16rFFFFFFFF]
                ifFalse: [bits >> 32]]

This method is used to extract the two 32 bits words that are stored in a Float. It handles the cases when the Float number is an immediate value and when it is not. The problem is in the part that handles the immediate value. In the Smalltalk simulator, for extracting the first Int32 it performs and a #bitAnd: and a bit shift (#>>) for the second one.

As you can see, in the last part of the method, there is a conditional that handles a special case differently in the Pharo simulated code and in the translated C code. The C code was being generated form this then:

(self cCoerceSimple: (self addressOf: bits) to: #'int *')      at: fieldIndex

And the generated C code was the following:

 ((int *)(&bits))[fieldIndex];

This is accessing the memory of the bits variable using indirect memory access. Somehow, this pattern of code was marking all usages of the variable bits as dead code, and so all dead code was being removed. To detect the problem it was not easy, basically, I have used the Pharo Tests to detect the failing function and then to understand the generation of code in GCC, we need to debug its generation. After some digging, I understood this was a problem related to some undefined behavior and aliasing of variables in the stack, which you can read more about in here and here. However, in the middle I learned how to debug GCC optimizations, so the post is about that, I hope somebody finds it interesting and useful.

Starting from a simple reproducible case

I managed to create first a simpler case showing the problem:

long __attribute__ ((noinline)) myFunc(long i, int index){
  long v;
  long x = i >> 3;
  v = x;
  return ((int*)(&v))[index];

#include <stdio.h>

int main(){
   long i;
   int x;

   scanf("%ld", &i);
   scanf("%d", &x);


This small function is just taking a long value (64bit integer), then it shifts the value and stores it in a temporary variable (a variable that is in the stack of the function). Finally, it performs the same indirect access that we have seen in the VM code showing the error.

So, to continue the investigation let’s compile the function with different optimization levels, as bigger the number of the level bigger the number of optimizations the compiler will do. 

Understanding the Level 1 generated code

When using the optimization level 1 (-01) it generates the following code:

    sarq $3, %rdi
    movq %rdi, -8(%rsp)
    movslq %esi, %rsi
    movslq -8(%rsp,%rsi,4), %rax

Before continuing let’s analyze this piece of code. We see only the code of the function myFunc.  To understand it, we need to have some little background on how the parameters are passed in the calling convention in System V x86_64 (in this case a Linux 64bits).  Basically, the register rdi is used for the first parameter, the register rsi is used for the second parameter and the register rax is used for holding the return value. The other important register here is the rsp, that holds the address to the top of the execution stack.

However, using the optimization level 2 (-02) generates a much compact version:

    movslq %esi, %rsi
    movslq -8(%rsp,%rsi,4), %rax

As you can see, the more optimized version is losing the bit shift and
the first mov. It keeps only the final access to the memory and the return. 

Generating the snippets

The first snippet is generated using:

$ gcc -O2 prb.c -S -o prb.s -Wall

and the second using:

$ gcc -O1 prb.c -S -o prb.s -Wall

Debugging GCC steps

The GCC compiler is implemented as a pipeline of transformations. Each transformation takes the previous result and applies one of the steps of the process. When you configure different optimization levels and compilation options different pipelines are executed. 

The GCC pipeline has different steps, so I centered myself in the
tree optimizations. To allow debugging of GCC, using the correct options it dumps each of the intermediate states of the compilation. The intermediate transformations work with C like trees. For example to get the optimized version of the program, before generating the Assembler we have to do:

$ gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout

This will generate:

myFunc (long int i, int index)
 long int v;
 long unsigned int _1;
 long unsigned int _2;
 int * _3;
 int _4;
 long int _8;

<bb 2> [local count: 1073741825]:
 _1 = (long unsigned int) index_7(D);
 _2 = _1 * 4;
 _3 = &v + _2;
 _4 = *_3;
 _8 = (long int) _4;
 v ={v} {CLOBBER};
 return _8;

This is the optimized SSA (static single assign) version of the function. As you can see in this version the code is already optimized, and our operations not correctly generated. I expected to see something like:

myFunc (long int i, int index)
 long int x;
 long int v;
 long unsigned int _1;
 long unsigned int _2;
 int * _3;
 int _4;
 long int _10;

<bb 2> [local count: 1073741825]:
 x_6 = i_5(D) >> 3;
 v = x_6;
 _1 = (long unsigned int) index_9(D);
 _2 = _1 * 4;
 _3 = &v + _2;
 _4 = *_3;
 _10 = (long int) _4;
 v ={v} {CLOBBER};
 return _10;


In the first SSA version, we are lacking the shift operation:

x_6 = i_5(D) >> 3;
v = x_6;

So, we need to see which of the optimizations and transformations is losing our code. To see all the steps we should execute:

$ gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all

This will generate a lot, really a lot, of files in the same directory with the name prb.c.xxx.yyyy. Where xxx is the number of the step, and yyyy is the name of the step. Each of the files contains the result of applying the changes, so what I have done is looking in a binary search where my code was lost.

I have found that the problem was in the first dead code elimination
step (prb.c.041t.cddce1). GCC does not like that we are copying a stack variable and then accessing directly as memory:

v = x;
return ((int*)(&v))[index];

So, basically, it was assuming that we were only using v and not x, so all the code modifying x is discarded (this optimization is called “dead store code deletion”). Basically, it tries to remove all the code that is affecting stack (temporary) variables that are never used.

Solving the problem

I have fixed the problem by writing the code in a different way. Actually, it was only necessary to remove the special C case in the original code, and let the Slang C code translator do his thing.

Spur64BitMemoryManager >> fetchLong32: fieldIndex ofFloatObject: oop
    "index by word size, and return a pointer as long as the word size"
    | bits |

    (self isImmediateFloat: oop) ifFalse:
        [^self fetchLong32: fieldIndex ofObject: oop].
    bits := self smallFloatBitsOf: oop.
    ^ fieldIndex = 0
          ifTrue: [bits bitAnd: 16rFFFFFFFF]
          ifFalse: [bits >> 32]]


I had published a smaller version of this post in the Pharo mailing list when commenting on the bug.

I wanted to store this knowledge and propagate it before I eventually flush it from my mind. I like these things, but I am not debugging the GCC optimization pipelines every day.

All in all, be careful about undefined behaviors!

Implementing an object-centric breakpoint with Reflectivity

In this post, we describe how we implement a breakpoint that affects only one specific object, and how we implement it using the Reflectivity framework.

What is an object-centric breakpoint?

Let’s take a simple example: imagine two Point instances p1 and p2.

p1 := 0@2.
p2 := 1@3.

Each of these points has two instance variables, x and y, that we can change by calling the setX:setY: method. Imagine that we have a bug related to point p1, and that we want to halt the execution when this object executes the setX:setY: method.

We definitely do not want to put a breakpoint directly in the setX:setY: of class Point method. Points are used all over the system: putting a breakpoint in class Point will halt whenever any point calls that method and the image will freeze.

What we really need is a breakpoint that halts the execution only when setX:setY: is called on p1.

The halt-on-call breakpoint

This breakpoint is an operator called halt-on-call, defined by Jorge Ressia in his Object-Centric debugger. It halts whenever one specific object receives a given message. This is what we need! Ideally, we would like to use an API like this:

p1 haltOnCall: #setX:setY

This installs a breakpoint that halts the method setX:setY: for the point p1 exclusively. By extension, all objects should benefit from that API, so that we can install object-centric breakpoints on any kind of object. Now let’s implement it.

Implementing the halt-on-call breakpoint API

Let’s define an interface for this operator in the Object class, to make it available for all objects. Let’s call it haltOnCall:. It takes a method selector as parameter, which defines the selector of the method to halt. We delegate the definition and the installation of the breakpoint instrumentation to another object named ObjectCentricInstrumenter:

Object >> haltOnCall: methodSelector
  ^ ObjectCentricInstrumenter new halt: methodSelector for: self

This interface installs a halt-on-call breakpoint on its receiver, and returns the object modeling that instrumentation. It is very important to keep a reference to that instrumenter object if we want to uninstall our breakpoint later. This would typically be handled by a higher level tool such as a real debugger.

This method is now our top-level debugging API, available for all objects in the system. Using this API, we can now ask any object to halt when it receives a particular message. Now, we have to create the ObjectCentricInstrumenter class and implement the halt:for: method that is used to instrument the receiver in the code above.

Implementing an object-centric instrumenter using Reflectivity

We use the Reflectivity framework as a support to implement object-centric breakpoints.

What is Reflectivity?

Reflectivity is a reflective framework shipped in the base Pharo distribution. It features annotation objects named Metalink that apply reflective operations at the sub-method level (i.e., at the level of sub-expressions of a method).

A metalink is an annotation of a method AST. It is an object that defines a message selector, a receiver named meta-object, and an optional list of arguments. At run time, when the code corresponding to the annotated AST is reached, the metalink is executed. The message corresponding to the selector is sent to the meta-object, with the previously computed argument list: the corresponding method is executed in the meta-object.

For example, adding logging to Point>>setX:setY:with Reflectivity goes as follows:

  1. Instantiate a metalink
  2. Define a meta-object, for example, a block:
    [Transcript show: 'Hello World']
  3. Define a message selector that will be sent to the block at run time, for example: value
  4. Attach the metalink to the ast node of a method, for example the method setX:setY: of Point

At run time, each time a point will execute setX:setY:,  the metalink will first execute and send value to the block object [Transcript show: 'Hello World'], resulting in the execution of the block. Then the execution of setX:setY: will continue.

Now, back to our original problem, Reflectivity supports object-centric metalinks. This means we can actually scope a metalink to a single, specific object. We will use this feature to implement our object-centric instrumenter and define our object-centric breakpoint.

More details about Reflectivity are available in the latest Reflectivity paper.

Implementing the object-centric instrumenter

Now, we have to create the ObjectCentricInstrumenter class and implement the halt:for: method. This class has three instance variables:

  • targetObject: the target object affected by instrumentation
  • metalink: the instrumentation per se, that is, a metalink
  • methodNode: the AST node of the method we instrument
Object subclass: #ObjectCentricInstrumenter
  instanceVariableNames: 'targetObject metalink methodNode'
  classVariableNames: ''
  package: 'Your-Pharo-Package'

In this class, we have to define how we install the halt-on-call breakpoint on our object. This is done through the halt:for: method. This method takes two parameters: the message selector of the method that will halt and the target object that the breakpoint will affect.

ObjectCentricInstrumenter >> halt: methodSelector for: anObject
  targetObject := anObject.
  metalink := MetaLink new
    metaObject: #object;
    selector: #halt.
  targetObject link: metalink toMethodNamed: methodSelector

First, we store the target object (line 2). We need to keep a reference to that object to uninstall the breakpoint later. Then, we configure a metalink to send the #halt message to #object (lines 3-5). At run time, #object represents the receiver of the current method that is executing. This instrumentation is equivalent to insert a self halt instruction at the beginning of the instrumented method. Finally, we use ourReflectivity’s object-centric API (line 6) to install the breakpoint metalink on the target method, but only for the target object.

When that is done, targetObject will halt whenever it receives the message methodSelector. All other objects from the system remain unaffected by that new breakpoint.

Using our object-centric breakpoint

Now that we have implemented our object-centric breakpoint, we can use it. We instrument our point p1 with an object-centric breakpoint on the setX:setY: method (line 3). We store the instrumenter object in the instrumenter variable so that we can reuse it later. Calling setX:setY: on p1 will now halt the system, while calling it on p2 or any other point will not halt.

p1 := 0@2.
p2 := 1@3.
instrumenter := p1 haltOnCall: #setX:setY:.
p1 setX: 4 setY: 2. "<- halt!"
p2 setX: 5 setY: 3. "<- no halt"

After debugging, we will probably need to uninstall our breakpoint. As we kept a reference to the instrumenter object, we can use it to change or to remove the instrumentation it defines.

Let’s first define an uninstall method in the ObjectCentricInstrumenter class. This method just calls the uninstall behavior of the metalink, removing all instrumentation from the target object.

ObjectCentricInstrumenter >> uninstall
  metalink uninstall

Our little example script now becomes:

point p1 := 0@2.
point p2 := 1@3.

instrumenter := p1 haltOnCall: #setX:setY:.
p1 setX: 4 setY: 2. "<- halt!"
p2 setX: 5 setY: 3. "<- no halt"

instrumenter uninstall.
p1 setX: 4 setY: 2. "<- no halt"
p2 setX: 5 setY: 3. "<- no halt"

More object-centric debugging tools!

We showed how to implement a breakpoint that halts when one single, specific object receives a particular message. And it was simple!

But why stopping there? The Pharo reflective tools provide us with much more power! In our next blog posts, we’ll show how we can implement a breakpoint that halts when the state of one specific object is accessed.

[Pharo features] The fusion of a developed program and IDE

In this series of blog-posts, we are explaining in more detail the Pharo features described in https://pharo.org/features.

Pharo follows several design principles, and one of them says that the interaction between the human and computer should be modeless. The interface needs to be as direct and intuitive as possible. If you try to think for a moment, what are the sources of modality in current user interfaces, you may easily overlook the worst one. Applications. The applications are giant modes, and our goal should be to avoid them.

Most developers prefer to use integrated development environments over the separate tools but, traditionally, they still have several modes of operation. You write code; then you debug it in a debugger and finally, you run the complete program. Every mode swapping is inconvenient and decreases effectivity, so the modern development environments try to blend these modes to allow one to modify the program in the debugger, change values of instance variables and so on. Some allow writing of custom extensions to help visualize the state of the program during debugging in some way. It is great for developers, but it makes the debuggers and whole development environments very complicated beasts that are hard to maintain and understand.

Pharo uses an elegant, straightforward approach. In Pharo, there isn’t any visible difference between code writing, debugging and running of a program. Pharo is almost completely modeless in this regard. There is just one program in Pharo, Pharo itself. When you start to write your own code, you are actually extending Pharo, teaching it a new thing while it runs. If you want to interrupt the program to inspect its state or perform stepping, it does not stop the program. Remember, Pharo is your program. If you would stop your program entirely, Pharo, the whole IDE, would stop. Your program is never really interrupted, which is great because it probably contains a lot of useful code. It may contain code that allows you to print your data structures to a console while debugging or maybe even to do some interactive visualization. To restart a lost connection to a database or open a form to correct broken data that caused an error.

The fact that writing a program means extending the development system means that you never start on an empty field. From the very beginning, you may use all the power and libraries that Pharo has. And when you are finished, you just cut Pharo features you do not want. You may unload the code of the development tools or directly bootstrap a new Pharo system without it, and the users of your program will not be able to use them anymore if you want.
The fusion of your program and the development environment has many positive consequences for the everyday workflow of a developer. It allows creating visualizations embedded into a debugger so you may see your data in a more accessible form. You may embed whole custom editors of your data into the debugger. You may select some expressions in the debugger window and let them inspect in a separate window. You can write the entire program in the debugger while it is being executed.

There is one feature that I particularly like, and that is extremely rare in other development environments. A feature that is very helpful and that I use on a daily basis. Everything that I told about the modeless approach is valid even for the debugger. In Pharo, a debugger is just a tool that uses exceptions and process control to hook into existing execution contexts to step them for a while using a bytecode simulator instead of letting them run at full speed. There is no single reason why there should be only one debugger. You may have dozens of opened debuggers at once and, moreover, in one debugger, you can select a piece of code and let it debug in another debugger. So you can debug a code that the running thread had not yet reached or, the other way around, has already executed.

The short video we created for this feature demonstrates it. It shows a debugger that was opened on a simple circuits designer program.

  • We look at the content of an instance variable that contains a collection of standalone circuit regions.
  • We select some of them and look at the custom visualization of these regions – this visualization with the full interactive circuit editor that displays all circuit cells that do not belong to the selected region as darker.
  • Then we select a piece of code in the debugger and run another debugger on top of it.
  • After a few steps in this new debugger, we just close it and return to the original one. There, we use the embedded circuit editor inside the debugger to actually change the circuit. It, of course, uses a lot of code of the application we are just debugging.
  • After this change, we continue to stepping.

This fusion of a developed program and development environment is optional. In Pharo, you still can work in the old-fashioned way where you modify in an external editor the source files and then let them load and execute in plain Pharo without any development tools. But you should not do it if you do not need to – you will lose one of the greatest advantages that Pharo provides. Moreover, the other developers do not use Pharo this way, so do not expect polished support for such workflow.

Every Pharo feature that we will mention has some disadvantages. This one is not an exception. First, it forces newcomers to learn to use the IDE that is probably significantly different from others that they know. To try to use some third-party editor and tools that the programmer probably mastered before, does not bring any advantage. That raises the primary barrier for entering the Pharo environment.

Next, the merging of the developed program with the environment makes the whole more fragile. Pharo has a very long tradition in dealing with the unwanted consequences of mistakes the programmers can do so only very small percents of issues lead to errors you cannot recover from or crashes of the entire environment. Pharo is built with presumptions that things can go wrong, so it is implausible you will lose some code you wrote without an option to recover it.

Getting a handle on names

There are only two hard things in Computer Science: cache invalidation and naming things.

Phil Karlton

Two hard things, but they are not the same kind of hard. Naming things we do it every day. And we read and re-read the things we name all day long.

This is a post about names. And ambiguity. And pain. And how to iterate on that to get something better at the end of the day.

And I’ll illustrate some ideas with the OSWindow project.

An OSWindow has a handle…

So, an OSWindow has a handle.

An OSWindow has a handle. What is this handle?

But let’s first start by saying what an OSWindow is. An OSWindow is an object representing an operating system window in an uniform way. Behind the scenes, an OSWindow uses some backend like WinForms in Windows or Cocoa in OSX. The main backend implementation we have in Pharo uses SDL2, a cross-platform library, mainly designed for game dev, that can be used for windowing.

So, let’s start again. An OSWindow has a handle. From the explanation above, you may think the handle is a pointer to a real window implemented in the backend framework. Actually, handle is indeed the name we use in pharo to designate a pointer to some externally managed memory region. For those used to FFI, a handle usually designates an ExternalAddress or pointer (And for those who aren’t you may take a look here).

Maybe the handle is an ExternalAddress as we use to do in uFFI?

However, regardless of our first impressions on this, reading the code shows another thing. The code manipulating this handle does not use it as an ExternalAddress, because the only thing we can do with external addresses is to read or write them (or operate with them, they are addresses after all). For example, let’s take the following piece of code from the OSWindow class:

handle show

or even:

handle newOpenGLRenderer

So the handle is finally not an FFI thingy as we thought! It’s more like a backend object representing the real window. Of course, the different native windows are manipulated through different sets of functions. And it seems it was decided to manage these differences as a composition of objects instead of inheritance, which we will not discuss nor argue today. Today we focus on names.

It turns out that these backend objects are also called handles. We have for example OSSDL2WindowHandle, OSVMWindowHandle, OSNullWindowHandle. This seems consistent with what we saw before. So, an OSWindow actually has an OSWindowHandle and not an FFI handle. I think we now understand enough to keep reading more code…

Let’s now dive in the OSSDL2WindowHandle class.

WAT? An OSSDL2WindowHandle also has a handle!

Wait! What!? WAT? The OSSDL2WindowHandle has another handle. But this cannot be an OSWindowHandle, right?

Is the handle of the handle a handle?

Or maybe now we did arrive to an ExternalAddress finally.

Or maybe the handle is an external address (an FFI handle?)

Again, reading the code tells us it is neither… For example the following pieces of code sending messages to the handle that are implemented in none of the classes we expect:

OSSDL2WindowHandle >> toggleBorderOn [
  handle toggleBorder: true.

OSSDL2WindowHandle >> getWMInfo [
  | wmInfo |
  wmInfo := SDL_SysWMinfo new version: SDL_Version bindingVersion.
  handle getWMInfo: wmInfo.
  ^ wmInfo

Following a bit the code, we can see this second handle is actually an SDL2Window implemented as an FFIExternalObject. And then I, used to uFFI conventions, know that FFIExternalObjects have a handle themselves, the real handle we were looking for from the beginning.

The SDL2Window is an FFIExtenralObject has a handle, an ExtenralAddress, finally

So the handle of the handle is an external object, which has a handle. the handle has a handle with a handle. What could go wrong here? 🙂

The full picture , to get a handle on the handles.

I don’t know you, but I felt pretty lost when reading this code, because of this overloaded terminology. The same name is used to designate three different things. And those different things are in overlapping domains! So when reading handle we need to actively think where we are and what it is designating here and now. And for worse, this problem of overloaded terminology does not stop at the name handle, I’ve found also that window could designate an OSWindow or an OSWindowHandle. So what do you think the following expression returns?

window handle

Take your pick 🙂

Getting the names on the handles

I decided to fix this name confusiology in this Pull Request by doing a couple of renames. I chose new names, that you may not agree with btw, following a couple of guidelines:

  • avoid ambiguity: three different names for three different non-interchangeable concepts
  • use generic names when possible, but staying in the domain terminology
  • use concrete names when variables will only reference objects of a single type/class

Reading the code and understanding the domain helped a bit in choosing the names, which now make OSWindow domain read as:

An OSWindow has a backend window, instance of OSBackendWindow. An OSSDL2BackendWindow has an sdl2Window (no need to say the class ;)). Moreover, all objects designate OSWindows as window, and OSBackendWindow as backend windows .

The final overview after renaming

Name the next step!

Do you agree with the names I choose? Please share yours! Discussing names is the only way to find better ones :). And we can always iterate and improve more. No design should be carved in stone.

In the meantime, I’ll keep improving OSWindow, we should have good support for multiple native windows soon.

I hope you enjoyed and that you give good names in your programs. For your future self. Or at least for me, if I have to read your code 🙂

Implementing Indexes – A Simple Index

In Pharo, we have an excellent tool to query, navigate and discover classes and methods in the system. The tool is Spotter, this tool allows us to search the image for different elements:

  • Classes
  • Methods
  • Menu entries
  • Help topics
  • and many more…

To access to it is as easy as doing Shift + Enter,

Spotter allows us to look for classes, methods and other elements doing a text search on the name of the element.

Spotter gives us access to all the classes in the system, giving us the power to learn, discover and increase our productivity.

However, it has a problem. If the image is a large image (it contains a couple of millions of objects and/or thousand of classes — one company has just 30 millions line of code) the lookup of the elements is slow. We need to implement a text search index.

A simple “Begins With” index

Spotter allows us to do full-text searches, looking up the text we want (the needle) in any part of the system. Although, we are going to start with an easier index, one that allows us to look for the entries that begin with the needle.

If we look up for Obj, the index will answer for example Object, and ObjectLayout; but it will not include DisplayableObject.  We will see how to implement a full-text search and even fuzzy logic searches in future entries!

For implementing the index we are going to build a Trie.  From Wikipedia:

A trie also called a digital tree or prefix tree, is a kind of search tree—an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node [Full Article].

In this case, we are going to start with the implementation that is available in Github Pharo-Containers/Containers-Trie. We thanks Benoit St-Jean for his implementation.

This implementation is based on the first iteration of Benoît St-Jean.

To install it is as easy as executing the following Metacello expression:

Metacello new
      baseline: 'ContainersTrie';
      repository: 'github://pharo-containers/Containers-Trie/src';

Once installed we will have the CTTrie class to create a new trie and use it.

Creating a new Trie is as easy as creating a new instance of CTTrie

aNewTrie := CTTrie new

Once we have an instance we can start to add elements in it and then performing a search on it.  The instances of CTTrie understand a series of messages to access, add and update the nodes in the trie.

A trie is a collection of associations. Each association has a key and a value. The Key should always be a String. The value can be of any type.

  • at: aKey – returns the stored value for this key, throws an error if not found.
  • at: aKey put: aValue – store the association of the key and the value in the trie.
  • at: aKey update: aBlock – updates the existing element in the trie with the result of executing the block passed as parameter. The block has a single argument that is the previous value associated with the key.
  • at: aKey update: anUpdateBlock initial: anInitialBlock – if the trie contains a value associated with the key, the block anUpdateBlock is used to update the value receiving as optional argument the previous value. If the trie does not contain an association for the key, a new association is created with the result of executing anInitialBlock.
  • removeKey: aKey – removes the association with the given key from the trie, throws an error if not found
  • removeKey: aKey ifAbsent: aBlock – removes the association with the given key from the trie. If the key is not found, the block is executed.

These methods allow us to access and update the trie using the full key.

As an example:

aTrie := CTTrie new.
aTrie at: #cat put: 'Meow'.
aTrie at: #dog put: 'Guau'.
aTrie at: #cat.
aTrie at: #fox put: '???'.
aTrie at: #fox.
aTrie at: #fox update: [ 'What does the fox say?' ].
aTrie at: #fox

As far as we are, the Trie is not more powerful than the Dictionary instances in Pharo.  So let’s see, which is the API to do queries to the trie.

  • allValues – it returns a collection with all the values stored in the trie.
  • allValuesBeginningWith: aString – it returns all the values that are stored in the trie and which keys start with the string passed as parameter.
  • withAllValuesDo: aBlock – it iterates all the trie and executes the block on each of the values in the trie. This method does not create the intermediate collection, so it is faster than using allValues and then iterating the collection.
  • withAllValuesBeginningWith: aString do: aBlock – it iterates all the elements that have a key starting with the passed string and executes the block on each of them. This method does not create the intermediate collection, so it is faster than using allValuesBeginningWith: and then iterating the collection.

So now, we are able to query the trie and get all the elements that have keys that begin with a given string.

Let’s index all the classes in the system.

"Let's create a new Trie"
behaviors := CTTrie new.

"We will iterate all the behaviors (classes and traits) and store
them in the trie with their name as key"
SystemNavigation default allBehaviorsDo: [ :aBehavior | 
	behaviors at: aBehavior name put: aBehavior].

"We can access the trie to make queries!"
behaviors at: #Object.
behaviors allValuesBeginningWith: 'Obj'.

What is the advantage?

Using a trie provides a fast way of performing the lookup of keys including a given prefix. Without the use of a trie, we need to iterate the full collection to compare each of the keys (for example using the message beginsWith: that is present in String).

This operation is very expensive when we have thousands and thousands of elements to iterate, in the case of Spotter, thousands and thousands of methods and classes.

Using a trie the lookup effort does not depend on the size of the indexed collection but in the size of the prefix that we are looking for. More precisely, looking up data in a trie is faster in the worst case, O(m) time (where m is the length of a search string).

How is it implemented?

A trie is an unbalanced tree. We have implemented the trie using two classes CTTrie and CTTrieNode.

CTTrie is the frontend of the implementation and it is used by the user to access the associations in it. It contains a single node that is the root of the tree. This class implements all the messages that the user wants to use and hides the implementation details in the nodes of the tree.

Each node of the tree is an instance of CTTrieNode, and each of these instances has:

  • The node value
  • The children of the node
  • And the first character of the subtree that this node starts.

In a Trie, the key of an entry is encoded in the path from the root to the node containing the value.

In the following picture, we can see an example. The code to generate this example is the following:

aTrie := CTTrie new
aTrie at:'bat' put:'bat'.
aTrie at:'cat' put:'cat'.
aTrie at:'cow' put:'cow'.


This example has 3 entries, for the clarity of the example, the keys and the values are the same, but you can store anything in the values.

As we can see from the picture, the keys of the associations are encoded in the path from the root of the tree. For the key bat, we start looking up the first letter of the string to select the path to get the answer, then the second letter and finally the third one.

The same lookup procedure can be executed to find the keys cow and cat.

If we perform the lookup of a key and we do not end in a leaf of the tree (a leaf node is one node with a value), we have found that the key is not present in the trie.

If we want to have all the entries for a given prefix, we will get a subtree. In this case if we want all the keys starting with $c we will get all the right subtree.

With this structure, it is only needed to perform n accesses to memory to get to the result, where n is the length of the prefix or key that we are looking for.

This is just a small introduction, for better understanding, why don’t you try with the Pharo implementation, adding elements to it and inspecting the structure of the resulting trie.

Still, we have problems…

The implementation we have just seen presents two problems:

  • It is not space-efficient, as a lot of nodes are creating and many of them do not contain nor children nor values.
  • It does not allow us to perform searches in the middle of the keys.

We will address both of these problems in the next entry of this series. Where we are going to see the optimized version of this trie and how to implement a suffix trie.

In the meanwhile, you can read the implementation, check the tests and understand how it is implemented. It is a really straight forward implementation, and why not… start reading the implementation of the optimized version!!!

Complishon: efficient heuristics for code completion

A couple of weeks ago I started prototyping a new code completion backend. The one available in Pharo 8.0 is good-enough to complete variables and messages to literal objects. However, the accuracy of the completion was pretty low because auto completion candidates were sorted alphabetically, and this was even producing completion pauses for large images. This code completion prototype engine, that I originally baptised as Complishon, was based on two ideas: generators and heuristics.

Generators are objects that provide a stream-like interface to consume values generated sequentially. Better with an example.

stream := Generator on: [ :generator |
  generator yield: 1.
  generator yield: 'hello'.
stream next.
>>> 1
stream next.
>>> 'hello'

One cool thing about generators is that the generator code will block until we call next. This is particularly interesting when iterating several big collections: we only really iterate them if we need them. And they come by default in Pharo :).

Heuristics, on the other hand, are not funky Pharo objects that come in a library. Heuristics are objects that represent rules. In our case rules for auto completion. Here you have some example of heuristics:

  • chances are I want to autocomplete first the methods defined in my package
  • chances are that a variable called anOrderedCollection will have effectively an OrderedCollection instance

And so on.

If we encode enough of these little rules on objects, we could have a smart enough auto-completion engine that does not require complex algorithms or machine learning models. Not saying that these may not be good, just saying that I wanted a quick solution to make our life easier.

This post talks about the architecture inside the completion engine. What parts are inside it and how they relate. Turns out this prototype grew up and is making it to the big leagues in Pharo 9.0, where it will be called HeuristicsCompletion.

An architecture for lazy code completion

The heuristics completion engine is done out of three main components:

  • lazy fetchers implemented using generators,
  • a completion object that works as a result-set
  • and several completion heuristics that are chosen from the current AST node.


Fetchers are subclasses of CoFetcher. They need to define the method #entriesDo: aBlock iterating a collection. The fetcher framework will take care of creating a streaming API for it using generators. For example, a simple fetcher can be created and used with:

CoFetcher subclass: #MyFetcher
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'Complishon-Core'
MyFetcher >> entriesDo: aBlock [
  1 to: 10000000000 do: aBlock

MyFetcher new next >>> 1.

Basic Fetchers

A simple generic fetcher works on a collection.

CoCollectionFetcher onCollection: #( a b b a c )

And an empty fetcher is useful to provide no results

CoEmptyFetcher new

Fetcher combinators

Fetchers can be combined by using simple combinators. A sequence of fetchers, created with the message #,, fetches first the elements in the first fetcher, then in the second one. This also means that the second fetcher will never be executed until the first one is completely consumed, which is an interesting property if it is expensive to calculate.

CoInstanceVariableFetcher new, CoGlobalVariableFetcher new

filter fetcher, created with the message #select:, decorates another fetcher and filters it with a block.

CoInstanceVariableFetcher new select: [ :e | "a condition..." ]

no-repetition fetcher, created with the message #withoutRepetition, decorates another fetcher and filters those elements that were already returned previously by himself.

aFetcher withoutRepetition

The Juice: Fetchers for code completion

In addition, the engine provides fetchers to iterate the following:

  • CoInstanceVariableFetcher: instance variables of a class
  • CoClassVariableFetcher: class variables of a class
  • CoClassImplementedMessagesFetcher: messages implemented by a class
  • CoGlobalVariableFetcher: global variables in an environment
  • CoMethodVariableFetcher: variables accessible to an AST node (in order)
  • CoPackageImplementedMessagesFetcher: messages sent in a package

For code completion, another combinator shows useful to iterate a fetcher up a class hierarchy: CoHierarchyFetcher. This hierarchy fetcher decorates a ClassBasedComplishonFetcher (i.e., a CoClassImplementedMessagesFetcher, a CoInstanceVariableFetcher or a CoClassImplementedMessagesFetcher), and can be created with the message #forHierarchy.

Completion ResultSet

The CoResultSet object stores a fetcher and lazily fetches and internally store objects on demand:

c := CoResultSet fetcher: (CoInstanceVariableFetcher new
    completionClass: aClass;

"The first time it will fetch 20 elements from the fetcher and store them"
c first: 20.

"The second time it will just return the already stored elements"
c first: 20.

"Now it will fetch some more objects to have its 25"
c first: 25.

The CoResultSet object allows one to:

  • explicit fetch using fetch: and fetchAll
  • retrieve the results using firstfirst: and results
  • filter those results using filterByString:
  • query it with hasMoreElements and notEmpty


When the autocompletion is invoked, it calculates how to autocomplete given the AST node where the caret is in the text editor. The following piece of code shows examples of heuristics implemented in the prototype.

Heuristics for messages

If the AST node is a message whose receiver is self, autocomplete all messages implemented in the hierarchy.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is super, autocomplete all messages implemented in the hierarchy starting from the superclass.

(CoClassImplementedMessagesFetcher new
    completionClass: aClass superclass;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a class name like OrderedCollection, autocomplete all messages in the class-side hierarchy of that class.

(ClassImplementedMessagesComplishonFetcher new
    completionClass: (completionContext environmentAt: aRBMessageNode receiver name) classSide;
    forHierarchy) withoutRepetition

If the AST node is a message whose receiver is a variable that has type information in its name name, like anOrderedCollection, autocomplete all messages in the instance-side hierarchy of guessed class. Then continue with normal completion. There are two cases: variables starting with a such as aPoint and variables starting with an such as anASTCache.

  environmentAt: aRBMessageNode receiver name allButFirst asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ].

  environmentAt: (aRBMessageNode receiver name allButFirst: 2) asSymbol
  ifPresent: [ :class |
    ^ (ClassImplementedMessagesComplishonFetcher new
        completionClass: class;
        forHierarchy), PackageImplementedMessagesComplishonFetcher new  ]

If all the above fail, autocomplete all messages used in the current package. Chances are we are going to send them again.

 ^ CoPackageImplementedMessagesFetcher new
    	complishonPackage: aPackage;

Heuristics for Method signature

When autocompleting the name of a new method, chances are we want to override a method in the hierarchy, or to reimplement a method polymorphic with the ones existing in the current package.

self newSuperMessageInHierarchyFetcher,
    (CoPackageImplementedMessagesFetcher new
        complishonPackage: aPackage;

Heuristics for variables

Variables accessed by an instance are first the ones in the method, then the ones in the hierarchy.

instanceAccessible := (CoMethodVariableFetcher new
        complishonASTNode: anAST;
            (CoInstanceVariableFetcher new
                completionClass: aClass)
			forHierarchy ].

Then, variables accessed by an instance are also the ones in the class variables and globals.

globallyAccessible := (CoClassVariableFetcher new
    completionClass: complishonContext complishonClass)
            (CoGlobalVariableFetcher new
	        complishonEnvironment: anEnvironment;

What’s next?

Try it in the lastest Pharo 9.0 dev build!

We would like to have new ideas for heuristics.

For example, I’ve started integrating it with a fast type inferencer that looks at initialize methods, or with a fast type inferencer like roel typer which latest version for Pharo can be found on github.

Why not machine learning?

Statistical models?