A couple of weeks ago I started prototyping a new code completion backend. The one available in Pharo 8.0 is good-enough to complete variables and messages to literal objects. However, the accuracy of the completion was pretty low because auto completion candidates were sorted alphabetically, and this was even producing completion pauses for large images. This code completion prototype engine, that I originally baptised as Complishon, was based on two ideas: generators and heuristics.
Generators are objects that provide a stream-like interface to consume values generated sequentially. Better with an example.
stream := Generator on: [ :generator | generator yield: 1. generator yield: 'hello'. ]. stream next. >>> 1 stream next. >>> 'hello'
One cool thing about generators is that the generator code will block until we call
next. This is particularly interesting when iterating several big collections: we only really iterate them if we need them. And they come by default in Pharo :).
Heuristics, on the other hand, are not funky Pharo objects that come in a library. Heuristics are objects that represent rules. In our case rules for auto completion. Here you have some example of heuristics:
- chances are I want to autocomplete first the methods defined in my package
- chances are that a variable called
anOrderedCollectionwill have effectively an
And so on.
If we encode enough of these little rules on objects, we could have a smart enough auto-completion engine that does not require complex algorithms or machine learning models. Not saying that these may not be good, just saying that I wanted a quick solution to make our life easier.
This post talks about the architecture inside the completion engine. What parts are inside it and how they relate. Turns out this prototype grew up and is making it to the big leagues in Pharo 9.0, where it will be called HeuristicsCompletion.
An architecture for lazy code completion
The heuristics completion engine is done out of three main components:
- lazy fetchers implemented using generators,
- a completion object that works as a result-set
- and several completion heuristics that are chosen from the current AST node.
Fetchers are subclasses of
CoFetcher. They need to define the method
#entriesDo: aBlock iterating a collection. The fetcher framework will take care of creating a streaming API for it using generators. For example, a simple fetcher can be created and used with:
CoFetcher subclass: #MyFetcher instanceVariableNames: '' classVariableNames: '' package: 'Complishon-Core' MyFetcher >> entriesDo: aBlock [ 1 to: 10000000000 do: aBlock ] MyFetcher new next >>> 1.
A simple generic fetcher works on a collection.
CoCollectionFetcher onCollection: #( a b b a c )
And an empty fetcher is useful to provide no results
Fetchers can be combined by using simple combinators. A sequence of fetchers, created with the message
#,, fetches first the elements in the first fetcher, then in the second one. This also means that the second fetcher will never be executed until the first one is completely consumed, which is an interesting property if it is expensive to calculate.
CoInstanceVariableFetcher new, CoGlobalVariableFetcher new
A filter fetcher, created with the message
#select:, decorates another fetcher and filters it with a block.
CoInstanceVariableFetcher new select: [ :e | "a condition..." ]
A no-repetition fetcher, created with the message
#withoutRepetition, decorates another fetcher and filters those elements that were already returned previously by himself.
In addition, the engine provides fetchers to iterate the following:
CoInstanceVariableFetcher: instance variables of a class
CoClassVariableFetcher: class variables of a class
CoClassImplementedMessagesFetcher: messages implemented by a class
CoGlobalVariableFetcher: global variables in an environment
CoMethodVariableFetcher: variables accessible to an AST node (in order)
CoPackageImplementedMessagesFetcher: messages sent in a package
For code completion, another combinator shows useful to iterate a fetcher up a class hierarchy:
CoHierarchyFetcher. This hierarchy fetcher decorates a
ClassBasedComplishonFetcher (i.e., a
CoInstanceVariableFetcher or a
CoClassImplementedMessagesFetcher), and can be created with the message
CoResultSet object stores a fetcher and lazily fetches and internally store objects on demand:
c := CoResultSet fetcher: (CoInstanceVariableFetcher new completionClass: aClass; yourself). "The first time it will fetch 20 elements from the fetcher and store them" c first: 20. "The second time it will just return the already stored elements" c first: 20. "Now it will fetch some more objects to have its 25" c first: 25.
CoResultSet object allows one to:
- explicit fetch using
- retrieve the results using
- filter those results using
- query it with
When the autocompletion is invoked, it calculates how to autocomplete given the AST node where the caret is in the text editor. The following piece of code shows examples of heuristics implemented in the prototype.
If the AST node is a message whose receiver is
self, autocomplete all messages implemented in the hierarchy.
(CoClassImplementedMessagesFetcher new completionClass: aClass; forHierarchy) withoutRepetition
If the AST node is a message whose receiver is
super, autocomplete all messages implemented in the hierarchy starting from the superclass.
(CoClassImplementedMessagesFetcher new completionClass: aClass superclass; forHierarchy) withoutRepetition
If the AST node is a message whose receiver is a class name like
OrderedCollection, autocomplete all messages in the class-side hierarchy of that class.
(ClassImplementedMessagesComplishonFetcher new completionClass: (completionContext environmentAt: aRBMessageNode receiver name) classSide; forHierarchy) withoutRepetition
If the AST node is a message whose receiver is a variable that has type information in its name name, like
anOrderedCollection, autocomplete all messages in the instance-side hierarchy of guessed class. Then continue with normal completion. There are two cases: variables starting with
a such as
aPoint and variables starting with
an such as
completionContext environmentAt: aRBMessageNode receiver name allButFirst asSymbol ifPresent: [ :class | ^ (ClassImplementedMessagesComplishonFetcher new completionClass: class; forHierarchy), PackageImplementedMessagesComplishonFetcher new ]. completionContext environmentAt: (aRBMessageNode receiver name allButFirst: 2) asSymbol ifPresent: [ :class | ^ (ClassImplementedMessagesComplishonFetcher new completionClass: class; forHierarchy), PackageImplementedMessagesComplishonFetcher new ]
If all the above fail, autocomplete all messages used in the current package. Chances are we are going to send them again.
^ CoPackageImplementedMessagesFetcher new complishonPackage: aPackage; yourself
When autocompleting the name of a new method, chances are we want to override a method in the hierarchy, or to reimplement a method polymorphic with the ones existing in the current package.
self newSuperMessageInHierarchyFetcher, (CoPackageImplementedMessagesFetcher new complishonPackage: aPackage; yourself) withoutRepetition
Variables accessed by an instance are first the ones in the method, then the ones in the hierarchy.
instanceAccessible := (CoMethodVariableFetcher new complishonASTNode: anAST; yourself), (CoInstanceVariableFetcher new completionClass: aClass) forHierarchy ].
Then, variables accessed by an instance are also the ones in the class variables and globals.
globallyAccessible := (CoClassVariableFetcher new completionClass: complishonContext complishonClass) forHierarchy, (CoGlobalVariableFetcher new complishonEnvironment: anEnvironment; yourself).
Try it in the lastest Pharo 9.0 dev build!
We would like to have new ideas for heuristics.
For example, I’ve started integrating it with a fast type inferencer that looks at
initialize methods, or with a fast type inferencer like roel typer which latest version for Pharo can be found on github.
Why not machine learning?