What is RBParseTreeSearcher ?

Imagine that you want to find a specific expression and that you want to find it in the complete system. How many classes would you have to look for? How can you be sure that you did not miss any class and being sure that you won’t be frustrated because of the number of issues thrown on compilation or execution? In addition imagine other scenario where you want to transform that expression into another one.

Changing code, removing, or replacing deprecated methods is costly for a developer by doing it manually instead of using an automated feature.

This blog post will explain how to find a specific piece of code we may look for inside a Pharo program, and make it easy for the developers to deal with pattern matching and RBParseTreeSearcher class.

Following work will be about how to replace code using RBParseTreeRewriter and doing the exact same thing automatically using the Rewrite tool (a tool built on top the RBParseTreeRewriter). 

For the moment, we will explain some fundamental definitions and for that the post is structured following below sections:

  1. Pattern code description
  2. RBParseTreeSearcher description
  3. RBParseTreeSearcher examples with pattern code

1. Pattern code description

A pattern expression is very similar to an ordinary Smalltalk expression, but allows one to specify some “wildcards”. The purpose is simple. Imagine that you have a piece of code:

car isNil ifTrue: [ ^ self ].

You can of course compare it with the same piece of code for equality, but wouldn’t it be cool if you could compare something similar, but ignore the fact that the receiver is named car? With pattern rules you can do exactly that. Consider the following code and notice the back-tick before car:

`car isNil ifTrue: [ ^ self ].

Now this expression can match any other expression where isNil ifTrue: [^self] is sent to any variable (or literal). With such a power you can find all the usages of isNil ifTrue: and replace them with ifNil. So what are the “wildcards” that we are using?

(`)Basic pattern nodes

Code prefixed with a back-tick character (`) defines a pattern node. The table below is listing three simple patterns that can be declared with the back-tick:

Pattern typeExampleDescription
Variable`someName asStringThis pattern will match message asString sent to any receiver, disregarding the name of it
MessagePharo globals `someMessageThis pattern will match any unary message sent to Pharo globals.
Method`someMethod ^ nilThis pattern will match any method which returns nil
Selector`sel: aValThis pattern will match any selector followed by aVal.

Example with matches:

`receiver foo 

matches:

  • self foo
  • x foo
  • OrderedCollection foo
(`#) Literal pattern nodes

A back-tick can be followed by the hash sign to ensure that matched receiver will be a literal:

Pattern typePattern nodeDescription
Literal`#literal asArrayThis pattern will match any literal (Number, String, Array of literal ) followed by asArray message

Example:

 `#lit size

matches:

  • 3 size
  • 'foo' size
  • #(a b c) size
(`@) List pattern nodes

To have complete flexibility, there is the possibility to use an at sign @ before the name of a pattern node which turns the node into a list pattern node, which can be empty, returns one or multiple values.

Pattern typePattern nodeDescription
Entity`@expr isPatternVariableThis pattern will match a single or multiple entities followed by isPatternVariable
MessagemyVar `@messageThis pattern will match any message (including unary) sent to myVar
Temporary variable|`temp `@temps|This pattern will match at least one temporary variable which is defined as `temp; For`@temps, the matching can find nil, one or many temporary variables defined.
ArgumentmyDict at: 1 put:`@argsThis pattern will match myDict at: 1 put: followed by a list of arguments `@args that can be nil, one or many args.
List of statements[ `.@statements.
 `var := `myAttribute. ]
We will explain statements later on, but this is to mention that @ can be used also to define a list of statements which can be empty, contain one or many elements.

This expression will match a block which has at first a list of statements, that must be followed by 1 last assignment statement `var := `myAttribute.

Disclaimer:

  • You may write an expression with just args instead of `@args.
  • The list patterns does not make any sense for literal nodes i.e. `#@literal.

Example 1:

`x := `@value

matches:

myVar := OrderedCollection new

Example 2:

`sel1 at: `@args1 `sel2: `@args2

matches:

self at: index putLink: (self linkOf: anObject ifAbsent: [anObject asLink])

Where:

  • `args1 and `args2 have different values
  • `sel1 matches self
  • `@args1 matches index
  • `sel2: matches putLink:
  • `@args2 matches (self linkOf: anObject ifAbsent: [anObject asLink])

Example 3:

`@rcvr `@msg: `@args matches:

(self class deferUpdates: true) ifTrue: [^aBlock value].

Where:

  • `@rcvr matches (self class deferUpdates: true)
  • `@msg: matches ifTrue:
  • `@args matches [^aBlock value]

Example 4:

|`@args1 `myArgument `@args2| matches:

| t1 t2 |

Here we need to have at least 1 argument myArgument , and the example is matching because `@args1 can be empty. So here we have:

  • myArgument is matching with t1
  • `@args2 is matching with t2
(`.) Statement pattern nodes

Back-tick can be followed by a period to match statements. For example:

Pattern typePattern nodeDescription
Statementvar
ifTrue: [`.statement1 ]
ifFalse: [ `.statement2 ]
This pattern will match an ifTrue:ifFalse: message send to any variable, where both blocks have only one statement each.

Example1:

`.Statement1.

is matching:

  • x := 1.
  • myVal:= 'Hello World'.
  • self assert: myVal size equals: 11.

Example2:

|`@temps|
`@.statements1.
`.duplicate.
`@.statements2

matches:

|x|
x := 1.
x := 2

Where:

  • |`@temps| matches |x|
  • `@.statements1. is nil
  • `.duplicate. matches x := 1.
  • `@.statements2

P.S. In the end it does not matter whether you will write `.@Statement or `@.Statement.

(`{ }) Block Pattern Nodes

These are the most exotic of all the nodes. They match any AST nodes like a list pattern and test it with a block. The syntax is similar to the Pharo block, but curly braces are used instead of square brackets and as always the whole expression begins with a back-tick.

Consider the following example:

`{ :node | node isVariable and: [ node isGlobal ] } become: nil

this pattern will match a message #become: with an attribute nil, where the receiver is a variable and it is a global variable. 

There is also a special case called wrapped block pattern node which has the same syntax and follows a normal pattern node. In this case first the node will be matched based on its pattern, and then passed to the block. For example:

`#arr `{ :node | node isLiteralArray } asArray

is a simple way to detect expression like #(1 2 3) asArray. In this case first #(1 2 3) will be matched by the node and then tested by the block.

Naming is Important

The pattern nodes are so that you can match anything in their place. But their naming is also important as the code gets mapped to them by name. For example:

`block value: `@expression value: `@expression

will match only those #value:value: messages that have exactly the same expressions as both arguments. It is like that because we used the same pattern variable name.

2. RBParseTreeSearcher description

So, after figuring out what are the patterns that can be used and what kind of matches they can perform, now we can move forward to discover how RBParseTreeSearcher class works in Pharo , in order to be able to understand in the last section how RBParseTreeSearcher and defined patterns work together to find the matches we are looking for.

RBParseTreeSearcher is supposed to look for a defined pattern using the ‘wildcards’ of a matcher defined as a Tree, and on success (when match is found) a block can be executed.

Basically, when a developper uses this class, the most used instance variables are:

  • #matches:do: which a message that looks for patterns defined in matches: block using the wildcards, and if a match is found the do: block is executed.
    The do block takes two parameters: :aNode and :answer. The aNode refers to each node of the pattern defined, and the answer can be used for example to increment value on each node match.
    The blocks defined in #matches:do: are called rules, and they are stored only in success in instance searches of RBParseTreeSearcher defined below.
  • searches which type is Ordered collection, contains all the successful rules applied whenever using: #matches:do:, #matchesMethod:do … to store rules of type Rule, MethodRule, ArgumentRule, TreeRule …
  • context which type is dictionary: contains all the successfully matched patterns.
  • executeTree: this method takes aParseTree as input parameter, which is the possible matching code that we are looking for, and starts the matching process using the defined pattern.
  • messages of type OrderedCollection, and returns the list of messages found in a match.
  • hasRules returns searches list

Consider the following example which is using the instance sides defined above:

|searcher dict|
searcher := RBParseTreeSearcher new.
searcher
    matches: '`@rcv at:`@arg `sel:`@arg1'
    do: [ :aNode :answer | dict := searcher context ].
searcher executeTree:
    (RBParser parseExpression: 'cache at: each ifAbsentPut: [ each ].').

The method #matches:do: is used to define the pattern that we are looking for, using the ‘wildcards’ defined in first section; In addition of that, the do is running only on match, and in our case it will fill the dictionary dict with the searcher context (which is the pattern defined in matches block).
This execution is fired on executeTree: which defines the matcher that is a String parsed as a Tree using parseExpression, then starts matching it with the pattern.

3. RBParseTreeSearcher examples with pattern code

Finally, in this section we use patterns with the RBParseTreeSearcher class and do some magic by finding some matches in Pharo code !

Consider the following example:

| dict searcher|
searcher := RBParseTreeSearcher new.

searcher  
   matches: '`@receiver assert: `@arg equals: true'
   do: [ :aNode :answer | dict := searcher context ].

searcher 
   executeTree: (RBParser parseExpression: 'self assert: reader storedSettings first realValue equals: true.').

dict 	
   collect: [ :each | each displayString ].

The example is matching successfully and the dictionary dict will return different values during the iteration:

Match 1: (key) `@receiver is matching with (value) self
Match 2: (key) `@arg is matching with (value) reader storedSettings first realValue

If we want to check all the messages in the matcher, we can use searcher messages which will return an array of one item containing message #assert:equals: as it is the only message available in the matched expression.

*********************************

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: