Imagine that you want to find a specific expression and that you want to find it in the complete system. How many classes would you have to look for? How can you be sure that you did not miss any class and being sure that you won’t be frustrated because of the number of issues thrown on compilation or execution? In addition imagine other scenario where you want to transform that expression into another one.
Changing code, removing, or replacing deprecated methods is costly for a developer by doing it manually instead of using an automated feature.
This blog post will explain how to find a specific piece of code we may look for inside a Pharo program, and make it easy for the developers to deal with pattern matching and RBParseTreeSearcher class.
Following work will be about how to replace code using RBParseTreeRewriter and doing the exact same thing automatically using the Rewrite tool (a tool built on top the RBParseTreeRewriter).
For the moment, we will explain some fundamental definitions and for that the post is structured following below sections:
- Pattern code description
- RBParseTreeSearcher description
- RBParseTreeSearcher examples with pattern code
1. Pattern code description
A pattern expression is very similar to an ordinary Smalltalk expression, but allows one to specify some “wildcards”. The purpose is simple. Imagine that you have a piece of code:
car isNil ifTrue: [ ^ self ].
You can of course compare it with the same piece of code for equality, but wouldn’t it be cool if you could compare something similar, but ignore the fact that the receiver is named car? With pattern rules you can do exactly that. Consider the following code and notice the back-tick before car:
`car isNil ifTrue: [ ^ self ].
Now this expression can match any other expression where isNil ifTrue: [^self]
is sent to any variable (or literal). With such a power you can find all the usages of isNil ifTrue:
and replace them with ifNil
. So what are the “wildcards” that we are using?
(`)Basic pattern nodes
Code prefixed with a back-tick character (`) defines a pattern node. The table below is listing three simple patterns that can be declared with the back-tick:
Pattern type | Example | Description |
---|---|---|
Variable | `someName asString | This pattern will match message asString sent to any receiver, disregarding the name of it |
Message | Pharo globals `someMessage | This pattern will match any unary message sent to Pharo globals . |
Method | `someMethod ^ nil | This pattern will match any method which returns nil |
Selector | `sel: aVal | This pattern will match any selector followed by aVal . |
Example with matches:
`receiver foo
matches:
self foo
x foo
OrderedCollection foo
(`#) Literal pattern nodes
A back-tick can be followed by the hash sign to ensure that matched receiver will be a literal:
Pattern type | Pattern node | Description |
---|---|---|
Literal | `#literal asArray | This pattern will match any literal (Number, String, Array of literal ) followed by asArray message |
Example:
`#lit size
matches:
3 size
'foo' size
#(a b c) size
(`@) List pattern nodes
To have complete flexibility, there is the possibility to use an at sign @ before the name of a pattern node which turns the node into a list pattern node, which can be empty, returns one or multiple values.
Pattern type | Pattern node | Description |
---|---|---|
Entity | `@expr isPatternVariable | This pattern will match a single or multiple entities followed by isPatternVariable |
Message | myVar `@message | This pattern will match any message (including unary) sent to myVar |
Temporary variable | |`temp `@temps| | This pattern will match at least one temporary variable which is defined as `temp ; For`@temps , the matching can find nil, one or many temporary variables defined. |
Argument | myDict at: 1 put:`@args | This pattern will match myDict at: 1 put: followed by a list of arguments `@args that can be nil, one or many args. |
List of statements | [ `.@statements. | We will explain statements later on, but this is to mention that @ can be used also to define a list of statements which can be empty, contain one or many elements. This expression will match a block which has at first a list of statements, that must be followed by 1 last assignment statement `var := `myAttribute. |
Disclaimer:
- You may write an expression with just
args
instead of`@args
. - The list patterns does not make any sense for literal nodes i.e.
`#@literal
.
Example 1:
`x := `@value
matches:
myVar := OrderedCollection new
Example 2:
`sel1 at: `@args1 `
sel2:
`@args2
matches:
self at: index putLink: (self linkOf: anObject ifAbsent: [anObject asLink])
Where:
- `
args1 and
`args2
have different values - `
sel1
matchesself
`@args1
matchesindex
`
matchessel2:
putLink:
`@args2
matches(self linkOf: anObject ifAbsent: [anObject asLink])
Example 3:
`@rcvr `@msg: `@args
matches:
(self class deferUpdates: true) ifTrue: [^aBlock value].
Where:
- `
@rcvr
matches(self class deferUpdates: true)
`@msg:
matchesifTrue:
`@args
matches[^aBlock value]
Example 4:
|`@args1 `myArgument `@args2|
matches:
| t1 t2 |
Here we need to have at least 1 argument myArgument
, and the example is matching because `@args1
can be empty. So here we have:
myArgument
is matching witht1
`@args2
is matching witht2
(`.) Statement pattern nodes
Back-tick can be followed by a period to match statements. For example:
Pattern type | Pattern node | Description |
---|---|---|
Statement | var | This pattern will match an ifTrue:ifFalse: message send to any variable, where both blocks have only one statement each. |
Example1:
`.Statement1.
is matching:
x := 1.
myVal:= 'Hello World'.
self assert: myVal size equals: 11.
Example2:
|`@temps|
`@.statements1.
`.duplicate.
`@.statements2
matches:
|x|
x := 1.
x := 2
Where:
|`@temps|
matches|x|
`@.statements1.
is nil`.duplicate.
matchesx := 1.
`@.statements2
P.S. In the end it does not matter whether you will write `.@Statement
or `@.Statement.
(`{ }) Block Pattern Nodes
These are the most exotic of all the nodes. They match any AST nodes like a list pattern and test it with a block. The syntax is similar to the Pharo block, but curly braces are used instead of square brackets and as always the whole expression begins with a back-tick.
Consider the following example:
`{ :node | node isVariable and: [ node isGlobal ] } become: nil
this pattern will match a message #become:
with an attribute nil
, where the receiver is a variable and it is a global variable.
There is also a special case called wrapped block pattern node which has the same syntax and follows a normal pattern node. In this case first the node will be matched based on its pattern, and then passed to the block. For example:
`#arr `{ :node | node isLiteralArray } asArray
is a simple way to detect expression like #(1 2 3) asArray
. In this case first #(1 2 3)
will be matched by the node and then tested by the block.
Naming is Important
The pattern nodes are so that you can match anything in their place. But their naming is also important as the code gets mapped to them by name. For example:
`block value: `@expression value: `@expression
will match only those #value:value:
messages that have exactly the same expressions as both arguments. It is like that because we used the same pattern variable name.
2. RBParseTreeSearcher description
So, after figuring out what are the patterns that can be used and what kind of matches they can perform, now we can move forward to discover how RBParseTreeSearcher class works in Pharo , in order to be able to understand in the last section how RBParseTreeSearcher and defined patterns work together to find the matches we are looking for.
RBParseTreeSearcher is supposed to look for a defined pattern using the ‘wildcards’ of a matcher defined as a Tree, and on success (when match is found) a block can be executed.
Basically, when a developper uses this class, the most used instance variables are:
- #
matches:do:
which a message that looks for patterns defined inmatches:
block using the wildcards, and if a match is found thedo:
block is executed.
Thedo
block takes two parameters::aNode
and:answer
. TheaNode
refers to each node of the pattern defined, and theanswer
can be used for example to increment value on each node match.
The blocks defined in#matches:do:
are called rules, and they are stored only in success in instancesearches
of RBParseTreeSearcher defined below. searches
which type is Ordered collection, contains all the successful rules applied whenever using:#matches:do:
,#matchesMethod:do
… to store rules of type Rule, MethodRule, ArgumentRule, TreeRule …
context
which type is dictionary: contains all the successfully matched patterns.executeTree:
this method takes aParseTree as input parameter, which is the possible matching code that we are looking for, and starts the matching process using the defined pattern.messages
of type OrderedCollection, and returns the list of messages found in a match.hasRules
returnssearches
list
Consider the following example which is using the instance sides defined above:
|searcher dict|
searcher := RBParseTreeSearcher new.
searcher
matches: '`@rcv at:`@arg `sel:`@arg1'
do: [ :aNode :answer | dict := searcher context ].
searcher executeTree:
(RBParser parseExpression: 'cache at: each ifAbsentPut: [ each ].').
The method #matches:do:
is used to define the pattern that we are looking for, using the ‘wildcards’ defined in first section; In addition of that, the do
is running only on match, and in our case it will fill the dictionary dict
with the searcher context
(which is the pattern defined in matches block).
This execution is fired on executeTree:
which defines the matcher that is a String
parsed as a Tree
using parseExpression
, then starts matching it with the pattern.
3. RBParseTreeSearcher examples with pattern code
Finally, in this section we use patterns with the RBParseTreeSearcher class and do some magic by finding some matches in Pharo code !
Consider the following example:
| dict searcher|
searcher := RBParseTreeSearcher new.
searcher
matches: '`@receiver assert: `@arg equals: true'
do: [ :aNode :answer | dict := searcher context ].
searcher
executeTree: (RBParser parseExpression: 'self assert: reader storedSettings first realValue equals: true.').
dict
collect: [ :each | each displayString ].
The example is matching successfully and the dictionary dict
will return different values during the iteration:
Match 1: (key) `@receiver
is matching with (value) self
Match 2: (key) `@arg
is matching with (value) reader storedSettings first realValue
If we want to check all the messages in the matcher, we can use searcher messages
which will return an array of one item containing message #assert:equals:
as it is the only message available in the matched expression.
*********************************