This is the third entry of the series about implementing full-text search indexes in Pharo. The second entry is Implementing Indexes – Who uses all my memory, where we analysed the first version and the problems it has. You can start reading from this post, but there are things that we have presented in the previous one. For example, how we detected the problem with the basic implementation. Also, maybe if you are new you can check the first post: Implementing Indexes – A Simple Index.
Remembering the Problem
In the last post, we have seen that our implementation is not optimal. It was creating a lot of nodes to store the data we are keeping. In that version, we had 18.484 values stored in the trie, but for those, we needed 159.568 node instances; a ratio of 9 nodes per value. This is unacceptable.
This problem was coming for the way of storing the elements in the Trie. We are creating a node for each character in the key, even if these nodes does not include a branch in the Trie. We have the following structure:
As we have told, it is desirable to collapse all these nodes, and reducing the memory impact.
Collapsing useless Nodes
In the data structure we are analysing, a node provides crucial information in any of the following scenarios:
- It maps a path from the root to a given value.
- It has more than a single child, because it has an split in the key path.
So we want to reduce the previous Trie to a new one like this:
In this new implementation, the chain of intermediate nodes that do not provide crucial information is collapsed in a single node. We can see, that all the light blue nodes have been collapsed in a single green one.
For supporting this, we have changed that the node now holds a String instead of a single Character. The key path from the root is now concatenating strings.
With this improvement, we have passed from an occupation rate from 13% to 100%.
This improvement has a little trade-off: We are making the graph simpler in data impact, but we have increased the complexity in the look-up and insertion of new nodes.
This new implementation is available side-by-side with the unoptimized version.
If we create the same example of the previous post, but with the optimized trie:
behaviors := CTOptimizedTrie new. SystemNavigation default allBehaviorsDo: [ :aBehavior | behaviors at: aBehavior name put: aBehavior name ].
To perform the analysis, we have used the same tool, SpaceAndTime, that we have used in the previous post.
stats := GraphSpaceStatistics new
We can see the reduced number of nodes in the Trie, and its memory impact. We have passed from 159.568 instances in the previous version to 22.102 instances in this version. We passed from occupying 5.106.176 bytes to 530,448 bytes in nodes.
However, this is not perfect. As we are keeping a String instead of Characters we have to create such instances in memory. In Pharo, the Strings are objects and are allocated in memory, while Characters are immediate, and they are encoded in the referencing instance. This represents that we have additional 22.099 instances, representing 407.704 additional bytes.
At a first glance it looks like a problem, but if we see that we are just having less than 1MB (adding the nodes and the Strings) against the 5MB of the previous solution. We see that there is an improvement.
This is a nice example, that even having more instances of Strings we have a clear advantage with the new solution. It teaches us that we need to measure before doing an statement.
Splitting Nodes on Insert
As we have said with this new implementation, it is required to split nodes when adding elements in the Trie. We have a more complex insert and delete operation when inserting and removing keys compared with the base implementation.
The following image presents an example.
Adding a new value to the Trie requires to create a new node, and also it might require to split an existing node if part of the key is in a single node.
In this case, the green node, should be split in two, to handle the insertion of the pair ‘ac’ -> 3.
At first glance, it looks that new implementation is slower than the old. But… to assert anything we have to measure it.
We can measure the time to generate the whole Trie of behaviors. For doing so, we will use the #timeToRun message of blocks. So, we execute to measure the times of the new implementation
[ behaviors := nil. behaviors := CTOptimizedTrie new. SystemNavigation default allBehaviorsDo: [ :aBehavior | behaviors at: aBehavior name put: aBehavior name ] ] timeToRun.
And to measure the base implementation:
[ behaviors := nil. behaviors := CTTrie new. SystemNavigation default allBehaviorsDo: [ :aBehavior | behaviors at: aBehavior name put: aBehavior name ] ] timeToRun.
From the results we can see that the new implementation takes 169 ms and the old implementation 210 ms. As we can see, the initial considerations was misleading.
Again, we have to measure. If we don’t measure it is impossible to compare different solutions.
This post presents a technique that we have used to improve the Trie implementation. Although, the most important part of this post is that we show how to measure the qualities of a solution. Also, we have shown that without measuring, it is impossible to compare or even to take good decisions.
Using previous experience to evaluate a solution is important, but it can be misleading. Measuring similar problems shows us the responses.
We still have an entry in this series. In the last entry, we are going to present how we solved the problem with the IdentitySet and how we have finally went to a tenth of the memory consumption.