Debugging the JIT compiler Hotspot detection for ARM64

The other day we were working on the compiler detection of hotspots, originally implemented by Clément Béra during his PhD thesis. In Sista, hotspot detection is implemented as a countdown that looks like the following: the counter is loaded in a register, decremented and then a jump if carry detects if the substraction underflowed.

self MoveA32: counterAddress R: counterReg.
self SubCq: 16r10000 R: counterReg. "Count executed"
countTripped := self JumpCarry: 0.

We were building some unit tests for this functionality, and we were interested at first at seeing how counters increment/decrement when methods execute. We wrote a couple dozen of tests for different cases (because the code for the counters is a bit more complicated, but that’s for another day). The code of one of our tests looked like the following: we compile a method, we execute it on a machine code simulator, then we verify that the counter was effectively incremented (because the public API is in terms of positive counts and not count-downs):

    | nativeMethod counterData jumpCounter |
    nativeMethod := self jitMethod: (self class>>#methodWithAndAndJump:).
        callCogMethod: nativeMethod
        receiver: memory nilObject
        arguments: { memory trueObject }
        returnAddress: callerAddress.

    counterData := interpreter picDataFor: nativeMethod.
    jumpCounter := memory fetchPointer: 1 ofObject: counterData.
        assert: (memory integerValueOf: (memory fetchPointer: 1 ofObject: jumpCounter))
        equals: 1

There was however something fishy about the ARM64 version. In addition of incrementing the counter, the code was taking the carry jump! Which lead our test to fail…

Doing some machine code debugging

So everything was working OK on intel (IA32, X64) but not on ARM (neither 32 or 64 bits). In both ARM versions the jump was __incorrectly__ taken. I first checked the instruction was being correctly assembled. And since that seemed ok, I went on digging in our machine code debugger. I found the corresponding instruction, set the instruction pointer in there and started playing with register values to see what was happening.

As you can see in the screenshot, the code is being compiled into subs x25, x25, x16, which you can read as x25 := x25 - x16. So I started playing with the values of those two registers and the carry flag, which is the flag that activates our jump carry. The first test I did was to check 2 - 1.

self carry: false.
self x25: 2.
self x16: 1.

Substraction was correct, leaving the correct result in x25, but the carry flag was set! That was odd. So I tested a second thing: 0 - 1.

self carry: false.
self x25: 0.
self x16: 1.

In this case, the carry flag was not set, but the negative was set. Which was even more odd. The case that should set carry was not setting it, and vice-versa. It seemed it was inverted! I did a final test just to confirm my assumption: 1-1 should set both the negative and carry flags if the carry flag was inverted.

self carry: false.
self x25: 0.
self x16: 1.

ARM Carry is indeed strange

I was puzzled for a moment, and then I got to look for a culprit: was our assembler that was doing something wrong? was it a bug in Unicorn, our machine code simulator? or was is something else?

After digging for some time I came to find something interesting in the ARM documentation:

For a subtraction, including the comparison instruction CMP and the negate instructions NEGS and NGCS, C is set to 0 if the subtraction produced a borrow (that is, an unsigned underflow), and to 1 otherwise.

And something similar in a stack overflow post:

ARM uses an inverted carry flag for borrow (i.e. subtraction). That’s why the carry is set whenever there is no borrow and clear whenever there is. This design decision makes building an ALU slightly simpler which is why some CPUs do it.

It seems that the carry flag in ARM is set if there is no borrow, so it is indeed inverted! But it is only inverted for substractions!

Extending the Compiler with this

Since carry works different in different architectures, but only for substractions, I created a new instruction factory method JumpSubstractionCarry: that detects carry for substractions and is supposed to be platform specific. Then I replaced the code of the counter by the following:

self MoveA32: counterAddress R: counterReg.
self SubCq: 16r10000 R: counterReg.
countTripped := self JumpSubstractionCarry: 0.

JumpSubstractionCarry: is backend defined:

JumpSubstractionCarry: jumpTarget
    backEnd genJumpSubstractionCarry: jumpTarget

the default implementation just delegates to the original factory method:

genJumpSubstractionCarry: jumpTarget
    ^cogit JumpCarry: jumpTarget

and the ARM (both 64 and 32 bits) do use a jump if no carry instead!

genJumpSubstractionCarry: jumpTarget
    ^cogit JumpNoCarry: jumpTarget

With these, our tests went green for all platforms 🙂

If you want to see the entire related code, you can check the following WIP:…guillep:sista?expand=1

Published by Guille Polito

Pharo dev. Researcher, engineer and father. > If it ain't tested, it does not exist.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: