Tail recursion in Kotlin — with bytecode!

Tail recursion in Kotlin — with bytecode!Stephen LeighBlockedUnblockFollowFollowingMay 7Kotlin Island, St Petersburg.

сергей володин via Google Maps.

Every article on Kotlin starts by saying how it has loads of cool features: null safety, inlining, infix and extension functions, varargs, and many more.

This article is going to look at one of them — tailrec — which is a neat optimisation the compiler can do on recursive functions that follow certain rules.

We’ll go over what tail recursion actually is, then investigate how it’s implemented in bytecode, the low-level instruction set that Kotlin is compiled into.

While it’s rare that anyone actually looks at bytecode — the only reason I ever have is for mutation testing purposes — it’s interesting to see how source code is transformed into the actual instructions being run.

Start Killing Mutants: Mutation testing your codeTesting your tests to test that they test what you think they test.

itnext.

ioThis article assumes a very basic knowledge of the stack (pushing, popping, and operations like addition), as well as basic knowledge of recursive functions in general.

You can find overviews of the stack here and here, and recursion here.

I’ll go over the specifics of tail recursion, and the bytecode examples will be very simple.

I’ll also annotate them for clarity.

Don’t be scared.

What is tail recursion?Photo by Abigail Lynn on UnsplashThe number one technical risk of recursion is that you might run out of memory and cause a stack overflow.

We can see why with a recursive function like this, which adds all numbers below a given ‘limit’ and the limit itself:fun sumOfInts(limit: Int): Int { if (limit == 1) return limit return limit + sumOfInts(limit – 1)}The machine has to wait on line 3 until the call to sumOfInts resolves before adding limit , and that call to sumOfInts has to wait for another call, and so on.

Each successive call adds another stack frame, and if we add too many, the runtime runs out of memory (‘overflows’) and the whole routine fails.

oh noThis is really risky, especially when you can’t be sure how large your parameters are going to be.

Putting in checks, and alternative methods if they’re big enough to risk a stack overflow, defeats the point of having the recursive function in the first place.

Fortunately, there is a solution: tail recursion!tailrec fun sumOfIntsTailRec(limit: Int, runningTotal: Int): Int { if (limit == 1) return runningTotal + limit return sumOfIntsTailRec(limit – 1, limit + runningTotal)}The only differences here are that we’ve introduced a counter to keep track of the running total, and added the modifier tailrec.

Small changes, but they make a huge difference to the compiler.

Now it doesn’t have to wait for another sumOfInts call to resolve with a result before doing something else— a recursive call will just return the result of a call to the same function with different parameters.

We don’t need the extra stack frames created when we were waiting for the whole chain of functions to return, in the previous example, because the JVM doesn’t have to ‘remember’ to do something after each recursive call returns!Because of how this is optimised (which we’ll get into below), a tailrec function must call itself as its last operation, without any sneaky adding-on like in the first example.

You also can’t call it within try/catch blocks.

However, you can call it within an if/else.

Fun with bytecodeHere’s both of our functions in bytecode (full source here):Not so scary after all, but there are a couple of things that might seem odd:Format: bytecode is just a series of instructions.

We have labels (L0, L1) specifying places in the code, and shorthand like `(I)I` showing that the function takes an integer parameter and returns an integer.

Almost all the operations are prefaced by ‘I’ because they act on integers.

Each function name is an entrypoint, and the instructions are executed from the top down, jumping to a different label where appropriate but using the same stack for the whole function.

GOTO statements!.An obvious one on line 37, and also 6 and 23 (‘if the two integers on the top of the stack aren’t equal, go to label L1’).

Because each instruction is executed sequentially, without the usual higher-level constructs like blocks, closures and scopes, bytecode can end up as a spaghetti mess of jumps from one label to another.

We banished GOTO from human-centric languages long ago, but bytecode is almost exclusively for the computer to read, and the computer can handle it better than our brains.

The second function has a very typical way of doing loops using GOTOs.

Our if (limit == 1) in source has become IF_ICMPNE which is short for ‘if integer compare not equal’, so it’s now doing the equivalent of if (limit != 1) .

The compiler could have used IF_ICMPEQ and flipped around the logic in L0 and L1, but doing it this way maintains much clearer parity between the bytecode and our source code, so I’m glad it didn’t.

The extra parameter in the tail-recursive version muddies it slightly, but the key difference in the implementation of tail recursion is clear if you compare lines 13–16 with lines 35–37:ISUB // limit – 1INVOKESTATIC src/tailrec/TailrecKt.

sumOfInts (I)IIADD // add the result of the recursive call to the limitIRETURN// .

ISTORE 1 // variable #1 is now limit + runningTotalISTORE 0 // variable #0 is now limit – 1GOTO L0 // go to label L0As explained in the previous section, a typical recursive function will have to wait for another call to itself to complete before doing something else and returning the result.

INVOKESTATIC is calling the function again in a new stack frame, and the logic then pauses until that gives us back a result.

We need to perform an IADD afterwards, before we return a result.

These stack frames will pile up, and if there are too many of them, we’ll run out of memory and the function will fail.

By contrast, the tail-recursive version changes the variables that the function will use, then runs the whole loop again with a simple GOTO instruction.

Because we don’t need to do anything with the result of the function, we’re not creating any extra stack frames or waiting for results — just manipulating the variables to ‘build up’ the result with a jump-loop.

It’s this little optimisation that saves us from the risk of stack overflow.

Adding a default parameterOne more thing that’s worth looking at is what happens when you add a default parameter.

This is how you would probably write this tailrec function in real life, as you’re unlikely ever to want to specify the running total yourself (although you might manage to think of a better name):tailrec fun sumOfIntsTailRecWithDefaultParam( limit: Int, runningTotal: Int = 0): Int { if (limit == 1) return runningTotal + limit return sumOfIntsTailRecWithDefaultParam( limit – 1, limit + runningTotal )}The only difference here is that we’ve added a default parameter to runningTotal: Int = 0, but let’s have a look at how this has changed the bytecode:This has made quite an interesting change.

We now have a ‘synthetic’ wrapper for our actual recursive function, with a $default modifier.

Looking towards the bottom, the bytecode sets the relevant arguments as variables #0 and #1, and invokes the non-synthetic sumOfInts function.

This means that there’s one more stack frame than the last iteration; invoking the function on line 33 means waiting until it’s run its course, going through all its GOTO loops, and then returning the result.

But this is still tail-call optimised, because we aren’t adding a frame for every call.

We add one frame just to call up to the non-synthetic function, but no more, so there is still no risk of a stack overflow.

The first section of the synthetic function took me a long time and a lot of coffee to get to the bottom of, and I won’t go into too much detail here.

You’ll notice that the synthetic function expects three integer arguments and an object.

The first two integers are the limit and (optional) running total, as you’d expect.

The third is a bitmask, which is passed in by the runtime to indicate which default parameters are being overridden (remember that you can have more than one default parameter for a function).

We have a clever bit of logic using a bitwise AND operator to determine whether we want to override our default parameter.

The final argument is unused, reserved in Kotlin for future use when adding super calls with default values.

Wrap-upThis is a pretty obscure topic.

I chose to write about it partly because there isn’t a huge amount of stuff written about Kotlin compilation, and most of it exists in forum questions and answers, which isn’t particularly accessible.

In addition, it’s occasionally worth understanding how what you write in a higher-level language ultimately boils down to a series of instructions given to a CPU.

While it’s overkill even as far as micro-optimisations go to look at the bytecode for potential areas of improvement, it’s interesting to see the compiler reason about what you’ve written with those lovely higher-order constructs Kotlin provides us with.

After all, none of it is magic.

.

. More details

Leave a Reply