Fast Virtual Functions: Hacking the VTable for Fun and Profit

Until later!Only the overridden functions were called!When are virtual functions slow and how slow are they?We’ve seen how you can call virtual functions directly, now let’s see why you might want to.

Virtual functions are slow when you have a cache miss looking them up.

As we’ll see through benchmarks, they can be very slow.

They can also be very fast when used carefully — to the point where it’s impossible to measure the overhead.

Let’s setup some tests and see for ourselves.

We’ll test several things:Repeatedly calling a non-virtual member function with caching allowed.

This will give us a good performance baseline.

Repeatedly calling a virtual member function with caching allowed.

This will demonstrate the best, albeit unlikely, case with virtual functions.

The same virtual function is called repeatedly in a tight loop, so we’d expect the VTable pointer, the VTable, and the function code itself to all remain cached.

Repeatedly calling a non-virtual member function without caching.

We’ll explicitly flush the cache to achieve another performance baseline.

This will be our target performance for non-cached virtual functions.

Repeatedly calling a virtual member function without caching.

This is the worst case performance scenario for a virtual function.

The VTable pointer, VTable, and function code will not be in the cache.

Repeatedly calling a virtual member function directly without caching.

The hope here is that we’ll gain back some of the performance lost by virtual functions and be close to that of non-virtual functions.

To test this this, we’ll use a small hierarchy of incrementers.

Each will add its value (a randomly initialized int) to a global variable repeatedly via an increment function.

Adding from a member variable ensures that the object is loaded into memory, so we’re playing fair, and being a random number will prevent the compiler from optimizing the member variable out.

We’re now set to run some tests and see just how slow virtual functions are.

To do this, we’ll create many objects of each type and call their increment functions repeatedly.

Below are the results, with number of objects (or calls, since each object’s function is called once) vs runtime in seconds:Amazingly, calling virtual functions directly is the fastest method, though it only marginally beats non-virtual functions.

Without the direct calls, and when these functions aren’t cached, there’s 3x more overhead to the virtual call vs non-virtual.

Keep in mind that we’re only comparing overhead.

For very long running functions, the overhead cost will be moot.

Avoiding the performance hit of virtual functionsAs seen in the previous section, virtual function calls can be quite slow.

To guarantee speed, we’ll want to call virtual functions directly.

To call a virtual function directly, we’ll go to the VTable and grab a pointer directly to the function we want to call.

“But wait,” you say, “won’t that incur the same penalty as a regular virtual function call?” And you’re right, it will.

The key here is that we can reuse this function pointer, either to call it with the same instance in the future or (as in the next section) on other instances of the same class.

To get the function pointer, we can first define a generic function to get the VTable:Then it’s as easy as casting the appropriate entry to the desired function pointer:In the case above, it simply invokes the virtual function in slot 1.

Where this technique really becomes useful is when a function is called repeatedly, either as a batch or on its own.

Running batches of virtual functions efficientlyIn games and other interactive applications, it’s common to run batches of virtual functions at a time.

For example, running thousands of Update() functions from various scene objects each frame.

We can take some of the tools from the previous sections and accomplish this efficiently.

The key insight we can leverage to run these batches efficiently is that cache misses on function calls kill performance.

So how can we minimize cache misses?.We can group our function calls to keep code in the cache!One easy way to group virtual functions is with std::set, which maintains an ordered set of values.

By sorting pairs of { function pointer, object pointer }, the set can easily be iterated over to call everything in sorted order.

This is especially convenient because any scene manager would need this list anyways.

Below is a complete example.

It uses SceneObject as its base class containing the overrideable Update() function.

SceneManager holds these objects and allows all update functions be called as a single batch, grouping update functions.

With a framework like this in place, there’s one more step to take to be truly efficient: don’t call empty functions.

Detecting when virtual functions are overriddenIn the previous example, we batched Update() functions and grouped repeated calls to the same function (but with different objects) for efficiency.

This works fine if every child class of SceneObject needs an update function, but what if some don’t?.What if we have a plethora of overrideable functions like LateUpdate, PhysicsUpdate, and DrawImmediate?.Surely we don’t expect most of those to be overridden in each object, and it would waste cycles to call empty functions.

Enter override checking.

To check if a function is overridden, you simply need to compare the function entries in each VTable.

The caveat here is that you need an instance of the base object (for which the null-object pattern fits nicely) to compare against.

Let’s update the example from the previous section with the new functions we just mentioned:Now we can update the scene manager to conditionally add new objects to each batch only when they have overridden the appropriate function:It’s that easy!.Of course, you’d probably want to start wrapping these up in nicer classes.

You can also use some macro magic to guarantee your VTable struct always matches what’s actually defined in the class.

ConclusionHopefully you’ve seen how useful playing with the VTable can be — and also that it’s not so hard.

I encourage you to try this for yourself.

The end result will be a faster program and happier developers.

There’s something very satisfying about being able to write code like this:and not worry about undue overhead or registering functions.

The full source for all examples and tests is on GitHub.

Happy hacking!.

. More details

Leave a Reply