Bitwise sorting in Ruby

Bitwise sorting in RubyNick BellBlockedUnblockFollowFollowingJan 23Imagine you run a NBA basketball recruiting site.

You want to rank every college basketball player in the country based on three attributes in order:The player’s individual rating(1–10)The player’s team rating (1–10)The player’s conference rating (1–10)The player’s ID (any number)Imagine this rule set applied to these players:Player 1: rating 10, team rating 10, conference rating 10Player 2: rating 10, team rating 10, conference rating 5Player 3: rating 10, team rating 5, conference rating 10Player 4: rating 10, team rating 5, conference rating 5Player 5: rating 5, team rating 10, conference rating 10Player 6: rating 5, team rating 10, conference rating 5Sorting all of the players on all of the teams could be an expensive calculation if you take a naive approach.

For example purposes, let’s assume there are 1,000,000 players to evaluate.

Using a nested sort, we could construct something like this:Player = Struct.

new(:id, :rating, :team_rating, :conf_rating)# setupid = 1.

step@players = 1_000_000.

times.

map do Player.

new(id.

next, rand(0.

10), rand(0.

10), rand(0.

10))end# I want to sort by player rating descending,# then team rating descending,# then conference rating descending,# then by id descending.

def sort_naively @players.

group_by(&:rating).

sort.

reverse.

map do |rb| rb[1].

group_by(&:team_rating).

sort.

reverse.

map do |tb| tb[1].

group_by(&:conf_rating).

sort.

reverse.

map do |cb| cb[1].

group_by(&:id).

sort.

reverse.

map do |ib| ib[1] end end.

flatten end.

flatten end.

flattenendOn average, sorting 1,000,000 records using this method takes ~2000 ms.

Not bad, considering the size of the record set.

However, the deeply nested array structure uses tons of memory, about 219MB.

This is a common issue in Ruby applications, but since we have a GC we’re typically safe.

All in all, it’s a usable algorithm for smaller data sets, but I think we can improve on it.

We could sort using Array.

sort_by , which is prettier to look at, and passing in an array of values to sort on:@players.

sort_by do |p| [p.

rating, p.

team_rating, p.

conf_rating, p.

id]end.

reverse… but this is even slower.

In my tests, it took on average 10 seconds.

However, it used 80MB of memory, so that’s a small improvement.

In low-memory environments this may come in handy, but let’s explore another sorting method, one that saves on both time and memory by utilizing clever bitwise operations.

If you’re not familiar with bitwise operations, they are widely used in lower-level languages like C to perform operations on bits.

In Ruby, the main bitwise operators are:& -> AND| -> OR^ -> XOR~ -> NOT<< -> LEFT SHIFT>> -> RIGHT SHIFTA thorough explanation of what these do can be found here, and is outside the scope of this article.

For this article, we only need an explainer on ^ (XOR) and << (LEFT SHIFT)If you’ve followed along so far, you probably have a pretty good knowledge of how computers work, and that bits are just 1s and 0s.

Bitwise operators operate on these bits.

Bitwise ^ (XOR) basically takes two arguments, a first and second bit, and returns a new bit with the 1s from the first bit, and adds in the 1s from the second bit where they are missing.

For example:integer | bit | operator | other bit | new bit | result 0 | 0000 | ^ | 0001 | 1 | 1 1 | 0001 | ^ | 0001 | 0 | 1 2 | 0010 | ^ | 0001 | 11 | 3 3 | 0011 | ^ | 0001 | 11 | 3 4 | 0100 | ^ | 0001 | 101 | 5 5 | 0101 | ^ | 0010 | 111 | 7 10 | 1010 | ^ | 0101 | 1111 | 15Bitwise << (LEFT SHIFT) , on the other hand, shifts a bit left by a certain number of places, adding 0s to the end of the bit.

For example:integer | bit | operator | number | result | integer 1 | 1 | << | 1 | 10 | 2 2 | 10 | << | 1 | 100 | 4 3 | 11 | << | 1 | 110 | 6 10 | 1010 | << | 2 | 101000 | 40We can utilize these two operations to make our sorting algorithm a little smarter.

Since the sort precedence is always rating > team_rating > conf_rating > id, it follows that a player with rating 10 will always be ranked above a player with ranking 9, and so on, no matter what the other ratings are.

Between players rated the same, the player with a better team rating will be overall rated higher, etc.

To achieve this ranking using bitwise operators, we should add a new parameter, bit_rank, to the Player struct.

The new code looks like this:Player = Struct.

new( :id, :rating, :team_rating, :conf_rating, :bit_rank # <- new attribute)# setupid = 1.

step@players = 1_000_000.

times.

map do Player.

new(id.

next, rand(0.

2), rand(0.

2), rand(0.

2))end#now, calculate the bit rank for each player@players.

each do |p| p.

bit_rank = p.

rating << 30 ^ p.

team_rating << 25 ^ p.

conf_rating << 20 ^ p.

idendIn short, this new bit_rank attribute is a number (a big one, at that) that represents the overall rating of the player.

We’re shifting the rating over 30 places, the team rating 25 places, the conference rating 20 places, and then running a bitwise XOR on all three plus the ID.

For example, a player with ID 1 and rated 10 in all three categories would have a bit_rank of 11_083_448_321 .

This is intuitive when looking at the bit representation of this value, which is:0101001010010100000000000000000000101010 01010 01010 0000000000000000000001^ ^ ^ ^| | | |__player ID = 1| | || | |__player conference rating (10 = 01010)| || |__player team rating (10 = 01010)||__player rating (10 = 01010)That same player with all 5s would have a bit_rank of 5_541_724_161 , which, when seen in the bit lens:0010100101001010000000000000000000100101 00101 00101 00000000000000000001^ ^ ^ ^ | | | |__player ID = 1| | || | |__player conference rating (5 = 00101)| || |__player team rating (5 = 00101)| |__player rating (5 = 00101)…makes sense.

The bit_rank embeds the sort precedence into itself, where the player’s respective ratings are shifted into appropriate areas of the bit rank, and they are all shifted far enough left to still sort by ID at the end.

Now that we understand what the bit_rank is doing, let’s see what code would be needed to run such a complex operation:@players.

sort_by(&:bit_rank)Yep, that’s it.

Since bit_rank is an attribute of the Struct, it can be called using the old Symbol.

to_proc ruby magic.

Using this method, though, really shines when you look at the timing and memory usage.

On average, adding this attribute to each of the million Player objects adds 200ms to the setup stage, but sorting based on this attribute reduces the sort time to 500ms.

In total, what was a 800ms build + 2000ms sort = 2800ms operation is now 1000ms build + 500ms sort = 1500ms!.We cut 1300ms off the sort time, for a 46% improvement!.This amount of efficiency is multiplied the more we add attributes to sort by, as well.

Memory usage, though, is unbelievable.

To refresh, the original naive sort used up 219MB of memory, mostly because it was sorting into 10 + (10*10) + (10*10*10) = 1110 separate, sorted arrays.

When using the bit_rank sort method, our operation only uses 16MB, or 203MB less, a 92% decrease, since we don’t have any nested arrays.

In reality, the whole of the memory footprint comes from building the array of 1,000,000 players.

Using this method is great for simple sort precedences on large data sets, provides an easy way to sort on multiple values, and can really be helpful in memory-deprived environments.

The idea behind exploring bitwise operators came from participation in the Advent of Code 2018, which I HIGHLY recommend participating in.

You can still complete challenges if you haven’t yet, as well.

Hopefully you enjoyed this article.

You can find me on twitter or github.

.

. More details

Leave a Reply