Python and LRU Cache

Python and LRU CacheJose Alberto Torres AgüeraBlockedUnblockFollowFollowingMay 26Since version 3.

2 python we can use a decorator namedfunctools.

lru_cache() , this function implement a built-in LRU cache in Python, so lets take a deep look to this functionalityWhat’s LRU cache?You have a full explanation here LRU Cache (Wikipedia), but to sum up, as its name indicates, LRU Least Recently Used, It will save on memory the last item readed, and every time the process is executed it will check if the result is on cache, in order to not do the full process again.

This cache can save us time when we use for example on an expensive function or function with multiple I/O calls.

The Example:Below, we are going to introduce this functionality with an example of python code and we will explain step by step how it worksimport timeimport urllibimport urllib.

requestdef get_url(resource): try: with urllib.

request.

urlopen(resource) as s: return True except Exception as e: return Falsestart = time.

time()get_url("https://cloud.

google.

com/")print(f’Duration: {time.

time() — start}s’)#Duration: 1.

0348174571990967sFirst callget_url() function makes a call with the urllib to an url.

On the example , we save the time before and after the call, and we have a result of 1.

03 seconds.

If we execute this code again, it will returns more less the same result.

Now imagine we have a list of urls, and we need to check if the site is alive, but most of them are repeatedIs here wherefunctools.

lru_cache() will help us.

Using lru cache, we are going to save the results of function calls on memory, to return them on future calls with the same arguments directly from memory.

In computer science this is called memoization@lru_cache(maxsize=4)def get_url(resource): try: with urllib.

request.

urlopen(resource) as s: return True except Exception as e: return FalseWe will need to import the decorator with:from functools import lru_cacheHow it worksWe are going to explain how it works step by step the code below.

import timeimport urllibimport urllib.

requestfrom functools import lru_cachecloud_platform_map = ["https://cloud.

google.

com/", "https://aws.

amazon.

com/es/", "https://azure.

microsoft.

com/es-es/", "https://www.

digitalocean.

com/"]@lru_cache(maxsize=4)def get_url(resource): try: with urllib.

request.

urlopen(resource) as s: return True except Exception as e: return Falsestart = time.

time()get_url("https://cloud.

google.

com/") #url 1print(get_url.

cache_info())get_url("https://aws.

amazon.

com/es/") #url 2print(get_url.

cache_info())get_url("https://azure.

microsoft.

com/es-es/") #url 3get_url("https://www.

digitalocean.

com/") #url 4print(get_url.

cache_info())get_url("https://cloud.

google.

com/") #url 1print(get_url.

cache_info())get_url("https://www.

ovh.

com/") #url 5print(get_url.

cache_info())print(f'Duration: {time.

time() – start}s')This should be the cache’s initial state.

Cache intial stateNow we make the first request and we check the cache status:get_url(“https://cloud.

google.

com/") #url 1get_url.

cache_info()#CacheInfo(hits=0, misses=1, maxsize=4, currsize=1)#Duration: 0.

9899957180023193sWith the function get_url.

cache_info() we can check the cache information.

hits=0 -> No coincidence was found on the function callmisses=1 -> Sum of cases where the cache was not hittenmaxsize=4 -> The size of the cachecurrsize=1 -> The current size of the cacheAfter that, we make a new function call with a different url:get_url(“https://aws.

amazon.

com/es/") #url 2#CacheInfo(hits=0, misses=2, maxsize=4, currsize=2)The status now indicates that the cache has 2 elements and the order number (green) indicates the last added value, the first position now is for the last item added.

If we call the function again 2 times with different arguments, the status should be like this.

get_url("https://azure.

microsoft.

com/es-es/") #url 3get_url("https://www.

digitalocean.

com/") #url 4#CacheInfo(hits=0, misses=4, maxsize=4, currsize=4)Now if we call the function, with a previous argument cached the status will be like this.

get_url("https://cloud.

google.

com/") #url 1#CacheInfo(hits=1, misses=4, maxsize=4, currsize=4)As we can observe the first element wich was “https://cloud.

google.

com/” has the first position on the cache, because it has been hitten again, and now it will be the last element to be replaced.

If we call the function with a new argument not used until now, An element of the cache will be replaced, because the cache has reached its limit (4), in this case the element with the higher order (4º) will be replaced.

get_url("https://www.

ovh.

com/") #url 5#CacheInfo(hits=1, misses=5, maxsize=4, currsize=4)And this is how the lru cache works, as his name indicates, the least recently used item that has been used, should be replaced.

Performance:Once we have seen how LRU cache works, we are going take a look to the performace.

If we make this sequence of calls without the lru cache decoratorimport timeimport urllibimport urllib.

requestdef get_url(resource): try: with urllib.

request.

urlopen(resource) as s: return True except Exception as e: return Falsestart = time.

time()get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://www.

digitalocean.

com/")get_url("https://cloud.

google.

com/")get_url("https://www.

ovh.

com/")get_url("https://cloud.

google.

com/")get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://aws.

amazon.

com/es/")print(f'Duration: {time.

time() – start}s')#Duration: 7.

545874357223511sThe full process has taken 7,5 seconds.

Now using the lru cache.

import timeimport urllibimport urllib.

requestfrom functools import lru_cache@lru_cache(maxsize=4)def get_url(resource): try: with urllib.

request.

urlopen(resource) as s: return True except Exception as e: return Falsestart = time.

time()get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://www.

digitalocean.

com/")get_url("https://cloud.

google.

com/")get_url("https://www.

ovh.

com/")get_url("https://cloud.

google.

com/")get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://aws.

amazon.

com/es/")get_url("https://azure.

microsoft.

com/es-es/")get_url("https://cloud.

google.

com/")get_url("https://aws.

amazon.

com/es/")get_url("https://aws.

amazon.

com/es/")print(get_url.

cache_info())print(f'Duration: {time.

time() – start}s')#CacheInfo(hits=8, misses=7, maxsize=4, currsize=4)#Duration: 2.

5642213821411133sThe process has finished in 2,5 seconds, is 66% faster than the previous process, It’s worth to take a look to this built-inI hope this post has shown you at least how this functionality works.. More details

Leave a Reply