Make your app fast again: caching function calls in Python

(, EN )

It’s the usual pattern: you start with a new webapp project, the app is responsive and fast. But after a while it gets slow and messy. There are many reasons why it can happen. Set aside the obvious reasons such as »technical debt«, messy code and lack of concurrency/parallelism, performance bottlenecks can arise from slow functions that are called frequently with the same (or a small set of different) parameters.

Here caching can be a big plus, because the return value does not need to be recalculated. However, this only works if your function in question is free of side-effects. In particular for functions that retrieve data from the database you need to make sure that the underlying data is not changed in the meantime. The Python documentation describes it best:

… it doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each call, or impure functions such as time() or random().

Later we will touch upon options to mitigate this problem.

Python has various caching libraries, some of them are tailored to Flask or other frameworks. Often adapted to specific needs, these solutions might be limiting if used as general approach to caching.

  1. To start simple, you can work with off-the-shelf caching with such as cached_property or lru_cache from the functools module.
  2. One level up would be cachetools
  3. And a more sophisticated solution could be Redis-based caching with the redis or hiredis-py packages

Feel free to combine the different approaches, but in here we will focus on the first and second points.

(n.b. @cached_property is only available in Python 3.8 or later, but you can just copy the code from the 3.8 functools and remove it once you switch to 3.8)

Python off-the-shelf caching

The example from the functools-module looks like:

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)

The implementation is straight forward, you just use @cached_property instead of @property. Take care with mutable data:

class DataSetMutable:
    def __init__(self, sequence_of_numbers):
        self.data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self.data)

    @cached_property
    def variance(self):
        return statistics.variance(self.data)

dsm = DataSetMutable(range(10))

assert abs(dsm.stdev - statistics.stdev(range(10))) < 0.00001
dsm.data = range(30,100)
assert abs(dsm.stdev - statistics.stdev(range(30,100))) < 0.00001

The second assert will fail, dsm.stdev returns the initially cached data and not the updated version. In case you work with functions and not properties, @cached_property will not help you further. In this case you can use @lru_cache from functools:

@lru_cache
def count_vowels(sentence):
    sentence = sentence.casefold()
    return sum(sentence.count(vowel) for vowel in 'aeiou')

Once you want to get serious with @lru_cache, you will inevitably run into following problem: @lru_cache can only cache hash()-able, i.e. immutable, data structures. How can we still cache functions with lists and dictionaries as arguments? There are two ways out of the misery:

  1. only use hashable args in your functions
  2. wrap @lru_cache to deal with mutable data structures sets.

Freeze everything

One low hanging fruit is to use immutable data structures. Lists, dictionaries and sets can be converted easily to their immutable counterparts. Luckily the immutable versions behave mostly the same as their mutable counterparts.

def to_immutable(v):
    if isinstance(v, dict):
        return frozendict(v)
    if isinstance(v, list):
        return tuple(v)
    if isinstance(v, set):
        return frozenset(v)
    raise TypeError(f"cannot convert {type(v)} to immutable counterpart")

As experienced Python developer you might have spotted that frozendict does not exist in vanilla Python. Unfortunately dictionaries do not have a immutable counterpart. In my code I use the following definition to create a frozen dictionary:

class frozendict(dict):
    def __hash__(self):
        return hash(frozenset(self.items()))

By using immutable structures as much as possible (if you know you need to cache them), you can get quite sophisticated caching out of the box.

Extending @lru_cache to deal with mutable data structures

Let’s have a look at 2. Other people have had this problem already (here or here). It usually boils down to modify or prepend some type of serialisation or conversion to immutable types.

I found that the serialisation with pickle works quite well, so I adapted a previous solution based on JSON:

import functools
import pickle

def pickled_lru_cache(cache):
    def pickled_lru_cache_decorator(func):
        def func_with_pickled_args(pickled_args, pickled_kwargs):
            args = pickle.loads(pickled_args)
            kwargs = pickle.loads(pickled_kwargs)
            return func(*args, **kwargs)

        cached_func = cache(func_with_pickled_args)

        @functools.wraps(func)
        def pickled_cached_func(*args, **kwargs):
            pickled_args = pickle.dumps(args)
            pickled_kwargs = pickle.dumps(kwargs)
            return cached_func(pickled_args, pickled_kwargs)

        pickled_cached_func.cache_info = cached_func.cache_info
        pickled_cached_func.cache_clear = cached_func.cache_clear

        return pickled_cached_func

    return pickled_lru_cache_decorator

@pickled_lru_cache(functools.lru_cache(maxsize=3))
def function_that_takes_long(*args, **kwargs):
    pass

With this approach, it is possible to also cache mutable objects. The extra functionality comes with a trade-off: time/performance. The extra steps necessary will of course take longer than the plain @lru_cache. What is the difference? A quick check with timethese gives:

# immutable data structures as args

                     Rate  run_py_lru  run_hashable_lru  run_pickled_lru
      run_py_lru  55952/s           .              566%             633%
run_hashable_lru   8401/s        -85%                 .              10%
 run_pickled_lru   7630/s        -86%               -9%                .


# mutable data structures as args (dict & list)

                     Rate  run_pickled_lru2  run_hashable_lru2
 run_pickled_lru2  4332/s                 .               800%
run_hashable_lru2   482/s              -89%                  .

with run_py_lru as plain @lru_cache, run_hashable_lru as the previous JSON-solution and run_pickled_lru as adapted version with pickle instead of json.

run_pickled_lru and run_hashable_lru are a magnitude (if not more) slower than plain @lru_cache. So think twice before you need a cache that can deal with lists and dictionaries. The performance impact can have two causes (1) wrapping everything in an extra decorator and (2) the overhead created by more complex hashing. In my case it was mostly (1).

Making lru_cache self-less

A second issue that might arise frequently is using @lru_cache in a class/object context. Tricky: first, the self object cannot be hashed and second, caching self could introduce a memory leak, as the caching might not release the object for garbage collection. How to deal with it? And more importantly: should we deal with it?

Based on a stackoverflow post, I played around with extending the @lru_cache to

  1. act on class methods and (see omit_self)
  2. as TTL cache (see add_ttl).

It is definitively possible, as you can see in the code. But should we do these kind of “tricks”? My opinion is: probably not. If you start wrapping the @lru_cache with another decorator that prevents self from being included in the cache entry hash, you will create code that is hard to maintain and reason about (see also the comment on the omit_self decorator). What if the object with the cached function changes and this change needs to be incorporated? What if I want to have a self-aware TTL LRU cache or a totally different cache algorithm? I suggest to make it explicit and use e.g. cachetools for that.

import cachetools


class Foobar:
    def __init__(self):
        self.cache = cachetools.LRUCache(maxsize=10)

    @property
    @cachetools.cachedmethod(lambda self: self.cache)
    def cached_method(self):
        # complex calculation or query here
        return ["some", "important", "result"]

This code snippet is basically a slow version of @cached_property.

Beware: if you use a cachetools cache across methods, there might be key conflicts leading to wrong results. It is better to use multiple caches

import cachetools


class Foobar2:
    def __init__(self):
        self.cache1 = cachetools.LRUCache(maxsize=10)
        self.cache2 = cachetools.LRUCache(maxsize=10)

    @cachetools.cachedmethod(lambda self: self.cache1)
    def cached_method1(self, a, b, c):
        # complex calculation or query here
        return ["some", "important", "result", "1"]

    @cachetools.cachedmethod(lambda self: self.cache2)
    def cached_method2(self, a, b, c):
        # complex calculation or query here
        return ["some", "important", "result", "2"]

or tag your cache entries:

import cachetools
from cachetools.keys import hashkey
from functools import partial


class Foobar3:
    def __init__(self):
        self.cache = cachetools.LRUCache(maxsize=10)

    @cachetools.cachedmethod(
        lambda self: self.cache, key=partial(hashkey, "cached_method1")
    )
    def cached_method1(self, a, b, c):
        # complex calculation or query here
        return ["some", "important", "result", "1"]

    @cachetools.cachedmethod(
        lambda self: self.cache, key=partial(hashkey, "cached_method2")
    )
    def cached_method2(self, a, b, c):
        # complex calculation or query here
        return ["some", "important", "result", "2"]

cachetools provides a versatile interface and the possibility to supply your own key functions:

import pickle
import cachetools


def pickled_hashkey(*args, **kwargs):
    pickled_args = pickle.dumps(args)
    pickled_kwargs = pickle.dumps(kwargs)
    return cachetools.keys.hashkey(pickled_args, pickled_kwargs)


@cachetools.cached(cache, key=pickled_hashkey)
def foobar(a, b, c):
    pass

The solution with cachetools is more developer friendly, but introduces additional complexity in the background. In my performance evaluation it was 50% slower than the custom pickled_lru_cache.

ON/OFF button

For performance evaluation it can be handy to disable caching globally. I usually pack all caching-relevant functions and classes into one module, from which I import throughout my app/code.

This construct gives you the advantage that implementation details can be changed at one place only:

# app/caching.py
import uuid
import cachetools
import pickle
import functools

CACHING_DISABLED = False


def pickled_hashkey(*args, **kwargs):
    if CACHING_DISABLED:
        # every time we generate a cache miss
        return uuid.uuid4()

    pickled_args = pickle.dumps(args)
    pickled_kwargs = pickle.dumps(kwargs)
    return cachetools.keys.hashkey(pickled_args, pickled_kwargs)


if CACHING_DISABLED:
    cached_property = property
else:
    cached_property = functools.cached_property

# app/run.py
from app.caching import cached_property
# instead of
# from functools import cached_property

Conclusion

Caching (in Python) is full of trade-offs and you need to pick the solution that fits your use case. I hope this article will help you in finding that solution.

Have fun!

See also