'Python functools lru_cache with instance methods: release object
How can I use functools.lru_cache
inside classes without leaking memory?
In the following minimal example the foo
instance won't be released although going out of scope and having no referrer (other than the lru_cache
).
from functools import lru_cache
class BigClass:
pass
class Foo:
def __init__(self):
self.big = BigClass()
@lru_cache(maxsize=16)
def cached_method(self, x):
return x + 5
def fun():
foo = Foo()
print(foo.cached_method(10))
print(foo.cached_method(10)) # use cache
return 'something'
fun()
But foo
and hence foo.big
(a BigClass
) are still alive
import gc; gc.collect() # collect garbage
len([obj for obj in gc.get_objects() if isinstance(obj, Foo)]) # is 1
That means that Foo
/BigClass
instances are still residing in memory. Even deleting Foo
(del Foo
) will not release them.
Why is lru_cache
holding on to the instance at all? Doesn't the cache use some hash and not the actual object?
What is the recommended way use lru_cache
s inside classes?
I know of two workarounds: Use per instance caches or make the cache ignore object (which might lead to wrong results, though)
Solution 1:[1]
This is not the cleanest solution, but it's entirely transparent to the programmer:
import functools
import weakref
def memoized_method(*lru_args, **lru_kwargs):
def decorator(func):
@functools.wraps(func)
def wrapped_func(self, *args, **kwargs):
# We're storing the wrapped method inside the instance. If we had
# a strong reference to self the instance would never die.
self_weak = weakref.ref(self)
@functools.wraps(func)
@functools.lru_cache(*lru_args, **lru_kwargs)
def cached_method(*args, **kwargs):
return func(self_weak(), *args, **kwargs)
setattr(self, func.__name__, cached_method)
return cached_method(*args, **kwargs)
return wrapped_func
return decorator
It takes the exact same parameters as lru_cache
, and works exactly the same. However it never passes self
to lru_cache
and instead uses a per-instance lru_cache
.
Solution 2:[2]
I will introduce methodtools
for this use case.
pip install methodtools
to install https://pypi.org/project/methodtools/
Then your code will work just by replacing functools to methodtools.
from methodtools import lru_cache
class Foo:
@lru_cache(maxsize=16)
def cached_method(self, x):
return x + 5
Of course the gc test also returns 0 too.
Solution 3:[3]
Simple wrapper solution
Here's a wrapper that will keep a weak reference to the instance:
import functools
import weakref
def weak_lru(maxsize=128, typed=False):
'LRU Cache decorator that keeps a weak reference to "self"'
def wrapper(func):
@functools.lru_cache(maxsize, typed)
def _func(_self, *args, **kwargs):
return func(_self(), *args, **kwargs)
@functools.wraps(func)
def inner(self, *args, **kwargs):
return _func(weakref.ref(self), *args, **kwargs)
return inner
return wrapper
Example
Use it like this:
class Weather:
"Lookup weather information on a government website"
def __init__(self, station_id):
self.station_id = station_id
@weak_lru(maxsize=10)
def climate(self, category='average_temperature'):
print('Simulating a slow method call!')
return self.station_id + category
When to use it
Since the weakrefs add some overhead, you would only want to use this when the instances are large and the application can't wait for the older unused calls to age out of the cache.
Why this is better
Unlike the other answer, we only have one cache for the class and not one per instance. This is important if you want to get some benefit from the least recently used algorithm. With a single cache per method, you can set the maxsize so that the total memory use is bounded regardless of the number of instances that are alive.
Dealing with mutable attributes
If any of the attributes used in the method are mutable, be sure to add _eq_() and _hash_() methods:
class Weather:
"Lookup weather information on a government website"
def __init__(self, station_id):
self.station_id = station_id
def update_station(station_id):
self.station_id = station_id
def __eq__(self, other):
return self.station_id == other.station_id
def __hash__(self):
return hash(self.station_id)
Solution 4:[4]
An even simpler solution to this problem is to declare the cache in the constructor and not in the class definition:
from functools import lru_cache
import gc
class BigClass:
pass
class Foo:
def __init__(self):
self.big = BigClass()
self.cached_method = lru_cache(maxsize=16)(self.cached_method)
def cached_method(self, x):
return x + 5
def fun():
foo = Foo()
print(foo.cached_method(10))
print(foo.cached_method(10)) # use cache
return 'something'
if __name__ == '__main__':
fun()
gc.collect() # collect garbage
print(len([obj for obj in gc.get_objects() if isinstance(obj, Foo)])) # is 0
Solution 5:[5]
python 3.8 introduced the cached_property
decorator in the functools
module.
when tested its seems to not retain the instances.
If you don't want to update to python 3.8 you can use the source code.
All you need is to import RLock
and create the _NOT_FOUND
object. meaning:
from threading import RLock
_NOT_FOUND = object()
class cached_property:
# https://github.com/python/cpython/blob/v3.8.0/Lib/functools.py#L930
...
Solution 6:[6]
You can move the implementation of the method to a module global function, pass only relevant data from self
when calling it from the method, and use @lru_cache
on the function.
An added benefit from this approach is that even if your classes are mutable, the cache will be correct. And the cache key is more explicit as just the relevant data is in the signature of the cached function.
To make the example slightly more realistic, let's assume cached_method()
needs information from self.big
:
from dataclasses import dataclass
from functools import lru_cache
@dataclass
class BigClass:
base: int
class Foo:
def __init__(self):
self.big = BigClass(base=100)
@lru_cache(maxsize=16) # the leak is here
def cached_method(self, x: int) -> int:
return self.big.base + x
def fun():
foo = Foo()
print(foo.cached_method(10))
print(foo.cached_method(10)) # use cache
return 'something'
fun()
Now move the implementation outside the class:
from dataclasses import dataclass
from functools import lru_cache
@dataclass
class BigClass:
base: int
@lru_cache(maxsize=16) # no leak from here
def _cached_method(base: int, x: int) -> int:
return base + x
class Foo:
def __init__(self):
self.big = BigClass(base=100)
def cached_method(self, x: int) -> int:
return _cached_method(self.big.base, x)
def fun():
foo = Foo()
print(foo.cached_method(10))
print(foo.cached_method(10)) # use cache
return 'something'
fun()
Solution 7:[7]
Solution
Below a small drop-in replacement for (and wrapper around) lru_cache
which puts the LRU cache on the instance (object) and not on the class.
Summary
The replacement combines lru_cache
with cached_property
. It uses cached_property
to store the cached method on the instance on first access; this way the lru_cache
follows the object and as a bonus it can be used on unhashable objects like a non-frozen dataclass
.
How to use it
Use @instance_lru_cache
instead of @lru_cache
to do decorate a method and you're all set. Decorator arguments are supported, e.g. @instance_lru_cache(maxsize=None)
Comparison with other answers
The result is comparable to the answers provided by pabloi and akaihola, but with a simple decorator syntax. Compared to the answer provided by youknowone, this decorator is type hinted and does not require third-party libraries (result is comparable).
This answer differs from the answer provided by Raymond Hettinger as the cache is now stored on the instance (which means the maxsize is defined per instance and not per class) and it works on methods of unhashable objects.
from functools import cached_property, lru_cache, partial, update_wrapper
from typing import Callable, Optional, TypeVar, Union
T = TypeVar("T")
def instance_lru_cache(
method: Optional[Callable[..., T]] = None,
*,
maxsize: Optional[int] = 128,
typed: bool = False
) -> Union[Callable[..., T], Callable[[Callable[..., T]], Callable[..., T]]]:
"""Least-recently-used cache decorator for instance methods.
The cache follows the lifetime of an object (it is stored on the object,
not on the class) and can be used on unhashable objects. Wrapper around
functools.lru_cache.
If *maxsize* is set to None, the LRU features are disabled and the cache
can grow without bound.
If *typed* is True, arguments of different types will be cached separately.
For example, f(3.0) and f(3) will be treated as distinct calls with
distinct results.
Arguments to the cached method (other than 'self') must be hashable.
View the cache statistics named tuple (hits, misses, maxsize, currsize)
with f.cache_info(). Clear the cache and statistics with f.cache_clear().
Access the underlying function with f.__wrapped__.
"""
def decorator(wrapped: Callable[..., T]) -> Callable[..., T]:
def wrapper(self: object) -> Callable[..., T]:
return lru_cache(maxsize=maxsize, typed=typed)(
update_wrapper(partial(wrapped, self), wrapped)
)
return cached_property(wrapper) # type: ignore
return decorator if method is None else decorator(method)
Solution 8:[8]
The problem with using @lru_cache or @cache on an instance method is that self is passed to the method for caching despite not really being needed. I can't tell you why caching self causes the issue but I can give you what I think is a very elegant solution to the problem.
My preferred way of dealing with this is to define a dunder method that is a class method that takes all the same arguments as the instance method except for self. The reason this is my preferred way is that it's very clear, minimalistic and doesn't rely on external libraries.
from functools import lru_cache
class BigClass:
pass
class Foo:
def __init__(self):
self.big = BigClass()
@staticmethod
@lru_cache(maxsize=16)
def __cached_method__(x: int) -> int:
return x + 5
def cached_method(self, x: int) -> int:
return self.__cached_method__(x)
def fun():
foo = Foo()
print(foo.cached_method(10))
print(foo.cached_method(10)) # use cache
return 'something'
fun()
I have verified that the item is garbage collected correctly:
import gc; gc.collect() # collect garbage
len([obj for obj in gc.get_objects() if isinstance(obj, Foo)]) # is 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | youknowone |
Solution 3 | |
Solution 4 | pabloi |
Solution 5 | |
Solution 6 | akaihola |
Solution 7 | |
Solution 8 |