New in PyXLL 5.11
If you have one or more functions that take a long time to execute, or are called repeatedly with the same arguments, caching the function can speed things up.
PyXLL includes an implementation of a Least Recently Used Cache, or LRU Cache, that will cache function results so that repeated calls with the same arguments return a cached result instead of repeatedly calling the actual Python function.
The Least Recently Used cache stores up to a maximum number of results, keyed by the function arguments. Once the maximum number of results is exceeded, the least recently used result is removed from the cache to make space for the latest result.
The LRU cache is enabled for a function using the lru_cache keyword argument to xl_func:
from pyxll import xl_func
@xl_func(lru_cache=3)
def cached_function(a, b):
...
Excel tracks when each cell needs to be recalculated. To do this, it keeps a record of which cells depend on which other cells - this is referred to as the Excel Dependency Graph. When a cell is updated, Excel knows that any other cells that depend on that cell need to be recalculated.
When in automatic calculation mode, when Excel determines a cell needs to be recalculated it will do so immediately.
In manual calculation mode, when Excel determines a cell nees to be recalculated it will mark the cell as dirty and it will be calculated when Excel next recalculates (usually when the user presses F9).
Excel doesn’t consider the cell value when deciding if a cell needs to be recalculated or not. If a cell is updated with the same value, Excel will still recalculate any dependent cells.
There are other reasons Excel can decide a cell needs to be recalculated. For example, if a function used in the cell,
or if one of the cell’s dependencies, is volatile. Various standard Excel functions are volatile functions, including
TODAY, NOW, RAND, RANDBETWEEN, INDIRECT, OFFSET, INFO, CELL, and SUMIF.
Excel can also decide that a cell needs recalculating after rows or columns are inserted or deleted.
For a complete description of when Excel might determine a cell needs recalculating see the Excel documentation page Excel Recalcuation.
To enable LRU caching for a PyXLL worksheet function you pass the keyword argument lru_cache to the xl_func
decorator.
The value of the lru_cache keyword argument can be an integer, True, or False.
If lru_cache is a positive integer then the LRU cache will cache up to a maximum number of results set by the number.
If lru_cache is True, zero, or a negative integer, the size of the cache will be infinite.
If lru_cache is False or None the function will not be cached.
For example, the following function will cache up to 3 results, keyed by its arguments:
from pyxll import xl_func
@xl_func(lru_cache=3)
def cached_function(a, b):
...
Note
PyXLL’s LRU cache is only used when calling the function from Excel.
If the function is called from Python then the cache is not used.
For caching functions when called from Python, see functools.lru_cache in the
standard Python library for an alternative.
It is possible to query the cache for some basic statistics for each cached function.
The function lru_cache_info when called with a function returns a dictionary
with the following keys:
maxsize - maximum number of cached reuslts, or 0 if unbounded
currsize - current number of cached results
hits - number of times a cached result has been returned
misses - number of times the function was called without finding a cached result
If no results are found an empty dictionary is returned.
lru_cache_info can also be called with no arguments. In that case it returns
a dictionary of function names to dictionaries of the statistics listed above.
The LRU cache is always cleared when the PyXLL add-in is reloaded.
It can also be cleared using the lru_cache_clear function.
If lru_cache_clear is called with a function, only cached results for that function are cleared.
If lru_cache_clear is called with no arguments, all cached results are cleared.
PyXLL’s LRU cache is conceptually very similar to functools.lru_cache from the Python standard library. There are
however some important differences.
Python’s functools.lru_cache requires all function arguments to be hashable. This is required as it needs to
store the arguments efficiently and quickly determine if a function call has been made with the same arguments before.
This can be problematic when using arguments such as DataFrames or other more complex Python objects that are not easily hashable.
PyXLL’s LRU cache has the same requirement as it works in a similar way, but it tests the arguments before they are converted to the final Python values. Because Excel values are always simple primitives (strings, numbers, booleans, etc) they are always easily hashable. Furthermore, when a cached result already exists for a set of arguments the cached result can be returned without doing the full conversion from the primitive Excel values to the final Python arguments, which can reduce the total time taken.
Python’s functools.lru_cache has no knowledge of Excel’s RTD or asynchronous functions. Wrapping one of these
functions using functools.lru_cache takes a little more work. PyXLL’s LRU cache on the other hand knows about these
different types of functions and handles them automatically.
Once final difference is with how PyXLL’s cached objects are handled. When using PyXLL’s LRU cache, if a function returns
a cached object back to Excel the object handle used doesn’t update when a cached result is found. This is intentional as
it means that dependent functions can also be cached and don’t need to be recalculated unless the object handle updates.
When using functools.lru_cache, if the function returns a cached object back to Excel then the object handle will
update each time regardless of whether the result was cached or not.
If you have a sheet with cells dependent on a function using the LRU cache, those dependent cells will still be recalculated even if a cached result is returned.
For example, suppose you have the following:
A1 = 1
A2 = foo(A1)
A3 = bar(A2)
Where foo is a cached function using lru_cache and bar is not.
When A1 is changed, Excel will mark A2 and A3 as being dirty. When Excel next recalculates, either automatically
or when triggered manually, it will recalculate the cells in their dependency order A2 followed by A3.
When recalculating A2 if foo has a cached result for the updated argument that that cached result will be called without
calling the Python function foo.
This doesn’t affect the fact that Excel has already marked A3 as being dirty though. Even if the result returned to cell
A2 hasn’t changed, Excel will still consider A3 as dirty.
Excel will continue its recalculation and still recalculate cell A3 and call the function bar.
While it is not possible to change how Excel recalculates, enabling LRU caching for bar also will mean that if it has
been called with the same arguments previously (and the result is still cached) the cached result can be used.