I have a couple matlab functions that take a long time to compute. Sometimes I remember to save the results, sometimes I forget. Here is the skeleton of a function that automatically cache's every input state and output state it sees. That way if the function is ever called again with the same input, the result is directly loaded from file rather than recomputing the result. It's written very generically so you should be able to just past in your existing function. I save the following in cache_test.m:
function C = cache_test(A,B)
% CACHE_TEST Dummy program that computes C = A+B, but demonstrates how to use
% md5 caching of function results on input parameters
%
% C = cache_test(A,B)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Check for cached result, do NOT edit variables until cache is checked,
% your function code comes later. See below
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% get a list of current variables in this scope, this is the input "state"
variables = who;
% get a temporary file's name
tmpf = [tempname('.') '.mat'];
% save the "state" to file, so we can get a md5 checksum
save(tmpf,'-regexp',sprintf('^%s$|',variables{:}));
% get md5 checksum on input "state", we append .cache.mat to the check sum
% because we'll use the checksum as the cache file name
[s,cache_name] = system(['tail -c +117 ' tmpf ...
' | md5 -r | awk ''{printf "."$1".cache.mat"}''']);
% clean up
delete(tmpf);
clear s tmpf variables;
% If the checksum cache file exists then we've seen this input "state"
% before, load in cached output "state"
if(exist(cache_name,'file'))
fprintf('Cache exists. Using cache...\n');
% use cache
load(cache_name);
% Otherwise this is the first time we've seen this input "state", so we
% execute the function as usual and save the output "state" to the cache
else
fprintf('First time. Creating cache...\n');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Your function code goes here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
C = A+B;
% get list of variables present in this scope at finish of function code,
% this is the output "state"
variables = who;
% save output "state" to file, using md5 checksum cache file name
save(cache_name,'-regexp',sprintf('^%s$|',variables{:}));
end
end
If I call:
cache_test(5,4)
I see
First time. Creating cache...
ans =
9
And if I call it again, I see:
Cache exists. Using cache...
ans =
9
Update: Weirdly this doesn't seem to support Logical variables or cells
Update: I'm finally taking the advice below and saving to binary. For this you need to chop off the first 166 bytes. So you need tail
installed. This supports all variable types, especially it supports cells including varargin
. I've updated the code above. But now I recommend checking out the utility/find_cache.m
and utility/create_cache.m
functions in my gptoolbox. These allow caching with minimal change to your code. Suppose you have an expensive function saved in expensive_function.m
looking like:
function X = expensive_function(varargin)
... % DOING SOMETHING THAT TAKES A LONG TIME AND
... % DOES NOT CHANGE FOR IDENTICAL INPUT
end
Then you can cache-ify this file by adding a little bit of code at the very beginning and one line at the very end:
function X = expensive_function(varargin)
[cache_exists,cache_name] = find_cache();
if cache_exists
fprintf('Using expensive_function cache...\n');
load(cache_name);
return;
end
fprintf('Calling expensive_function...\n');
... % DOING SOMETHING THAT TAKES A LONG TIME AND
... % DOES NOT CHANGE FOR IDENTICAL INPUT
create_cache(cache_name);
end
Warning: this will not work if expensive_function
sets varargout
rather than using named output variables. Probably you just need some logic to separate loading, computing and setting varargout and it would.