|
on caching
|
Tom Ritchford
|
May 17, 2001 14:15 PDT
|
at least we can attack subtle issues like caching early and
get a nice clean solution that we can build on later.
here's the issue, first in terms of the chess problem.
consider a view of a chess position.
the actual result might depend on a lot of things:
- the position itself
- information like chat attached to that position
- the protocol of the request
- user settings (ie, classic (P-K4) or algebraic (e5))
we want to cache these views.
Now, we can't just compute all the possible combinations
of protocol and settings when we first visit the node
because 90% of the nodes will only be visited rarely
(whereas a few nodes will get 90% of the visits and be
requested in all protocols).
Let's write it out as as function of view)
result = view( position, positionState, protocol, user );
now there are two possibilities.
positionState can change -- for example, someone adds chat or
revalues the position.
that will invalidate "all" cached views of this position.
a different request can come in a different protocol, say, or from
a different
user setting. this will result in a new description being added but it won't
result in any old items being invalid. it might result in the cache throwing
out some older items... "cache garbage collection".
so we have "cache invalidation" and "cache gc" as our two ways
that an object can leave the cache.
remember, it's an actual error to deliver an invalid entry
from the cache! it's OK not to delete invalid entries immediately
but they must never be released to an unsuspecting consumer.
it's also the case that the best caching strategy isn't just
to dump the oldest item from the cache. For example,
a difficult-to-compute and small object would be much more
important to cache than a simple-to-compute and large object.
finally, the cache interface needs to be transparent.
this means two things:
- the cached and uncached function must have exactly the same interface.
- the actual implementation of the cache could be in memory or database
or disk but the interface will be the same.
so here's what we need:
- an uncached function -- "the view"
- a validation function -- "would any view of this object be valid?"
- an equality function -- "is this the same view of this object?"
For the equality function, we can use Object.equals
- a garbage collection priority function
that I'm a little hazy on right now...
Here's some Java:
public interface Function { Object eval( Object x ); }
public interface Filter { boolean accept( Object x ); }
public class CachedFunction implements Function {
public final Function cachedFunction;
public final Filter isValid;
public CachedFunction(
Function cachedFunction,
Filter isValid
) {
this.cachedFunction = cachedFunction;
this.isValid = isValid;
}
public Object eval( Object x ) {
// not really a cache at all!
// a real cache would actually do something here.
// a real cache would probably know a lot about Object x, too.
//
return cachedFunction.eval( x );
}
public void invalidate( Object x ) {
// object x has become invalid.
// this might invalidate other objects too.
}
}
There is NO reason why specific caches can't know a very
great deal about the cachedFunction and the Object x!
Or, you could make a generic Hashtable-based cache
that would work for ANY cachedFunction and x, but less
effectively.
public class HashedCachedFunction extends CachedFunction {
protected Hashtable hashtable = new Hashtable();
public Object eval( Object x ) {
Object ret = findInCache( x );
if (ret == null) {
ret = super.eval( x );
addToCache( x, ret );
garbageCollect();
}
return ret;
}
public void invalidate( Object x ) {
// etc
}
}
comments?
...electronic a cappella madness <http://volectrix.com>.........
...extreme internet radio <http://extremeNY.com/radio>...
|
|
 |
|