One of the nice things about an interpreted language like PHP is that you don’t need to worry about memory management. The underlying system is able to figure out what objects are no longer needed and get rid of them on your behalf. You’re free to spend your time writing useful code instead of reinventing the memory management wheel.

In theory.

As I wrote about earlier, my walker processes were gobbling up large (and growing) amounts of memory for no good reason. Well, I finally tracked down and fixed the problem. Gory details to follow, so if you’re not a coder you can stop reading now.

Feedwhip’s Blender framework has a built-in caching system. Whenever an object is loaded from the database, it stores a pointer to that object in a global table. The next time somebody wants to use that same object, we can just pull it out of memory instead of hitting the database. The not only reduces the load on the database, but cuts down on some heavy computation we do to regenerate full web pages based on just the differences between each snapshot.

This works well but there’s an obvious problem — if you let the walker run long enough, eventually you’ll have pulled the entire database into memory. Or maybe PHP will just run out of memory and die. Either way, you’ve got a problem. So, I added a function to clear the global cache. This is what it looks like, more or less:


function clear_global_cache()
{
$the_global_cache = array();
}

The theory is that by assigning the global cache to an empty array, the system will notice that all the objects it used to contain are no longer needed, and it’ll clean those up. Again, that’s in theory, and in practice nothing was happening. So, I changed the code to a series of array_pop()s, and that seemed to improve things, but not fix them entirely.

The problem is that in addition to having a global cache, I’ve got a local cache. The local cache is attached to each object, and it stores pointers to objects that it has directly referenced. For example, every subscription has an associated user. The first time you ask a subscription for its user, it’ll go to the database and grab it. The next time, it’ll just use the local cache. This works well, except that I’ve now got all kinds of objects pointing at each other. This is a classic circular reference problem and I had assumed that PHP was smart enough to detect these, but I guess I was wrong.

To fix this problem I used the classic solution to the classic circular reference problem: I created a shutdown() method. Actually, my method is called clear_local_cache(), and it is called recursively on all the objects in all the caches. So, now the clear_global_cache code looks like this:

static function clear_global_cache()
{
while( count( $global_cache ) )
{
$obj = array_pop( $global_cache );
if( is_array( $obj ) )
{
while( count( $obj ) )
{
$obj2 = array_pop( $obj );
if( NULL != $obj2 &&
is_a( $obj2, "BlenderObject" ) )
{
$obj2->clear_local_cache();
}
$obj2 = NULL;
}
}
$obj = NULL;
}
}

The memory usage (according to the memory_get_usage function) now hovers nicely right around 5MB per walker. Sadly, top is still reporting 20-50MB per PHP process after about 20 minutes, and growing, albeit more slowly. Happily, overall bandwidth consumption is higher — which is probably the best metric I have for how efficiently the system is running.

The changes I’ve made are good enough for now, I think. Time to move on to more visible features.

Advertisements