I’ve been grinding away at Feedwhip’s performance issues for almost a month now. I’ve made some huge improvements: connect times are 25x faster, page generation is 10x faster, and overall throughput is more than 50x better.

Here are a few tips and tricks that I can pass on:

  • First of all, know your code: use a profiler. I used APD and it showed me exactly where CPU cycles were being burned — in some surprising places.
  • Calling functions in PHP is expensive. This is surprising and stupid and all those nicely abstracted classes you created will end up costing you in the long run. PHP benefits greatly from caching values locally. So, for example, don’t write this:

    for( $i=0; $i < count( $my_array ); $i++ ) ...

    Instead, do this:

    $c = count( $my_array );
    for( $i = 0; $i < $c; $i++ ) ...

    You can eliminate function calls in surprising ways. For example, instead of doing this:

    if( strlen( $my_string ) > 0 ) ...

    do this:

    if( isset( $my_string[0] ) ) ...

    isset is a statement, not a function, and it operates much faster.

    Pay special attention to function calls inside of loops and sort comparison functions. Remember that if your object implements __set (so that you can do $object->property arbitrarily), then each access of those properties is a function call. Cache those values in a local variable if you're looping or sorting.

    As part of my performance improvements, I got rid of lots of function calls which gave me nicely abstracted object properties, and instead I accessed the properties directly in the array in which they were stored. Sucks for overriding functionality in subclasses, but that's the price you've got to pay.

  • Refer to your database server by IP address instead of by name. This one-line change (you do store this value in just one place, right?) gave me a jaw-dropping 10x performance boost. If your database is on the same machine as the PHP code, you can call it
  • To reduce load times, use an opcode cache. I use APC. It is free and extremely easy to use.
  • Even with an opcode cache, you need to reduce the amount of code that is loaded. Use PHP's __autoload function to dynamically pull in only the files you need to use.
  • Cache pages which don't change very often. It is trivial to create a file caching system on the web server -- there are plenty of sample classes available, or you can roll your own in about an hour like I did. Feedwhip tends to have lots of dynamic content, so this doesn't work for every page, but right now I only need to do real work for about 10% of the RSS requests -- the rest of the time I either serve up a 304 (not modified) or dump a file directly out of the cache.
  • Cache compute-expensive data. A bigger hard disk is cheaper than a second server. Feedwhip is caching RSS requests, simplified versions of HTML pages, generated feed items, and the list of most recent feed items for a subscription. We could go back and recalculate all of those values, as needed, off of the original HTML snaps, but that would take a horrific amount of time.

Now that Feedwhip's performance is back in the realm of usefulness, I'm going to step away from the code for a week or two. I need to get some perspective on where Feedwhip is and where it needs to go. As always, I love to hear from Feedwhip's users -- it just takes one suggestion to get the feature you've always wanted!