I dropped in support for automatic tagging yesterday. Or was that the day before? Anyway, the code automatically looks for interesting words and tags feeds based on what is in them. It’s a good first try, not bad for a day of work, but it could use some improvement.

User-specified tags are pretty much done as well, but not launched yet. I need to figure out a way to integrate the two tagging systems.

The goal behind the tagging system is to make it easier for people to find existing feeds related to what they’re looking for. And, honestly, the goal behind THAT is to drive more page views which may one day generate more ad revenue. I ain’t running a charity here. šŸ™‚

I’ll be pounding out a bunch of new changes over the next few weeks, and then things will come to a grinding halt. I’m going to work for Picnik later this month, and that won’t leave much time for this little hobby of mine. I’ll have more to say about Picnik in a later post.


Things have been quiet on this blog lately, but it’s not because I’m ignoring Feedwhip. On the contrary: I’ve been getting up early every morning and putting in some hours before switching over to the job that pays the bills.

Feedwhip’s big server died a few months ago, and I had to push the service onto a much less powerful server (from 2GB ram to 512MB and from two fast CPUs to one slower one). This had the expected effect of slowing down the service overall. Instead of running through all the feeds once per hour, it took upwards of four or five hours — even with user limits scaled way back.

Ever since that downgrade I’ve been working on performance issues. I drastically reduced the memory footprint, and rewrote a ton of the feed processing code to make it much more efficient. After several fits and starts, this weekend I finally started seeing this:

Feedwhip bandwidth graph

That’s a graph of the amount of bandwidth Feedwhip consumes. As you can see, it’s got a nice, steady, hourly rhythm to it. Previous incarnations would get stuck at a steady 100 kbps (far too low to be able to process everything in an hour), or slowly trail off over time (due to memory leaks or fatal errors). Now, though, I’ve got an awesome, steady signal. Seriously, I love that graph.

These performance improvements have taken a long time and been occasionally frustrating, so it’s really gratifying to see them finally pay off. The real beneficiaries, though, will be the end users. First of all, I’m now ready to throw the code onto a hosted server somewhere outside my basement — which means everybody will be able to get decent connection speeds and an improved user experience. And even more important, I can start working on a whole slew of new features that I know everyone is going to love.

Kid666.com talks a bit about Feedwhip in a recent blog post.

Technical limitations aside, what I liked best about Feedwhip was the idea of getting a feed for information from anywhere. Iā€™d like to see something which takes this further and produces a usable interface to page scrape directly into a feed. Something like that which could be thrown at Pipes would be incredible. A user could create a feed of any information they wanted without any coding at all and subscribe to it with any feed reader at all.

This is exactly the kind of thing I had in mind when I first created Feedwhip — the ability to take any information from anywhere on the net and turn it into something more useful to you. Progress towards that goal has been slow lately, but it’s gratifying to know that somebody else can sense where we’re heading way, way before we get there.

Although I haven’t been posting much lately, don’t worry — Feedwhip is still alive and well and I’m still slowly pushing it forward. I’ve got a paying job which is taking priority, though, so I don’t have much extra time for blogging right now.

In the meantime, Feedwhip was mentioned in this blog post as a potential acquisition target for next month.

How much of a chance is there for essentialist alternatives in a post-autonomy world? Today’s small startup (e.g. http://www.feedwhip.com/) will be in the pocket of one media giant or the other by next month. And of course, this is not a new development.

Woo hoo! Seriously, though, I am open to M&A discussions…

Feedwhip’s big server has been stumbling heavily the past few days and as a result, no notifications were going out. The culprit is a bad hard disk. To get things running again I have moved all of Feedwhip’s backend operations onto a different, much less powerful server. And so, I’ve also had to dial down everyone’s notification frequencies and subscription limits.

The timing of this move is actually a little convenient, since I’m in the process of acquiring a full-time job, and my opportunities to work on Feedwhip are going to be severely limited in the future. I need to dial back Feedwhip’s growth and bandwidth consumption to a reasonable level. Remember, I’m not making any money off of Feedwhip — the cash is definitely flowing in the opposite direction.

I know this will come as a disappointment to people who have come to depend on Feedwhip for hourly updates of their hundreds of feeds, but it’s just not possible for me to support that level of service going forward. Feedwhip will continue to be operate, and will continue to be free, but I can’t afford to be as generous as before.

If you have any questions, I’d be happy to answer them. If you’re interesting in moving your feeds to another notification provider, I can help you migrate. Just drop us a line on our contact page.

The database server is struggling with a hard disk which is on the verge of failing. Rebooting seems to fix things only temporarily. The data is safely backed up, so that’s not a concern, but I don’t have another server able to handle the load of the database server. So, I don’t have a good solution at hand. Anyone want to buy me a new server?

As I’ve been clicking around local companies to see who’s hiring for what, I’ve come up with a short list of criteria for the kind of company that I’d like to work at:

  • Inside the Seattle city limits. Sitting in my car in traffic is a colossal waste of time. I’ll most likely be taking the bus or riding my bike to work, and I’d like to keep my commute short. Also, working in an interesting neighborhood in Seattle would be nice, too.
  • Small. “Small” could mean anything from two to a hundred people. This rules out big players like Microsoft, Amazon, Adobe… The biggest reason I want a small company is that I want to feel like my contribution is important to the bottom line. I never got that feeling at Microsoft — no matter how hard I worked, Windows would still make the company a gazillion dollars. Furthermore, big companies tend to require extreme specialization, whereas I’ve got a breadth of talent that I’d like to see exercised. Finally, big companies tend towards big, proven ideas. I want to be somewhere that will be willing to give small ideas a shot.
  • Flexibility with my time. I’ve been working from home for 3 years now, and I’ve grown accustomed to setting my own hours. I would love to get back in an office and interact with real humans on a daily basis, but I also like taking my daughter to the grocery store every other day. I’m being realistic with this one, though — face time with my coworkers is important to me, personally, and to the company’s overall productivity.
  • A real salary and benefits. For the past three years I’ve worked for free, I’ve worked for equity, and I’ve worked for half-pay. Sadly, stock options don’t pay the bills.
  • A company I believe in. This is really two things: first, there’s got to be a realistic business model in place. Not just an idea for a popular product, but an idea for a money-making product. Secondly, I want to the product to be something I can really get excited about, and get excited about telling my friends and family about. My wife, parents, and in-laws use my current project (Feedwhip) on a regular basis, and that is really gratifying.
  • Good people. Despite being last on the list, this one may actually be the most important. I’ll sacrifice a lot of the other bullet points to be working with smart, creative people in a productive, supportive environment. However, since every company talks about how they only hire smart, creative people to work in their amazing workplace, this is just going to be a gut-feeling call. Either I click with the people, or I don’t.

Although I still have a ton of ideas about how to improve Feedwhip, I’m putting things on hold. Feedwhip has been fun, but I’ll be lucky if it ever pays its own bandwidth costs, let alone other bills like, say, the mortgage. I’ve been thinking about getting a “real” job lately, and I’ve done a bit of looking at the local job market.

What’s been surprising (and a little disappointing) is how little demand there is for PHP engineers. ASP.NET/C# and Java dominate the market. Ruby is starting to make some inroads. PHP is really nowhere to be found — especially if you’re looking for advanced engineering work.

To that end, I’ve decided to pick up (yet) another language. As a seasoned engineer, I’ve gotten to the point where the language itself isn’t so important — they’re all more or less the same, although Ruby is a bit of an outlier — but it’s the surrounding framework and tools that take time to learn.

I’ve done some work in C#/.net in the past, but I’d like to stay away from MS platforms for now (why pay for software when the equivalent is available for free?). Java would be fine, but it appears that a lot of Java developers are choosing to do their new work in Ruby — so, Ruby it is!

I spent about a week playing with Ruby before starting Feedwhip, but found that the amount of magic — things which just happened without you really understanding how or why — to be a little frustrating to dig through. In the end I decided to borrow a bunch of ideas from Rails and create my own framework in PHP — an endeavor which was fun, educational, and pretty darn successful.

Now, I’m diving into Ruby on Rails waters for the second time. After plumbing the depths of PHP and model-view-controller frameworks over the past 18 months, I think I’ve got a better appreciation for just how handy Ruby’s magic is.

I’ve been grinding away at Feedwhip’s performance issues for almost a month now. I’ve made some huge improvements: connect times are 25x faster, page generation is 10x faster, and overall throughput is more than 50x better.

Here are a few tips and tricks that I can pass on:

  • First of all, know your code: use a profiler. I used APD and it showed me exactly where CPU cycles were being burned — in some surprising places.
  • Calling functions in PHP is expensive. This is surprising and stupid and all those nicely abstracted classes you created will end up costing you in the long run. PHP benefits greatly from caching values locally. So, for example, don’t write this:

    for( $i=0; $i < count( $my_array ); $i++ ) ...

    Instead, do this:

    $c = count( $my_array );
    for( $i = 0; $i < $c; $i++ ) ...

    You can eliminate function calls in surprising ways. For example, instead of doing this:

    if( strlen( $my_string ) > 0 ) ...

    do this:

    if( isset( $my_string[0] ) ) ...

    isset is a statement, not a function, and it operates much faster.

    Pay special attention to function calls inside of loops and sort comparison functions. Remember that if your object implements __set (so that you can do $object->property arbitrarily), then each access of those properties is a function call. Cache those values in a local variable if you're looping or sorting.

    As part of my performance improvements, I got rid of lots of function calls which gave me nicely abstracted object properties, and instead I accessed the properties directly in the array in which they were stored. Sucks for overriding functionality in subclasses, but that's the price you've got to pay.

  • Refer to your database server by IP address instead of by name. This one-line change (you do store this value in just one place, right?) gave me a jaw-dropping 10x performance boost. If your database is on the same machine as the PHP code, you can call it
  • To reduce load times, use an opcode cache. I use APC. It is free and extremely easy to use.
  • Even with an opcode cache, you need to reduce the amount of code that is loaded. Use PHP's __autoload function to dynamically pull in only the files you need to use.
  • Cache pages which don't change very often. It is trivial to create a file caching system on the web server -- there are plenty of sample classes available, or you can roll your own in about an hour like I did. Feedwhip tends to have lots of dynamic content, so this doesn't work for every page, but right now I only need to do real work for about 10% of the RSS requests -- the rest of the time I either serve up a 304 (not modified) or dump a file directly out of the cache.
  • Cache compute-expensive data. A bigger hard disk is cheaper than a second server. Feedwhip is caching RSS requests, simplified versions of HTML pages, generated feed items, and the list of most recent feed items for a subscription. We could go back and recalculate all of those values, as needed, off of the original HTML snaps, but that would take a horrific amount of time.

Now that Feedwhip's performance is back in the realm of usefulness, I'm going to step away from the code for a week or two. I need to get some perspective on where Feedwhip is and where it needs to go. As always, I love to hear from Feedwhip's users -- it just takes one suggestion to get the feature you've always wanted!

When I got back to my computer this afternoon I took a look at the performance of my RSS feeds over the past few hours:

3498.8924 | 1808.4851 |
3522.5980 | 1610.3802 |
2838.9859 | 1510.7083 |
428.4015 | 191.3470 |
588.4480 | 238.6685 |
415.9069 | 123.9909 |
525.5345 | 215.9800 |

Each row is the average time (in milliseconds) it took to generate a page for a given hour. The left column is the average time for each page, and the right column is the amount of time just spent querying the database.

Four hours ago, something big changed. At first, I thought that four hours ago something big broke, like maybe all of the crawlers had suddenly stopped working and taken their load off the database, but no, everything is working fine. Almost too fine.

Then I remembered the last thing I’d done before I took Ruby out this morning…

Recalling something that was mentioned at last month’s PHP conference, I changed the web server’s configuration to refer to the database server by IP address instead of by name. So, instead of querying db.feedwhip.com, it was going to 216.172.217.XXX. That one change knocked almost 90% off the average amount of time I spent waiting on the database.