Welcome to Community Server Sign in | Join | Help

Incremental RSS Feeds

One of the things that never really felt intuitive when I was plunking around with my own aggregator in the hazy days before NewsGator was the notion of RSS as a static feed of post items. While I understood how it worked and why this historically probably came to pass, one of the more tedious parts of pulling in new or modified items from a feed was reconciling it with existing items locally and then indicating what was actually a new item to be read. Not rocket science, sure, but it struck me as inefficient nonetheless.

Brad posted about this earlier and then Simon followed on and made the comparison to NNTP. Doing RSS incrementally was exactly how it seemed like it should be done. It wouldn't be that much more work for the folks writing aggregators or blog servers, or so it seemed. The most likely problem is that something like this would never really get adopted, there's probably too much server-side software out there generating static files to serve up.

Like I said, I can't ever see this happening in practical terms, but if I ruled the world for a day I figure it might work like the following:

Incremental Feeds To support incremental RSS feeds an aggregator would include a valid If-Modified-Since when it issues its GET request for the channel's feed. Many/most polite aggregators already do so.

GET /rss/rss2.xml HTTP/1.1 If-Modified-Since: Wed, 14 May 2003 05:00:00 GMT Connection: Keep-Alive Host: www.xl8.net
On the server side, weblog applications that dynamically generate RSS feeds could support incremental RSS by checking for the existence of an If-Modified-Since header and doing the following. This would generally only apply to weblog servers that return dyamic feeds, sites utilizing static files as their feed source would use normal 200/304 handling (but would not be incrementally serving RSS).
  • If the header is not present, the default RSS feed containing N number of static items is returned to the client.
  • If the header is present, retrieve, parse and adjust for local time as needed. The server's RSS response should only include those post items that were either modified or created since the datetime indicated by If-Modified-Since.
  • If no items meet the created-or-modified-since criteria, a standard 304 response is returned to the client along with a valid Last-Modified header indicating the point in time when a post was last created or modified. The client application could optionally update its internal value used for If-Modified-Since to reflect the the Last-Modified header returned with the 304 response, but this would not have any effect on future transactions (but it could potentially be used for feed meta-information on the client side).
  • If the number of items meeting the created-or-modified-since criteria above exceeds the default number of static items for the site's feed, the default RSS feed containing N number of static items is returned to the client. This should be the response in all exception cases.
Issues
  • Most servers are serving feeds statically in which case there is no real downside, but upside gains would be limited without server-side adoption.
  • Overall this type of solution would be more efficient in terms of bandwidth but require more server-side processing. In real-world terms, how does this balance out, do most people have more horsepower than bandwidth/transfer?
  • What is the impact on server caching, if any?
  • I'm not sure if this would be more RESTful as POST since the response will change based upon the header value passed. It is straight retrieval and RFC 2616 excludes "error or expiration issues" from as a criteria for being idempotent. This smells like an expiration issue, but maybe in HTTP terms that really doesn't hold up.
  • A similar cycle could probably happen with etags, but a pretty good standard header is already available to work within, I'm not enough of an HTTP wonk to see what else toying with If-Not-Modified via the aggregator might blow up
I suppose the real answer is that aggregator-server interaction is probably too big a vessel to turn in a different direction, but it's nice to think about.

Update: Additional discussion about etags and proxy caches in Simon's comments.

Justin's post reminded me that Brad mentioned push originally. Maybe it's just bad memories of Pointcast or Marimba, push still seems flawed to me for distributed publishing. What's the upside of having servers pushing feeds out automatically vs. clients polling for new feeds. Maybe approximating federation like NNTP or listservs would be a middle ground, but for whatever reason a pull paradigm just seems to make good sense to me with the current state of affairs. The underlying economic question is will bandwidth costs be inversely related to the growth of distributed publishing. My guess is no, and if that's the case my expectation is that the cost of serving a feed will be come more relevant with the passage of time.

Published Wednesday, May 14, 2003 2:51 AM by grant
Filed Under:

Comments

No Comments
Anonymous comments are disabled