Tuesday, May 15, 2007

Notes: Beyond Blogging: Feeds In Action Technical Session (TS-6029) at JavaOne 2007

Note: These are the notes which I took at the "Beyond Blogging - Feeds in Action" Technical Session at JavaOne 2007 (TS-6029) hosted by Dave Johnson. I will likely expand it with links and more information as time passes but for now I publish it in it's basic form. Please comment in the usual way.

Introduction
Blogging has taken off over recent years. Thesedays everyone has a blog and more and more applications have a feed exposed. Everything with a time stamped, uniquely identified chunk of data with meta data anyway.
  • News stories
  • Search results
  • Uploaded photos
  • Events and meetings
  • Podcasts and vodcasts
  • Bug reports
  • Wiki changes
  • Source code changes
  • Log files
Meanwhile Web Services (WS) standards have "got uppity" and been evolving towards what some have termed WS-(Death Star)* aka WSDl, UDDI, schema etc al. However most developers didn't follow - developers prefered REST. This is because it is often best just to use XML and HTTP for Web Services. Now ATOM and RSS are merging as a foundation for simple Web Services. e.g. Yahoo Pipes, Lucene publishing using APP and the Eclipse Europe Distributed Build System.

RSS X.X
So what is an RSS feed? It is a XML representation of discreet timestamped chunks of data with metadata available at a fixed URL which can be polled for updates.

Feed Attributes: id, title, link, date, author(s)
Item Attribites: id, title, date, author(s), category, summary, content

History
  • RSS (original) came out of Netscape in 1999. This was v.0.90 and was created by Dan Libby. It supported RDF and was known as "RDF Site Summary".
  • v.0.91. Dave Winer simplified things by removing RDF support (so "RSS" become "Really Simple Syndication"). (Later 0.9x formats are obsolete but still used today) A public spat between Libby and Winer occurs.
  • The RDF fork (RSS 1.0). One camp (Libby et al) wanted to put RDF back in. Winer et al wanted to keep it simple. RSS 1.0 put the RDF back in. This completely different standard provided a small set of elements augmented by RDF, and allowed for extension modules (e.g. Apple's iTunes module which provides metadata for song information.) This caught on as it was adopted by MoveableType, the first big blogging engine. RSS v.1.0 is still widely used today.
[Diagram: RSS 1.0 elements] - RDF, Channel, Items (NOTE: items are not in the channel element, unlike in RSS 0.9.x)
  • The Simple Fork - RSS 0.9.x continues. However Winer kept going with v.0.92, 0.93 and finally 2.0. These later releases added metadata and an element (allowing podcasting!). This was still completely incomaptible with RSS 1.x. However, To carry other kinds of data you could extend the format. This led to "Funky RSS" using extension elements instead of the core RSS. e.g Dublin core (dates with ISO 8104(?) format)
[Diagram: RS 0.9.x elements] - RDF, channel (contains the rest of the elements)]

RSS Limitations:
  • The spec is too loose and unclear - which fields can be escaped HTML?, how many enclosures are allowed per item?
  • The Content model is weak - There is no support fior summary and content and content-type and escaping is not specified
  • The specification is final and cannot be clarified
ATOM
So what is ATOM? It is both a feed format and protocol to edit resources on the WWW. It is developed by IETF as RFC-2487. The protocol will be finalized in 2007.

The Feed format
In ATOM a feed contains entries which are time stamped, unique ID'd chuunks of data (a URL is a valid identifier for an item - think REST). These entry types can be any content type (binary data x/Base64 encoding) so as a result it's generic; its not just for blogging and podcasting (e.g. there I heard elsewhere at JavaOne that there are plans to use it to make server logs files available as feeds). ATOM feeds are longer than their RSS precedents as there are more required elements.

[Diagram: RSS and ATOM family tree]

Parsing and Fetching Feeds
RSS is just XMl so you can use your favourite parsing technique. There are many such as Abdera Apache, Python (universal feed parser), Windows RSS platform (Windows only)).

ROME is the most capable and widely used client and parser avaiable for Java. It is extensible and pluggable and is based on the JDOM parser (which means that if it is a large feed then there is lots of memory usage. This needs to be fixed). ROME is free and OSS under the apache licence. ROME supports 3 object models: RSS, ATOM and a bridging SyndFeed model (which can convert to and from the other two)

Usage
Be nice and conserve bandwith; use HTTP conditional get (or Etags) and keep a local copy. Don't poll too much dependent on the update speed of the feed you are polling. ROME does this for you with it's fetcher.

Producing Feeds with ROME - Tips
  • You can either use your favoutire XML tools
  • or use template languages (JSP)
  • or use ROME direct
Serving Feeds
  • Serve your feed up once its created. For example within a servlet. Make sure you use the correct content type (either application/rss+xml or application/atom+xml as applicable)
  • Cache cache cache! On client side via HTTP conditional GET, on proxy servers using the correct HTTP headers and on server side via your ROME application. E.g. in a servlet use LRUCache cache = new LRUCache(5, 5400); This is an in memory cache for feed items
  • If a feed or item has not been modified return 300 ("not modifed") response to HTTP GETs
  • Set the cache control header to tell proxies to cache for how long
Feed Auto discovery
Make it easy for others to find your feeds. Web browsers and others can simply find your feeds by hittin your page if you add a couple of links to the HTML header; put the content types supported (as above) with the feed title and URL.

Serving Valid Feeds
  • Ensure the HTML content within your feed XML is properly escaped
  • Ensure your XML is well formed
  • Use feedvalidator.org (It's OSS, written in Python and can be run on your desktop/server)

Publishing with ROME PROPONO
Propono is an API which supports publishing by feed publishing protocols.

Some history. first there was the Blogger Feed Publishing Protocol API - It was simple and didn't support all the required metadata. Then came the Metaweblog API which extended the Blogger API by adding RSS metadata. Now we have the Atom Publishing Protocol which again updated and replaced these. This is a REST based protocol.

The Metaweblog API
This allows you to connect to a server and have an API to do some operations to publish your post. e.g. getUserBlogs(), newPost(), etc.

ATOM Publishing protocol (APP)
Based on the ATOM feed format APP can do all Metaweblog API can but in a much more generic way. It allows you to manage collections of entries (which can be any form of data) on the web which can contain any kind of data e.g. CRUD entries. It is based on REST so HTTP verbs can be used for all operations.

APP Introspection
APP allows you to do an authenticating GET against the Endpoint URL which tells you what services are availble on the URI.

An ATOM collection feed
The collection of items returned may not be complete. In this case there will be a next and previous links so you can page through resutls effieicntly The server governs how big the chunks are returned.

Creating an Entry
If you create an ATOM entry and post to the collection URL the server will post back the result to you with the blanks filled in (it controls the namespace so generates URI's etc)

Propono
The Propono APP client library makes it easy to build an APP client. Likewise the APP server library makes it easy to add an APP server. The Blog Client library supports both the MetaWeblog and and APPAPIs.

The Future
  • There is now better RSS/ROME support in Java. Is it time for a JSR?
  • We will see more REST based WS in general. This is made easy by the REST API, Restlets, etc
  • We will see more WS based on ATOM using APP as a canonical REST protocol

More info
Summary
  • RSS and ATOM are not just for blogs
  • ROME has the tools you need to produce and consume RSS feeds

No comments: