Four Models for Aggregating and Publishing RSS Headlines

by Ray_Matthews on March 04, 2003 at 11:26 PM

The State of Utah is reviewing options for creating, aggregating, and publishing news from state agencies. The decision of which technology to use to create RSS feeds can be made independent to the decision regarding a technology for aggregating and publishing (parsing) the feeds. I'll address the later first and write about the creation/CMS end tomorrow. There seems to me to be at least four models for aggregating and publishing RSS headlines. This lengthy article describes these four models with examples of each.....

Publishing Aggregated News on a Single Topic from Multiple Sources

The American Homeowners Resource Center is a great example of a content managed site that provides resources and collaborative tools for getting things done. It has a News page that presents a newspaper-like presentation of headlines linked to full stories. It also streams a channel of "Breaking News" in a side-column on every page. Its news is limited to the topic of the site and appears to be selected and submitted manually and then reviewed for inclusion by the site's editors. The success of this depends to a great degree on the care given to selecting or filtering headlines for inclusion. Moreover is one such successful aggregator that has for several years created hundreds topical feeds aggregated from numerous business and news sources.

Aggregating and Presenting News in Multiple Channels

The next step beyond presenting a single feed is presenting multiple channels, usually in a predefined number of headlines per channel. The Novell Portal Server provides this kind of RSS support, allowing the user to select a number channels to display. This is usually effective only for monitoring a small set of channels that are usually related in some way.

Dave Winer's Weblogs at Harvard is an example of this kind of aggregation using the Radio Userland News Aggregator. Conceivably, a government agency could present a few feeds of particular interest to their visitors. Add a lot of feeds, though, and you can see from this example how this kind of aggregation can easily get out of control. I don't think this approach will go anywhere considering the sophistication now built into news readers such as NetNewsWire, NewsMonster, and NewZCrawler.

Aggregating and Filtering News in Pre-selected Categories and Channels

I have two examples to illustrate this. The first is Network World Fusion.

This site, created by Adam Gaffin with Percussion Rhythmyx content management software pulls XML content stored in database and presents it using XSL style sheets. Feeds are created with Movable Type and the data , as we speak, is being moved from two Oracle databases into mySQL. News is aggregated and filtered in categories related to networking from three publishers (Network World, Network World Fusion and the IDG News Service) and results are presented in four different views.

The Daily News view presents all of the day's "top" Enterprise Network News aggregated from their dozen or so feeds. The front page has headlines, links, and descriptions for the first five headlines with a link to the full article, where following the article, they stream the "breaking news" for the category. The top page then flows just the headlines for 20 or so more articles of lesser note. In a similar fashion, they provide aggregations of Remote Networking News and Service Provider Network news. They call each of these three displays "channels" and articles are associated with one or more channel through the use of a channel metatag.

News by Vendor presents hundreds of articles from the category of vendor news sortable either by relevance or date. They make a practice of naming companies in headlines to allow you to find news about specific companies. The list provides the headlines, link to full article, date, url, and the first 100 characters from the article.

Newsletter Archives is an index page that provides access to lists or newsletters similar to the News by Vendor for about 25 other categories.

This Week in Print is an online version of a sectional print publication complete with frontpage news, reviews, columns, news of lesser note, and links to past issues. It displays the headline, link, and description for each article in the issue.

My second interesting example of aggregation and filtering is Julian Bond.s and Glenn Watkins' Ecademy: The E-Business Network. It is completely Open Source using PHP, the Drupal content management system, and MySQL.

Latest News are headlines that link to articles arranged chronologically, newest to oldest. News by Source are parsed headlines for each of more than 100 feeds, not too unlike Dave Winer's Harvard Aggregator. This view shows channel title, channel description, a link to a page that parses each channel, and a link to the RSS feed for each channel.

News by Topic are aggregated "bundles" for eight topic categories. In Drupal you have feeds and bundles. �Feeds define news sources and bundles categories syndicated content by source, topic or any other heuristic. �Bundles provide a generalized way of creating composite feeds. �They allow you, for example, to combine various business related feeds into one bundle called "Business". You can have any number of government agencies providing news feeds. �

You can add a feed by clicking the "add feed" link on the import administration pages. Give the feed a name, supply the URI and a comma-separated list of attributes that you want to associate the feed with. �The update interval defines how often Drupal should go out to try and grab fresh content. �The expiration time defines how long syndicated content is kept in the database. �So set the update and expiration time and save your settings. �You have just defined your first feed. If you have more feeds repeat as necessary. To verify whether your feed works, press "update items" at the overview page. The number of news items that have been successfully fetched, should then become visible in the third column of the feed overview.

Now you have to define some bundles. Bundles look for feeds that contain one of the keywords associated with the bundle and display those feeds together. To define a bundle you have to give it a name and a comma-separated list of keywords just like in the case for feeds. Your newly created bundle will now show up in the list of blocks that you can see at the block related administration pages. There you can customize where and when your bundles will be displayed. Julian Bond's Voidstar is another site that uses this same Drupal News Aggregator module.

Aggregating, Filtering, and Customizing News by User Defined Queries

Going back to Network World Fusion, if you look a little deeper, you'll discover that you can use their metatag "tuned" Verity Ultraseek (formerly Inktomi Enterprise Search) site search engine to create RSS channels for search terms (topics, companies, author, etc) of your choosing.

To do this you have to do a bit of a hack. You take their search's query string, i.e.

and replace KEYWORD in this string with the keyword(s) or phrase, in lowercase, that you want to use to create your custom RSS feed.

The keywords should match those that the publisher has put into the article's "keywords" metatag. So for example, if you wanted a channel to keep you informed by Adam Gaffin's articles about webbloging, you could replace KEYWORD with gaffin+webblogs.

The first step, though, is to enter Ultraseek queries +keywords:webblogs and +author:gaffin to see if these term actually shows up in the database as a metatag. They do, so we're in luck.

We make the substitutions and now have:

Load this into your browser's address field and it loads the channel producing a page, in this case, with four results. You can now cut and paste it into a RSS parser and clean it up a bit by adding a channel title and description.

If you want to, you can drop out the metatags (i.e. keywords%3A and author%3A) so that it will search the full text of the articles for terms.

For more magical tips about creating channels using metatags in query strings see Adam's article Do-It-Yourself RSS Feed. Since all State of Utah agencies put metatags such as author and keywords into their pages (don't they), and since we now have a search engine that can pass terms in query strings, we may be able to do something similar to create custom RSS feeds.

Something to consider is that there are several services that now construct RSS feeds directly from search queries. This includes Chris Ridings' Fresh Search and Julian Bond's free php script, GNews2RSS for creating an instant clipping service by performing a Google News search and turning the results into an RSS feed.



Your readers might like to try Wildgrape NewsDesk, my free RSS reader for Windows:


Posted by: David at March 17, 2003 03:04 AM