Mastering RSS: Control your Inputs, and Improve your Outputs

February 23, 2009 by: Rick Martin

Contents

Tools Mentioned

  • -
  • Netvibes, Delicious, Twitter Search
  • Dapper, Page2RSS, Versionista
  • Xfruits, Yahoo Pipes
  • Feed Sifter, Feed Rinse, Filter my RSS
  • Google Reader, Twitterfeed, Feedburner
At Tokyo Barcamp, I gave a short presentation about the content in this post. I hope I can edit this video a little more, and maybe throw in some annotations too. Cheers to Brian Lockwood for filming it. The presentation can be found here: http://prezi.com/70546/ if you want a clearer look at what’s happening on the screen behind me.

Filtering your Internet from YISIT Department on Vimeo.

Too many inputsPeople often agonize over their productivity. Whether it’s personal or work related, I think that approaching productivity problems only by slaving over your outputs is a mistake.  If you consider a six sigma approach (bear with me for a second) to personal productivity, it makes an surprising amount of sense. For those who aren’t familiar with it, six sigma is a system created by Motorola to improve quality in manufacturing.. While you might find the details rather boring, the gist of Six sigma entails eliminating defects in products by adjusting all the inputs during production. This is somewhat obvious when thinking about manufacturing, but the same principle can be used to think about personal productivity. Naturally, your output is a direct result of all your varied inputs. So why not try adjusting your inputs for a change? If broadly applied, this might not be a bad rule to live by. Examine each activity you do each day, and figure out how it contributes to your overall output and personal growth.

For geeks like me, if you spend significant time on the internet, you have to be especially careful about your inputs.  For me, browsing the internet was always a massive time drain, because it’s so easy to lose focus on what you were looking for if you find, for example, a page full of LOLcats. Over the past year or so, I’ve been becoming more and more fascinated with RSS feeds. Feeds allow you to to quit browsing the internet, and instead suck in only the information you want to read about. I’ve been especially interested in how these feeds can they be created, manipulated, mashed, spliced, filtered, and eventually consumed. There are so many possibilities it makes a vein pop somewhere on the back of my head. I’d like to share a little of the information I’ve learned because I think it’s amazingly practical technology, but yet obscure enough that not too many people are dabbling with it yet. With very little effort you can be a far more efficient consumer and producer of internet information. Whether you implement these tools in your personal life or in your work life, the benefits should be enormous.

Consuming Data

Anyone who has been keeping tabs on my 2JPN.com Japan Information site will know that the concept for the site sprung out of my need to find a job in Japan. While that mission has not been accomplished (I have some freelance gigs, but nothing big yet), I have made the task significantly easier by aggregating all the latest Japan job listings from all over the internet, and bringing them into a single browsable page. In the beginning, I did this by building a Netvibes page and subscribing to as many job feeds as possible.  Netvibes allows you to build customizable homepages, and it’s a great service to use as an RSS reader as well. I use Google Reader as well, but for absolutely essential content that I need to read daily (i.e. job listings, japanese study podcasts, or world news) I set up a Netvibes tab. I can layout my content across three or four columns, and it’s much easier to skim.  It’s very fast and efficient data consumtion. All the benefits of a morning newspaper, but customized to your specific needs. Awesome, hey?

Most websites have RSS feeds these days, but there are a few sites that put out vast amounts of feeds that you can pick through for your daily needs. Delicious online bookmarks provides amazingly useful RSS tags. If you need to follow the latest information on Windows 7 for example, just go to the http://delicious.com/tag/windows7 page, and grab the RSS link from the bottom of the page. This will provide you with every single page that anyone bookmarks with the label “windows7.” For very bigger search terms like “apple,” you might want to try just the most popular bookmarks using the following link structure: http://delicious.com/popular/apple. Of course, you could use delicious’s search function, but it doesn’t provide any RSS feed at the bottom like we have here.

Update: I just discovered that you can do even more by inputting a url with two tags like http://www.delicious.com/tag/apple+tablet

Delicious’ url search provides a great way to for businesses to keep tabs on what customers are saying about your brand. Have a look at this Dell search result, and what their customer are saying in the notes. There is an RSS feed on the bottom, so if you own a website this is a great tool to monitor when and how many times it is being bookmarked.

The other big source that I look to is twitter. If you try Twitter’s search page, you’ll find that search results also have an RSS feed that you can subscribe to. This feature is particularly useful for gathering breaking news. Media types and journalists should be all over this, because it allows you to track all the important links and information about any breaking news event. You don’t miss anything. I’ve used this to build my 2JPN Earthquake page. In the event of an earthquake in Tokyo, my page will aggregate all the latest tweets that mention the words “tokyo” and “earthquake.” Hashtags are ok too, but I find that the twitter search RSS feed is far more reliable.

Scraping Data when there’s no Feed

Collecting a few feeds from some of the major job sites was not difficult. But there were some excellent job sites out there that for some reason or another did not have RSS feeds yet. And I’ll be damned if I’m going to click over to their site and browse through their job listings everyday. I mean, if there are 10 sites of this kind, how much time will I waste?

This is where data scraping (sometimes called web scraping) comes in. There are many ways to scrape the data that you want from a website, and repackage it for consumption in a more convenient form. My tool of choice here is a web service called Dapper. All you do is take the URL of the page that you want to read, and specify which parts of the page you want to scrape. So typically any job listing page will have a html header tag (maybe h2, h3, or h4) for it’s job listings. Once you click on one of them in the Dapper factory window, Dapper will try to find the rest for you. Once it find them all, you can finalize that “Dapp” and have it output an RSS feed (among other outputs).

Dapper works best for pages have a steady flow of information but no RSS feed. Another service that you could use as well is Page2RSS. It’s a great service, but it doesn’t have the same accuracy that Dapper does. It monitors any updates on a given page, and lets you know when something has been added.  Versionista is another one to watch, as it monitors a webpage not only for updates, but for revisions as well.

Believe it or not, you can also use Google Spreadsheets as well to gather data from pages. There is an excellent tutorial on ouseful about how to scrape data from Wikipedia tables, and import it into a google spreadsheet where you can tweak it a little. In the example, they showed how to scrape data from a table of England’s most populous cities, input it into Google spreadsheets and (after some modifications) output it as a google map.  While the example used wikipedia tables, there’s no reason why you cannot do the same with any table of data anywhere on the net. And similarly there’s no reason why you can’t upload a regular spreadsheet of data to google docs for modification.

Update: Feed43.com is also very useful for creating feeds. It’s not quite as easy as dapper.net but for someone who wants a little more control over the data you want to suck in, this might be the ticket. (h/t to Julien Cayzac for this one)

Combining Feeds

There are a few services that you can use to combine RSS feeds. The most well-known of these is Yahoo Pipes. While I’m using pipes more and more of late, the service that I relied on quite heavily in the past was Xfruits. It will allow you to combine many RSS feeds into one big super-feed. This is especially for anyone involved in group projects. For example, you could splice together the RSS feeds of blogs of a common theme to make a blog network. They could all display the feed output in a sidebar widget, all promoting each other, and all benefitting from it. Similarly, you could create a themed twitter group by taking RSS feeds of many twitter users and splicing them together into one. The output is up to you: web page, email news letter, whatever suits your fancy… Yahoo Pipes is a great way to combine feeds as well, and is probably the best way to go if you’re not a beginner.

Applying Filters to Feeds

After you have your feeds ready to go, you can refine things even  further by filtering out any information that you don’t want. I find that the simplest way to apply a filter is using FeedSifter.com. Their interface is dummy-proof, but sometimes the feeds get broken. If that ever happens, head on over to Yahoo Pipes you can apply numerous filters to any RSS feed. Use the Fetch Feed Module, and then choose Filter under the Operators menu. You can also use the Unique module to filter out duplicate results.  FilterMyRSS and Feed Rinse are useful feed filtering tools that you may want to explore as well.

Responsible and Valuable Output

Before I go any further, I should emphasize that with this great power comes great responsibility (h/t to my Uncle Ben for that bit of wisdom).  Many evil-doers out there have taken to aggregating RSS feeds, and publishing other people’s content in full, and unattributed on their own websites. This is not cool people, so don’t do it. Google will see right through you. Whatever your output is, you need to contribute some value to your readers. Now, if you can use RSS feeds to do this, then by all means, give it a try. But be responsible about it. Let me outline a few of the ways that I do it.

Shared Google Reader Bookmarks: When I read news in Google Reader, I share the articles that I really like by clicking “share” on the bottom of each article. I then take the RSS feed of my shared items, and I run it through TwitterFeed to my twitter account. I add the restriction that it cannot post more than one link at a time, because I’ve seen people misuse this feature before — and it’s really annoying when someone posts 5 links all in a row. It’s overwhelming, and I don’t want to do that to my followers. Again, the key is to share something of value to your readers/followers. Note: you could also share your favorite links in this way using delicious bookmarks or netvibes.

Feed Blog Posts to Twitter: Again, I use twitterfeed for this. But before sending my blog feeds to twitter, I combine them using Xfruits to make one catch-all feed for all the blogs I contribute to. But then just to clean it up a bit I send it through feed burner as well, which then allows me to create an animated email signature, email subscriptions, or even add google adsense if I wanted to.

Other Useful Resources

Change Tracker: Track changes made to a website using Versonista and Yahoo Pipes.
Keyword CSV Files and searching: a 2 minute yahoo pipes video demo
Notify.me: From the guys who brought us Page2RSS, comes another service for sending you automatic updates. Try the IM feature, it’s pretty good.

Support 1Rick.com by Sharing:
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • Posterous
  • Reddit
  • RSS
  • StumbleUpon
  • Tumblr
  • Twitter
  • email
Filed under: tools
Tags: ,

Comments

5 Responses to “Mastering RSS: Control your Inputs, and Improve your Outputs”
Subscribe to Comments via RSS
  1. fred says:

    As I’m not particularly knowledgeable about any of this it will take me some time to fully digest and understand the wealth of info and detailed advice in this article. Thanks for laying out so much information so clearly, I know I’ll be referring back to this a lot.

  2. Andrew Coey says:

    This is insightful and indicative of the times ahead which will require effort and study to reach the level you have, thanks very much.

  3. Alex says:

    Dapper looks interesting, where did you hear about it?!

    Totally forgot you were blogging here.

  4. John says:

    Getting a control on RSS is necessary. Good review!

Leave a Reply

Ads: 顔 脱毛 電話占い フランチャイズ 福祉車両 イギリス 留学 結婚祝い 個別指導 化粧品 資産運用 留学 墨田区 中古マンション ホテル 菊陽 フィットネスクラブ 相模原市 不動産 広島 専門学校