Monitoring Superfeedr

At Superfeedr, they have built a cost-effective feed delivering infrastructure, it’s real-time delivering. As a subscriber, you don’t need to constantly ping tons of topics anymore, you just subscribe via Superfeedr, and you’ll get universal atom notifications in real-time (ok, it’s nearly real-time); and as a publisher, your cloud server will get less and less dummy requests, allocate these saved computing resources to serve more users instead, this will be achieved by establishing a hub at Superfeedr, and redirect your subscribers to this new hub.

In the last couple of months, I’ve built an external monitering application for Superfeedr, that will constantly simulate requests, and measure Superfeedr’s performance and reliablity from outside. It’s a good and enjoyable experience to work with various subscribe protocals, APIs, and powerfull Ruby libraries.

Overview

Superfeedr supports both PubSub (XMPP 0060) and PubSubHubbub (PuSH as Josh Fraser has coincided) protocals, and it delivers standard atom notifications, regardless of original feed format. We need simulate requests for both protocols, and measure min, max, average, median time in an interval (hourly), as well as failure rate.

Our monitoring application has two main components: a Worker that will constantly simulate requests, from PubSub subscribe, to PuSH ping; a WebServer that will act as a PuSH callback server to process verify requests from Superfeedr, parse atom notifications, as will as providing an visualization interface.

Simulate Requests

The worker is evented in nature, it periodically sends requests, handles responses when requests get returned, and it’s working in a non-blocking thread. That’s what EventMachine built for.

Our worker is simply an EventMachine run loop, which constantly simulating request, saves the request record into datastore. When it gets response, it will fetch the request record, compute elapsed time, update record with the elapsed time and a status which indicates fail or succees of the request.

Only with Blather, a great evented XMPP Ruby library, we could built pubsub module with ease, though we have only used a subset (pubsub) of Blather. A stream connection is established, when the worker gets initialized. Then we stream xml stanzas over the connection, when received response, we parse it with the excellent xml parse library Nokogiri. Working with PubSub is significantly easier than I had expected, I’d say.

But monitoring the PubSubHubbub has proven much more work. We chose em-http-request to asynchronously send http request, and deal with different status codes as requests returns from Superfeedr server. That’s the case for sync mode of PuSH. For async mode of PuSH, we have WebServer to handle requests from Superfeedr.

And there’s full ping cycle. We publish new entries to subscribed topics, then wait for Superfeedr propagating notifications to us, we measure the elapsed time at last, to test Superfeedr’s real-time delivering ablity. When we simulate a full ping command, we generate a request log record with a unique id, then build a new entry with this id, publish the entry to the subscribed topic. When we get atom notifications from Superfeedr, we imediately parse for the id, query database record by this id, update it with the computed elapsed time, put it back to database then.

The WebServer

Originally, we have had only a simple EventMachine based http server for handling Superfeedr’s verify requests as well as atom notifications. But when I began to work on a web interface for visualizing statistics, I realized a Sinatra module might replace the http server very well.

Now we have only Sinatra to serv http requests. It’s responsible for verifying subscribe or unsubscribe attempts, parsing atom notifications for ping command, as well as providing a google visualization interface to backend statistics, which makes on-demand stats review easier.

Computing Stats

We have MongoDB as the datastore, which offers built-in map/reduce. In Worker’s running loop, map/reduce are scheduled hourly, to produce required statistics.

To produce hourly stats, I’ve employed a small trick - emitting hourly-timestamped key with the request type. The process will map on all recorded requests that have completed, reduce them to a collection of hourly timestamped and typed docs, with computed min, max, median, average, failure rate. The whole collection will be updated every hour with newly computed stats.

With google’s visualization, we dynamically download typed stats to local browser, and update visualization with the new data.

Conclusion

The most difficult part of working with evented architecture is debugging. Unexpected status codes, malformed xml format, and there’s even a Nokogiri xpath issue that took me days to work it out. It’s always not the solution you try to find that took most of your time, but trials to figure out where’s the cause that do.

When there’s any thing going wrong, you’ll need setup a whole stack of services to emit different set of events, so you could figure out the cause, and that may take you struggling days. curl, hurl, and postbin are your friends, as well.

The two main Ruby libraries, i.e. EventMachine and Sinatra, as well as MongoDB have reduced the complexity in a magnitude, fortunately. I’ve really enjoyed working with these beautifully crafted software.

The application is already running for a couple of weeks now, and shows a pretty good uptime and stablities. Thankfully to the great Ruby community!