Queuing Mechanisms

Even though I read it cover to cover, I regularly reread chapters of Dave and Andy's book The Pragmatic Programmer. They state many tips, amongst them:

Temporal Coupling
Minimize Coupling Between Modules
Analyze Workflow to Improve Concurrency

This made me think about a project I worked on, where we extensively used a queuing mechanism. With that knowledge, I thought, what would I advise someone else on this subject?

Background

Basically the project consisted of a database with XML files and stylesheets. In the database, a workflow would run, putting work on a message queue, and the application would listen to this queue so it knew what to do. This work mostly consisted of transforming the XML files, package them in TAR/GZIP or plain ZIP format, and then push them to the customer using FTP. A relatively small front-end let the users create orders for customers, as well as see what's in the database.

The queuing was done using a built-in feature of the particular database that we used. The principle is really simple: you put messages on a queue, with the other side taking them them off and processing them. There are several advantages to this:

Stability: this type of communication happens asynchronously. One of the sides doesn't have to be running, so it's possible to upgrade parts of the installation or the application itself.

Coupling: if your environment consists of several platforms (Windows, Unix, Java, Oracle, maybe some IBM stuff) and your queuing mechanism has interfaces for each, then this is a way of stringing applications together.

Scaling: the producer puts stuff on a queue and doesn't care who does the dequeuing. So it's easy to scale up the number of processes that dequeue messages and execute the task that's detailed in the message. Also, the producer doesn't have to wait for the consumers to finish, it can go on and add tasks to the queue (up to a certain point, of course).

Priority: it's possible to give certain urgent messages a higher priority by putting them ahead in the queue.

Throttling: Alternatively, there could be messages that contain a task that needs heavy batch-like processing in order to complete. In that case, they can be put last in the queue.

There are more advantages, but I have experience with the above.

I've mentioned the advantages, but you've probably guessed that there are a few gotchas with this way of working, and I advise you think about those before you select, design for and use a similar mechanism.

Training

Since most people will have something to do with these parts, it's a good idea to take an hour or two and have one developer explain the basics of the solution to the rest of the development team. It'll save time later on.

Error handling

When modules are decoupled in this way, you can't just throw an exception or report -1; you'll have to think about a mechanism that reports errors. Of course, you can't immediately report an error after the user pounds the button: there's a waiting line between the pressing of the button and the occurrence of the error. So there has to be a way to bring the errors to the user its attention, like a special screen that is shown regularly or upon logging in. Besides warning the users, you might want to warn administrators when a user has retried a message several times but the task still didn't finish successfully.

Extra checking

You have no compiler support. The compiler can't check whether you spelled the name and contents of the message correctly; therefore use as many constants as possible and see if you can build a little layer API to make the creation of a message "type safe". The compiler can then check that the queue is used correctly.

Queue management

You have to have management screens or good logging that shows what is in the queue, where you can delete it, change it, or add test messages. It might also come in handy when you can move messages from one queue to another. It's extremely useful in the early stages of development to see what is going on in the queue, because you can't use the traditional tools like a debugger or a few log lines to see what goes wrong.

Transactions

Think about how your application handles transactions; the dequeuing probably shouldn't be part of the transaction. When your application takes something off the queue, fails to do whatever task is in the message and puts it back on the queue, then the whole thing starts all over again.

Timing

The solution you will be using, might need a timing mechanism for dequeueing. For some things, it must be possible that the task is retried at a later time. Suppose your application must:

Contact a remote server for a file transfer
Run an exceptionally heavy batch job
Notify a user

It's entirely possible that the time of the dequeue isn't the most appropriate time to process the message. The remote server could be down, the application could be in heavy usage or the user is having lunch with his boss. Your solution must then be able to determine an appropriate time for retrying and put the message back on the queue, passing the calculated time along.