Day 8 of 60: Sendmail queues

The time has come to start adding DTrace functionality to Sendmail. Of course, there's no point in just diving in and adding code left, right, and centre, so over the last couple of days I've been thinking about what I should be instrumenting first.



One of the issues that I see with Sendmail when I use it in production is in visibility and management of the Sendmail queues. If you're not familiar with how Sendmail manages queues of mail, here's a quick primer (that skips over some things, so for complete chapter and verse consult the Sendmail documentation).

Sendmail Queue Primer



Sendmail stores mail in one or more queue groups. There's always at least one queue group (the default, called 'mqueue'), but there may be others. Each message can be in only one queue group, and Sendmail can be configured to choose the queue group for a message based on the message's characteristics (which domain is it from, which domain is it going to, and so on).

Each queue group has one or more queue directories associated with it. If you are running a busy site with lots of mail in the queues then you would normally assign multiple directories to a queue (probably on different disks) to try and spread the IO load, and avoid issues that some filesystems have with directories that contain large files.

Each message in the queue has a unique identifier, or Queue ID, and is represented by at least two files. There is a 'qf' file (named 'qf<QueueID>') which contains meta information about the message (where it's going, who it's from, the delivery priority, and so on, the message headers), and a 'df' file (named 'df<QueueID>') which contains the body of the message.

There may be other files related to a queued message. For example, a message that has been quarantined has an 'hf' file instead of a 'qf' file. You might see a 'tf' file, which is a temporary 'qf' that Sendmail is in the process of writing, or an 'xf' file, which contains a transcript of everything that's happened to a queued message during a session.

See the Sendmail documentation for a complete list of queue file types.

To spread the IO load even further you can create subdirectories of each queue directory called 'qf', 'df', and 'xf'. If these exist then Sendmail will store the appropriate file in the appropriate subdirectory. So you might choose to put the 'qf' and 'df' files on real storage, and put the 'xf' files on a swap backed filesystem.

Queue Processing



In an ideal world you'd never have any queued mail -- remote systems would always be up and ready to accept mail. However, that's rarely the case, and some fraction of messages are always going to need to be queued, and be scheduled for repeat delivery attempts.

Attempting to redeliver the messages in a mail queue is referred to as processing, or running the queue, and the processes that do this are normally referred to as queue runners.

Sendmail can process the queue in one of two ways.

Normal queue runners



A normal queue runner is a process that's spawned after a certain period of time (say, every five minutes). The queue runner collects information about the entire queue, sorts the messages according to an admin-defined order, and then starts attempting to deliver the messages. The runner has finished processing all the messages in its list it exits.

Persistent queue runners



As their name implies, persistent queue runners do not die. Instead, a single process gathers the data from the queue. Each persistent queue runner can then share that data. This can significantly reduce IO load, and means that you are not repeatedly forking new processes to handle the queue.

The disadvantage is that the queue runners for a group do not finish until all the queue runners for that group have finished. Suppose you have a queue with a stubborn message that will not be delivered, and where the remote system is slow to respond.

If you have ten persistent queue runners processing that queue, nine of them may have finished processing all their messages. However, the tenth is stuck on the slow message, and the other nine can not continue until the tenth has finished. This may substantially delay the start of the next queue run.

Queue runner sort order



Regardless of the type of queue runner you choose you also need to decide the order in which messages in the queue are going to be processed.

Your options are:


  1. Host - hostname of the first recipient

  2. Filename - name of the queue file

  3. Time - the time the message was first queued

  4. Random - random order
  5. Modification - modification time of the qf file, older files first

  6. Priority - message priority, a value calculated by Sendmail based on message size, number of delivery attempts, etc

  7. None - don't sort the messages



In terms of load, Host and Priority have the most overhead, as the qf files need to opened and parsed before ordering can begin.

Time and Modification take fewer resources. The queue runners need to stat() each file rather than parsing the contents.

Filename and Random have even less overhead, as just the directories need to be read.

Finally, None has the least overhead.

Queue Runner Mechanics



So, bearing all this in mind, here's an overview of what a queue runner has to do when it starts.


  • If it's a normal queue runner it has to (depending on the queue sort order) scan through the whole queue to gather the information it needs to sort the queue. If it's a persistent queue runner then the master process has already done this.


  • Sort the queue as necessary.


  • Process the messages in sorted order. As it processes messages it locks them, so that another queue runner can't start processing the same message. This means that it also has to skip any queued messages that another queue runner has locked.



Processing a message involves trying to connect to the remote site to deliver the message. This can involve another Sendmail feature, the host cache. The host cache is simply a shared area that queue runners use to store information about the last connection they made to a remote site. If the connection failed (perhaps it timed out) other queue runners can use this information to determine whether to connect. For example, you can configure queue runners not to try and reconnect to a remote site that didn't respond within the last 10 minutes.

A Production Example



So lets consider a production system with one queue group (the default). The root directory for that queue group has been configured to be /var/spool/mqueue, and 5 queue directories have been created. The directory structure is going to look like this:

/var/spool/mqueue/qdir00
/var/spool/mqueue/qdir01
/var/spool/mqueue/qdir02
/var/spool/mqueue/qdir03
/var/spool/mqueue/qdir04


Suppose that you're responsible for this system, and you see that queue processing is too slow. Your options are (broadly):


  1. Move the qdir* directories to different disks, and either mount them under /var/spool/mqueue or use symlinks.


  2. Create additional directories under /var/spool/mqueue (qdir05, qdir06, etc).


  3. Create qf/, df/, xf subdirectories.


  4. Create additional queue groups based on the messages that you're sending, and segregate the messages that way.


  5. Switch from normal to persistent queue runners (or vice-versa).


  6. If using normal queue runners, decrease the amount of time between queue runs -- for example, if you're starting a new queue runner every ten minutes, try starting a new queue runner every five minutes.


  7. If using persistent queue runners, try increasing the number of persistent queue runners. For example, if you're running ten persistent queue runners, each one is going to have to process 10% of the mail queue. If one of them gets stuck on a slow message then potentially 10% (minus a little bit) of the queue is going to be delayed. If you start twenty persistent queue runners and one of them gets stuck then only 5% (minus a little bit) of the queue is going to be delayed.


  8. Try a different queue sort order.


  9. Make better use of the host cache -- for example, instead of ignoring unresponsive sites for ten minutes, ignore them for thirty minutes instead. This might delay messages, but it also means that you're not using resources connecting to sites you can't deliver messages to.



All of these approaches are valid. The problem is that Sendmail doesn't provide you with any tools to easily determine which is the most appropriate change to make -- you have to use whatever facilities the OS provides to determine what the problem is.


  • Are you IO bound, and need to split queues across disks?


  • Are the directories too large, and you're running in to filesystem limitations?


  • Is your queue sort order effective? How long are queue runners taking to sort the queue?


  • Is the host cache being used effectively?



You can extract (or infer) some of this information from the Sendmail logs, but it's a tedious process. Ramping up the Sendmail debug level can also determine some of this, but it tends to produce a lot of data, and it's not necessarily something you want to do on a production server.

Which is where DTrace comes in, and allows me to set some goals for the next phase of this work.

So, I've decided that the milestones for the first phase of this work will be as follows.


  • Define a simple Sendmail DTrace provider.



This lets me confirm that I can build Sendmail with DTrace enabled, and that I've got all the build infrastructure in place.

Then I need to define DTrace providers for the following Sendmail events.


  • Queuing a message starts.


  • Queuing a message ends.


  • A queue runner starts reading the queue.


  • A queue runner finishes reading the queue.


  • A queue runner starts sorting the queue.


  • A queue runner finishes sorting the queue.


  • A queue runner starts processing a message.


  • A queue runner locks a message.


  • A queue runner discovers that a message is locked.


  • A queue runner starts to query the host cache.


  • A queue runner finishes querying the host cache.


  • A queue runner starts to update the host cache.


  • A queue runner finishes updating the host cache.



As I provide these I can write D scripts that note the time that these events happened, and hopefully generate useful charts that show how long each process is taking, and identify areas of operation that need adjusting.

With some specific goals in mind I can start looking at the code in more detail, and that's going to take place over the next few days.

4 comments:

  1. Byran, thanks for the comments.

    I'll be posting something to the DTrace discussion groups (and places like comp.mail.sendmail) when I've got something a little more concrete to show off. At the moment there's no actual code :-) Hopefully that should be in a day or two.

    I know that it's not strictly necessary to add these probes to Sendmail to get some (considerable) value from DTrace. But the stock providers that come with DTrace provide a vocabulary that's at quite a low level. There's nothing wrong with that, of course, but it's going to be interesting to see how much extra value there is in providing an additional application, or problem-domain, specific vocabulary.

    ReplyDelete
  2. [...] Following Monday&#8217;s info dump about queues, I&#8217;ve spent some time over the last few days reading the DTrace documentation in detail. In particular, the Solaris Dynamic Tracing Guide. This is the DTrace handbook, with a great deal of information about how to use DTrace. [...]

    ReplyDelete
  3. [...] Note: If you&#8217;re not familiar with sendmail queues, the sendmail queue primer I wrote might be useful. [...]

    ReplyDelete