Day 33 of 60: Strategies for processing the queue.

Note: If you're not familiar with sendmail queues, the sendmail queue primer I wrote might be useful.

There are two aspects of mail queue management to consider with Sendmail. The first is the process that puts messages in the queue. I've looked at that in some detail already, and written a number of D scripts that should make it easy for you to instrument Sendmail on your production systems so you can decide how best to layout your queue directories for optimal inbound performance.

The flip side of the coin is to try and answer the question How do you maximise delivery from the queue?" This is a more complex question to answer, as the number of variables that you can control that affect this is much larger. Also, there's more variability when delivering mail, as you are at the mercy, to some extent, of each remote site -- how fast they process mail you send them, whether or not they're actually up, how much latency there is between you and them, the speed of DNS lookups, and so on.

So, what can we test?

  • How many queue directories you have (and therefore how many queue runners you have)
  • What the queue interval is set to

  • Whether to run persistent queue runners, and how many to run

  • The maximum number of messages each queue runner processes

  • The queue sort order

  • Whether the host cache is enabled

I'm not going to be able to effectively test some of these in my (current) testing environment. Or rather, I'll be able to test them and instrument them, but the numbers aren't going to be very relevant (even less relevant than the disclaimer makes clear).

To look at them in turn:

How many queue directories you have

I can test this. It will have an effect on the number of children Sendmail forks to process queues, and the number of queued messages that can be processed in parallel.

What the queue interval is set to

The queue interval is the value of the -q parameter given to Sendmail at startup. For example:

# sendmail -bd -q15m

This configures Sendmail to fork a child to process the queue every 15 minutes (irrespective of how many other children are also processing the queue).

I suspect that this is going to be difficult to test in any way that gives meaningful results, because I think the queue is going to be drained by the first queue runner quite rapidly, so second and subsequent ones might never get a chance to start.

However, further testing will prove or disprove that, so I'll keep it in mind for the time being.

Whether to run persistent queue runners, and how many to run

Persistent queue runners may bring a benefit to processing the queue, especially in areas of IO overhead, and time taken to sort the queue. This should definitely be included in the testing.

Note that persistent queue runners can have problems. The Sendmail documentation notes:

Their disadvantage is that a new queue run is only started after all queue runners belonging to a group finished their tasks. In case one of the queue runners tries delivery to a slow recipient site at the end of the queue run, the next queue run may be substantially delayed.

I don't expect this initial round of testing to highlight issues like this (not in my results). However, the testing methodology, if applied at other sites, should show if this has become an issue.

The maximum number of messages each queue runner processes

Nick Christenson's book, Sendmail Performance Tuning, says:

The MaxQueueRunSize parameter is normally not set by default. Once a queue runner sorts the queue, it will atempt to deliver only as many messages as given by the value of this parameter. For busy and clogged queues, messages at the end of a queue that could be delivered may not be, because the first N messages cannot currenly be delivered. Therefore, this parameter rarely proves useful in practice. When it is used, it is likely to be less detrimental when used in conjunction with the "random" queue sorting strategy.

This is not a parameter I've ever used, so I'll be interested to see what the results are. It's also going to be relatively easy to test.

The queue sort order

I expect this is likely to have a big difference. To quote from day 8, there are seven possible orderings.

  1. Host - hostname of the first recipient

  2. Filename - name of the queue file

  3. Time - the time the message was first queued

  4. Random - random order
  5. Modification - modification time of the qf file, older files first

  6. Priority - message priority, a value calculated by Sendmail based on message size, number of delivery attempts, etc

  7. None - don't sort the messages

Since all the test messages have one recipient, and they're all going to the same host I doubt that host ordering will have much effect. But the others all involve differing amounts of IO and CPU consumption to calculate the priority, and I hope that that will show up in testing.

Whether the host cache is enabled

To quote from day 8 again:

The host cache is simply a shared area that queue runners use to store information about the last connection they made to a remote site. If the connection failed (perhaps it timed out) other queue runners can use this information to determine whether to connect. For example, you can configure queue runners not to try and reconnect to a remote site that didn’t respond within the last 10 minutes.

Initially I was thinking that I couldn't really test this, as all the messages are going to be relayed to one other host.

Then it occurred to me that although I couldn't test the benefit of the host cache, I can at least try and instrument the functions that check the host cache, and show how much overhead there is involved in checking the cache. That may (or may not) turn out to be interesting information.

And even if it's not useful in these results the functionality will probably be useful to someone else.

So that gives me a few things to look at -- time to get on and do it.

1 comment:

  1. [...] Previously I’ve written about variables that may affect how rapidly Sendmail can process the mail queue. I’ve now started worknng to gather data on exactly how much influence these variables have. [...]