Day 41 of 60: Multiple queues, multiple queue runners (pt 2)

That's odd.



You may recall that my most recent tests have involved sending 30,000 messages, split over 1, 5, 10, 20, 30, and 40 queue directories respectively.

The Sendmail Operations Guide says:

2.3.1. Queue Groups and Queue Directories

There are one or more mail queues. Each mail queue belongs to a queue group. There is always a default queue group that is called "mqueue" (which is where messages go by default unless otherwise specified). The directory or directories which comprise the default queue group are specified by the QueueDirectory option.

[...]

When a message is placed in a queue group, and the queue group has more than one queue, a queue is selected randomly.


This suggests that a reference to a "queue" means a queue directory within a queue group. Later, the documentation says:

2.3.5. Forcing the queue

Sendmail should run the queue automatically at intervals. When using multiple queues, a separate process will by default be created to run each of the queues unless the queue run is initiated by a user with the verbose flag.


I take this to mean that if you have configured five queue directories then a queue run will create five processes, one for each of the queue directories, unless the -v flag is also given, in which case only one queue running process is created.

The book Sendmail Performance Tuning, by Nick Christenson (ISBN:0-321-11570-8) backs up this interpretation, and says:

3.4.2 Multiple Queues

[... text that makes it clear Nick is talking about multiple queues in the same sense that I am ...]

While this mechanism might seem little more than a fairly minor workaround for some filesystem's directory deficiencies, multiple queues offer one other advantage. One can gain effective parallelism from multiple queue runners. Recall that when sendmail starts, one parameter given to it on the command line (typically something like -q30m) specifies how often a queue runner should start. This parameter indicates that every 30 minutes, the master sendmail daemon will fork a child process; this child process scans the entire queue looking for messages to deliver, sorts the messages, and then sequentially tries to deliver them. With multiple queue, one queue runner process exists per queue directory. As most of the time during a queue run is spent waiting for responses from email servers across the Internet (or lack of responses -- remember that most items encountered by a queue runner were undeliverable in a previous attempt), it may take as little as 1/N times as long to complete a queue run over N queues as it does to attempt to deliver every item in a single queue, a significant speedup. This effort helps keep the total number of concurrently queued messages to a minimum. -- Pages 51-52


It seems that Sendmail does not behave like this. I've just run repeated tests, kicking off the queue runs by running

sendmail -q30s

No matter how many queue directories I have, Sendmail always creates one queue runner. Then 30 seconds later it creates a second. 30 seconds after that it creates a third. And so on. Each queue runner has to read the entire queue (no matter how many directories it's spread across), and attempt to lock each queue file in turn.

Obviously, queue runners that start later in the process have less work to do, because more queued mail has been delivered, so the size of the queue is smaller. But the differences are constant, no matter how many queue directories there are.

You can verify this by looking at the results for test 20, which follow the same format as those for test 19. If you run those through ministat you'll see no statistically significant differences for any of the results.

I brought this up on the Sendmail mailing list, explaining that this doesn't match my experience in these tests, and John Beck replied, saying that it didn't match his experience either -- so it's not just me seeing these results. Googling for other references pointed me at this training slide from Gregory Shapiro, which tells the same story.

I did think that Sendmail might only start multiple queue runners if given the daemon option, -bd, so I also tried repeating the test by stopping Sendmail completely, and then running:

sendmail -bd -q30s

with five queue directories. But again, only one queue runner was created, not the expected five.

Accordingly, and despite the documentation, it would appear that multiple queue directories offer no advantages when processing a queue spread over mutliple different directories. I'm unclear at this point whether this is a Sendmail bug or a documentation bug.

No comments:

Post a Comment