sendmail -q
to deliver 30,000 messages to a different zone. There were 10 runs to each test, and the different tests collected data on timings for 1, 5, 10, 20, 30, and 40 queue directories.Highlevel overview of Sendmail queue processing
Before diving in to the results it's worth looking at what Sendmail does when it runs the queue like this. To get specific data, I've written queue-run.d. This shows, at a fairly high level, what happens when Sendmail processes the queue.
Here's some sample output that it produced. In this test I had 5 queue directories. 4 of the directories were empty, while the first directory contained 2 messages (so 4 files in total). I've removed the timestamps and lock count columns from the output.
This first section of output:
16537 runqueue:queue-run-start
16537 runqueue:queue-run-end
16538 gatherq:queue-read-start
16538 Locked qfk7GD224j016534 'sh, nb' 1
16538 Locked qfk7GD224i016534 'sh, nb' 1
16538 gatherq:queue-read-end
shows that the
sendmail -q
process that I started was pid 16537. This process started the queue run, and almost immediately finished. However, it forked process 16538, and it's this process that's going to deliver the messages.Recall that there are 5 queue directories. The
queue-read-*
probes fire at the beginning and end of the section of code that processes queue directories, and we see here one queue directory being processed. 2 qf*
files have been found, and a shared ("sh"), nonblocking ("nb") lock has successfully ("1") been obtained for both files.This next section:
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
shows another four
queue-read-*
pairs, one for each of the remaining four queue directories. There are no queue files in these directories, so there's no work to be done.The queue runner now sorts the message it has found, to determine the order in which to deliver them.
16538 sortq:queue-sort-start
16538 sortq:queue-sort-end
This has generated a sorted list of work for a queue runner to perform, and one is now started.
16538 runner_work:queue-runner-start
16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224j016534 'ex, nb' 1
16538 dowork:queue-runjob-end
Each queue entry is a job to be run, and here we see the
queue-runjob-*
probes firing. As part of the job-running process the qf*
file for this job is locked. It's an exclusive lock this time, instead of a shared lock. The queue runner then connects to the remote system and delivers the message.This is repeated for the second job in the queue, and with both messages delivered the queue runner ends.
16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224i016534 'ex, nb' 1
16538 dowork:queue-runjob-end
16538 runner_work:queue-runner-end
Consequences
You can see from this that there are a number of high-level processes happening here.
- The queues have to be read (
queue-read-*
probes) - The queue entries have to be sorted (
queue-sort-*
probes) - A queue runner is started (
queue-runner-*
probes) - The queue runner runs one or more jobs (
queue-runjob-*
probes)
The time for all these stages might be affected by the number of queue directories, and this is what queue-run-duration.d reports on.
Result format
View raw data
The results are contained in multiple files, one per set of queue directories. They are CSV files, and look a little like this:
PID,queue-run,queue-read,queue-sort,queue-runner,queue-runjob
15510,0,428899305,6936775,40084251281,39863107039
15521,0,435772123,6547168,40390482231,40184604420
15509,32860952,0,0,0,0
15520,4862161,0,0,0,0
Here, PIDs 15510 and 15521 were forked to do the actual queue run. That's why that have time in all columns except
queue-run
. Conversely, PIDs 15509 and 15520 are the parents of these two, in which only the queue-run-*
probes fired.The
queue-run
times aren't especially interesting, as that's carried out by the first process before it forks. It's the other columns where differences in the number of queue directories might show up.Results for queue-read
This probe covers the time it takes to iterate through all the files in the queue. The results for different numbers of queue directories are:
x queue-read.01
+ queue-read.05
* queue-read.10
% queue-read.20
- queue-read.30
@ queue-read.40
: = Mean
M = Median
+----------------------------------------------------------+
|%-+%-%-%@%@@%@-*--*+-@ @ @@@@ x xx xx xx x x|
| |______M:________| |
| |_____:_____| |
| |____M:____| |
| |____M:_____| |
| |____M:______| |
| |________:_M_____| |
+----------------------------------------------------------+
That certainly looks as though there's a benefit to multiple queue directories. The range bars clearly show it somewhat to the right of the other ranges, and therefore taking longer.
The statistical results bear this out (nanoseconds have been converted to microseconds) in these figures.
N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|
x | 10 | 428.89931 | 464.33992 | 443.42052 | 444.47453 | 9.7989508 |
+ | 10 | 402.45654 | 421.85465 | 411.10078 | 411.51274 | 6.8551837 |
Difference at 99.5% confidence | ||||||
-32.9618 +/- 13.652 | ||||||
-7.4159% +/- 3.07148% | ||||||
* | 10 | 405.4721 | 425.66973 | 416.3516 | 416.6921 | 6.0080299 |
Difference at 99.5% confidence | ||||||
-27.7824 +/- 13.1215 | ||||||
-6.25062% +/- 2.95215% | ||||||
% | 10 | 400.15261 | 424.28187 | 407.71897 | 409.16544 | 6.7259599 |
Difference at 99.5% confidence | ||||||
-35.3091 +/- 13.5679 | ||||||
-7.94401% +/- 3.05258% | ||||||
- | 10 | 401.18106 | 423.11366 | 410.45674 | 411.72426 | 7.1450633 |
Difference at 99.5% confidence | ||||||
-32.7503 +/- 13.8443 | ||||||
-7.36831% +/- 3.11476% | ||||||
@ | 10 | 409.59177 | 432.69872 | 425.02406 | 422.27599 | 9.1898042 |
Difference at 99.5% confidence | ||||||
-22.1985 +/- 15.336 | ||||||
-4.99433% +/- 3.45037% |
It would seem that, at least until 30 queue directories, there is a 6-7% reduction in the amount of time it takes to read the queue.
Results for queue-sort
This probe covers the time it takes to sort the queue items once they have been gathered. These results use the default Sendmail queue sort order, "Priority". The results for different numbers of queue directories are:
x queue-sort.01
+ queue-sort.05
* queue-sort.10
% queue-sort.20
- queue-sort.30
@ queue-sort.40
: = Mean
M = Median
+----------------------------------------------------------+
| *x - @@++ % |
|x x x ***-* - %%- x -@@-+-+%@ @ * *@- @ %-%+ %@@|
| |___________M_:_____________| |
| |___M___:_______| |
| |____M_______:_____________| |
| |_________M__:_____________| |
| |_________M_:___________| |
| |________M__:___________| |
+----------------------------------------------------------+
There's no apparent difference between the results on this plot, and the statistical summary from ministat bears this out.
N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|
x | 10 | 6.492041 | 7.044234 | 6.721011 | 6.7525311 | 0.2109344 |
+ | 10 | 6.897024 | 7.236107 | 6.9306245 | 6.9814723 | 0.118952 |
No difference proven at 99.5% confidence | ||||||
* | 10 | 6.607956 | 7.097123 | 6.6564425 | 6.7786936 | 0.20168636 |
No difference proven at 99.5% confidence | ||||||
% | 10 | 6.765709 | 7.319717 | 6.9684315 | 7.0233506 | 0.19899698 |
No difference proven at 99.5% confidence | ||||||
- | 10 | 6.652388 | 7.212086 | 6.885209 | 6.913707 | 0.18356812 |
No difference proven at 99.5% confidence | ||||||
@ | 10 | 6.882107 | 7.346733 | 7.00444 | 7.0520978 | 0.17935072 |
No difference proven at 99.5% confidence |
Results for queue-runner
This probe covers the time it takes to deliver all the messages in the queue. The results for different numbers of queue directories are:
x queue-runner.01
+ queue-runner.05
* queue-runner.10
% queue-runner.20
- queue-runner.30
@ queue-runner.40
: = Mean
M = Median
+----------------------------------------------------------+
| - - %@ %** * ++++ xxx - @ |
| @@-%%-@@@@--** *+++ xxxxxx - - @ @|
| |_:| |
| |:| |
| |_:_| |
| |__:___| |
| |________M______:_______________| |
||__________M_________:_____________________| |
+----------------------------------------------------------+
Now these look quite interesting. There's a clear benefit to be had as queue directories are added, as can be seen by the gradual leftward shift of the range bars. Note, too, that the standard deviation slowly increases as the number of queue directories increases.
This continues until we get to 30 queue directories, and suddenly the range of values increases dramatically. This pattern holds for the 40 queue directory results too. This would seem to indicate that there's a factor that's negligible with 20 or fewer queue directories that suddenly becomes much more significant.
The statistical results show this in more detail.
N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|
x | 10 | 39546.617 | 40595.753 | 40197.703 | 40163.428 | 338.53016 |
+ | 10 | 37840.309 | 38448.061 | 38164.578 | 38170.287 | 209.78723 |
Difference at 99.5% confidence | ||||||
-1993.14 +/- 454.65 | ||||||
-4.96258% +/- 1.132% | ||||||
* | 10 | 36540.079 | 37843.604 | 36954.964 | 37055.675 | 430.47168 |
Difference at 99.5% confidence | ||||||
-3107.75 +/- 625.175 | ||||||
-7.73777% +/- 1.55658% | ||||||
% | 10 | 35043.263 | 36947.376 | 35942.349 | 35993.725 | 697.82562 |
Difference at 99.5% confidence | ||||||
-4169.7 +/- 885.417 | ||||||
-10.3818% +/- 2.20454% | ||||||
- | 10 | 34620.743 | 42961.967 | 36161.556 | 37588.728 | 3451.5012 |
No difference proven at 99.5% confidence | ||||||
@ | 10 | 34589.025 | 46344.242 | 36245.368 | 38587.428 | 4649.2078 |
No difference proven at 99.5% confidence |
5, 10, or 20 queue directories makes the delivery process take 5%, 8%, and 10% faster respectively. But there's no significant benefit with 30 or 40 queue directories. Note that these results do not mean that 30 or 40 queue directories have the same benefit as 20 -- they mean that 30 or 40 queue directories offer no benefit over 1 queue directory.
Interesting. I wonder why.
Results for queue-runjob
This probe covers the time it takes to deliver all the messages in the queue. This is slightly different from the
queue-runner
probe. In this results, queue-runjob
probes will have fired for every message delivery. The times taken by these probes have been summed.The
queue-runner
probe only fires at the start and end of the whole process. So the totall cumulative time for all the queue-runjob
probes is going to be less than that for queue-runner
. These results may or may not show the same interesting features at 30 and 40 queue directores as queue-runner
.The results for different numbers of queue directories are:
x queue-runjob.01
+ queue-runjob.05
* queue-runjob.10
% queue-runjob.20
- queue-runjob.30
@ queue-runjob.40
: = Mean
M = Median
+----------------------------------------------------------+
| - - %@ %** * ++++ xxx - |
| @@-%%-@@@@--** *+++ xxxxx - - @@ @|
| |_:| |
| :| |
| |_:_| |
| |__:___| |
| |________M______:_______________| |
||__________M_________:_____________________| |
+----------------------------------------------------------+
Well, that answers that question. This looks very similar to plot for
queue-runner
, which would imply that whatever is taking additional time in the 30 and 40 queue directory case is bounded by the queue-runjob
probes.The statistical results are similar to
queue-runner
too.N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|
x | 10 | 39351.85 | 40370.584 | 39973.804 | 39950.51 | 330.588 |
+ | 10 | 37657.682 | 38238.619 | 37973.41 | 37977.145 | 204.91243 |
Difference at 99.5% confidence | ||||||
-1973.37 +/- 444.012 | ||||||
-4.93953% +/- 1.1114% | ||||||
* | 10 | 36351.686 | 37631.44 | 36764.907 | 36857.071 | 419.81903 |
Difference at 99.5% confidence | ||||||
-3093.44 +/- 610.011 | ||||||
-7.74318% +/- 1.52692% | ||||||
% | 10 | 34857.73 | 36736.394 | 35747.589 | 35795.814 | 684.95635 |
Difference at 99.5% confidence | ||||||
-4154.7 +/- 868.243 | ||||||
-10.3996% +/- 2.1733% | ||||||
- | 10 | 34440.561 | 42751.05 | 35948.914 | 37393.308 | 3445.2905 |
No difference proven at 99.5% confidence | ||||||
@ | 10 | 34408.408 | 46132.48 | 36035.342 | 38390.119 | 4648.62 |
No difference proven at 99.5% confidence |
There are similar percentage improvements in the 5, 10, and 20 queue directory cases, and then 30 and 40 queue directories completely lose that advantage, and become as slow as a single queue directory.
Conclusions
Please read the disclaimer in conjunction with these conclusions.
When you have a single process running the queue then you can expect that multiple queue directories will have the following effects:
- The time taken to read the queue will be reduced by 6-7%
- With the default queue sort order of "Priority" there is no significant performance gain or loss when using multiple queues.
- Additional queue directories make the delivery processes run faster, up to a point. There's a point somewhere between 20 and 30 queue directories, where additional queue directories is a step backwards, and any benefit from having them is lost.
Hmmm, I'm going to have to read more carefully. I was convinced I was looking at nethack screens, not stats, for a moment. :-)
ReplyDeleteHeh. No amulets here.
ReplyDelete[...] You can verify this by looking at the results for test 20, which follow the same format as those for test 19. If you run those through ministat you’ll see no statistically significant differences for any of the results. [...]
ReplyDelete