Day 38 of 60: Multiple queues, one queue runner

Today I'm looking at the results that I've obtained from the latest round of tests. These tests used sendmail -q to deliver 30,000 messages to a different zone. There were 10 runs to each test, and the different tests collected data on timings for 1, 5, 10, 20, 30, and 40 queue directories.



Highlevel overview of Sendmail queue processing



Before diving in to the results it's worth looking at what Sendmail does when it runs the queue like this. To get specific data, I've written queue-run.d. This shows, at a fairly high level, what happens when Sendmail processes the queue.

Here's some sample output that it produced. In this test I had 5 queue directories. 4 of the directories were empty, while the first directory contained 2 messages (so 4 files in total). I've removed the timestamps and lock count columns from the output.

This first section of output:

16537 runqueue:queue-run-start
16537 runqueue:queue-run-end
16538 gatherq:queue-read-start
16538 Locked qfk7GD224j016534 'sh, nb' 1
16538 Locked qfk7GD224i016534 'sh, nb' 1
16538 gatherq:queue-read-end


shows that the sendmail -q process that I started was pid 16537. This process started the queue run, and almost immediately finished. However, it forked process 16538, and it's this process that's going to deliver the messages.

Recall that there are 5 queue directories. The queue-read-* probes fire at the beginning and end of the section of code that processes queue directories, and we see here one queue directory being processed. 2 qf* files have been found, and a shared ("sh"), nonblocking ("nb") lock has successfully ("1") been obtained for both files.

This next section:

16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end


shows another four queue-read-* pairs, one for each of the remaining four queue directories. There are no queue files in these directories, so there's no work to be done.

The queue runner now sorts the message it has found, to determine the order in which to deliver them.

16538 sortq:queue-sort-start
16538 sortq:queue-sort-end


This has generated a sorted list of work for a queue runner to perform, and one is now started.

16538 runner_work:queue-runner-start
16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224j016534 'ex, nb' 1
16538 dowork:queue-runjob-end


Each queue entry is a job to be run, and here we see the queue-runjob-* probes firing. As part of the job-running process the qf* file for this job is locked. It's an exclusive lock this time, instead of a shared lock. The queue runner then connects to the remote system and delivers the message.

This is repeated for the second job in the queue, and with both messages delivered the queue runner ends.

16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224i016534 'ex, nb' 1
16538 dowork:queue-runjob-end
16538 runner_work:queue-runner-end


Consequences



You can see from this that there are a number of high-level processes happening here.


  • The queues have to be read (queue-read-* probes)

  • The queue entries have to be sorted (queue-sort-* probes)

  • A queue runner is started (queue-runner-* probes)
  • The queue runner runs one or more jobs (queue-runjob-* probes)



The time for all these stages might be affected by the number of queue directories, and this is what queue-run-duration.d reports on.

Result format



View raw data

The results are contained in multiple files, one per set of queue directories. They are CSV files, and look a little like this:

PID,queue-run,queue-read,queue-sort,queue-runner,queue-runjob
15510,0,428899305,6936775,40084251281,39863107039
15521,0,435772123,6547168,40390482231,40184604420
15509,32860952,0,0,0,0
15520,4862161,0,0,0,0


Here, PIDs 15510 and 15521 were forked to do the actual queue run. That's why that have time in all columns except queue-run. Conversely, PIDs 15509 and 15520 are the parents of these two, in which only the queue-run-* probes fired.

The queue-run times aren't especially interesting, as that's carried out by the first process before it forks. It's the other columns where differences in the number of queue directories might show up.

Results for queue-read



This probe covers the time it takes to iterate through all the files in the queue. The results for different numbers of queue directories are:

x queue-read.01
+ queue-read.05
* queue-read.10
% queue-read.20
- queue-read.30
@ queue-read.40
: = Mean
M = Median
+----------------------------------------------------------+
|%-+%-%-%@%@@%@-*--*+-@ @ @@@@ x xx xx xx x x|
| |______M:________| |
| |_____:_____| |
| |____M:____| |
| |____M:_____| |
| |____M:______| |
| |________:_M_____| |
+----------------------------------------------------------+


That certainly looks as though there's a benefit to multiple queue directories. The range bars clearly show it somewhat to the right of the other ranges, and therefore taking longer.

The statistical results bear this out (nanoseconds have been converted to microseconds) in these figures.































NMinMaxMedianMeanStddev
x10428.89931464.33992443.42052444.474539.7989508
+10402.45654421.85465411.10078411.512746.8551837
Difference at 99.5% confidence
-32.9618 +/- 13.652
-7.4159% +/- 3.07148%
*10405.4721425.66973416.3516416.69216.0080299
Difference at 99.5% confidence
-27.7824 +/- 13.1215
-6.25062% +/- 2.95215%
%10400.15261424.28187407.71897409.165446.7259599
Difference at 99.5% confidence
-35.3091 +/- 13.5679
-7.94401% +/- 3.05258%
-10401.18106423.11366410.45674411.724267.1450633
Difference at 99.5% confidence
-32.7503 +/- 13.8443
-7.36831% +/- 3.11476%
@10409.59177432.69872425.02406422.275999.1898042
Difference at 99.5% confidence
-22.1985 +/- 15.336
-4.99433% +/- 3.45037%


It would seem that, at least until 30 queue directories, there is a 6-7% reduction in the amount of time it takes to read the queue.

Results for queue-sort



This probe covers the time it takes to sort the queue items once they have been gathered. These results use the default Sendmail queue sort order, "Priority". The results for different numbers of queue directories are:

x queue-sort.01
+ queue-sort.05
* queue-sort.10
% queue-sort.20
- queue-sort.30
@ queue-sort.40
: = Mean
M = Median
+----------------------------------------------------------+
| *x - @@++ % |
|x x x ***-* - %%- x -@@-+-+%@ @ * *@- @ %-%+ %@@|
| |___________M_:_____________| |
| |___M___:_______| |
| |____M_______:_____________| |
| |_________M__:_____________| |
| |_________M_:___________| |
| |________M__:___________| |
+----------------------------------------------------------+


There's no apparent difference between the results on this plot, and the statistical summary from ministat bears this out.





















NMinMaxMedianMeanStddev
x106.4920417.0442346.7210116.75253110.2109344
+106.8970247.2361076.93062456.98147230.118952
No difference proven at 99.5% confidence
*106.6079567.0971236.65644256.77869360.20168636
No difference proven at 99.5% confidence
%106.7657097.3197176.96843157.02335060.19899698
No difference proven at 99.5% confidence
-106.6523887.2120866.8852096.9137070.18356812
No difference proven at 99.5% confidence
@106.8821077.3467337.004447.05209780.17935072
No difference proven at 99.5% confidence


Results for queue-runner



This probe covers the time it takes to deliver all the messages in the queue. The results for different numbers of queue directories are:

x queue-runner.01
+ queue-runner.05
* queue-runner.10
% queue-runner.20
- queue-runner.30
@ queue-runner.40
: = Mean
M = Median
+----------------------------------------------------------+
| - - %@ %** * ++++ xxx - @ |
| @@-%%-@@@@--** *+++ xxxxxx - - @ @|
| |_:| |
| |:| |
| |_:_| |
| |__:___| |
| |________M______:_______________| |
||__________M_________:_____________________| |
+----------------------------------------------------------+


Now these look quite interesting. There's a clear benefit to be had as queue directories are added, as can be seen by the gradual leftward shift of the range bars. Note, too, that the standard deviation slowly increases as the number of queue directories increases.

This continues until we get to 30 queue directories, and suddenly the range of values increases dramatically. This pattern holds for the 40 queue directory results too. This would seem to indicate that there's a factor that's negligible with 20 or fewer queue directories that suddenly becomes much more significant.

The statistical results show this in more detail.



























NMinMaxMedianMeanStddev
x1039546.61740595.75340197.70340163.428338.53016
+1037840.30938448.06138164.57838170.287209.78723
Difference at 99.5% confidence
-1993.14 +/- 454.65
-4.96258% +/- 1.132%
*1036540.07937843.60436954.96437055.675430.47168
Difference at 99.5% confidence
-3107.75 +/- 625.175
-7.73777% +/- 1.55658%
%1035043.26336947.37635942.34935993.725697.82562
Difference at 99.5% confidence
-4169.7 +/- 885.417
-10.3818% +/- 2.20454%
-1034620.74342961.96736161.55637588.7283451.5012
No difference proven at 99.5% confidence
@1034589.02546344.24236245.36838587.4284649.2078
No difference proven at 99.5% confidence


5, 10, or 20 queue directories makes the delivery process take 5%, 8%, and 10% faster respectively. But there's no significant benefit with 30 or 40 queue directories. Note that these results do not mean that 30 or 40 queue directories have the same benefit as 20 -- they mean that 30 or 40 queue directories offer no benefit over 1 queue directory.

Interesting. I wonder why.

Results for queue-runjob



This probe covers the time it takes to deliver all the messages in the queue. This is slightly different from the queue-runner probe. In this results, queue-runjob probes will have fired for every message delivery. The times taken by these probes have been summed.

The queue-runner probe only fires at the start and end of the whole process. So the totall cumulative time for all the queue-runjob probes is going to be less than that for queue-runner. These results may or may not show the same interesting features at 30 and 40 queue directores as queue-runner.

The results for different numbers of queue directories are:

x queue-runjob.01
+ queue-runjob.05
* queue-runjob.10
% queue-runjob.20
- queue-runjob.30
@ queue-runjob.40
: = Mean
M = Median
+----------------------------------------------------------+
| - - %@ %** * ++++ xxx - |
| @@-%%-@@@@--** *+++ xxxxx - - @@ @|
| |_:| |
| :| |
| |_:_| |
| |__:___| |
| |________M______:_______________| |
||__________M_________:_____________________| |
+----------------------------------------------------------+


Well, that answers that question. This looks very similar to plot for queue-runner, which would imply that whatever is taking additional time in the 30 and 40 queue directory case is bounded by the queue-runjob probes.

The statistical results are similar to queue-runner too.


























NMinMaxMedianMeanStddev
x1039351.8540370.58439973.80439950.51330.588
+1037657.68238238.61937973.4137977.145204.91243
Difference at 99.5% confidence
-1973.37 +/- 444.012
-4.93953% +/- 1.1114%
*1036351.68637631.4436764.90736857.071419.81903
Difference at 99.5% confidence
-3093.44 +/- 610.011
-7.74318% +/- 1.52692%
%1034857.7336736.39435747.58935795.814684.95635
Difference at 99.5% confidence
-4154.7 +/- 868.243
-10.3996% +/- 2.1733%
-1034440.56142751.0535948.91437393.3083445.2905
No difference proven at 99.5% confidence
@1034408.40846132.4836035.34238390.1194648.62
No difference proven at 99.5% confidence


There are similar percentage improvements in the 5, 10, and 20 queue directory cases, and then 30 and 40 queue directories completely lose that advantage, and become as slow as a single queue directory.

Conclusions



Please read the disclaimer in conjunction with these conclusions.

When you have a single process running the queue then you can expect that multiple queue directories will have the following effects:


  • The time taken to read the queue will be reduced by 6-7%

  • With the default queue sort order of "Priority" there is no significant performance gain or loss when using multiple queues.

  • Additional queue directories make the delivery processes run faster, up to a point. There's a point somewhere between 20 and 30 queue directories, where additional queue directories is a step backwards, and any benefit from having them is lost.

3 comments:

  1. Hmmm, I'm going to have to read more carefully. I was convinced I was looking at nethack screens, not stats, for a moment. :-)

    ReplyDelete
  2. [...] You can verify this by looking at the results for test 20, which follow the same format as those for test 19. If you run those through ministat you’ll see no statistically significant differences for any of the results. [...]

    ReplyDelete