Try before you buy: Day 38 of 60: Multiple queues, one queue runner

Today I'm looking at the results that I've obtained from the latest round of tests. These tests used sendmail -q to deliver 30,000 messages to a different zone. There were 10 runs to each test, and the different tests collected data on timings for 1, 5, 10, 20, 30, and 40 queue directories.

Highlevel overview of Sendmail queue processing

Before diving in to the results it's worth looking at what Sendmail does when it runs the queue like this. To get specific data, I've written queue-run.d. This shows, at a fairly high level, what happens when Sendmail processes the queue.

Here's some sample output that it produced. In this test I had 5 queue directories. 4 of the directories were empty, while the first directory contained 2 messages (so 4 files in total). I've removed the timestamps and lock count columns from the output.

This first section of output:

16537 runqueue:queue-run-start
16537 runqueue:queue-run-end
16538 gatherq:queue-read-start
16538 Locked qfk7GD224j016534           'sh, nb'   1
16538 Locked qfk7GD224i016534           'sh, nb'   1
16538 gatherq:queue-read-end

shows that the sendmail -q process that I started was pid 16537. This process started the queue run, and almost immediately finished. However, it forked process 16538, and it's this process that's going to deliver the messages.

Recall that there are 5 queue directories. The queue-read-* probes fire at the beginning and end of the section of code that processes queue directories, and we see here one queue directory being processed. 2 qf* files have been found, and a shared ("sh"), nonblocking ("nb") lock has successfully ("1") been obtained for both files.

This next section:

16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end
16538 gatherq:queue-read-start
16538 gatherq:queue-read-end

shows another four queue-read-* pairs, one for each of the remaining four queue directories. There are no queue files in these directories, so there's no work to be done.

The queue runner now sorts the message it has found, to determine the order in which to deliver them.

16538 sortq:queue-sort-start
16538 sortq:queue-sort-end

This has generated a sorted list of work for a queue runner to perform, and one is now started.

16538 runner_work:queue-runner-start
16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224j016534    'ex, nb'   1
16538 dowork:queue-runjob-end

Each queue entry is a job to be run, and here we see the queue-runjob-* probes firing. As part of the job-running process the qf* file for this job is locked. It's an exclusive lock this time, instead of a shared lock. The queue runner then connects to the remote system and delivers the message.

This is repeated for the second job in the queue, and with both messages delivered the queue runner ends.

16538 dowork:queue-runjob-start
16538 Locked qdir00/qfk7GD224i016534    'ex, nb'   1
16538 dowork:queue-runjob-end
16538 runner_work:queue-runner-end

Consequences

You can see from this that there are a number of high-level processes happening here.

The queues have to be read (queue-read-* probes)

The queue entries have to be sorted (queue-sort-* probes)

A queue runner is started (queue-runner-* probes)
The queue runner runs one or more jobs (queue-runjob-* probes)

The time for all these stages might be affected by the number of queue directories, and this is what queue-run-duration.d reports on.

Result format

View raw data

The results are contained in multiple files, one per set of queue directories. They are CSV files, and look a little like this:

PID,queue-run,queue-read,queue-sort,queue-runner,queue-runjob
15510,0,428899305,6936775,40084251281,39863107039
15521,0,435772123,6547168,40390482231,40184604420
15509,32860952,0,0,0,0
15520,4862161,0,0,0,0

Here, PIDs 15510 and 15521 were forked to do the actual queue run. That's why that have time in all columns except queue-run. Conversely, PIDs 15509 and 15520 are the parents of these two, in which only the queue-run-* probes fired.

The queue-run times aren't especially interesting, as that's carried out by the first process before it forks. It's the other columns where differences in the number of queue directories might show up.

Results for `queue-read`

This probe covers the time it takes to iterate through all the files in the queue. The results for different numbers of queue directories are:

x queue-read.01
+ queue-read.05
* queue-read.10
% queue-read.20
- queue-read.30
@ queue-read.40
: = Mean
M = Median
+----------------------------------------------------------+
|%-+%-%-%@%@@%@-*--*+-@ @  @@@@  x  xx xx xx     x        x|
|                               |______M:________|         |
|    |_____:_____|                                         |
|         |____M:____|                                     |
|  |____M:_____|                                           |
|    |____M:______|                                        |
|           |________:_M_____|                             |
+----------------------------------------------------------+

That certainly looks as though there's a benefit to multiple queue directories. The range bars clearly show it somewhat to the right of the other ranges, and therefore taking longer.

The statistical results bear this out (nanoseconds have been converted to microseconds) in these figures.

	N	Min	Max	Median	Mean	Stddev
x	10	428.89931	464.33992	443.42052	444.47453	9.7989508
+	10	402.45654	421.85465	411.10078	411.51274	6.8551837
Difference at 99.5% confidence
		-32.9618 +/- 13.652
		-7.4159% +/- 3.07148%
*	10	405.4721	425.66973	416.3516	416.6921	6.0080299
Difference at 99.5% confidence
		-27.7824 +/- 13.1215
		-6.25062% +/- 2.95215%
%	10	400.15261	424.28187	407.71897	409.16544	6.7259599
Difference at 99.5% confidence
		-35.3091 +/- 13.5679
		-7.94401% +/- 3.05258%
-	10	401.18106	423.11366	410.45674	411.72426	7.1450633
Difference at 99.5% confidence
		-32.7503 +/- 13.8443
		-7.36831% +/- 3.11476%
@	10	409.59177	432.69872	425.02406	422.27599	9.1898042
Difference at 99.5% confidence
		-22.1985 +/- 15.336
		-4.99433% +/- 3.45037%

It would seem that, at least until 30 queue directories, there is a 6-7% reduction in the amount of time it takes to read the queue.

Results for `queue-sort`

This probe covers the time it takes to sort the queue items once they have been gathered. These results use the default Sendmail queue sort order, "Priority". The results for different numbers of queue directories are:

x queue-sort.01
+ queue-sort.05
* queue-sort.10
% queue-sort.20
- queue-sort.30
@ queue-sort.40
: = Mean
M = Median
+----------------------------------------------------------+
|        *x          -     @@++  %                         |
|x x x   ***-*   - %%- x  -@@-+-+%@ @ *  *@-  @ %-%+    %@@|
|   |___________M_:_____________|                          |
|                         |___M___:_______|                |
|      |____M_______:_____________|                        |
|                      |_________M__:_____________|        |
|                |_________M_:___________|                 |
|                         |________M__:___________|        |
+----------------------------------------------------------+

There's no apparent difference between the results on this plot, and the statistical summary from ministat bears this out.

	N	Min	Max	Median	Mean	Stddev
x	10	6.492041	7.044234	6.721011	6.7525311	0.2109344
+	10	6.897024	7.236107	6.9306245	6.9814723	0.118952
No difference proven at 99.5% confidence
*	10	6.607956	7.097123	6.6564425	6.7786936	0.20168636
No difference proven at 99.5% confidence
%	10	6.765709	7.319717	6.9684315	7.0233506	0.19899698
No difference proven at 99.5% confidence
-	10	6.652388	7.212086	6.885209	6.913707	0.18356812
No difference proven at 99.5% confidence
@	10	6.882107	7.346733	7.00444	7.0520978	0.17935072
No difference proven at 99.5% confidence

Results for `queue-runner`

This probe covers the time it takes to deliver all the messages in the queue. The results for different numbers of queue directories are:

x queue-runner.01
+ queue-runner.05
* queue-runner.10
% queue-runner.20
- queue-runner.30
@ queue-runner.40
: = Mean
M = Median
+----------------------------------------------------------+
|   - -  %@  %** * ++++      xxx          -       @        |
|   @@-%%-@@@@--** *+++    xxxxxx   -     -       @       @|
|                           |_:|                           |
|                  |:|                                     |
|            |_:_|                                         |
|      |__:___|                                            |
| |________M______:_______________|                        |
||__________M_________:_____________________|              |
+----------------------------------------------------------+

Now these look quite interesting. There's a clear benefit to be had as queue directories are added, as can be seen by the gradual leftward shift of the range bars. Note, too, that the standard deviation slowly increases as the number of queue directories increases.

This continues until we get to 30 queue directories, and suddenly the range of values increases dramatically. This pattern holds for the 40 queue directory results too. This would seem to indicate that there's a factor that's negligible with 20 or fewer queue directories that suddenly becomes much more significant.

The statistical results show this in more detail.

	N	Min	Max	Median	Mean	Stddev
x	10	39546.617	40595.753	40197.703	40163.428	338.53016
+	10	37840.309	38448.061	38164.578	38170.287	209.78723
Difference at 99.5% confidence
		-1993.14 +/- 454.65
		-4.96258% +/- 1.132%
*	10	36540.079	37843.604	36954.964	37055.675	430.47168
Difference at 99.5% confidence
		-3107.75 +/- 625.175
		-7.73777% +/- 1.55658%
%	10	35043.263	36947.376	35942.349	35993.725	697.82562
Difference at 99.5% confidence
		-4169.7 +/- 885.417
		-10.3818% +/- 2.20454%
-	10	34620.743	42961.967	36161.556	37588.728	3451.5012
No difference proven at 99.5% confidence
@	10	34589.025	46344.242	36245.368	38587.428	4649.2078
No difference proven at 99.5% confidence

5, 10, or 20 queue directories makes the delivery process take 5%, 8%, and 10% faster respectively. But there's no significant benefit with 30 or 40 queue directories. Note that these results do not mean that 30 or 40 queue directories have the same benefit as 20 -- they mean that 30 or 40 queue directories offer no benefit over 1 queue directory.

Interesting. I wonder why.

Results for `queue-runjob`

This probe covers the time it takes to deliver all the messages in the queue. This is slightly different from the queue-runner probe. In this results, queue-runjob probes will have fired for every message delivery. The times taken by these probes have been summed.

The queue-runner probe only fires at the start and end of the whole process. So the totall cumulative time for all the queue-runjob probes is going to be less than that for queue-runner. These results may or may not show the same interesting features at 30 and 40 queue directores as queue-runner.

The results for different numbers of queue directories are:

x queue-runjob.01
+ queue-runjob.05
* queue-runjob.10
% queue-runjob.20
- queue-runjob.30
@ queue-runjob.40
: = Mean
M = Median
+----------------------------------------------------------+
|   - -  %@  %** * ++++      xxx          -                |
|   @@-%%-@@@@--** *+++    xxxxx    -     -       @@      @|
|                           |_:|                           |
|                   :|                                     |
|            |_:_|                                         |
|      |__:___|                                            |
| |________M______:_______________|                        |
||__________M_________:_____________________|              |
+----------------------------------------------------------+

Well, that answers that question. This looks very similar to plot for queue-runner, which would imply that whatever is taking additional time in the 30 and 40 queue directory case is bounded by the queue-runjob probes.

The statistical results are similar to queue-runner too.

	N	Min	Max	Median	Mean	Stddev
x	10	39351.85	40370.584	39973.804	39950.51	330.588
+	10	37657.682	38238.619	37973.41	37977.145	204.91243
Difference at 99.5% confidence
		-1973.37 +/- 444.012
		-4.93953% +/- 1.1114%
*	10	36351.686	37631.44	36764.907	36857.071	419.81903
Difference at 99.5% confidence
		-3093.44 +/- 610.011
		-7.74318% +/- 1.52692%
%	10	34857.73	36736.394	35747.589	35795.814	684.95635
Difference at 99.5% confidence
		-4154.7 +/- 868.243
		-10.3996% +/- 2.1733%
-	10	34440.561	42751.05	35948.914	37393.308	3445.2905
No difference proven at 99.5% confidence
@	10	34408.408	46132.48	36035.342	38390.119	4648.62
No difference proven at 99.5% confidence

There are similar percentage improvements in the 5, 10, and 20 queue directory cases, and then 30 and 40 queue directories completely lose that advantage, and become as slow as a single queue directory.

Conclusions

Please read the disclaimer in conjunction with these conclusions.

When you have a single process running the queue then you can expect that multiple queue directories will have the following effects:

The time taken to read the queue will be reduced by 6-7%

With the default queue sort order of "Priority" there is no significant performance gain or loss when using multiple queues.

Additional queue directories make the delivery processes run faster, up to a point. There's a point somewhere between 20 and 30 queue directories, where additional queue directories is a step backwards, and any benefit from having them is lost.

3 comments:

AnonymousAugust 16, 2006 at 11:06 PM
Hmmm, I'm going to have to read more carefully. I was convinced I was looking at nethack screens, not stats, for a moment. :-)
AnonymousAugust 17, 2006 at 1:19 AM
Heh. No amulets here.
AnonymousAugust 19, 2006 at 10:41 PM
[...] You can verify this by looking at the results for test 20, which follow the same format as those for test 19. If you run those through ministat you’ll see no statistically significant differences for any of the results. [...]

Try before you buy

Pages

Day 38 of 60: Multiple queues, one queue runner

Highlevel overview of Sendmail queue processing

Consequences

Result format

Results for `queue-read`

Results for `queue-sort`

Results for `queue-runner`

Results for `queue-runjob`

Conclusions

3 comments:

About Me

Followers

Pages

Day 38 of 60: Multiple queues, one queue runner

Highlevel overview of Sendmail queue processing

Consequences

Result format

Results for queue-read

Results for queue-sort

Results for queue-runner

Results for queue-runjob

Conclusions

3 comments:

Results for `queue-read`

Results for `queue-sort`

Results for `queue-runner`

Results for `queue-runjob`