`sendmail -q`

to deliver 30,000 messages to a different zone. There were 10 runs to each test, and the different tests collected data on timings for 1, 5, 10, 20, 30, and 40 queue directories.## Highlevel overview of Sendmail queue processing

Before diving in to the results it's worth looking at what Sendmail does when it runs the queue like this. To get specific data, I've written queue-run.d. This shows, at a fairly high level, what happens when Sendmail processes the queue.

Here's some sample output that it produced. In this test I had 5 queue directories. 4 of the directories were empty, while the first directory contained 2 messages (so 4 files in total). I've removed the timestamps and lock count columns from the output.

This first section of output:

`16537 runqueue:queue-run-start`

16537 runqueue:queue-run-end

16538 gatherq:queue-read-start

16538 Locked qfk7GD224j016534 'sh, nb' 1

16538 Locked qfk7GD224i016534 'sh, nb' 1

16538 gatherq:queue-read-end

shows that the

`sendmail -q`

process that I started was pid 16537. This process started the queue run, and almost immediately finished. However, it forked process 16538, and it's this process that's going to deliver the messages.Recall that there are 5 queue directories. The

`queue-read-*`

probes fire at the beginning and end of the section of code that processes queue directories, and we see here one queue directory being processed. 2 `qf*`

files have been found, and a shared ("sh"), nonblocking ("nb") lock has successfully ("1") been obtained for both files.This next section:

`16538 gatherq:queue-read-start`

16538 gatherq:queue-read-end

16538 gatherq:queue-read-start

16538 gatherq:queue-read-end

16538 gatherq:queue-read-start

16538 gatherq:queue-read-end

16538 gatherq:queue-read-start

16538 gatherq:queue-read-end

shows another four

`queue-read-*`

pairs, one for each of the remaining four queue directories. There are no queue files in these directories, so there's no work to be done.The queue runner now sorts the message it has found, to determine the order in which to deliver them.

`16538 sortq:queue-sort-start`

16538 sortq:queue-sort-end

This has generated a sorted list of work for a queue runner to perform, and one is now started.

`16538 runner_work:queue-runner-start`

16538 dowork:queue-runjob-start

16538 Locked qdir00/qfk7GD224j016534 'ex, nb' 1

16538 dowork:queue-runjob-end

Each queue entry is a job to be run, and here we see the

`queue-runjob-*`

probes firing. As part of the job-running process the `qf*`

file for this job is locked. It's an exclusive lock this time, instead of a shared lock. The queue runner then connects to the remote system and delivers the message.This is repeated for the second job in the queue, and with both messages delivered the queue runner ends.

`16538 dowork:queue-runjob-start`

16538 Locked qdir00/qfk7GD224i016534 'ex, nb' 1

16538 dowork:queue-runjob-end

16538 runner_work:queue-runner-end

## Consequences

You can see from this that there are a number of high-level processes happening here.

- The queues have to be read (
`queue-read-*`

probes) - The queue entries have to be sorted (
`queue-sort-*`

probes) - A queue runner is started (
`queue-runner-*`

probes) - The queue runner runs one or more jobs (
`queue-runjob-*`

probes)

The time for all these stages might be affected by the number of queue directories, and this is what queue-run-duration.d reports on.

## Result format

View raw data

The results are contained in multiple files, one per set of queue directories. They are CSV files, and look a little like this:

`PID,queue-run,queue-read,queue-sort,queue-runner,queue-runjob`

15510,0,428899305,6936775,40084251281,39863107039

15521,0,435772123,6547168,40390482231,40184604420

15509,32860952,0,0,0,0

15520,4862161,0,0,0,0

Here, PIDs 15510 and 15521 were forked to do the actual queue run. That's why that have time in all columns except

`queue-run`

. Conversely, PIDs 15509 and 15520 are the parents of these two, in which only the `queue-run-*`

probes fired.The

`queue-run`

times aren't especially interesting, as that's carried out by the first process before it forks. It's the other columns where differences in the number of queue directories might show up.## Results for `queue-read`

This probe covers the time it takes to iterate through all the files in the queue. The results for different numbers of queue directories are:

`x queue-read.01`

+ queue-read.05

* queue-read.10

% queue-read.20

- queue-read.30

@ queue-read.40

: = Mean

M = Median

+----------------------------------------------------------+

|%-+%-%-%@%@@%@-*--*+-@ @ @@@@ x xx xx xx x x|

| |______M:________| |

| |_____:_____| |

| |____M:____| |

| |____M:_____| |

| |____M:______| |

| |________:_M_____| |

+----------------------------------------------------------+

That certainly looks as though there's a benefit to multiple queue directories. The range bars clearly show it somewhat to the right of the other ranges, and therefore taking longer.

The statistical results bear this out (nanoseconds have been converted to microseconds) in these figures.

N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|

x | 10 | 428.89931 | 464.33992 | 443.42052 | 444.47453 | 9.7989508 |

+ | 10 | 402.45654 | 421.85465 | 411.10078 | 411.51274 | 6.8551837 |

Difference at 99.5% confidence | ||||||

-32.9618 +/- 13.652 | ||||||

-7.4159% +/- 3.07148% | ||||||

* | 10 | 405.4721 | 425.66973 | 416.3516 | 416.6921 | 6.0080299 |

Difference at 99.5% confidence | ||||||

-27.7824 +/- 13.1215 | ||||||

-6.25062% +/- 2.95215% | ||||||

% | 10 | 400.15261 | 424.28187 | 407.71897 | 409.16544 | 6.7259599 |

Difference at 99.5% confidence | ||||||

-35.3091 +/- 13.5679 | ||||||

-7.94401% +/- 3.05258% | ||||||

- | 10 | 401.18106 | 423.11366 | 410.45674 | 411.72426 | 7.1450633 |

Difference at 99.5% confidence | ||||||

-32.7503 +/- 13.8443 | ||||||

-7.36831% +/- 3.11476% | ||||||

@ | 10 | 409.59177 | 432.69872 | 425.02406 | 422.27599 | 9.1898042 |

Difference at 99.5% confidence | ||||||

-22.1985 +/- 15.336 | ||||||

-4.99433% +/- 3.45037% |

It would seem that, at least until 30 queue directories, there is a 6-7% reduction in the amount of time it takes to read the queue.

## Results for `queue-sort`

This probe covers the time it takes to sort the queue items once they have been gathered. These results use the default Sendmail queue sort order, "Priority". The results for different numbers of queue directories are:

`x queue-sort.01`

+ queue-sort.05

* queue-sort.10

% queue-sort.20

- queue-sort.30

@ queue-sort.40

: = Mean

M = Median

+----------------------------------------------------------+

| *x - @@++ % |

|x x x ***-* - %%- x -@@-+-+%@ @ * *@- @ %-%+ %@@|

| |___________M_:_____________| |

| |___M___:_______| |

| |____M_______:_____________| |

| |_________M__:_____________| |

| |_________M_:___________| |

| |________M__:___________| |

+----------------------------------------------------------+

There's no apparent difference between the results on this plot, and the statistical summary from ministat bears this out.

N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|

x | 10 | 6.492041 | 7.044234 | 6.721011 | 6.7525311 | 0.2109344 |

+ | 10 | 6.897024 | 7.236107 | 6.9306245 | 6.9814723 | 0.118952 |

No difference proven at 99.5% confidence | ||||||

* | 10 | 6.607956 | 7.097123 | 6.6564425 | 6.7786936 | 0.20168636 |

No difference proven at 99.5% confidence | ||||||

% | 10 | 6.765709 | 7.319717 | 6.9684315 | 7.0233506 | 0.19899698 |

No difference proven at 99.5% confidence | ||||||

- | 10 | 6.652388 | 7.212086 | 6.885209 | 6.913707 | 0.18356812 |

No difference proven at 99.5% confidence | ||||||

@ | 10 | 6.882107 | 7.346733 | 7.00444 | 7.0520978 | 0.17935072 |

No difference proven at 99.5% confidence |

## Results for `queue-runner`

This probe covers the time it takes to deliver all the messages in the queue. The results for different numbers of queue directories are:

`x queue-runner.01`

+ queue-runner.05

* queue-runner.10

% queue-runner.20

- queue-runner.30

@ queue-runner.40

: = Mean

M = Median

+----------------------------------------------------------+

| - - %@ %** * ++++ xxx - @ |

| @@-%%-@@@@--** *+++ xxxxxx - - @ @|

| |_:| |

| |:| |

| |_:_| |

| |__:___| |

| |________M______:_______________| |

||__________M_________:_____________________| |

+----------------------------------------------------------+

Now these look quite interesting. There's a clear benefit to be had as queue directories are added, as can be seen by the gradual leftward shift of the range bars. Note, too, that the standard deviation slowly increases as the number of queue directories increases.

This continues until we get to 30 queue directories, and suddenly the range of values increases dramatically. This pattern holds for the 40 queue directory results too. This would seem to indicate that there's a factor that's negligible with 20 or fewer queue directories that suddenly becomes much more significant.

The statistical results show this in more detail.

N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|

x | 10 | 39546.617 | 40595.753 | 40197.703 | 40163.428 | 338.53016 |

+ | 10 | 37840.309 | 38448.061 | 38164.578 | 38170.287 | 209.78723 |

Difference at 99.5% confidence | ||||||

-1993.14 +/- 454.65 | ||||||

-4.96258% +/- 1.132% | ||||||

* | 10 | 36540.079 | 37843.604 | 36954.964 | 37055.675 | 430.47168 |

Difference at 99.5% confidence | ||||||

-3107.75 +/- 625.175 | ||||||

-7.73777% +/- 1.55658% | ||||||

% | 10 | 35043.263 | 36947.376 | 35942.349 | 35993.725 | 697.82562 |

Difference at 99.5% confidence | ||||||

-4169.7 +/- 885.417 | ||||||

-10.3818% +/- 2.20454% | ||||||

- | 10 | 34620.743 | 42961.967 | 36161.556 | 37588.728 | 3451.5012 |

No difference proven at 99.5% confidence | ||||||

@ | 10 | 34589.025 | 46344.242 | 36245.368 | 38587.428 | 4649.2078 |

No difference proven at 99.5% confidence |

5, 10, or 20 queue directories makes the delivery process take 5%, 8%, and 10% faster respectively. But there's no significant benefit with 30 or 40 queue directories. Note that these results do

**not**mean that 30 or 40 queue directories have the same benefit as 20 -- they mean that 30 or 40 queue directories offer no benefit over 1 queue directory.

Interesting. I wonder why.

## Results for `queue-runjob`

This probe covers the time it takes to deliver all the messages in the queue. This is slightly different from the

`queue-runner`

probe. In this results, `queue-runjob`

probes will have fired for every message delivery. The times taken by these probes have been summed.The

`queue-runner`

probe only fires at the start and end of the whole process. So the totall cumulative time for all the `queue-runjob`

probes is going to be less than that for `queue-runner`

. These results may or may not show the same interesting features at 30 and 40 queue directores as `queue-runner`

.The results for different numbers of queue directories are:

`x queue-runjob.01`

+ queue-runjob.05

* queue-runjob.10

% queue-runjob.20

- queue-runjob.30

@ queue-runjob.40

: = Mean

M = Median

+----------------------------------------------------------+

| - - %@ %** * ++++ xxx - |

| @@-%%-@@@@--** *+++ xxxxx - - @@ @|

| |_:| |

| :| |

| |_:_| |

| |__:___| |

| |________M______:_______________| |

||__________M_________:_____________________| |

+----------------------------------------------------------+

Well, that answers that question. This looks very similar to plot for

`queue-runner`

, which would imply that whatever is taking additional time in the 30 and 40 queue directory case is bounded by the `queue-runjob`

probes.The statistical results are similar to

`queue-runner`

too.N | Min | Max | Median | Mean | Stddev | |
---|---|---|---|---|---|---|

x | 10 | 39351.85 | 40370.584 | 39973.804 | 39950.51 | 330.588 |

+ | 10 | 37657.682 | 38238.619 | 37973.41 | 37977.145 | 204.91243 |

Difference at 99.5% confidence | ||||||

-1973.37 +/- 444.012 | ||||||

-4.93953% +/- 1.1114% | ||||||

* | 10 | 36351.686 | 37631.44 | 36764.907 | 36857.071 | 419.81903 |

Difference at 99.5% confidence | ||||||

-3093.44 +/- 610.011 | ||||||

-7.74318% +/- 1.52692% | ||||||

% | 10 | 34857.73 | 36736.394 | 35747.589 | 35795.814 | 684.95635 |

Difference at 99.5% confidence | ||||||

-4154.7 +/- 868.243 | ||||||

-10.3996% +/- 2.1733% | ||||||

- | 10 | 34440.561 | 42751.05 | 35948.914 | 37393.308 | 3445.2905 |

No difference proven at 99.5% confidence | ||||||

@ | 10 | 34408.408 | 46132.48 | 36035.342 | 38390.119 | 4648.62 |

No difference proven at 99.5% confidence |

There are similar percentage improvements in the 5, 10, and 20 queue directory cases, and then 30 and 40 queue directories completely lose that advantage, and become as slow as a single queue directory.

## Conclusions

Please read the disclaimer in conjunction with these conclusions.

When you have a single process running the queue then you can expect that multiple queue directories will have the following effects:

- The time taken to read the queue will be reduced by 6-7%
- With the default queue sort order of "Priority" there is no significant performance gain or loss when using multiple queues.
- Additional queue directories make the delivery processes run faster, up to a point. There's a point somewhere between 20 and 30 queue directories, where additional queue directories is a step backwards, and any benefit from having them is lost.

Hmmm, I'm going to have to read more carefully. I was convinced I was looking at nethack screens, not stats, for a moment. :-)

ReplyDeleteHeh. No amulets here.

ReplyDelete[...] You can verify this by looking at the results for test 20, which follow the same format as those for test 19. If you run those through ministat you’ll see no statistically significant differences for any of the results. [...]

ReplyDelete