Try before you buy: Day 46 of 60: Queue sort strategies

I've been looking at different queue sort strategies to see what their overhead is. Since all the messages are going to be delivered to a single host these results aren't necessarily going to be indicative of what you would see on a production server. However, they should serve to illustrate any inherent speed advantages of one sort strategy over another.

Read on for the resuls.

Methodology

I used my saved tarball of 30,000 messages in one mail queue for each test. smtp-sink(1) was run in the external zone, with the command:

smtp-sink -c external-zone:25 50

Each test run consisted of extracting the tarball, sleeping for 20 seconds to give the filesystem a chance to stabilise, and then running a single queue runner with a command line option to set the priority.

tar xf mqueue.1q.tar
sleep 20
sendmail -OQueueSortOrder=order -q

While this was being run, dtrace(1) was being run with queue-run-duration.d This sequence was run 10 times for each sort order, with the dtrace(1) results redirected to a file.

You can view the full results..

Results

In a single queue runner case there are two areas where a different sort order may be beneficial.

First, it may affect the amount of time taken to sort the queue, depending on the sort algorithm that Sendmail uses and the amount of information that makes up each sortable entry.

Second, it may affect how long the queue takes to be read. For example, sorting on queue file modification time may increase the number of stat(2) calls that need to be made to retrieve the entry's modification time. The results will compare both of these variables.

In all cases, the benchmark is the Priority sort order, as this is the Sendmail default. Numbers in these results are expressed as microseconds.

Priority vs. Filename

queue-read time

ministat confirms that sorting by filename has a positive effect on the time taken to read the queue, being approximately 68% faster in this case.

x queue-read.Priority
+ queue-read.Filename
: = Mean
M = Median
+----------------------------------------------------------+
| +                                                        |
| +                                                     x  |
| ++                                                    xxx|
|+++                                                    xxx|
|+++                                                    xxx|
|                                                       |:||
| :|                                                       |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	446674	460635	454535.5	454258.5	4333.0399
+	10	134600	144921	142604.5	141415.2	3545.6063
Difference at 99.5% confidence
		-312843 +/- 6391.49
		-68.869% +/- 1.40702%

queue-sort time

However, sorting by filename has greater negative impact on the amount of time it takes to sort the queue.

x queue-sort.Priority
+ queue-sort.Filename
: = Mean
M = Median
+----------------------------------------------------------+
|x                                          +              |
|x                                          +  +           |
|x                                          +  +           |
|xx                                         +  +  + +     +|
|:|                                                        |
|                                          |___M:___|      |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	6669.11	7137.53	6710.015	6754.011	137.9286
+	10	24412	30223.2	25566.85	25985.28	1901.2515
Difference at 99.5% confidence
19231.3	+/-	2176.14
284.738%	+/-	32.2199%

This goes against my expectations. I had assumed that in the Priority case Sendmail would have to go and open every queue file to read the message's priority, whereas in the Filename case Sendmail could just sort the directory entries.

This assumption is, in part, based on this section from "Sendmail Performance Tuning", by Nick Christenson.

If queue I/O bandwidth is expensive compared to network bandwidth, then the queue sort order of "file" might be a good strategy. This method sorts the qf files based on their file name and starts a delivery attempt on the first one. Using this method, the qf files do not need to be opened and read before delivery is attempted; instead, just a listing of the files in the directory is obtained. This method has a much lower I/O overhead, especially on start-up, than the methods previously described. If all messages are of roughly the same priority, and especially if the queued entries are already sorted by host, then a more sophisticated sorting algorithm offers no special advantages. This algorithm has the shortest start time before the first message delivery is attempted in general, and it works very well if the queue is characterized by heavy I/O contention. -- pages 144-145

A look at the code explains the difference. Priority sorting is carred out by queue.c:workcmpf0(), which looks like this:

static int
workcmpf0(a, b)
        register WORK *a;
        register WORK *b;
{
        long pa = a->w_pri;
        long pb = b->w_pri;

        if (pa == pb)
                return 0;
        else if (pa > pb)
                return 1;
        else
                return -1;
}

As you can see, that's just comparing some longs, a relatively cheap operation. Filename sorting is carried out by workcmpf4(), which looks like:

static int
workcmpf4(a, b)
        register WORK *a;
        register WORK *b;
{
        return strcmp(a->w_name, b->w_name);
}

That entails making a second function call for every comparison, and calling strcmp() for every filename, an operation that takes considerably longer. The opening of the queue files to extract the priority information is carried out, but it happens in the queue-read portion of the code, and the time for that has already been accounted for.

queue-read + queue-sort time

The key test here is whether the queue read time + the queue sort time is significantly different for the two sort options.

x queue-sum.Priority
+ queue-sum.Filename
: = Mean
M = Median
+----------------------------------------------------------+
|  +                                                       |
| ++                                                    xxx|
|+++                                                    xxx|
|++++                                                  xxxx|
|                                                       |:||
| :|                                                       |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	453369.65	467344.58	461231.78	461012.51	4299.6478
+	10	160266	175144.2	167920.85	167400.48	4218.8583
Difference at 99.5% confidence
		-293612 +/- 6876.62
		-63.6885% +/- 1.49163%

This clearly shows that time spent reading the queue dominates the total processing time for this section of work. Although the actual sorting of the filenames takes considerably longer, it's dwarfed by the amount of extra time taken to open and process all those queue files.

A review of the Sendmail code in queue.c also shows this snippet:

                /* avoid work if possible */
                if ((QueueSortOrder == QSO_BYFILENAME ||
                     QueueSortOrder == QSO_BYMODTIME ||
                     QueueSortOrder == QSO_RANDOM) &&
                    QueueLimitQuarantine == NULL &&
                    QueueLimitSender == NULL &&
                    QueueLimitRecipient == NULL)
                {
                        w->w_qgrp = qgrp;
                        w->w_qdir = qdir;
                        w->w_name = newstr(d->d_name);
                        w->w_host = NULL;
                        w->w_lock = w->w_tooyoung = false;
                        w->w_pri = 0;
                        w->w_ctime = 0;
                        w->w_mtime = sbuf.st_mtime;
                        ++num_ent;
                        continue;
                }

which suggests that the Modification and Random sort options are likely to have the same benefit. Since they won't have the strcmp() overhead they may perform even better than sorting by filename.

Priority vs. Host

I do not expect there to be a significant difference between these two. In both cases Sendmail will have to open and parse the qf file to determine the sorting characteristic. It's possible that sorting by hostname will incur extra penalties at sort time, since it will be a string comparison rather than a numeric comparison. However, I/O concerns are likely to dominate the results, leading to no significant difference.

queue-read time

x queue-read.Priority
+ queue-read.Host
: = Mean
M = Median
+----------------------------------------------------------+
|             x                                   +        |
|x    xxxx    x xx  x      +       +     ++    + ++  +    +|
|    |_____:M____|                                         |
|                                   |________:__M_____|    |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	446674	460635	454535.5	454258.5	4333.0399
+	10	466307	489130	481739.5	479712.3	6741.291
Difference at 99.5% confidence
25453.8	+/-	9148.36
5.60337%	+/-	2.01391%

There's a small positive difference here, and I'm not entirely sure where it's come from.

queue-sort time

Also, as expected, the extra overhead of comparing hostnames has meant that sorting the queue takes considerably longer in this case.

A review of the code shows that when sorting by host Sendmail attempts to lock all the messages that are destined for that host, which incurs an extra penalty. Once the messages have been locked the queue re-sorted, to take account of any messages that could not be locked.

x queue-sort.Priority
+ queue-sort.Host
: = Mean
M = Median
+----------------------------------------------------------+
| x                                                +      +|
| x                                                ++     +|
| xx   x                                           ++++   +|
||M:|                                                      |
|                                                  |_M:__| |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	6669.11	7137.53	6710.015	6754.011	137.9286
+	10	10848.5	11442.2	11001.95	11093.15	253.04518
Difference at 99.5% confidence
4339.14	+/-	328.998
64.2454%	+/-	4.87115%

queue-read + queue-sort time

As we can see here, since the original times taken to read the queue were so close, the extra time taken to carry out a host sort is now significant., leading to a result that shows that sorting by host suffers a small, 6% or so penalty when compared to sorting by priority.

x queue-sum.Priority
+ queue-sum.Host
: = Mean
M = Median
+----------------------------------------------------------+
|x   x xx    x x  x           +       +   ++    ++++ +    +|
|    |____:____|                                           |
|                                     |_______:__M____|    |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	453369.65	467344.58	461231.78	461012.51	4299.6478
+	10	477364.4	500565.1	492723.95	490805.45	6764.702
Difference at 99.5% confidence
29792.9	+/-	9150.35
6.4625%	+/-	1.98484%

Priority vs. Modification

I expect these results to be similar to those for sorting by filename, with the exception that the time taken to sort the messages should be much closer to the time taken when sorting by priority, as there are no string comparisons involved.

queue-read time

x queue-read.Priority
+ queue-read.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| ++                                                    x  |
| ++                                                    xxx|
|+++                                                    xxx|
|+++                                                    xxx|
|                                                       |:||
||:|                                                       |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	446674	460635	454535.5	454258.5	4333.0399
+	10	132538	145743	138702.5	139517.8	4669.4914
Difference at 99.5% confidence
		-314741 +/- 7272.1
		-69.2867% +/- 1.60087%

This is as expected. The benefit of sorting by modification time is the same as that from sorting by filename.
You can easily see this on a histogram that includes the sort-by-filename results.

x queue-read.Filename
+ queue-read.Modification
: = Mean
M = Median
+----------------------------------------------------------+
|                           +               x   +          |
|++       x    x      + +   +    x          xxx + x + x   +|
|                       |______________:____M__________|   |
|          |________________M__:___________________|       |
+----------------------------------------------------------+

queue-sort time

This is a surprise.

x queue-sort.Priority
+ queue-sort.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| x                                                 + +    |
| x                                                ++ +    |
| xx   x                                           ++ + + +|
||M:|                                                      |
|                                                  |_M:_|  |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	6669.11	7137.53	6710.015	6754.011	137.9286
+	10	10962.5	11523.8	11097.2	11141.15	184.26849
Difference at 99.5% confidence
4387.14	+/-	262.76
64.9561%	+/-	3.89043%

The code for sorting based on modification time looks like this:

static int
workcmpf6(a, b)
        register WORK *a;
        register WORK *b;
{
        if (a->w_mtime > b->w_mtime)
                return 1;
        else if (a->w_mtime < b->w_mtime)
                return -1;
        else
                return 0;
}

which looks like it's doing very similar work to the comparison function for sorting by priority.

queue-read + queue-sort time

x queue-sum.Priority
+ queue-sum.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| ++                                                    x  |
| ++                                                    xxx|
|+++                                                    xxx|
|+++                                                    xxx|
|                                                       |:||
||:|                                                       |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	453369.65	467344.58	461231.78	461012.51	4299.6478
+	10	143755.5	156771.2	149799.25	150658.95	4609.4779
Difference at 99.5% confidence
		-310354 +/- 7195.98
		-67.32% +/- 1.56091%

Priority vs. Random

I expect the random sort order to have the same benefits when reading the queue as sorting by filename or modification time. Namely, an almost 70% improvement in the time taken. Times to sort the queue should also match the sort-by-priority results.

Note: If you examine queue-run.1q.Random you'll see that runs 8 and 9 show anomolously large values for the time taken to sort the queue. I've removed those results when charting the results below.

queue-read time

As expected, the time taken to read the queue is on a par with the filename and modification time sorting options.

x queue-read.Priority
+ queue-read.Random
: = Mean
M = Median
+----------------------------------------------------------+
|  +                                                       |
|  +                                                       |
|  +                                                       |
|  +                                                    x  |
|  +                                                    xxx|
|+ +                                                    xxx|
|+++                                                    xxx|
|                                                       |:||
| |:                                                       |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	446674	460635	454535.5	454258.5	4333.0399
+	10	130338	143557	141671	139220.9	5049.021
Difference at 99.5% confidence
		-315038 +/- 7595.41
		-69.3521% +/- 1.67205%

queue-sort time

This also followed my predictions, being indistinguishable from the time taken to sort messages by priority.

x queue-sort.Priority
+ queue-sort.Random
: = Mean
M = Median
+----------------------------------------------------------+
|    x    +                                                |
|    x  + +                                                |
|   xxx + +        +                               +      +|
|   |:                                                     |
||________M___________:___________________|                |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	6669.11	7137.53	6710.015	6754.011	137.9286
+	8	7518.86	19070.4	7845.565	10634.789	4767.6163
No difference proven at 99.5% confidence

queue-read + queue-sort time

It's unsurprising that this also showed a significant benefit.

x queue-sum.Priority
+ queue-sum.Random
: = Mean
M = Median
+----------------------------------------------------------+
|  +                                                       |
|  +                                                       |
|  +                                                    x  |
|  +                                                    xxx|
|  +                                                    xxx|
|+ ++                                                   xxx|
|                                                       |:||
| |:|                                                      |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	453369.65	467344.58	461231.78	461012.51	4299.6478
+	8	140202.98	159080.6	150302.68	150247.41	5210.1916
Difference at 99.5% confidence
		-310765 +/- 8251.99
		-67.4093% +/- 1.78997%

Priority vs. Time

Sorting the queue by time has to carry out much of the same work as sorting by priority or by hostname. The difference is that the queue sort routine does not need to lock any messages in the queue. Accordingly, I expect this to behave similarly to sorting by priority, without the additional overheard incurred by sorting by hostname.

queue-read time

As expected, these results are effectively indistinguishable.

x queue-read.Priority
+ queue-read.Time
: = Mean
M = Median
+----------------------------------------------------------+
|+                +x       +x + xx++x    + +  +   +x      x|
|                           |___________:M__________|      |
|                 |______________:_M___________|           |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	446674	460635	454535.5	454258.5	4333.0399
+	10	440470	457910	452367.5	451643.5	5196.639
No difference proven at 99.5% confidence

queue-sort time

Again, we see that that the queue sort time is considerably shorter in the sorting-by-priority case. If I get the time I'll investigate this some more, as it's not immediately obvious why they'd be so different. The only thing in the code that stands out as being different is that when sorting by priority it's a comparison between two longs, and in the other cases the comparison is having to indirect through a pointer. I doubt that's responsible for this difference though.

x queue-sort.Priority
+ queue-sort.Time
: = Mean
M = Median
+----------------------------------------------------------+
| x                                                 +      |
| x                                           +     ++     |
| xx  x                                      ++ +   ++    +|
||:_|                                                      |
|                                             |____:M__|   |
+----------------------------------------------------------+

	N	Min	Max	Median	Mean	Stddev
x	10	6669.11	7137.53	6710.015	6754.011	137.9286
+	10	10988.2	12251.4	11694.15	11517.98	411.79028
Difference at 99.5% confidence
4763.97	+/-	495.762
70.5354%	+/-	7.34025%

queue-read + queue-sort time

At least the predictions pan out here, with there being no significant differences between the two.

x queue-sum.Priority
+ queue-sum.Time
: = Mean
M = Median
+----------------------------------------------------------+
|                      x             x   +           +     |
|+   x           x +   x x   +      +x   + xx    +  x+    +|
|               |______________:_____________|             |
|                   |_________________:__M______________|  |
+----------------------------------------------------------+

N	Min	Max	Median	Mean	Stddev
x	10	453369.65	467344.58	461231.78	461012.51	4299.6478
+	10	452181.2	469127.6	464109.2	463161.48	5236.7017
No difference proven at 99.5% confidence

Not investigated

These tests haven't looked at the results of using the Random sort order with multiple queue runners. That could potentially have a benefit, as there is less chance of two or more queue runners racing to lock the same queue entry prior to delivery. This reduction in locking contention may be beneficial, and warrants further research.

Conclusion

Please ensure you have reviewed the disclaimer.

With a large number of queue entries, sorting by filename, modification time, or randomly has a definite advantage over sorting by priority, as the time needed to open the queue files and extract the priority information dominates the processing time.

Carrying out the actual sort is more intensive when sorting by filename or modification time, but this is outweighed by the extra time spent reading the queue when sorting by priority. If your queues are typically small then this may not offer a significant benefit. If it does, queue-run-duration.d will show what it is.

Processing the queue to sort by hostname is slower than sorting by priority, however, this may have network benefits, and may also empty the queue faster (which I have not examined here).

Try before you buy

Pages

Day 46 of 60: Queue sort strategies

Methodology

Results

Priority vs. Filename

queue-read time

queue-sort time

queue-read + queue-sort time

Priority vs. Host

queue-read time

queue-sort time

queue-read + queue-sort time

Priority vs. Modification

queue-read time

queue-sort time

queue-read + queue-sort time

Priority vs. Random

queue-read time

queue-sort time

queue-read + queue-sort time

Priority vs. Time

queue-read time

queue-sort time

queue-read + queue-sort time

Not investigated

Conclusion

No comments:

Post a Comment

About Me

Followers