Day 46 of 60: Queue sort strategies

I've been looking at different queue sort strategies to see what their overhead is. Since all the messages are going to be delivered to a single host these results aren't necessarily going to be indicative of what you would see on a production server. However, they should serve to illustrate any inherent speed advantages of one sort strategy over another.

Read on for the resuls.



Methodology



I used my saved tarball of 30,000 messages in one mail queue for each test. smtp-sink(1) was run in the external zone, with the command:

smtp-sink -c external-zone:25 50

Each test run consisted of extracting the tarball, sleeping for 20 seconds to give the filesystem a chance to stabilise, and then running a single queue runner with a command line option to set the priority.

tar xf mqueue.1q.tar
sleep 20
sendmail -OQueueSortOrder=order -q


While this was being run, dtrace(1) was being run with queue-run-duration.d This sequence was run 10 times for each sort order, with the dtrace(1) results redirected to a file.

You can view the full results..

Results



In a single queue runner case there are two areas where a different sort order may be beneficial.

First, it may affect the amount of time taken to sort the queue, depending on the sort algorithm that Sendmail uses and the amount of information that makes up each sortable entry.

Second, it may affect how long the queue takes to be read. For example, sorting on queue file modification time may increase the number of stat(2) calls that need to be made to retrieve the entry's modification time. The results will compare both of these variables.

In all cases, the benchmark is the Priority sort order, as this is the Sendmail default. Numbers in these results are expressed as microseconds.

Priority vs. Filename



queue-read time



ministat confirms that sorting by filename has a positive effect on the time taken to read the queue, being approximately 68% faster in this case.

x queue-read.Priority
+ queue-read.Filename
: = Mean
M = Median
+----------------------------------------------------------+
| + |
| + x |
| ++ xxx|
|+++ xxx|
|+++ xxx|
| |:||
| :| |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10446674460635454535.5454258.54333.0399
+10134600144921142604.5141415.23545.6063
Difference at 99.5% confidence
-312843 +/- 6391.49
-68.869% +/- 1.40702%



queue-sort time



However, sorting by filename has greater negative impact on the amount of time it takes to sort the queue.

x queue-sort.Priority
+ queue-sort.Filename
: = Mean
M = Median
+----------------------------------------------------------+
|x + |
|x + + |
|x + + |
|xx + + + + +|
|:| |
| |___M:___| |
+----------------------------------------------------------+














NMinMaxMedianMeanStddev
x106669.117137.536710.0156754.011137.9286
+102441230223.225566.8525985.281901.2515
Difference at 99.5% confidence
19231.3+/-2176.14
284.738%+/-32.2199%


This goes against my expectations. I had assumed that in the Priority case Sendmail would have to go and open every queue file to read the message's priority, whereas in the Filename case Sendmail could just sort the directory entries.

This assumption is, in part, based on this section from "Sendmail Performance Tuning", by Nick Christenson.

If queue I/O bandwidth is expensive compared to network bandwidth, then the queue sort order of "file" might be a good strategy. This method sorts the qf files based on their file name and starts a delivery attempt on the first one. Using this method, the qf files do not need to be opened and read before delivery is attempted; instead, just a listing of the files in the directory is obtained. This method has a much lower I/O overhead, especially on start-up, than the methods previously described. If all messages are of roughly the same priority, and especially if the queued entries are already sorted by host, then a more sophisticated sorting algorithm offers no special advantages. This algorithm has the shortest start time before the first message delivery is attempted in general, and it works very well if the queue is characterized by heavy I/O contention. -- pages 144-145


A look at the code explains the difference. Priority sorting is carred out by queue.c:workcmpf0(), which looks like this:

static int
workcmpf0(a, b)
register WORK *a;
register WORK *b;
{
long pa = a->w_pri;
long pb = b->w_pri;

if (pa == pb)
return 0;
else if (pa > pb)
return 1;
else
return -1;
}


As you can see, that's just comparing some longs, a relatively cheap operation. Filename sorting is carried out by workcmpf4(), which looks like:

static int
workcmpf4(a, b)
register WORK *a;
register WORK *b;
{
return strcmp(a->w_name, b->w_name);
}


That entails making a second function call for every comparison, and calling strcmp() for every filename, an operation that takes considerably longer. The opening of the queue files to extract the priority information is carried out, but it happens in the queue-read portion of the code, and the time for that has already been accounted for.

queue-read + queue-sort time



The key test here is whether the queue read time + the queue sort time is significantly different for the two sort options.

x queue-sum.Priority
+ queue-sum.Filename
: = Mean
M = Median
+----------------------------------------------------------+
| + |
| ++ xxx|
|+++ xxx|
|++++ xxxx|
| |:||
| :| |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10453369.65467344.58461231.78461012.514299.6478
+10160266175144.2167920.85167400.484218.8583
Difference at 99.5% confidence
-293612 +/- 6876.62
-63.6885% +/- 1.49163%



This clearly shows that time spent reading the queue dominates the total processing time for this section of work. Although the actual sorting of the filenames takes considerably longer, it's dwarfed by the amount of extra time taken to open and process all those queue files.

A review of the Sendmail code in queue.c also shows this snippet:

/* avoid work if possible */
if ((QueueSortOrder == QSO_BYFILENAME ||
QueueSortOrder == QSO_BYMODTIME ||
QueueSortOrder == QSO_RANDOM) &&
QueueLimitQuarantine == NULL &&
QueueLimitSender == NULL &&
QueueLimitRecipient == NULL)
{
w->w_qgrp = qgrp;
w->w_qdir = qdir;
w->w_name = newstr(d->d_name);
w->w_host = NULL;
w->w_lock = w->w_tooyoung = false;
w->w_pri = 0;
w->w_ctime = 0;
w->w_mtime = sbuf.st_mtime;
++num_ent;
continue;
}


which suggests that the Modification and Random sort options are likely to have the same benefit. Since they won't have the strcmp() overhead they may perform even better than sorting by filename.

Priority vs. Host



I do not expect there to be a significant difference between these two. In both cases Sendmail will have to open and parse the qf file to determine the sorting characteristic. It's possible that sorting by hostname will incur extra penalties at sort time, since it will be a string comparison rather than a numeric comparison. However, I/O concerns are likely to dominate the results, leading to no significant difference.

queue-read time



x queue-read.Priority
+ queue-read.Host
: = Mean
M = Median
+----------------------------------------------------------+
| x + |
|x xxxx x xx x + + ++ + ++ + +|
| |_____:M____| |
| |________:__M_____| |
+----------------------------------------------------------+













NMinMaxMedianMeanStddev
x10446674460635454535.5454258.54333.0399
+10466307489130481739.5479712.36741.291
Difference at 99.5% confidence
25453.8+/-9148.36
5.60337%+/-2.01391%


There's a small positive difference here, and I'm not entirely sure where it's come from.


queue-sort time



Also, as expected, the extra overhead of comparing hostnames has meant that sorting the queue takes considerably longer in this case.

A review of the code shows that when sorting by host Sendmail attempts to lock all the messages that are destined for that host, which incurs an extra penalty. Once the messages have been locked the queue re-sorted, to take account of any messages that could not be locked.

x queue-sort.Priority
+ queue-sort.Host
: = Mean
M = Median
+----------------------------------------------------------+
| x + +|
| x ++ +|
| xx x ++++ +|
||M:| |
| |_M:__| |
+----------------------------------------------------------+













NMinMaxMedianMeanStddev
x106669.117137.536710.0156754.011137.9286
+1010848.511442.211001.9511093.15253.04518
Difference at 99.5% confidence
4339.14+/-328.998
64.2454%+/-4.87115%


queue-read + queue-sort time



As we can see here, since the original times taken to read the queue were so close, the extra time taken to carry out a host sort is now significant., leading to a result that shows that sorting by host suffers a small, 6% or so penalty when compared to sorting by priority.

x queue-sum.Priority
+ queue-sum.Host
: = Mean
M = Median
+----------------------------------------------------------+
|x x xx x x x + + ++ ++++ + +|
| |____:____| |
| |_______:__M____| |
+----------------------------------------------------------+













NMinMaxMedianMeanStddev
x10453369.65467344.58461231.78461012.514299.6478
+10477364.4500565.1492723.95490805.456764.702
Difference at 99.5% confidence
29792.9+/-9150.35
6.4625%+/-1.98484%


Priority vs. Modification



I expect these results to be similar to those for sorting by filename, with the exception that the time taken to sort the messages should be much closer to the time taken when sorting by priority, as there are no string comparisons involved.

queue-read time



x queue-read.Priority
+ queue-read.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| ++ x |
| ++ xxx|
|+++ xxx|
|+++ xxx|
| |:||
||:| |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10446674460635454535.5454258.54333.0399
+10132538145743138702.5139517.84669.4914
Difference at 99.5% confidence
-314741 +/- 7272.1
-69.2867% +/- 1.60087%


This is as expected. The benefit of sorting by modification time is the same as that from sorting by filename.
You can easily see this on a histogram that includes the sort-by-filename results.

x queue-read.Filename
+ queue-read.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| + x + |
|++ x x + + + x xxx + x + x +|
| |______________:____M__________| |
| |________________M__:___________________| |
+----------------------------------------------------------+



queue-sort time



This is a surprise.

x queue-sort.Priority
+ queue-sort.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| x + + |
| x ++ + |
| xx x ++ + + +|
||M:| |
| |_M:_| |
+----------------------------------------------------------+













NMinMaxMedianMeanStddev
x106669.117137.536710.0156754.011137.9286
+1010962.511523.811097.211141.15184.26849
Difference at 99.5% confidence
4387.14+/-262.76
64.9561%+/-3.89043%


The code for sorting based on modification time looks like this:

static int
workcmpf6(a, b)
register WORK *a;
register WORK *b;
{
if (a->w_mtime > b->w_mtime)
return 1;
else if (a->w_mtime < b->w_mtime)
return -1;
else
return 0;
}


which looks like it's doing very similar work to the comparison function for sorting by priority.

queue-read + queue-sort time



x queue-sum.Priority
+ queue-sum.Modification
: = Mean
M = Median
+----------------------------------------------------------+
| ++ x |
| ++ xxx|
|+++ xxx|
|+++ xxx|
| |:||
||:| |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10453369.65467344.58461231.78461012.514299.6478
+10143755.5156771.2149799.25150658.954609.4779
Difference at 99.5% confidence
-310354 +/- 7195.98
-67.32% +/- 1.56091%



Priority vs. Random



I expect the random sort order to have the same benefits when reading the queue as sorting by filename or modification time. Namely, an almost 70% improvement in the time taken. Times to sort the queue should also match the sort-by-priority results.

Note: If you examine queue-run.1q.Random you'll see that runs 8 and 9 show anomolously large values for the time taken to sort the queue. I've removed those results when charting the results below.

queue-read time



As expected, the time taken to read the queue is on a par with the filename and modification time sorting options.

x queue-read.Priority
+ queue-read.Random
: = Mean
M = Median
+----------------------------------------------------------+
| + |
| + |
| + |
| + x |
| + xxx|
|+ + xxx|
|+++ xxx|
| |:||
| |: |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10446674460635454535.5454258.54333.0399
+10130338143557141671139220.95049.021
Difference at 99.5% confidence
-315038 +/- 7595.41
-69.3521% +/- 1.67205%


queue-sort time



This also followed my predictions, being indistinguishable from the time taken to sort messages by priority.

x queue-sort.Priority
+ queue-sort.Random
: = Mean
M = Median
+----------------------------------------------------------+
| x + |
| x + + |
| xxx + + + + +|
| |: |
||________M___________:___________________| |
+----------------------------------------------------------+









NMinMaxMedianMeanStddev
x106669.117137.536710.0156754.011137.9286
+87518.8619070.47845.56510634.7894767.6163
No difference proven at 99.5% confidence


queue-read + queue-sort time



It's unsurprising that this also showed a significant benefit.

x queue-sum.Priority
+ queue-sum.Random
: = Mean
M = Median
+----------------------------------------------------------+
| + |
| + |
| + x |
| + xxx|
| + xxx|
|+ ++ xxx|
| |:||
| |:| |
+----------------------------------------------------------+











NMinMaxMedianMeanStddev
x10453369.65467344.58461231.78461012.514299.6478
+8140202.98159080.6150302.68150247.415210.1916
Difference at 99.5% confidence
-310765 +/- 8251.99
-67.4093% +/- 1.78997%


Priority vs. Time



Sorting the queue by time has to carry out much of the same work as sorting by priority or by hostname. The difference is that the queue sort routine does not need to lock any messages in the queue. Accordingly, I expect this to behave similarly to sorting by priority, without the additional overheard incurred by sorting by hostname.

queue-read time



As expected, these results are effectively indistinguishable.

x queue-read.Priority
+ queue-read.Time
: = Mean
M = Median
+----------------------------------------------------------+
|+ +x +x + xx++x + + + +x x|
| |___________:M__________| |
| |______________:_M___________| |
+----------------------------------------------------------+









NMinMaxMedianMeanStddev
x10446674460635454535.5454258.54333.0399
+10440470457910452367.5451643.55196.639
No difference proven at 99.5% confidence


queue-sort time



Again, we see that that the queue sort time is considerably shorter in the sorting-by-priority case. If I get the time I'll investigate this some more, as it's not immediately obvious why they'd be so different. The only thing in the code that stands out as being different is that when sorting by priority it's a comparison between two longs, and in the other cases the comparison is having to indirect through a pointer. I doubt that's responsible for this difference though.

x queue-sort.Priority
+ queue-sort.Time
: = Mean
M = Median
+----------------------------------------------------------+
| x + |
| x + ++ |
| xx x ++ + ++ +|
||:_| |
| |____:M__| |
+----------------------------------------------------------+













NMinMaxMedianMeanStddev
x106669.117137.536710.0156754.011137.9286
+1010988.212251.411694.1511517.98411.79028
Difference at 99.5% confidence
4763.97+/-495.762
70.5354%+/-7.34025%


queue-read + queue-sort time



At least the predictions pan out here, with there being no significant differences between the two.

x queue-sum.Priority
+ queue-sum.Time
: = Mean
M = Median
+----------------------------------------------------------+
| x x + + |
|+ x x + x x + +x + xx + x+ +|
| |______________:_____________| |
| |_________________:__M______________| |
+----------------------------------------------------------+









NMinMaxMedianMeanStddev
x10453369.65467344.58461231.78461012.514299.6478
+10452181.2469127.6464109.2463161.485236.7017
No difference proven at 99.5% confidence


Not investigated



These tests haven't looked at the results of using the Random sort order with multiple queue runners. That could potentially have a benefit, as there is less chance of two or more queue runners racing to lock the same queue entry prior to delivery. This reduction in locking contention may be beneficial, and warrants further research.

Conclusion



Please ensure you have reviewed the disclaimer.

With a large number of queue entries, sorting by filename, modification time, or randomly has a definite advantage over sorting by priority, as the time needed to open the queue files and extract the priority information dominates the processing time.

Carrying out the actual sort is more intensive when sorting by filename or modification time, but this is outweighed by the extra time spent reading the queue when sorting by priority. If your queues are typically small then this may not offer a significant benefit. If it does, queue-run-duration.d will show what it is.

Processing the queue to sort by hostname is slower than sorting by priority, however, this may have network benefits, and may also empty the queue faster (which I have not examined here).

No comments:

Post a Comment