Day 32 of 60: Complete instrumentation of queue creation

Or: "How do I use DTrace with programs that fork?"

With some help from the dtrace-discuss[1] mailing list I've now written a couple of D scripts that can trace what Sendmail is doing between probe points. There's a writeup, and sample output, below the fold.

[1] Note -- the forum archive doesn't seem to link to the discussion yet. When it does I'll update this link to point to the discussion. The subject was "Using pid provider when process forks".



Ordinarily, DTraces makes it easy to trace function calls in a process. The pid provider enables this functionality, and a D script that looks like this:

#pragma D option flowindent

pid123:::entry
{
self->tracing = 1;
trace(0);
}

pid123:::return
/self->tracing/
{
trace(0);
self->tracing = 0;
}


is enough to provide basic output for the process with pid 123.

This doesn't work if your process forks. DTrace disables itself in forked children, and you can't do the equivalent of


pid123::fork:return
{
interesting_pid = return_value_from_fork();
}

pid*:::entry
/interesting_pid && pid = interesting_pid/
{
...
}


to trace children, as DTrace won't let you.

Since Sendmail forks to accept mail, this is a bit of a problem.

The trick is to use two DTrace scripts. The first traces all Sendmail processes, with a probe that fires when something interesting happens in a child. When that happens, the child process is stopped, and the first DTrace script starts a second DTrace script, telling it the PID of the process that's been stopped.

This is what queue-create-pid.d does. A slightly simplfied version is this:

#pragma D option destructive

sendmail*:::queue-createentry-start
{
stop();
system("dtrace -s queue-create-pid2.d %d", pid);
}


This enables destructive tracing (so that stop()) will work. Then it waits for the queue-createentry-start probe to fire. This probe will fire in one of the child processes that Sendmail creates.

When it fires, the child is stopped, and system() is used to start a second copy of DTrace running queue-create-pid2.d, giving the PID of the current process (which has been stopped, remember) on the command line.

Here's a slighly simplified version of queue-create-pid2.d.

#pragma D option destructive
#pragma D option flowindent

BEGIN
{
system("prun %d", $1);
self->depth = 0;
}

sendmail$1:::queue-createentry-end
{
exit(0);
}

pid$1:::entry
{
trace(self->depth);
self->start[self->depth] = timestamp;
self->depth++;
}

pid$1:::return
/self->start[self->depth - 1]/
{
self->depth--;
trace(self->depth);
trace((timestamp - self->start[self->depth]));
self->start[self->depth] = 0;
}


This:


  1. Uses system() to run prun(1), which starts the process that the first script stopped.

  2. Stops tracing when the queue-createentry-end probe fires.

  3. At each function entry point the current nesting depth of the function call is traced, the time of the function is stored, and the nesting depth is incremented.

  4. When each function returns (and assuming that we've traced the entry in to this function), the depth is decremented and traced, the total time spent in this function (and anything it called) is traced, and then the execution time is cleared so that DTrace can reclaim the space.



The #pragma option D flowindent means that DTrace will produce nicely indented output for us.

Here's a sample of some of the output. These results all include calls made to libraries as well, but the pid provider clause could restrict these to just calls inside the running binary by specifying pid$1:a.out::entry (and the same on the :::return clause) instead.

CPU FUNCTION
1 | :BEGIN DEPTH ELAPSED (ns)
0 -> queuename 0
0 -> queue_letter 1
0 <- queue_letter 1 13211
0 -> sm_strlcpyn 1
0 <- sm_strlcpyn 1 6086
0 <- queuename 0 264388
0 -> sm_strlcpy 0
0 <- sm_strlcpy 0 4607
0 -> queuename 0
0 -> queue_letter 1
0 <- queue_letter 1 2953

[...]

0 -> sm_free_tagged 1
0 -> free 2
0 -> lmutex_lock 3
0 <- lmutex_lock 3 5467
0 -> _free_unlocked 3
0 <- _free_unlocked 3 5258
0 -> lmutex_unlock 3
0 <- lmutex_unlock 3 4261
0 <- free 2 27896
0 <- sm_free_tagged 1 36625
0 <- sm_io_close 0 90448
0 | queueup:queue-createentry-end


You can also see the full output.

Of course, this technique isn't limited to just tracing Sendmail's flow. Since DTrace clauses let you use some regular expression metacharacters when specifying the probes you're interested you can drill down.

For example, if I only wanted to see the functions with names that that start sm_ I'd write:

pid$1::sm_*:entry
...

pid$1::sm_*:return


Or I could use this to produce counts of which internal Sendmail functions are being called most frequently.

For even more power, the syscall and fbt providers let you see what the code is doing, first at the Sendmail level, then to see what system calls Sendmail is making, finally down to see what kernel functions are being called as a result of this. And that's as easy as:

syscall:::entry,
fbt:::entry,
pid$1:::entry
...

syscall:::return,
fbt:::return,
pid$1:::return
...


in the probe specification in each clause.

Bloody hell but that's powerful.

No comments:

Post a Comment