Day 17 of 60: ministat

I've spent some of today porting some useful statistics reporting software from C to Perl.

ministat reads in two or more files of data and uses the Student's t test to determine if there is any statistical difference between the means of the datasets. This is especially useful when comparing benchmark results.

For instance, I figure that this will be useful to compare data from several Sendmail runs, where the number of queue directories differ between the runs. It should highlight any benefits between different numbers of queue directories.

Rather than explain more here, I'll point you at the code. There's documentation towards the end of the file. Note that this still needs some work -- there's no proper command line option handling at the moment, some of the documentation needs fleshing out, and I wouldn't use this as an example of good Perl code, as it still looks far too much like a Perl program that's been written in C.

When I've fixed that I'll put it up on CPAN.

1 comment:

  1. [...] The first thing I did was run the data through ministat to see if there were any glaring discrepencies. To get it in a format suitable for ministat I knocked together “to_ministat.pl, which extracts the second column from each results file. This is the column that contains the numbers I’m interested in. This converted test1/results.1 to test1/m.1, results.2 to m.2, and so on. [...]

    ReplyDelete