Improving the performance of your perl scripts

These days, processors are so powerful we rarely worry about using the most efficient methods for performing tasks in our scripts. Most of the time there is no need to. Sometimes you will find that it's worth taking a little extra time and shaving off a few tenths, or hundredths of a second in some commonly used sections of your script.

For example, if you are processing a datafile containing 100,000 rows of data. You're processing each row, perhaps to validate it and store the data elsewhere. On a small file, a 0.1s difference between methods is negligible. However, on our file this equates to a delay of (100,000 * 0.1) 10,000 seconds, or 167 minutes of you prefer. Even if we are only talking about a 0.1s difference, we would still save 17 minutes.

Where are the slow bits?

The first step to increasing the efficiency of your code is to work out which sections would benefit most. A useful tool here is the Devel::DProf module. The package is used to profile Perl code. It will collect information on the execution time of a script and also of the subroutines in the script.

A rather contrived example of a script that could be improved is:

#!/usr/bin/perl
# vi:ts=4:sw=4:
use strict;
use warnings;

sub slow_add {
  my $value = shift;
  print $value + 1 . "\n";

  while ($value < 100000) {
    $value++;
  }
}

sub add {
  my $value = shift;
  print $value + 1 . "\n";
}

for (my $i=0; $i<=10; $i++) {
  add($i);
  slow_add($i);
}

Looking at the script it should be obvious where the slow-down is occuring. However, imagine that you've got a normal perl script and there aren't big signposts with flashing neon signs pointing at the inefficient area. How would you use Devel::DProf to help you? Well assuming you save the above code as profile.pl you just need to do the following:

bash$ perl -d:DProf files/profile.pl

This causes the script to be run as usual, with the perl profiling enabled. Once the exection of the script has completed you should find a new file in your current directory, tmon.out. If you wish you can examine the file with your favourite text editor. You'll probably find that there's nothing helpful in there. It's more useful to view a processed version of the output. For this we'll use dprofpp

bash$ dprofpp
Total Elapsed Time = 0.709792 Seconds
  User+System Time = 0.709792 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 94.3   0.670  0.670     10   0.0670 0.0670  main::slow_add
 1.41   0.010  0.010      1   0.0100 0.0100  warnings::BEGIN
 1.41   0.010  0.020      2   0.0050 0.0100  main::BEGIN
 0.00       - -0.000      1        -      -  strict::import
 0.00       - -0.000      1        -      -  warnings::import
 0.00       - -0.000      1        -      -  strict::bits
 0.00       - -0.000     10        -      -  main::add

dprofpp gives us a nice indication of where our script spends most of its time. %Time shows the percentage of time spent in this routine. As you can see, we spent 94.3% of the total script inside the slow_add() function. The columns are explained in the manpage for dprofpp. dprofpp has a number of options that you can use to change the information displayed.

If this were a normal perl script we would now have a list of functions to investigate further. This doesn't tell us how to improve the efficiency of these functions. It might not even be possible to make some function more efficient. Making things more efficient relies on you having some idea about which methods are more efficient than others in perl.

Looking at slow_add() we decide to remove the pointless while loop. Profiling the script now gives us:

bash$ dprofpp   
Total Elapsed Time = 0.039818 Seconds
  User+System Time = 0.029818 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 33.5   0.010  0.010      2   0.0050 0.0050  main::BEGIN
 0.00       - -0.000      1        -      -  strict::import
 0.00       - -0.000      1        -      -  strict::bits
 0.00       - -0.000      1        -      -  warnings::BEGIN
 0.00       - -0.000      1        -      -  warnings::import
 0.00       - -0.000     10        -      -  main::slow_add
 0.00       - -0.000     10        -      -  main::add

Just looking at Total Elapsed Time we can see that we've made the execution time 17 times smaller (in this case) with that change.

Trying things out

Sometimes you will identify a function that you wuld like to optimise for speed. You have some ideas that might be faster but how can you be sure? Do you rewrite the function, sit back afterwards and say, "Yeah, that feels much faster!"? Of course not! No-one is going to believe you. You need numbers that you can compare.

This is where Benchmark comes in. Benchmark describes itself as follows: benchmark running times of Perl code.

The way we work with this module is by separating out the area of functionality that we want to measure into a script of it's own. The easiest way is usually to write a small subroutine that will perform the functionality. This is proably best explained with another (contrived) example:

Given a hash-reference, what's the quickest way to copy all key-value pairs into another hash?
At the moment our code is using while (... = each % ...) to perform the copy. Are there quicker methods?

Because we're going to be comparing with other methods, we need to be sure that we implement the minimum required amount for testing each particular method. We'll work on the assumption that the hash we're copying is defined as a global variable, and we have a global variable to store the results in. With these assumptions, a subroutine to test while() might look something like this:

sub while_copy {
  undef $results;
  # use a while ... each .. to loop the hashref
  while (my($k,$v) = each %$hashref) {
    $results->{$k} = $v;
  }
}

We undefine the results variable before we start any work just to ensure that each time the function is called we're starting under the same conditions.

We could write a simple script to use Benchmark's timethis() function, but that doesn't tell us if it's a fast method in relation to other possible solutions - so we'll skip this and move straight on.

We know we can use map() to iterate over lists of things. Would a solution using map() be faster than our current one? We first need to write a small subroutine to test this:

sub map_copy {
  undef $results;
  # use map to loop the hashref
  map { $results->{$_} = $hashref->{$_} } keys %$hashref;
}

Once again we undefine the results variable before we do anything, then we perform the map function to copy our hash. If we pad this out with a hash to copy, and other things for a working script, we end up with something looking like this:

# vi:ts=4:sw=4:
use strict;
use warnings;
use Benchmark qw{:all};

# gloval variables
my ($hashref, $results);

# set up a hashref with some values
$hashref = {
  'Chisel'  => 'Wright',
  'John'    => 'Smith',
  'Jane'    => 'Doe',
  'Joe'     => 'Public',
};

# test the while( ... = each % ...) method
sub while_copy {
  undef $results;
  # use a while ... each .. to loop the hashref
  while (my($k,$v) = each %$hashref) {
    $results->{$k} = $v;
  }
}

# test the map { .. } key %... method
sub map_copy {
  undef $results;
  # use map to loop the hashref
  map { $results->{$_} = $hashref->{$_} } keys %$hashref;
}

# run a benchmark over both methods
my $iterations = $ARGV[0] || 100000;
print "Running benchmark with $iterations iterations. Please wait ...\n";

cmpthese($iterations,
  {
    'while_copy'  => \&while_copy,
    'map_copy'    => \&map_copy,
  }
);

The only new section that should be explained is at the end. First of all we either take the number of iterations to test over from the command or we default to a reasonably large number. We then call the cmpthese() function imported from the Benchmark module. We simply call it with a hash of label-function pairs and the Benchmark module does the work for us.

When we run the script, after a few seconds we see some output:

Running benchmark with 100000 iterations. Please wait ...
              Rate while_copy   map_copy
while_copy 36630/s         --       -25%
map_copy   48780/s        33%         --

According to the documentaion for Benchmark, "chart is sorted from slowest to fastest, and shows the percent speed difference between each pair of tests." so our fastest method is the map method. According to the chart our method is 33% faster than the while loop.

Of course, if we wanted to copy a hash there's a much faster, and hopefully more obvious, method:

sub just_copy {
    undef $results;
    # a straight assignment
    $results = $hashref;
}

If we add this function to our script, and change the call to cmpthese() to look like this:

cmpthese($iterations,
    {
        'while_copy'    => \&while_copy,
        'map_copy'      => \&map_copy,
        'just_copy'     => \&just_copy,
    }
);

and then re-run the modified, our new output appears as follows:

Running benchmark with 100000 iterations. Please wait ...
            (warning: too few iterations for a reliable count)
                Rate while_copy   map_copy  just_copy
while_copy   36900/s         --       -24%       -98%
map_copy     48780/s        32%         --       -97%
just_copy  1666667/s      4417%      3317%         --

The first indication we have that our new method is faster is the "too few iterations" warning. Looking at the comparisons between the previous two methods we see improvements of three and four thousand percernt!

To try to ensure a fair benchmark we call the script with an argument to modify the number of iterations. Stepping up 100,000 each call we find that 800,000 iterations will perform the benchmark tests without the warning. The results from this test look like this:

Running benchmark with 800000 iterations. Please wait ...
                Rate while_copy   map_copy  just_copy
while_copy   37157/s         --       -24%       -98%
map_copy     49140/s        32%         --       -97%
just_copy  1739130/s      4580%      3439%         --

The numbers are similar to the same set of tests being run with fewer iterations, but without the warning we can be more confident that it's an accurate set of tests. We can see that just_copy is by far the fastest method to duplicate the contents of a hash, map() is second fastest and while() is the slowest. This probably doesn't come as a shock to anyone, but Benchmark allows you to present qualitative results rather than just gut-feeling.

This method for benchmarking can be used to test almost anything you care to write comparison tests for. Some things I've investigated in the past are:

The only thing holding you back is you imagination and you ability to write comparable tests!

Conclusion

You should have enough information to methodically locate and rectify bottlenecks in your perl scripts and modules. Bear in mind that if the slowest part of your code is in a function that's rarely called then it's questionable whether it's worth fully optimising that function. Generally your time is better spent optimising the most frequently called functions.

© Chisel Wright, November 2004

Site designed and maintained by Chisel Wright. Copyright © 2002-2008 Chisel Wright. All rights reserved.

Valid XHTML 1.0 Strict herlpacker