Don Marti
Mon 14 Sep 2009 06:32:58 AM PDT
Spring cleaning for Perl scripts
Here are a couple of notes on things I've been doing to clean up some random Perl scripts. Using Perl to do stuff doesn't have to mean that you have a directory full of scripts that are hard to read and modify, break on real input since you only tested them with a few files, and run slowly. (Yes, I used to have a mess of Perl scripts that were like that, but I'm starting to see the light.)
This isn't about going from decent Perl to very high-quality Perl. It's about going from mess to tolerable.
use modules: This is the obvious one. Usually when I find really ugly or unreliable Perl code, there's a module to replace it. Instead of gross pattern matches that break on real-world CSV, HTML, or XML, there's a module to get the thing you need.
take command-line options: here are three lines
that give a script a --verbose
or -v
option.
use Getopt::Long;
my $VERBOSE = 0;
GetOptions ("verbose|v" => \$VERBOSE);
Saves having to edit the script to change flags.
write perldoc: Instead of making
the beginning of a script into a big
block comment, add the POD markup. (example: sitemap-o-matic).
Now you can just say perldoc [script]
to remind
yourself what it does and how to use it. Details at
perldoc perldoc
and perldoc perlpod
.
use threads: This is potentially hairy, but if an often-run script is just doing the same thing to a bunch of files, and the result is one value, it's not too hard just to do a File::Find::find over the directory you want to look at, start a new thread per file, and then "join" all the threads. Probably not worth it for most scripts, but here's a podcast client that can kick off multiple downloads while other threads are still parsing XML. It was too slow when it parsed everything and then downloaded one at a time.
use memcache: OK, kind of weird, but a lot of command line or cron job Perl scripts do something like this: (1) fetch or read in a bunch of data, (2) build some kind of data structure (3) spew out some kind of data. So take the result of step 2 and stick it in memcache, and the script will run much faster while you get step 3 right.
my $memd = new Cache::Memcached {
'servers' => [ "127.0.0.1:11211" ],
'debug' => 0,
'namespace' => 'my_nifty_web_client_script'
};
...
my $stuff;
unless ($stuff = $memd->get($url)) {
my $res = $ua->get($url);
if ($res->is_success()) {
$stuff = expensive_operation($res->content);
$memd->set($url, $stuff, $EXPIRE);
}
}
This is good for using a site's Web API without getting the webmaster cheesed off at you.