Examples
Some of the following examples, especially the simpler ones, are accompanied
with their source code which may be of interest, although more complex examples are only described
textual with some code snippets. (If you are interested in the source code of one of these more
complex examples, please feel free to contact the author directly by mail.)
Programs for one-time-usage
Many everyday tasks require that system administrators as well as programmers
solve unexpected problems like clever pattern matching, parsing logfiles etc.
which are not readily handled with standard DCL tools. Many of these problems
can be solved on the fly using a couple of lines of Perl code.
Perl can be used as a command line tool like
awk for example. This can be very
useful when you have a puzzling problem which does not deserve a real program
but nevertheless needs a clever data conversion on the fly or something like
that. Since there are a variety of command line options for Perl which are
useful in this context, only simple examples are given in the following (more
information may be found elsewhere like the Camel Book).
Adapting configuration files to VMS
Once I inherited a configuration file
transfer.ini which looked like this (but contained literally hundreds of
such sections):
[logging]
log = log/transfer.log
ticket = log/ticket.log
[templates]
ticket = templates/ticket.tpl
mail = templates/mail.tpl
Of course, these pathnames are not very OpenVMS like and it would have been
quite cumbersome to edit all of the manually. Now one could write a Perl program
reading the file, performing the necessary changes using regular expressions and
then writing the result back to disk. Since tasks like these are commonplace,
Perl can be used as a mighty command line tool for performing in-place edit
operations like transforming the pathnames in the example above into valid
OpenVMS file names. In the example shown, this was accomplished with the
following command line:
$ perl -i -pe "s/^(.*\s*)=(\s*)(.+)\/(.+)/$1=$2\[\.$3\]$4/"
transfer.ini
It looks a bit like line noise, right? What does it do? First of all it loops
over all lines in transfer.ini and matches lines
which contain an equal sign with a string to its left and two strings separated
by a slash to its right. These three strings are captured using parentheses and
then the line which matched this expression will be substituted by a new line
constructed out of the parts just captured. Applying this single line statement
to the configuration file shown above yields a new version of this file with the
following structure
[logging]
log = [.log]transfer.log
ticket = [.log]ticket.log
[templated]
ticket = [.templates]ticket.tpl
mail = [.templates]mail.tpl
which is exactly what was desired.
Repairing HTML files
Another problem I was faced with was that a user of a WASD
web server insisted on creating her web pages using “modern” tools running on a
MAC. Unfortunately these particular tools just refused to generate proper HTML
encoding for special German characters like “ä” which should be coded as “ä”
in HTML. Instead these tools just insert “ä” which results in a completely bogus
display of the resulting web page. This problem can be corrected on the fly with
a Perl call like this:
$ perl -i -pi "s/\xC3\xA4/\ä\;/g; s/\xC3\xB6/\ö\;/g;
s/\xC3\xBC/\ü\;/g; s/\xC3\x84/\Ä\;/g; s/\xC3\x96/\Ö\;/
g;
s/\xC3\x9C/\Ü\;/g; s\xC3\x9F/\ß\;/g;" [...]*.html
OK – it really looks like line noise, but it can be easily put into a DCL
routine which can be called after each upload of HTML files by this particular
user and thus correcting the problem on the fly. This very special user (my
beloved wife, to be exact) also wanted to include a background picture into her
web pages which the tool used just does not support. Using Perl it was simple to
extend the generated HTML-code by a proper background-image-directive,
too.
Making sure that a large LaTeX document is consistent
Another problem came up when I wrote a really large book
using LaTeX without using BibTeX (which is stupid, but the project grew from a
pet project to a major project and when I realized that the simple bibliography
of basic LaTeX was not really powerful enough to cope with the bibliography, it
was literally too late to switch to BibTeX). Having a very long list of
literature, I feared that some entries could have been rendered unused in the
text body due to changes in its structure etc. Although LaTeX tells you when you
cite something which is not defined, it does not tell you if you have
bibliography entries which are not cited which is annoying.
A typical entry has the form
\bibitem{zachary} %book
G. Pascal Zachary, \emph{Endless Frontier - Vannevar Bush,
Engineer of the
American Century},
The MIT Press, 1999
while a citation looks like
cf. \cite{zachary}[p.~142]
Having a document with more than 120000 lines of LaTeX code resulting in about
600 pages of text with more than 600 bibliography entries, a solution was
necessary to make sure that no entry went uncited. This was accomplished with
the following Perl program which reads in the complete LaTeX source code with a
single statement and parses this for all citations in a first run while building
a hash containing these citations. In a second run through this data all
bibliography entries are processed and a message is printed for every
bibliography entry without a corresponding citation:
use strict;
use warnings;
die "Usage bib.pl <filename.tex>\n" unless @ARGV + 0;
my $data;
open my $fh, '<', $ARGV[0] or die "Could not open $ARGV[0]: $!\n";
{
local $/;
$data = <$fh>;
}
close $fh;
my %cite;
$cite{$_}++ for $data =~ m/\\cite\{(.+?)\}/g;
$cite{$_} or print "$_\n" for $data =~ m/\\bibitem\{(.+?)\}/g;
Parsing a log file and generating some statistics
Some months ago I had to parse a log file containing
entries like these:
[LOG|SYSTEM|2008 May 13, 14:15:26 (886)|ENGINE.batch]
Loaded 16 events in 497 milliSecs
[END]
[LOG|SYSTEM|2008 May 13, 14:15:55 (281)|Risk|BatchJob]
Time to execute Scenario 24902 ms
[END]
[LOG|SYSTEM|2008 May 13, 14:15.55 (283)|Risk|BatchJobThread]
Time to execute Scenario 13662 ms
[END]
I was asked to calculate the arithmetic mean and possibly other values of the
time necessary to execute scenarios and thus wrote the following short Perl
program:
use strict;
use warnings;
die "Usage: stat3 \"yyyy mm dd\" \"hh:mm:ss\"\n" if @ARGV != 3;
my ($file, $date, $min_time) = @ARGV;
my @values;
open my $fh, '<', $file or die "Could not open $file: $!\n";
{
local $/ = '[END]';
while (my $entry = <$fh>)
{
my ($time, $duration) = $entry =~
m/^.+\|.+\|$date,\s(\d\d:\d\d:\d\d\s).*execute Scenario\s(\d+)\sms/s;
push (@values, $duration) if $time and $time ge $min_time;
}
}
close $fh;
print 'Average: ', in(eval(join('+'. @values)) / @values) /
1e3,
' s (', @values + 0, ")\n" if @values + 0;
Of course, the simple arithmetic mean could have been calculated in the main
loop without the need of storing all values captured from the log file into an
array but since it was not clear if more complex calculations might be
necessary, it was decided to save all values first and use them for the
calculations in the next step. This led to the funny way of computing the
arithmetic mean by concatenating all array elements in a single long character
string, joined together with '+'-characters. This string is then fed into an
eval which is not efficient but shows what can be done using a dynamic
programming language like Perl.
Are there any files with W:WD-rights on my system disk?
Another day I was asked “How can you be sure there are no
files on your system disk which are writable by WORLD?” Good question – this
calls for a short Perl program, too, which shows how external commands and
functions can be called using backticks while capturing their output into
program variables:
use strict;
use warnings;
my ($fc, $mc) = (0, 0);
for my $line (`dir/prot/width=(file=60) [...[`)
{
my ($file, $w) = $line =~ m/(.+)\s+.+,(.*)\)/;
next unless $file;
$fc++;
print "$file\n" and $mc++ if ($w =~ m/[WD]/);
}
print "$fc files processed, $mc are world writable/deletable!\n";
It turned out that no files were endangered by wrong protection settings and
yes, the program was tested by creating a file with W:WD rights deliberately.
Migrating a MySQL database to Oracle/RDB
Another one time script which proved itself being very
useful was written to migrate a MySQL database running on a LINUX machine to an
Oracle/RDB system running on an OpenVMS system (cf. “Bringing Vegan Recipes to
the Web with OpenVMS”, OpenVMS Technical Journal, No. 8, June 2006). All out of
the box attempts to solve this problem did not work directly due to the very
different output/input formats regarding the generated files. A first attempt to
transform a MySQL output file into a suitable load file for Oracle/RDB proved to
be quite complicated and having much overhead, so it was decided to give up this
approach and try an online approach using a simple Perl program to connect to
both databases at once, reading data from MySQL and writing it directly into
Oracle/RDB.
The resulting program turned out to be quite generic and only expects the
necessary database connection parameters as well as a list of tables to be
copied. The copy operation itself was faster than expected and even outperformed
the first attempt using file based export/import with an external transformation
routine implemented in Perl.
Larger Perl programs
Many problems which occur on a regular basis can be solved
using Perl, too. Examples for such problems are:
-
Generating simple web server statistics on a daily
basis.
-
Fetching stock market data from a web server and
storing it into a MySQL database.
-
Fetching mail from a POP3 server in regular time
intervals and distributing these mails to the OpenVMS mail system.
-
Sending outgoing mails to an SMTP server requiring
authentication which is not currently supported by OpenVMS's TCPIP stack.
-
Caching results from database queries to speed up
execution time of programs requesting data from a database etc.
These four examples will be described briefly in the following showing the
power of Perl in larger applications:
Simple web server statistics After observing that the WASD web server running on an OpenVMS system was
unexpectedly busy, a simple web server statistics was to be programmed to
see which files were requested how often. All in all a result like this
should be generated:
2734: my_machines/dornier/do80/chapter_1.pdf 288: my_machines/bbc/tisch_analogrechner/anleitung.pdf 117: publications/anhyb.pdf 97: publications/handson.pdf
This was accomplished after only a couple of minutes with the following
short Perl program:
use strict; use warnings;
die "File name and account name expected!\n" unless @ARGV == 2; my ($log_file, $account) = @ARGV;
open my $fh, '<', $log_file or die "Unable to open log file
$log_file,
$!\n";
my %matches; while (my $line = <$fh>) { my ($ip, $key) = $line =~ m/^(\d+\.\d+\.\d+\.\d+).*"GET
\/
$account\/(.+?)\s/; next if !$ip or $ip =~ '^192.168.31'; $key =~ s/"//g; $key .= 'index.html' if $key =~ m:/$:; $matches{$key}++ if $key =~ m/(html|pdf|txt)$/;
}
close $fh;
printf "%5d: %s\n", $matches{$_}, $_ for (sort {$matches{$b} <=> $matches{$a}} keys(%matches));
A couple of months later my friend Michael Monscheuer wrote an equivalent
web server statistics script in pure DCL which was much (very much, in fact)
longer than the program shown above, although I have to admit that his
solution seemed more easy to read at first sight, but due to the sheer
amount of code this impression faded rather quickly.
Fetching mail from a POP3-server Sometimes it would be desirable to fetch mails from a standard POP3-server
and make these mails available in the OpenVMS mail system so the system's
users can access their mails using MAIL or a suitable webinterface like
yahmail or
soymail. To make this possible, a Perl written batch job is
required which polls in regular intervals a variety of POP3-servers and
their associated mailboxes, fetches mails and distributes these mails to the
various users of the OpenVMS system. The overall Perl code for implementing this batch job consists of only 140
lines since most of the really complicated subtasks were already implemented
in the following modules readily found on CPAN:
- Net::POP3 – client interface to the POP3-protocol.
- IO::File – file creation and access methods.
- POSIX qw(tmpnam) – used to create temporary file names.
- VMS::Mail – interface to the OpenVMS mail system.
When it is possible to receive mails, it would be nice to be able to send
mails, too, as the following example shows:
SMTP-Proxy Almost every current mail provider requires that its clients authenticate
prior to sending mail via their SMTP server(s). Unfortunately,
authentication is not supported by the TCPIP package for OpenVMS. Since a
requirement was to send output mail directly from the OpenVMS system, i.e.
without an intermediate proxy system like a LINUX host or the like, it was
decided to implement a small SMTP-proxy in Perl running on the OpenVMS
system itself.
This proxy connects on the local machine to port 25 and listens for outgoing
mail while another connection is maintained to port 25 of the provider.
Every outgoing mail is parsed and enriched with the necessary authentication
information before being sent to the provider which solved the initial
problem quite easily.
This SMTP-proxy makes use of the following modules yielding an overall code
size of only 68 lines of Perl code:
- Net::ProxyMod – this module allows easy TCPIP-packet modification.
- MIME::Base64 – MIME-encoding and -decoding.
- Tie::RefHash – allows using references as hash keys.
Database-Proxy Sometimes it is desirable to perform database accesses not directly but via
a proxy which might either contain some business logic and/or cashing
mechanisms to reduce database load and to speed up the application at the
cost of some additional memory consumption. Since a former article already
described this Perl based proxy in detail (cf. “Bringing Vegan Recipes to
the Web with OpenVMS”, OpenVMS Technical Journal No. 8, June 2006) only the
obtained speedup of a factor of 10 obtained with this proxy should be noted
here.
|