[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]
LINUX GAZETTE
...making Linux just a little more fun!
How to E-mail an Encyclopedia
By Graham Jenkins

Why Would You E-mail an Encyclopedia?

OK, so it doesn't have to be an encylopedia. It might be a movie. Or a large directory you have tarred and compressed. And you could of course transfer it using FTP. Or perhaps you couldn't. Your machine might live within a corporate LAN environment with no FTP access to the outside world. Or the destination machine may have FTP disabled for security purposes. An alternative is to encode the object to be transferred into a string of ASCII characters, and send it via e-mail. You can use the 'uuencode' utility to perform this encoding, or you can use Base64 Content Transfer Encoding as described in RFC 2045 "Multipurpose Internet Mail Extensions (MIME) Part One".

How Would You Package an Encyclopedia?

If you were physically mailing an encyclopedia, you might package it entirely within one carton. That would be a good solution, provided all mail handlers along the way would accept a carton of that size and weight. If that were not the case, then you would have to split the encyclopedia into multiple cartons of acceptable size and weight.

In a like manner, when we are e-mailing an encyclopedia, we need to ensure that the size of the e-mail message which contains it doesn't exceed any limits which might be encountered along the way. If that is not the case, then we need to split the message into multiple parts of acceptable size. This can be done in accordance with RFC 2046 "Multipurpose Internet Mail Extensions (MIME) Part Two".

In summary, if we follow the recommendations of RFC 2045 and RFC 2046, we should perform Base64 encoding on our entire encyclopedia, then split the result into as many parts as necessary. The parts to be mailed will then look something like this:

  From grahjenk@au1.ibm.com Tue Dec 31 13:14:34 2002
  Content-Disposition: inline
  Content-Transfer-Encoding: 7bit
  Content-Type: message/partial; id="300870"; number="1"
  Subject: Graham's Encylopedia
  
  owF1Vb+P3EQUPhLRrBSFlHQjRYCQsthe/1q7CNrbREjocnvK3hHREM3ac7fWeWfM
  zPh2L38ASomEIro0SNBBg2hBSDTwR0BBEwqQaFJF8J499tob0EjW7rzv+96b772x
   ...
  szJb9DUMvKdRUIV+RY5Xu3UkRQqvJCzdzHtHoQL36Ke6elnYLgwH8MfxCU9ymq1Y
  
  --
  From grahjenk@au1.ibm.com Tue Dec 31 13:14:34 2002
  Content-Disposition: inline
  Content-Transfer-Encoding: 7bit
  Content-Type: message/partial; id="300870"; number="9"; total="9"
  Subject: Graham's Encyclopedia
  
  dc45xuruv3m3e8z/OGRD6lxz13GC5m0XbXvcWlyFW4vxbSSK5KEoTOIIuxTFs2JK
  UnZKy1wTAV9TWr2dev7WrLbXkeOHUVQnjuyXEptwm3hBgfT43auvVh/v5mt+48pb
  n+09Hf7+5Nvyx5tf/fP4o+PJ398Xf958cW3v6ejzL17/9YPfPs4unv08efvr68O/
  njz/Fw==
  
  --

Another Way of Packaging an Encyclopedia

It's not always easy for a message recipient to assemble parts like those shown above in correct order, then strip out header lines and feed the parts into a Base64 decoding program. If he is using an old Unix machine, he may not actually have a Base64 decoder. If he is using a Microsoft machine, he might not be able to appropriately edit the message parts.

So an alternative mechanism is to break the encyclopedia into numbered parts, then separately uuencode each part for sending. Most versions of 'uudecode' are smart enough to strip out header lines. It even works with Microsoft Outlook.

The secret here is to number the component parts in such a fashion that they can easily be selected (e.g. by using 'cat') in the correct sequence, and fed to a pipe (e.g. for uncompress and untar operations) or output file. The output parts now look like:

  From grahjenk@au1.ibm.com Tue Dec 31 13:49:07 2002
  Subject: encyclo part 1/ size/sum 1024/16571
  
  begin 644 001_encyclo
  M<F]O=#IX.C`Z,3I3=7!E<BU5<V5R.B\Z+W-B:6XO<V@*9&%E;6]N.G@Z,3HQ
  M.CHO.@IB:6XZ>#HR.C(Z.B]U<W(O8FEN.@IS>7,Z>#HS.C,Z.B\Z"F%D;3IX
   ...
  M8W)E<',Z+V)I;B]K<V@*=V-O8F%T8V@Z>#HU,#(X.#HQ.D%L97@@=&AE(%=A
  B;FME<CHO97AP;W)T+VAO;64O=V-O8F%T8V@Z+V)I;B]K<P``
  `
  end
  
  --
  From grahjenk@au1.ibm.com Tue Dec 31 13:49:07 2002
  Subject: encyclo part 2/2 size/sum 945/12218
  
  begin 644 002_encyclo
  M:`IC-S0S-#0P.G@Z-38T-C,Z-3`P-#I!;F1R97<@3'5O;F<Z+VAO;64O861M
  M;W!E<F%T;W(Z+V5X<&]R="]H;VUE+V]P8U]O<#HO8FEN+W-H"F,Y,34W.3DZ
  M>#HU,#(Y,#HQ.CHO:&]M92]A9&UI;B]C.3$U-SDY.B]U<W(O8FEN+V)A<V@*
  `
  end
  
  --

You'll notice that we are now using just an upper-case character-set, and that it contains a number of bracket and other symbols. Some of the symbols don't map in an equivalent fashion into other character-set representations. That's why RFC 2045 recommends the use of Base64 instead of 'uuencode'.

The Encylopedia Packer

Here's the packaging program. For simplicity and generality, we use the alternative packaging scheme outlined above. Programs which do this have been around for a long time. They are usually written in 'C', although Bourne-Shell versions are available. And they usually write temporary files.

It is possible to write an elegant implementation of the packaging scheme using the Perl language, without using any temporary files. The resulting program is both simple and portable. So that's what we've done.

#!/usr/local/bin/perl -w
# @(#) filemail.pl      Breaks incoming stream into parts, then encodes
#                       each part and e-mails it to designated recipient.
#                       Vers. 2.05; Graham Jenkins, IBM GSA, December 2002.

use strict;             # Parts are encoded and sent via a double-buffer scheme.
use File::Basename;     # Uuencoding is used to reduce module dependence.
my  $PSize = 700;       # Default (input) part-size.
my  ($Count,$Sum,$Size,$Total,$InpBuf,$InpLen,$OutBuf,$j);

if ($#ARGV eq 2) { if ($ARGV[0] =~ m/^-\d+$/ ) { $PSize=0-$ARGV[0]; shift } } 

die "Usage: cat file  |".basename($0)." [-KbPerPart] destination filename\n".
    " e.g.: tar cf - .|".basename($0)." -64 smith\@popser.acme.com mydir.tar\n".
    "(Note: default un-encoded part size = $PSize","kb)\n"  if ($#ARGV ne 1);

open(INFILE,"-") || die "Can't read input!\n";
$Count = 0; $Total = "";# Loop until no further input available.

do { $InpLen = read(INFILE, $InpBuf, 1024 * $PSize);
     $Total  = $Count if $InpLen lt 1;
     do { $Size = length($OutBuf); 
          print STDERR "$ARGV[1] part $Count/$Total => $ARGV[0] $Size bytes\n";
          $Sum  = unpack("%32C*", $OutBuf);
          foreach $j (1,2) {$Sum = ($Sum & 0xffff) + int($Sum/0x10000)}
          open(PIPE, "| Mail -s" .
            "'$ARGV[1] part $Count/$Total size/sum $Size/$Sum' $ARGV[0]");
          $j = $Count ; while (length($j) < 3 ) { $j = "0" . $j }
          $j = dirname($ARGV[1])."/".$j if dirname($ARGV[1]) ne "."; 
          print PIPE "begin 644 ",$j,"_", basename($ARGV[1]),"\n",
            pack("u",$OutBuf),"\`\nend\n";
          close(PIPE)                                   } if $Count gt 0;
     $Count++; $OutBuf = $InpBuf                          } until $InpLen lt 1;

Perl lends itself to this application through the form of its 'read' statement, which allows us to specify the number of bytes which it should try to acquire into a designated string. As can be seen, we just keep reading from standard input until an empty string is returned in '$InpBuf'. Each time we get a non-empty string, we uuencode whatever content is currently in '$OutBuf' and push it into a mail program. We then store the contents of '$InpBuf' in '$OutBuf' ready for our next iteration.

Perl is able to perform a uuencode operation on a string by using its 'pack' statement as shown with a 'u' parameter; no additional modules are required. It's not really necessary - but we also take advantage of the 'unpack' statement's characteristics to compute a checksum on each part as it is sent.

You may observe that we actually open a pipe into the Unix/Linux 'Mail' program to handle our outgoing mail. For greater portability, we could install and use the Net::SMTP module.

The program can be invoked with an optional part-size parameter to adjust its default un-encoded part-size limit of 700kb.

Programs Which Do Similar Things

Some of you may recognize that this sort of message-splitting is exactly the sort of thing we did in "Secure Printing with PGP". For those of you who are interested, there are updated versions of the programs presented therein at: "CPAN Scripts Repository". Those programs use the RFC-recommended "Base64-encode then split" methodology.

An earlier article "A Linux Client for the Brother Internet Print Protocol" included a shell script which used a "split then send parts" methodology; this also used Base64 encoding.

 

[BIO] Graham is a Unix Specialist at IBM Global Services, Australia. He lives in Melbourne and has built and managed many flavors of proprietary and open systems on several hardware platforms.


Copyright © 2003, Graham Jenkins. Copying license http://www.linuxgazette.com/copying.html
Published in Issue 86 of Linux Gazette, January 2003

[ Prev ][ Table of Contents ][ Front Page ][ Talkback ][ FAQ ][ Next ]