Lecture 4: Applications #2: Email

Electronic Mail Basics

Internet electronic mail (email) allows a person to compose a message and to send it to another person, usually on a remote system. Most email software also provides software to facilitate reading, saving, printing and replying to email. Until very recently, electronic mail was the single biggest generator of traffic volume on the Internet.

Email messages are delivered as follows:

Email delivery protocols
Key concepts:

RFC 822 Message Format

RFC822 defines the structure of an email message in the Internet. It has become the generic standard for all email messages. RFC2822 updates RFC822 without substantially changing its approach.

An RFC822-compatible email message consists of lines of ASCII text. It contains two sections:

  1. An email message begins with a set of headers (or header lines), some of which are are mandatory and others optional. The headers have a fixed format, consisting of a keyword which starts immediately after a newline (ie, left-justified), followed by a colon character, followed by a space and a value -- sometimes called "name-value pairs". Some typical headers include:
    From: pscott@ironbark.bendigo.latrobe.edu.au
    To: hjc@redgum.bendigo.latrobe.edu.au
    Reply-To: p.scott@latrobe.edu.au
    Subject: Problems with redgum?
  2. A body, which may contain any plain ASCII text. The body part follows the headers, separated from them by a blank line. Note that more recent standards than RFC822 (MIME) extend the range of possible messages which can be sent by email as enclosures or attachments, see later.

The SMTP Protocol

The Simple Mail Transfer Protocol defined in RFC821 (and updated, most recently, in RFC2821) specifies how mail is delivered from one system to another. It is a relatively straightfoward protocol.

Initially, an email client (usually the delivery agent software on the originating machine) establishes a TCP connection to the SMTP server (at port 25) on the destination machine.

The server responds with an informative message beginning with the 3-digit code 220 The client then sends a HELO command identifying the domain name of the system it is running on.

The client software then transmits one (or more) mail messages to the server. Each message is preceded by a MAIL-FROM and one or more RCPT-TO messages. The responses to these messages always begin with 3-digit numbers followed by a human readable message. Then the text of the message itself (including its headers) is transmitted using a DATA message.

Finally, a QUIT message from the client tells the server to close the TCP connection. An example of this is given on the next slide.

An SMTP Session

NB: Text in italics is sent from the client, boldface messages are sent from the server. Note that messages from the server always have a 3-digit code at the start of line. Some lines folded for clarity.
220 redgum.bendigo.latrobe.edu.au ESMTP Sendmail SGI-8.9.3/8.9.3;
    Tue, 11 Mar 2004 20:29:37 +1100 (EDT)
HELO bindi.bendigo.latrobe.edu.au
250 redgum.bendigo.latrobe.edu.au Hello bindi.bendigo.latrobe.edu.au
    [], pleased to meet you
Mail from: philscott@bindi.bendigo.latrobe.edu.au
250 pscott@bindi.bendigo.latrobe.edu.au... Sender ok
Rcpt to: hjc@ironbark.bendigo.latrobe.edu.au
250 Recipient ok
354 Enter mail, end with "." on a line by itself
From: pscott@ironbark.bendigo.latrobe.edu.au
Reply-To: p.scott@latrobe.edu.au
Subject: Problems with redgum?
To: hjc@ironbark.bendigo.latrobe.edu.au

Do we have a problem with mail on redgum?

250 NAA17474 Message accepted for delivery
221 redgum.bendigo.latrobe.edu.au closing connection

Other Aspects

There are many subtleties involved in electronic mail. These include:

Email Attachments

The Multipart Internet Mail Extensions (MIME -- originally RFC1521 and RFC1522, now updated in RFC2045-9) specification extends the SMTP protocol to allow the mail message body to contain attachments or enclosures. This allows SMTP to be used to send files of arbitrary type, whilst retaining compatability with RFC822.

The MIME specification adds several new header types. In the most common usage, the following are added to the basic message header:

MIME-Version: 1.0
Content-Type: Multipart/Mixed; Boundary=NextBitString_8765r443
The message is then structured into one or more "message parts", using the "Boundary" string as a separator.The following shows an audio attachment to an ordinary text message. Note that non-ASCII data is usually encoded into an ASCII representation.
Content-Type: text/plain

Ordinary email mesage in plain ASCII text

Content-Type: audio/basic; name="message.au"
Content-Transfer-Encoding: base64
...ASCII encoded data for the audio message

MIME Types and Encodings

The Content-Type: header in MIME specifies a "MIME type" for the data which follows. The MIME type is used to open a suitable application program to display the attached data. Some standard MIME types include:

text/plain lines of ASCII text
text/html HTML text
image/gif GIF image
video/mpeg MPEG video
application/postscript PostScript document
application/octet-stream Arbitrary data

For non-ASCII (8 bit) data, common encodings include "quoted-printable" and "Base64".

In Base64 encoding, the message is subdivided into groups of 3 bytes (24 bits) in length. These 24 bits are then subdivided into 4 groups of 6 bits each. Each 6 bit group is represented as one of 64 printable ASCII characters, from the 95 printable characters in ASCII. Finally, each of the printable characters is sent as an 8 bit byte. Thus, 24 bits of data are sent as 32 bits of ASCII data in the encoded message.

The Post Office Protocol (POP)

SMTP is really only useful to deliver mail to multiuser hosts which are permanently available and connected to the network. It is not normally used to deliver mail directly to, for example, a user's PC or Mac desktop system.

The Post Office Protocol (currently POP3) is designed to allow mail to be delivered to a mailbox on, eg, a Unix host using SMTP, but to later (at the recipient's convenience) download the contents of the mailbox to their desktop system.

A POP client (eg Eudora, Netscape Mail, MS Outlook) establishes a TCP connection (on port 110) to a server process on the (eg) Unix system where the mailbox resides. The user is authenticated (username/password), and the contents of her mailbox is downloaded for processing on her PC or Mac.

POP is almost universally used where a user has "dial up" Internet access from a commercial Internet Service Provider - the user's mailbox is maintained by the ISP. The IMAP protocol has superior functionality to POP, but is not (yet) in wide use.

Digression: Web-based Email Clients

A significant trend in email usage in recent years has been the emergence of "Web-based" email systems (of which the most significant is probably hotmail.com). In these systems, the mail is delivered to an SMTP mailbox, as usual. However, the user agent function is provided by a Web server and CGI (see later) combination running on the same system as the mailbox is located. The great attraction is that the user can access their mailbox from any Web browser and any location.

The system diagram looks like:

Web Email System
The main disadvantage of these systems, compared to POP (and IMAP) based email clients, is their slower performance, and more limited functionality. There are also privacy and/or security considerations, since the user's mailbox is stored on a remote system where the attitudes, ethics and competence of the system manager are unknown.

La Trobe Uni Logo

Copyright 2004 by Philip Scott, La Trobe University.
Valid HTML 3.2!