Yahoo Webmail Creates Illegal Line Ends

Messages delivered to the e-mail cloud by means of html forms and Yahoo mail as provided over the World Wide Web are introducing HEX 0A characters (UNIX line ends) when they shouldn't. Quoted printable encoding is involved. The 0As are causing problems with rejection, in accordance with RFC2821, by SMTP agents.

This affects Yahoo web mail and two X-Mailer clients, YahooMailRC and YahooMailClassic. Problems are apparent in systems that use single ASCII control characters to denote line ends. Systems that use CRLF pairs natively - Windows - are not affected.

I first saw the effects in the Microsoft Excel mailing list. My SMTP server, qmail, refused to reply to some messages. A week or so of detective work, with the much appreciated help of an Excel poster KC Cheung who blogs at pynasocas, I tracked the 0A line ends to the use of Yahoo webmail. There were about 1000 affected messages in my collection of postings from mailing lists. Most involved Excel but that's probably because Excel is the only Microsoft product I use.

A partial work-around is to switch to classic mode when using Yahoo's webmail. help.yahoos describes a procedure for that. But even with classic mode it's not perfect. Yahoo mail seems to do the quoted-printable thing when it feels like it regardless of the choice of classic or modern.

Please have a look at these selections from the RFCs (Request For Comments) that, in principle, are the standards for the Internet.

From RFC-2822 Section 2.3, Body

   CR and LF MUST only occur together as CRLF;
   they MUST NOT appear independently in the body.

From RFC-2821 Section 2.3.7, Lines

   SMTP commands and, unless altered by a service extension, message
   data, are transmitted in "lines".  Lines consist of zero or more data
   characters terminated by the sequence ASCII character "CR" (hex value
   0D) followed immediately by ASCII character "LF" (hex value 0A).
   This termination sequence is denoted as <CRLF> in this document.
   Conforming implementations MUST NOT recognize or generate any other
   character or character sequence as a line terminator.  Limits MAY be
   imposed on line lengths by servers (see section 4.5.3).

   In addition, the appearance of "bare" "CR" or "LF" characters in text
   (i.e., either without the other) has a long history of causing
   problems in mail implementations and applications that use the mail
   system as a tool.  SMTP client implementations MUST NOT transmit
   these characters except when they are intended as line terminators
   and then MUST, as indicated above, transmit them only as a <CRLF>
   sequence.

Yahoo web mail is causing violations because it introduces =0D=0A quoted returns and line feeds which are not processed by SMTP, POP, and other machine to machine transfers which are expected to convert line ends first to the internet standard CRLF and then, on reception, to the standard for the receiving machine.

The result is nonconforming mail, sent out in replies with added quotes, that doesn't work correctly or is even refused by SMTP servers.

Both =0D=0A and shorter =0A strings are observed. It is not clear if they are introduced by Yahoo web mail as it runs on divers servers or if the underlying HTML form produces one or the other depending on what OS the calling browser is running under.

Yahoo should, when it introduces quoted printable in a message, refrain from quoting the line ends and it should transmit only CRLF pairs. That way intermediate transfer agents can do their thing properly as text moves from one operating system to another.

Yahoo could, but doesn't, offer a user option to eschew automatic conversion to quoted printable. It is rarely necessary these days to limit line length to 76 characters. 998, plus 2 for the CRLF, is the actual limit and mail clients are encouraged to handle more than that.

When a user uses HTML for his mail the problem is less severe and may not even exist. That may be because of MIME encoding or typing. It isn't a solution for mailing lists that demand text/plain or use the demime tool for digests.

YahooMailClassic/11.3.2 YahooMailWebService/0.8.105.279950 seems not to be so demanding about when to convert to quoted printable. YahooMailRC/470 YahooMailWebService/0.8.105.279950 pretty much always creates problems. In some cases a web user can send a compliant e-mail by changing to classic mode.

The oldest e-mail in my collection that has the problem is December 2008.

It is likely that a workaround can be accomplished in my qmail server with a global replacement of /=0D=0A/ with 0x0A before storage into a UNIX POP mailbox. I'll be trying that soon but it's hardly a proper solution. Well, I have done that, and it works, at least for me. Have a look at the bottom.

These typical messages are taken from a special open BSD POP mailbox that I created so that I could see messages from Yahoo without any processing by a POP3 server or an e-mail client, Note the use of the = sign to represent characters given in hexadecimal and to indicate soft carriage returns that allow clipping to 76 ASCII characters in a line without worrying about word breaks. That is quoted-printable.

From blueboy@yahoo.com  Mon Aug 23 17:26:42 2010
Received: from [71.38.2.217] by web112109.mail.gq1.yahoo.com via HTTP; 23 Aug 2010
X-Mailer: YahooMailClassic/11.3.2 YahooMailWebService/0.8.105.279950
Date: Mon, 23 Aug 2010 14:28:24 -0700 (PDT)
From: T Blues blueboy@yahoo.com
Subject: Test Classic 8
To: Doug Test McNutt dougtest@macnauchtan.com
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Should You Ditch Your Calcium Supplement?=0AFor years, women have been urge=
d to load up on calcium. Now, a new study finds that older women who take c=
alcium supplements may be increasing their risk of heart attack by a whoppi=
ng 30%. You can get my take on this research and what it means for you on t=
he Nutrition Data Blog.=09=09 =0A=09=09=09=09=09=09=09=09=09=0AEating calci=
um-rich foods=E2=80=94such as kale, yogurt, and Chinese cabbage=E2=80=94may=
 be a better way to protect your bones. Researchers from Washington Univers=
ity in St. Louis found that women who got most of their calcium from foods =
had denser, healthier bones than those who took calcium supplements, even t=
hough their total calcium intake was slightly lower. =0A=0AYou can use the =
tools on Nutrition Data to track your daily nutrient intake and to find foo=
ds rich in calcium or other nutrients you may be missing. And for a physici=
an's perspective on the new calcium research, see Steve Parker's post on th=
e Nutrition Data Heart Health Blog. =0A=0AKit=0A=0A=0A 

NOTE ASCII =09 is a horizontal tab character. It goes away in modern, RC/, mode below. E2 80 94 as UTF-8 is a long em dash.

From blueboy@yahoo.com  Mon Aug 23 17:28:26 2010
Received: from [71.38.2.217] by web112117.mail.gq1.yahoo.com via HTTP; 23 Aug 2010
X-Mailer: YahooMailRC/470 YahooMailWebService/0.8.105.279950
Date: Mon, 23 Aug 2010 14:26:41 -0700 (PDT)
From: T Blues blueboy@yahoo.com
Subject: Test8
To: dougtest@macnauchtan.com
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Should You Ditch Your Calcium Supplement?=0AFor years, women have been urge=
d to load up on calcium. Now, a new study finds =0Athat older women who tak=
e calcium supplements may be increasing their risk of =0Aheart attack by a =
whopping 30%. You can get my take on this research and what it =0Ameans for=
 you on the Nutrition Data Blog. =0AEating calcium-rich foods=E2=80=94such =
as kale, yogurt, and Chinese cabbage=E2=80=94may be a =0Abetter way to prot=
ect your bones. Researchers from Washington University in St. =0ALouis foun=
d that women who got most of their calcium from foods had denser, =0Ahealth=
ier bones than those who took calcium supplements, even though their total =
=0Acalcium intake was slightly lower. =0A=0AYou can use the tools on Nutrit=
ion Data to track your daily nutrient intake and =0Ato find foods rich in c=
alcium or other nutrients you may be missing. And for a =0Aphysician's pers=
pective on the new calcium research, see Steve Parker's post on =0Athe Nutr=
ition Data Heart Health Blog. =0A=0AKit=0A=0A=0A      

From: an internet posting Aug 2010, re Lotus Notes

Development FiLMS The name of this FiLMS install (Development FiLMS)=0D=0Ah=
ttp://Internal-IP The URL of this FiLMS install (http://Internal-IP/dev_r=
yan)=0D=0Atest1 The Username for the user=0D=0A Test 1 T=
he text used to describe this User throughout the system=0D=0AEnglish This =
user's primary language.=0D=0ASandbox (Test Organization) The Name of the O=
rganization this User belongs to.=0D=0ALearner The Name of the Role this Us=
er has been assigned.=0D=0A The Hint that is shown to this user if they do =
not enter their password correctly=0D=0ATrue If the User is able to log int=
o the system, and if they appear in reports.=0D=0ANone The time (if any) wh=
en this User will automatically become inactive.=0D=0Ato-address E-Mail=
=0D=0A[Department] Department=0D=0A[GEO] GEO=0D=0A[Position] Position=0D=0A=
[Access Type] Access Type=0D=0A[Direct Manager] Direct Manager=0D=0A The us=
er's Company Name=0D=0A[temp_password] The field where a user's temp passwo=
rd is stored=0D=0A[Job Family] Job Family is required by Nexen Integrity to=
 direct the SOC survey=0D=0A[Contractor User Id] This User's Id in the cont=
ractor portal.=0D=0Afalse If this user should change their password after t=
heir next login.=0D=0A[Employee ID] This User's Employee ID=0D=0A[Bank] Thi=
s User's Bank=0D=0A[Country] This User's country=0D=0A

Note the =0A and =0D=0A encoded characters which are line ends probably sent to Yahoo via an HTML form in the HTTP interface to Yahoo mail. Sent using Safari from an OS-10 (UNIX) Macintosh there are just =0A encodings. The internet example shows =0D=0A pairs which are likely the result of access to Yahoo webmail from a Windows operating system.

When my mail client - Eudora on a classic Macintosh - reads the message using a POP3 protocol the Content Type: header is recognized and the decoding of quoted printable is carried out on the fly. The = signs at the ends of the lines are followed by CRLF pairs because the POP3 server converts its line ends to internet form. Eudora just removes those line ends and the = sign in accordance with the definition of quoted printable. But Eudora also converts the =0A character string to a bona fide 8-bit character 0x0A which would be recognized as a UNIX line end on a UNIX box but is treated as an unprintable character represented visually as an open square on a classic Macintosh. That's not a readability problem but . . . .

If I try to reply to such a message, while including the expected quoted copy of the original text, my SMTP server rejects the message because it contains a character - 0A - that is prohibited by RFC-2822. 0A characters MUST be preceded by a 0D to create an acceptable internet line end.

Because the line end for my OS is a single 0D the mail client converts a simple 0D to a 0D0A pair before a transmit to SMTP but it does not do that for a nude 0A. That is exactly what the RFC's call for.


From RFC-2045 Quoted Printable, November 1996

  6.7(1)  (General 8bit representation) Any octet, except a CR or
          LF that is part of a CRLF line break of the canonical
          (standard) form of the data being encoded, may be
          represented by an "=" followed by a two digit
          hexadecimal representation of the octet's value.  The
          digits of the hexadecimal alphabet, for this purpose,
          are "0123456789ABCDEF".  Uppercase letters must be
          used; lowercase letters are not allowed.  Thus, for
          example, the decimal value 12 (US-ASCII form feed) can
          be represented by "=0C", and the decimal value 61 (US-
          ASCII EQUAL SIGN) can be represented by "=3D".  This
          rule must be followed except when the following rules
          allow an alternative encoding.

    (4)   (Line Breaks) A line break in a text body, represented
          as a CRLF sequence in the text canonical form, must be
          represented by a (RFC 822) line break, which is also a
          CRLF sequence, in the Quoted-Printable encoding.  Since
          the canonical representation of media types other than
          text do not generally include the representation of
          line breaks as CRLF sequences, no hard line breaks
          (i.e. line breaks that are intended to be meaningful
          and to be displayed to the user) can occur in the
          quoted-printable encoding of such types.  Sequences
          like "=0D", "=0A", "=0A=0D" and "=0D=0A" will routinely
          appear in non-text data represented in quoted-
          printable, of course.

    (5)   (Soft Line Breaks) The Quoted-Printable encoding
          REQUIRES that encoded lines be no more than 76
          characters long.  If longer lines are to be encoded
          with the Quoted-Printable encoding, "soft" line breaks
          must be used.  An equal sign as the last character on a
          encoded line indicates such a non-significant ("soft")
          line break in the encoded text.

6.7(4) seems to contradict 6.7(1) but (1) pretty clearly says that a CRLF pair should never be encoded with = and hexadecimal codes in the except clause in the first sentence. =0A amd =0D are allowed as they may occur in binary, non-text, data but that's not what we're talking about.

There are a lot of historical comments in the C part of the RFCs. Some commenters pretty much said that quoted printable just shouldn't ever be used in favor of the then-new Base64. The compromise seems to be that quoted printable should never be used for binary attachments but is worthwhile for blocks of text that need special treatment such as line length or upper ASCII characters. msg01971 from Feb 1992 is the start of a discussion of how quoted printable can mess things up. These are interesting followups: msg01973, msg01975, msg01976, msg01977, msg01980, msg01981.


Added September 15, 2011.

My SMTP/POP server structure allows me to create qmail filters in a scripting language like perl. By placing the filter where the mail users can access it and properly configuring /var/rules/ for a mail account I can intercept messages as they are delivered by the SMTP part and before they are stored in a .mbx file that will later be accessed via POP3 by my mail client at home. For my mailing list address I have have long used such a filter which eliminates spam by accepting only messages from lists that I am subscribed to. There is a provision for accepting personal mail from someone who replies, within a couple of weeks, to something I posted.

I changed that long perl script to repair the offending line ends and I have experienced no problems since. With an open source mail client at home I could have made the changes there but my long time favorite - Eudora - won't allow for that because it decodes quoted-printable on the fly during its download with POP3. Below is the code I added with some adjustment to remove unrelated stuff and subroutine calls. It's perl. The code you may need help with is the s/ / / structure which is from sed and says look for what is between the first slashes and replace it with what's between the second pair.

#!/usr/bin/perl
# http://qmail.linocomm.net/top.html
# http://www.lifewithqmail.org/
########## 
open MBX, ">>$boxpath"); Open (and you should lock) the mailbox file for appending.
$yahooflag = 0;
$bodystarted = 0;
while (<STDIN>)  # qmail passes message via standard input. Read one line.
    {
    if (length <= 1) # Count the line end which is still there.
        {
        $bodystarted = 1; # Blank line indicates end of headers.
        }
    if (/YahooMailWebService/) # This will be in a header.
        {
        $yahooflag = 1;
        print MBX "X-foundyahoo: YahooMailWebService found.\n"; #Tell us about it.
        }
    if ($bodystarted)
        {
        if ($yahooflag) # Change only if in a message body from Yahoo
            {
            # Code assumes it's running under UNIX with \n meaning a 0A line end character.
            # 0A line ends will get converted to the 0D0A internet standard by the POP server.
            s/=0D=0A/\n/gs; # Substitute UNIX line end for quoted printable CRLF. Do this first!
            s/=0A/\n/gs; # Substitute UNIX line end for quoted printable LF
            s/=0D/\n/gs; # Substitute UNIX line end for quoted printable CR
            s/=\n\z//gs; # Kill quoted printable soft returns. There was no chomp of the STDIN line.
            }
        s/^From />From /; # Escape accidental lines that start with MBX keyword From.
        }
    print MBX; # Write the, possibly modified, line to the mailbox.
    }
########## 


Thu Sep 16 18:15:30 MDT 2010
Douglas P. McNutt, PhD
The MacNauchtan Laboratory
7255 Suntide Place
Colorado Springs, CO 80919-1060
voice 719 593 8192
dmcnutt@macnauchtan.com Subject: Yahoo mail
http://www.macnauchtan.com/
ftp://ftp.macnauchtan.com/