bincimap

Log | Files | Refs | LICENSE

rfc2192.txt (31426B)


      1 
      2 
      3 
      4 
      5 
      6 
      7 Network Working Group                                          C. Newman
      8 Request for Comments: 2192                                      Innosoft
      9 Category: Standards Track                                 September 1997
     10 
     11 
     12                             IMAP URL Scheme
     13 
     14 
     15 Status of this memo
     16 
     17      This document specifies an Internet standards track protocol for
     18      the Internet community, and requests discussion and suggestions for
     19      improvements.  Please refer to the current edition of the "Internet
     20      Official Protocol Standards" (STD 1) for the standardization state
     21      and status of this protocol.  Distribution of this memo is
     22      unlimited.
     23 
     24 
     25 Abstract
     26 
     27      IMAP [IMAP4] is a rich protocol for accessing remote message
     28      stores.  It provides an ideal mechanism for accessing public
     29      mailing list archives as well as private and shared message stores.
     30      This document defines a URL scheme for referencing objects on an
     31      IMAP server.
     32 
     33 
     34 1. Conventions used in this document
     35 
     36      The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", and "MAY"
     37      in this document are to be interpreted as defined in "Key words for
     38      use in RFCs to Indicate Requirement Levels" [KEYWORDS].
     39 
     40 
     41 2. IMAP scheme
     42 
     43      The IMAP URL scheme is used to designate IMAP servers, mailboxes,
     44      messages, MIME bodies [MIME], and search programs on Internet hosts
     45      accessible using the IMAP protocol.
     46 
     47      The IMAP URL follows the common Internet scheme syntax as defined
     48      in RFC 1738 [BASIC-URL] except that clear text passwords are not
     49      permitted.  If :<port> is omitted, the port defaults to 143.
     50 
     51 
     52 
     53 
     54 
     55 
     56 
     57 
     58 Newman                      Standards Track                     [Page 1]
     59 
     60 RFC 2192                    IMAP URL Scheme               September 1997
     61 
     62 
     63      An IMAP URL takes one of the following forms:
     64 
     65          imap://<iserver>/
     66          imap://<iserver>/<enc_list_mailbox>;TYPE=<list_type>
     67          imap://<iserver>/<enc_mailbox>[uidvalidity][?<enc_search>]
     68          imap://<iserver>/<enc_mailbox>[uidvalidity]<iuid>[isection]
     69 
     70      The first form is used to refer to an IMAP server, the second form
     71      refers to a list of mailboxes, the third form refers to the
     72      contents of a mailbox or a set of messages resulting from a search,
     73      and the final form refers to a specific message or message part.
     74      Note that the syntax here is informal.  The authoritative formal
     75      syntax for IMAP URLs is defined in section 11.
     76 
     77 
     78 3. IMAP User Name and Authentication Mechanism
     79 
     80      A user name and/or authentication mechanism may be supplied.  They
     81      are used in the "LOGIN" or "AUTHENTICATE" commands after making the
     82      connection to the IMAP server.  If no user name or authentication
     83      mechanism is supplied, the user name "anonymous" is used with the
     84      "LOGIN" command and the password is supplied as the Internet e-mail
     85      address of the end user accessing the resource.  If the URL doesn't
     86      supply a user name, the program interpreting the IMAP URL SHOULD
     87      request one from the user if necessary.
     88 
     89      An authentication mechanism can be expressed by adding
     90      ";AUTH=<enc_auth_type>" to the end of the user name.  When such an
     91      <enc_auth_type> is indicated, the client SHOULD request appropriate
     92      credentials from that mechanism and use the "AUTHENTICATE" command
     93      instead of the "LOGIN" command.  If no user name is specified, one
     94      SHOULD be obtained from the mechanism or requested from the user as
     95      appropriate.
     96 
     97      The string ";AUTH=*" indicates that the client SHOULD select an
     98      appropriate authentication mechanism.  It MAY use any mechanism
     99      listed in the CAPABILITY command or use an out of band security
    100      service resulting in a PREAUTH connection.  If no user name is
    101      specified and no appropriate authentication mechanisms are
    102      available, the client SHOULD fall back to anonymous login as
    103      described above.  This allows a URL which grants read-write access
    104      to authorized users, and read-only anonymous access to other users.
    105 
    106      If a user name is included with no authentication mechanism, then
    107      ";AUTH=*" is assumed.
    108 
    109 
    110 
    111 
    112 
    113 
    114 Newman                      Standards Track                     [Page 2]
    115 
    116 RFC 2192                    IMAP URL Scheme               September 1997
    117 
    118 
    119      Since URLs can easily come from untrusted sources, care must be
    120      taken when resolving a URL which requires or requests any sort of
    121      authentication.  If authentication credentials are supplied to the
    122      wrong server, it may compromise the security of the user's account.
    123      The program resolving the URL should make sure it meets at least
    124      one of the following criteria in this case:
    125 
    126      (1) The URL comes from a trusted source, such as a referral server
    127      which the client has validated and trusts according to site policy.
    128      Note that user entry of the URL may or may not count as a trusted
    129      source, depending on the experience level of the user and site
    130      policy.
    131      (2) Explicit local site policy permits the client to connect to the
    132      server in the URL.  For example, if the client knows the site
    133      domain name, site policy may dictate that any hostname ending in
    134      that domain is trusted.
    135      (3) The user confirms that connecting to that domain name with the
    136      specified credentials and/or mechanism is permitted.
    137      (4) A mechanism is used which validates the server before passing
    138      potentially compromising client credentials.
    139      (5) An authentication mechanism is used which will not reveal
    140      information to the server which could be used to compromise future
    141      connections.
    142 
    143      URLs which do not include a user name must be treated with extra
    144      care, since they are more likely to compromise the user's primary
    145      account.  A URL containing ";AUTH=*" must also be treated with
    146      extra care since it might fall back on a weaker security mechanism.
    147      Finally, clients are discouraged from using a plain text password
    148      as a fallback with ";AUTH=*" unless the connection has strong
    149      encryption (e.g. a key length of greater than 56 bits).
    150 
    151      A program interpreting IMAP URLs MAY cache open connections to an
    152      IMAP server for later re-use.  If a URL contains a user name, only
    153      connections authenticated as that user may be re-used.  If a URL
    154      does not contain a user name or authentication mechanism, then only
    155      an anonymous connection may be re-used.  If a URL contains an
    156      authentication mechanism without a user name, then any non-
    157      anonymous connection may be re-used.
    158 
    159      Note that if unsafe or reserved characters such as " " or ";" are
    160      present in the user name or authentication mechanism, they MUST be
    161      encoded as described in RFC 1738 [BASIC-URL].
    162 
    163 
    164 
    165 
    166 
    167 
    168 
    169 
    170 Newman                   Standards Track                        [Page 3]
    171 
    172 RFC 2192                    IMAP URL Scheme               September 1997
    173 
    174 
    175 4. IMAP server
    176 
    177      An IMAP URL referring to an IMAP server has the following form:
    178 
    179          imap://<iserver>/
    180 
    181      A program interpreting this URL would issue the standard set of
    182      commands it uses to present a view of the contents of an IMAP
    183      server.  This is likely to be semanticly equivalent to one of the
    184      following URLs:
    185 
    186          imap://<iserver>/;TYPE=LIST
    187          imap://<iserver>/;TYPE=LSUB
    188 
    189      The program interpreting this URL SHOULD use the LSUB form if it
    190      supports mailbox subscriptions.
    191 
    192 
    193 5. Lists of mailboxes
    194 
    195      An IMAP URL referring to a list of mailboxes has the following
    196      form:
    197 
    198          imap://<iserver>/<enc_list_mailbox>;TYPE=<list_type>
    199 
    200      The <list_type> may be either "LIST" or "LSUB", and is case
    201      insensitive.  The field ";TYPE=<list_type>" MUST be included.
    202 
    203      The <enc_list_mailbox> is any argument suitable for the
    204      list_mailbox field of the IMAP [IMAP4] LIST or LSUB commands.  The
    205      field <enc_list_mailbox> may be omitted, in which case the program
    206      interpreting the IMAP URL may use "*" or "%" as the
    207      <enc_list_mailbox>.  The program SHOULD use "%" if it supports a
    208      hierarchical view, otherwise it SHOULD use "*".
    209 
    210      Note that if unsafe or reserved characters such as " " or "%" are
    211      present in <enc_list_mailbox> they MUST be encoded as described in
    212      RFC 1738 [BASIC-URL].  If the character "/" is present in
    213      enc_list_mailbox, it SHOULD NOT be encoded.
    214 
    215 
    216 6. Lists of messages
    217 
    218      An IMAP URL referring to a list of messages has the following form:
    219 
    220          imap://<iserver>/<enc_mailbox>[uidvalidity][?<enc_search>]
    221 
    222 
    223 
    224 
    225 
    226 Newman                      Standards Track                     [Page 4]
    227 
    228 RFC 2192                    IMAP URL Scheme               September 1997
    229 
    230 
    231      The <enc_mailbox> field is used as the argument to the IMAP4
    232      "SELECT" command.  Note that if unsafe or reserved characters such
    233      as " ", ";", or "?" are present in <enc_mailbox> they MUST be
    234      encoded as described in RFC 1738 [BASIC-URL].  If the character "/"
    235      is present in enc_mailbox, it SHOULD NOT be encoded.
    236 
    237      The [uidvalidity] field is optional.  If it is present, it MUST be
    238      the argument to the IMAP4 UIDVALIDITY status response at the time
    239      the URL was created.  This SHOULD be used by the program
    240      interpreting the IMAP URL to determine if the URL is stale.
    241 
    242      The [?<enc_search>] field is optional.  If it is not present, the
    243      contents of the mailbox SHOULD be presented by the program
    244      interpreting the URL.  If it is present, it SHOULD be used as the
    245      arguments following an IMAP4 SEARCH command with unsafe characters
    246      such as " " (which are likely to be present in the <enc_search>)
    247      encoded as described in RFC 1738 [BASIC-URL].
    248 
    249 
    250 7. A specific message or message part
    251 
    252      An IMAP URL referring to a specific message or message part has the
    253      following form:
    254 
    255          imap://<iserver>/<enc_mailbox>[uidvalidity]<iuid>[isection]
    256 
    257      The <enc_mailbox> and [uidvalidity] are as defined above.
    258 
    259      If [uidvalidity] is present in this form, it SHOULD be used by the
    260      program interpreting the URL to determine if the URL is stale.
    261 
    262      The <iuid> refers to an IMAP4 message UID, and SHOULD be used as
    263      the <set> argument to the IMAP4 "UID FETCH" command.
    264 
    265      The [isection] field is optional.  If not present, the URL refers
    266      to the entire Internet message as returned by the IMAP command "UID
    267      FETCH <uid> BODY.PEEK[]".  If present, the URL refers to the object
    268      returned by a "UID FETCH <uid> BODY.PEEK[<section>]" command.  The
    269      type of the object may be determined with a "UID FETCH <uid>
    270      BODYSTRUCTURE" command and locating the appropriate part in the
    271      resulting BODYSTRUCTURE.  Note that unsafe characters in [isection]
    272      MUST be encoded as described in [BASIC-URL].
    273 
    274 
    275 
    276 
    277 
    278 
    279 
    280 
    281 
    282 Newman                   Standards Track                        [Page 5]
    283 
    284 RFC 2192                    IMAP URL Scheme               September 1997
    285 
    286 
    287 8. Relative IMAP URLs
    288 
    289      Relative IMAP URLs are permitted and are resolved according to the
    290      rules defined in RFC 1808 [REL-URL] with one exception.  In IMAP
    291      URLs, parameters are treated as part of the normal path with
    292      respect to relative URL resolution.  This is believed to be the
    293      behavior of the installed base and is likely to be documented in a
    294      future revision of the relative URL specification.
    295 
    296      The following observations are also important:
    297 
    298      The <iauth> grammar element is considered part of the user name for
    299      purposes of resolving relative IMAP URLs.  This means that unless a
    300      new login/server specification is included in the relative URL, the
    301      authentication mechanism is inherited from a base IMAP URL.
    302 
    303      URLs always use "/" as the hierarchy delimiter for the purpose of
    304      resolving paths in relative URLs.  IMAP4 permits the use of any
    305      hierarchy delimiter in mailbox names.  For this reason, relative
    306      mailbox paths will only work if the mailbox uses "/" as the
    307      hierarchy delimiter.  Relative URLs may be used on mailboxes which
    308      use other delimiters, but in that case, the entire mailbox name
    309      MUST be specified in the relative URL or inherited as a whole from
    310      the base URL.
    311 
    312      The base URL for a list of mailboxes or messages which was referred
    313      to by an IMAP URL is always the referring IMAP URL itself.  The
    314      base URL for a message or message part which was referred to by an
    315      IMAP URL may be more complicated to determine.  The program
    316      interpreting the relative URL will have to check the headers of the
    317      MIME entity and any enclosing MIME entities in order to locate the
    318      "Content-Base" and "Content-Location" headers.  These headers are
    319      used to determine the base URL as defined in [HTTP].  For example,
    320      if the referring IMAP URL contains a "/;SECTION=1.2" parameter,
    321      then the MIME headers for section 1.2, for section 1, and for the
    322      enclosing message itself SHOULD be checked in that order for
    323      "Content-Base" or "Content-Location" headers.
    324 
    325 
    326 9. Multinational Considerations
    327 
    328      IMAP4 [IMAP4] section 5.1.3 includes a convention for encoding
    329      non-US-ASCII characters in IMAP mailbox names.  Because this
    330      convention is private to IMAP, it is necessary to convert IMAP's
    331      encoding to one that can be more easily interpreted by a URL
    332      display program.  For this reason, IMAP's modified UTF-7 encoding
    333      for mailboxes MUST be converted to UTF-8 [UTF8].  Since 8-bit
    334      characters are not permitted in URLs, the UTF-8 characters are
    335 
    336 
    337 
    338 Newman                      Standards Track                     [Page 6]
    339 
    340 RFC 2192                    IMAP URL Scheme               September 1997
    341 
    342 
    343      encoded as required by the URL specification [BASIC-URL].  Sample
    344      code is included in Appendix A to demonstrate this conversion.
    345 
    346 
    347 10. Examples
    348 
    349      The following examples demonstrate how an IMAP4 client program
    350      might translate various IMAP4 URLs into a series of IMAP4 commands.
    351      Commands sent from the client to the server are prefixed with "C:",
    352      and responses sent from the server to the client are prefixed with
    353      "S:".
    354 
    355      The URL:
    356 
    357       <imap://minbari.org/gray-council;UIDVALIDITY=385759045/;UID=20>
    358 
    359      Results in the following client commands:
    360 
    361          <connect to minbari.org, port 143>
    362          C: A001 LOGIN ANONYMOUS sheridan@babylon5.org
    363          C: A002 SELECT gray-council
    364          <client verifies the UIDVALIDITY matches>
    365          C: A003 UID FETCH 20 BODY.PEEK[]
    366 
    367      The URL:
    368 
    369       <imap://michael@minbari.org/users.*;type=list>
    370 
    371      Results in the following client commands:
    372 
    373        <client requests password from user>
    374        <connect to minbari.org imap server, activate strong encryption>
    375        C: A001 LOGIN MICHAEL zipper
    376        C: A002 LIST "" users.*
    377 
    378      The URL:
    379 
    380       <imap://psicorp.org/~peter/%E6%97%A5%E6%9C%AC%E8%AA%9E/
    381       %E5%8F%B0%E5%8C%97>
    382 
    383      Results in the following client commands:
    384 
    385        <connect to psicorp.org, port 143>
    386        C: A001 LOGIN ANONYMOUS bester@psycop.psicorp.org
    387        C: A002 SELECT ~peter/&ZeVnLIqe-/&U,BTFw-
    388        <commands the client uses for viewing the contents of a mailbox>
    389 
    390 
    391 
    392 
    393 
    394 Newman                      Standards Track                     [Page 7]
    395 
    396 RFC 2192                    IMAP URL Scheme               September 1997
    397 
    398 
    399      The URL:
    400 
    401       <imap://;AUTH=KERBEROS_V4@minbari.org/gray-council/;uid=20/
    402       ;section=1.2>
    403 
    404      Results in the following client commands:
    405 
    406          <connect to minbari.org, port 143>
    407          C: A001 AUTHENTICATE KERBEROS_V4
    408          <authentication exchange>
    409          C: A002 SELECT gray-council
    410          C: A003 UID FETCH 20 BODY.PEEK[1.2]
    411 
    412      If the following relative URL is located in that body part:
    413 
    414       <;section=1.4>
    415 
    416      This could result in the following client commands:
    417 
    418          C: A004 UID FETCH 20 (BODY.PEEK[1.2.MIME]
    419                BODY.PEEK[1.MIME]
    420                BODY.PEEK[HEADER.FIELDS (Content-Base Content-Location)])
    421          <Client looks for Content-Base or Content-Location headers in
    422           result.  If no such headers, then it does the following>
    423          C: A005 UID FETCH 20 BODY.PEEK[1.4]
    424 
    425      The URL:
    426 
    427       <imap://;AUTH=*@minbari.org/gray%20council?SUBJECT%20shadows>
    428 
    429      Could result in the following:
    430 
    431          <connect to minbari.org, port 143>
    432          C: A001 CAPABILITY
    433          S: * CAPABILITY IMAP4rev1 AUTH=GSSAPI
    434          S: A001 OK
    435          C: A002 AUTHENTICATE GSSAPI
    436          <authentication exchange>
    437          S: A002 OK user lennier authenticated
    438          C: A003 SELECT "gray council"
    439          ...
    440          C: A004 SEARCH SUBJECT shadows
    441          S: * SEARCH 8 10 13 14 15 16
    442          S: A004 OK SEARCH completed
    443          C: A005 FETCH 8,10,13:16 ALL
    444          ...
    445 
    446 
    447 
    448 
    449 
    450 Newman                      Standards Track                     [Page 8]
    451 
    452 RFC 2192                    IMAP URL Scheme               September 1997
    453 
    454 
    455      NOTE: In this final example, the client has implementation
    456      dependent choices.  The authentication mechanism could be anything,
    457      including PREAUTH.  And the final FETCH command could fetch more or
    458      less information about the messages, depending on what it wishes to
    459      display to the user.
    460 
    461 
    462 11. Security Considerations
    463 
    464      Security considerations discussed in the IMAP specification [IMAP4]
    465      and the URL specification [BASIC-URL] are relevant.  Security
    466      considerations related to authenticated URLs are discussed in
    467      section 3 of this document.
    468 
    469      Many email clients store the plain text password for later use
    470      after logging into an IMAP server.  Such clients MUST NOT use a
    471      stored password in response to an IMAP URL without explicit
    472      permission from the user to supply that password to the specified
    473      host name.
    474 
    475 
    476 12. ABNF for IMAP URL scheme
    477 
    478      This uses ABNF as defined in RFC 822 [IMAIL].  Terminals from the
    479      BNF for IMAP [IMAP4] and URLs [BASIC-URL] are also used.  Strings
    480      are not case sensitive and free insertion of linear-white-space is
    481      not permitted.
    482 
    483      achar            = uchar / "&" / "=" / "~"
    484                              ; see [BASIC-URL] for "uchar" definition
    485 
    486      bchar            = achar / ":" / "@" / "/"
    487 
    488      enc_auth_type    = 1*achar
    489                            ; encoded version of [IMAP-AUTH] "auth_type"
    490 
    491      enc_list_mailbox = 1*bchar
    492                              ; encoded version of [IMAP4] "list_mailbox"
    493 
    494      enc_mailbox      = 1*bchar
    495                              ; encoded version of [IMAP4] "mailbox"
    496 
    497      enc_search       = 1*bchar
    498                              ; encoded version of search_program below
    499 
    500      enc_section      = 1*bchar
    501                              ; encoded version of section below
    502 
    503 
    504 
    505 
    506 Newman                   Standards Track                        [Page 9]
    507 
    508 RFC 2192                    IMAP URL Scheme               September 1997
    509 
    510 
    511      enc_user         = 1*achar
    512                              ; encoded version of [IMAP4] "userid"
    513 
    514      imapurl          = "imap://" iserver "/" [ icommand ]
    515 
    516      iauth            = ";AUTH=" ( "*" / enc_auth_type )
    517 
    518      icommand         = imailboxlist / imessagelist / imessagepart
    519 
    520      imailboxlist     = [enc_list_mailbox] ";TYPE=" list_type
    521 
    522      imessagelist     = enc_mailbox [ "?" enc_search ] [uidvalidity]
    523 
    524      imessagepart     = enc_mailbox [uidvalidity] iuid [isection]
    525 
    526      isection         = "/;SECTION=" enc_section
    527 
    528      iserver          = [iuserauth "@"] hostport
    529                              ; See [BASIC-URL] for "hostport" definition
    530 
    531      iuid             = "/;UID=" nz_number
    532                              ; See [IMAP4] for "nz_number" definition
    533 
    534      iuserauth        = enc_user [iauth] / [enc_user] iauth
    535 
    536      list_type        = "LIST" / "LSUB"
    537 
    538      search_program   = ["CHARSET" SPACE astring SPACE]
    539                         search_key *(SPACE search_key)
    540                            ; IMAP4 literals may not be used
    541                            ; See [IMAP4] for "astring" and "search_key"
    542 
    543      section          = section_text / (nz_number *["." nz_number]
    544                          ["." (section_text / "MIME")])
    545                         ; See [IMAP4] for "section_text" and "nz_number"
    546 
    547      uidvalidity      = ";UIDVALIDITY=" nz_number
    548                              ; See [IMAP4] for "nz_number" definition
    549 
    550 13. References
    551 
    552      [BASIC-URL] Berners-Lee, Masinter, McCahill, "Uniform Resource
    553      Locators (URL)", RFC 1738, CERN, Xerox Corporation, University of
    554      Minnesota, December 1994.
    555 
    556          <ftp://ds.internic.net/rfc/rfc1738.txt>
    557 
    558 
    559 
    560 
    561 
    562 Newman                      Standards Track                    [Page 10]
    563 
    564 RFC 2192                    IMAP URL Scheme               September 1997
    565 
    566 
    567      [IMAP4] Crispin, M., "Internet Message Access Protocol - Version
    568      4rev1", RFC 2060, University of Washington, December 1996.
    569 
    570          <ftp://ds.internic.net/rfc/rfc2060.txt>
    571 
    572      [IMAP-AUTH] Myers, J., "IMAP4 Authentication Mechanism", RFC 1731,
    573      Carnegie-Mellon University, December 1994.
    574 
    575          <ftp://ds.internic.net/rfc/rfc1731.txt>
    576 
    577      [HTTP] Fielding, Gettys, Mogul, Frystyk, Berners-Lee, "Hypertext
    578      Transfer Protocol -- HTTP/1.1", RFC 2068, UC Irvine, DEC, MIT/LCS,
    579      January 1997.
    580 
    581          <ftp://ds.internic.net/rfc/rfc2068.txt>
    582 
    583      [IMAIL] Crocker, "Standard for the Format of ARPA Internet Text
    584      Messages", STD 11, RFC 822, University of Delaware, August 1982.
    585 
    586          <ftp://ds.internic.net/rfc/rfc822.txt>
    587 
    588      [KEYWORDS] Bradner, "Key words for use in RFCs to Indicate
    589      Requirement Levels", RFC 2119, Harvard University, March 1997.
    590 
    591          <ftp://ds.internic.net/rfc/rfc2119.txt>
    592 
    593      [MIME] Freed, N., Borenstein, N., "Multipurpose Internet Mail
    594      Extensions", RFC 2045, Innosoft, First Virtual, November 1996.
    595 
    596         <ftp://ds.internic.net/rfc/rfc2045.txt>
    597 
    598      [REL-URL] Fielding, "Relative Uniform Resource Locators", RFC 1808,
    599      UC Irvine, June 1995.
    600 
    601          <ftp://ds.internic.net/rfc/rfc1808.txt>
    602 
    603      [UTF8] Yergeau, F. "UTF-8, a transformation format of Unicode and
    604      ISO 10646", RFC 2044, Alis Technologies, October 1996.
    605 
    606          <ftp://ds.internic.net/rfc/rfc2044.txt>
    607 
    608 14. Author's Address
    609 
    610      Chris Newman
    611      Innosoft International, Inc.
    612      1050 Lakes Drive
    613      West Covina, CA 91790 USA
    614      EMail: chris.newman@innosoft.com
    615 
    616 
    617 
    618 Newman                      Standards Track                    [Page 11]
    619 
    620 RFC 2192                    IMAP URL Scheme               September 1997
    621 
    622 
    623 Appendix A.  Sample code
    624 
    625 Here is sample C source code to convert between URL paths and IMAP
    626 mailbox names, taking into account mapping between IMAP's modified UTF-7
    627 [IMAP4] and hex-encoded UTF-8 which is more appropriate for URLs.  This
    628 code has not been rigorously tested nor does it necessarily behave
    629 reasonably with invalid input, but it should serve as a useful example.
    630 This code just converts the mailbox portion of the URL and does not deal
    631 with parameters, query or server components of the URL.
    632 
    633 #include <stdio.h>
    634 #include <string.h>
    635 
    636 /* hexadecimal lookup table */
    637 static char hex[] = "0123456789ABCDEF";
    638 
    639 /* URL unsafe printable characters */
    640 static char urlunsafe[] = " \"#%&+:;<=>?@[\\]^`{|}";
    641 
    642 /* UTF7 modified base64 alphabet */
    643 static char base64chars[] =
    644   "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,";
    645 #define UNDEFINED 64
    646 
    647 /* UTF16 definitions */
    648 #define UTF16MASK       0x03FFUL
    649 #define UTF16SHIFT      10
    650 #define UTF16BASE       0x10000UL
    651 #define UTF16HIGHSTART  0xD800UL
    652 #define UTF16HIGHEND    0xDBFFUL
    653 #define UTF16LOSTART    0xDC00UL
    654 #define UTF16LOEND      0xDFFFUL
    655 
    656 /* Convert an IMAP mailbox to a URL path
    657  *  dst needs to have roughly 4 times the storage space of src
    658  *    Hex encoding can triple the size of the input
    659  *    UTF-7 can be slightly denser than UTF-8
    660  *     (worst case: 8 octets UTF-7 becomes 9 octets UTF-8)
    661  */
    662 void MailboxToURL(char *dst, char *src)
    663 {
    664     unsigned char c, i, bitcount;
    665     unsigned long ucs4, utf16, bitbuf;
    666     unsigned char base64[256], utf8[6];
    667 
    668 
    669 
    670 
    671 
    672 
    673 
    674 Newman                   Standards Track                       [Page 12]
    675 
    676 RFC 2192                    IMAP URL Scheme               September 1997
    677 
    678 
    679     /* initialize modified base64 decoding table */
    680     memset(base64, UNDEFINED, sizeof (base64));
    681     for (i = 0; i < sizeof (base64chars); ++i) {
    682         base64[base64chars[i]] = i;
    683     }
    684 
    685     /* loop until end of string */
    686     while (*src != '\0') {
    687         c = *src++;
    688         /* deal with literal characters and &- */
    689         if (c != '&' || *src == '-') {
    690             if (c < ' ' || c > '~' || strchr(urlunsafe, c) != NULL) {
    691                 /* hex encode if necessary */
    692                 dst[0] = '%';
    693                 dst[1] = hex[c >> 4];
    694                 dst[2] = hex[c & 0x0f];
    695                 dst += 3;
    696             } else {
    697                 /* encode literally */
    698                 *dst++ = c;
    699             }
    700             /* skip over the '-' if this is an &- sequence */
    701             if (c == '&') ++src;
    702         } else {
    703         /* convert modified UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX */
    704             bitbuf = 0;
    705             bitcount = 0;
    706             ucs4 = 0;
    707             while ((c = base64[(unsigned char) *src]) != UNDEFINED) {
    708                 ++src;
    709                 bitbuf = (bitbuf << 6) | c;
    710                 bitcount += 6;
    711                 /* enough bits for a UTF-16 character? */
    712                 if (bitcount >= 16) {
    713                     bitcount -= 16;
    714                     utf16 = (bitcount ? bitbuf >> bitcount
    715                              : bitbuf) & 0xffff;
    716                     /* convert UTF16 to UCS4 */
    717                     if
    718                     (utf16 >= UTF16HIGHSTART && utf16 <= UTF16HIGHEND) {
    719                         ucs4 = (utf16 - UTF16HIGHSTART) << UTF16SHIFT;
    720                         continue;
    721                     } else if
    722                     (utf16 >= UTF16LOSTART && utf16 <= UTF16LOEND) {
    723                         ucs4 += utf16 - UTF16LOSTART + UTF16BASE;
    724                     } else {
    725                         ucs4 = utf16;
    726                     }
    727 
    728 
    729 
    730 Newman                   Standards Track                       [Page 13]
    731 
    732 RFC 2192                    IMAP URL Scheme               September 1997
    733 
    734 
    735                     /* convert UTF-16 range of UCS4 to UTF-8 */
    736                     if (ucs4 <= 0x7fUL) {
    737                         utf8[0] = ucs4;
    738                         i = 1;
    739                     } else if (ucs4 <= 0x7ffUL) {
    740                         utf8[0] = 0xc0 | (ucs4 >> 6);
    741                         utf8[1] = 0x80 | (ucs4 & 0x3f);
    742                         i = 2;
    743                     } else if (ucs4 <= 0xffffUL) {
    744                         utf8[0] = 0xe0 | (ucs4 >> 12);
    745                         utf8[1] = 0x80 | ((ucs4 >> 6) & 0x3f);
    746                         utf8[2] = 0x80 | (ucs4 & 0x3f);
    747                         i = 3;
    748                     } else {
    749                         utf8[0] = 0xf0 | (ucs4 >> 18);
    750                         utf8[1] = 0x80 | ((ucs4 >> 12) & 0x3f);
    751                         utf8[2] = 0x80 | ((ucs4 >> 6) & 0x3f);
    752                         utf8[3] = 0x80 | (ucs4 & 0x3f);
    753                         i = 4;
    754                     }
    755                     /* convert utf8 to hex */
    756                     for (c = 0; c < i; ++c) {
    757                         dst[0] = '%';
    758                         dst[1] = hex[utf8[c] >> 4];
    759                         dst[2] = hex[utf8[c] & 0x0f];
    760                         dst += 3;
    761                     }
    762                 }
    763             }
    764             /* skip over trailing '-' in modified UTF-7 encoding */
    765             if (*src == '-') ++src;
    766         }
    767     }
    768     /* terminate destination string */
    769     *dst = '\0';
    770 }
    771 
    772 /* Convert hex coded UTF-8 URL path to modified UTF-7 IMAP mailbox
    773  *  dst should be about twice the length of src to deal with non-hex
    774  *  coded URLs
    775  */
    776 void URLtoMailbox(char *dst, char *src)
    777 {
    778    unsigned int utf8pos, utf8total, i, c, utf7mode, bitstogo, utf16flag;
    779    unsigned long ucs4, bitbuf;
    780    unsigned char hextab[256];
    781 
    782     /* initialize hex lookup table */
    783 
    784 
    785 
    786 Newman                   Standards Track                       [Page 14]
    787 
    788 RFC 2192                    IMAP URL Scheme               September 1997
    789 
    790 
    791     memset(hextab, 0, sizeof (hextab));
    792     for (i = 0; i < sizeof (hex); ++i) {
    793         hextab[hex[i]] = i;
    794         if (isupper(hex[i])) hextab[tolower(hex[i])] = i;
    795     }
    796 
    797     utf7mode = 0;
    798     utf8total = 0;
    799     bitstogo = 0;
    800     while ((c = *src) != '\0') {
    801         ++src;
    802         /* undo hex-encoding */
    803         if (c == '%' && src[0] != '\0' && src[1] != '\0') {
    804             c = (hextab[src[0]] << 4) | hextab[src[1]];
    805             src += 2;
    806         }
    807         /* normal character? */
    808         if (c >= ' ' && c <= '~') {
    809             /* switch out of UTF-7 mode */
    810             if (utf7mode) {
    811                 if (bitstogo) {
    812                 *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F];
    813                 }
    814                 *dst++ = '-';
    815                 utf7mode = 0;
    816             }
    817             *dst++ = c;
    818             /* encode '&' as '&-' */
    819             if (c == '&') {
    820                 *dst++ = '-';
    821             }
    822             continue;
    823         }
    824         /* switch to UTF-7 mode */
    825         if (!utf7mode) {
    826             *dst++ = '&';
    827             utf7mode = 1;
    828         }
    829         /* Encode US-ASCII characters as themselves */
    830         if (c < 0x80) {
    831             ucs4 = c;
    832             utf8total = 1;
    833         } else if (utf8total) {
    834             /* save UTF8 bits into UCS4 */
    835             ucs4 = (ucs4 << 6) | (c & 0x3FUL);
    836             if (++utf8pos < utf8total) {
    837                 continue;
    838             }
    839 
    840 
    841 
    842 Newman                   Standards Track                       [Page 15]
    843 
    844 RFC 2192                    IMAP URL Scheme               September 1997
    845 
    846 
    847         } else {
    848             utf8pos = 1;
    849             if (c < 0xE0) {
    850                 utf8total = 2;
    851                 ucs4 = c & 0x1F;
    852             } else if (c < 0xF0) {
    853                 utf8total = 3;
    854                 ucs4 = c & 0x0F;
    855             } else {
    856                 /* NOTE: can't convert UTF8 sequences longer than 4 */
    857                 utf8total = 4;
    858                 ucs4 = c & 0x03;
    859             }
    860             continue;
    861         }
    862         /* loop to split ucs4 into two utf16 chars if necessary */
    863         utf8total = 0;
    864         do {
    865             if (ucs4 >= UTF16BASE) {
    866                 ucs4 -= UTF16BASE;
    867                 bitbuf = (bitbuf << 16) | ((ucs4 >> UTF16SHIFT)
    868                                            + UTF16HIGHSTART);
    869                 ucs4 = (ucs4 & UTF16MASK) + UTF16LOSTART;
    870                 utf16flag = 1;
    871             } else {
    872                 bitbuf = (bitbuf << 16) | ucs4;
    873                 utf16flag = 0;
    874             }
    875             bitstogo += 16;
    876             /* spew out base64 */
    877             while (bitstogo >= 6) {
    878                 bitstogo -= 6;
    879                 *dst++ = base64chars[(bitstogo ? (bitbuf >> bitstogo)
    880                                : bitbuf)
    881                                      & 0x3F];
    882             }
    883         } while (utf16flag);
    884     }
    885     /* if in UTF-7 mode, finish in ASCII */
    886     if (utf7mode) {
    887         if (bitstogo) {
    888             *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F];
    889         }
    890         *dst++ = '-';
    891     }
    892     /* tie off string */
    893     *dst = '\0';
    894 }
    895 
    896 
    897 
    898 Newman                   Standards Track                       [Page 16]
    899