Conversion to and from External Data
------------------------------------

   When an external function, such as a C library function, returns a
`char' pointer, you should almost never treat it as `Bufbyte'.  This is
because these returned strings may contain 8bit characters which can be
misinterpreted by XEmacs, and cause a crash.  Likewise, when exporting
a piece of internal text to the outside world, you should always
convert it to an appropriate external encoding, lest the internal stuff
(such as the infamous \201 characters) leak out.

   The interface to conversion between the internal and external
representations of text are the numerous conversion macros defined in
`buffer.h'.  There used to be a fixed set of external formats supported
by these macros, but now any coding system can be used with these
macros.  The coding system alias mechanism is used to create the
following logical coding systems, which replace the fixed external
formats.  The (set-symbol-value-handler) mechanism was enhanced to make
this possible (more work on that is still needed).

`Qbinary'
     This is the simplest format and is what we use in the absence of a
     more appropriate format.  This converts according to the `binary'
     coding system:

       a. On input, bytes 0-255 are converted into (implicitly Latin-1)
          characters 0-255.  A non-Mule xemacs doesn't really know about
          different character sets and the fonts to display them, so
          the bytes can be treated as text in different 1-byte
          encodings by simply setting the appropriate fonts.  So in a
          sense, non-Mule xemacs is a multi-lingual editor if, for
          example, different fonts are used to display text in
          different buffers, faces, or windows.  The specifier
          mechanism gives the user complete control over this kind of
          behavior.

       b. On output, characters 0-255 are converted into bytes 0-255
          and other characters are converted into `~'.

`Qfile_name'
     Format used for filenames.  This is user-definable via either the
     `file-name-coding-system' or `pathname-coding-system' (now
     obsolete) variables.

`Qnative'
     Format used for the external Unix environment--`argv[]', stuff
     from `getenv()', stuff from the `/etc/passwd' file, etc.
     Currently this is the same as Qfile_name.  The two should be
     distinguished for clarity and possible future separation.

`Qctext'
     Compound-text format.  This is the standard X11 format used for
     data stored in properties, selections, and the like.  This is an
     8-bit no-lock-shift ISO2022 coding system.  This is a real coding
     system, unlike Qfile_name, which is user-definable.

   There are two fundamental macros to convert between external and
internal format.

   `TO_INTERNAL_FORMAT' converts external data to internal format, and
`TO_EXTERNAL_FORMAT' converts the other way around.  The arguments each
of these receives are a source type, a source, a sink type, a sink, and
a coding system (or a symbol naming a coding system).

   A typical call looks like
     TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);

   which means that the contents of the lisp string `str' are written
to a malloc'ed memory area which will be pointed to by `ptr', after the
function returns.  The conversion will be done using the `file-name'
coding system, which will be controlled by the user indirectly by
setting or binding the variable `file-name-coding-system'.

   Some sources and sinks require two C variables to specify.  We use
some preprocessor magic to allow different source and sink types, and
even different numbers of arguments to specify different types of
sources and sinks.

   So we can have a call that looks like
     TO_INTERNAL_FORMAT (DATA, (ptr, len),
                         MALLOC, (ptr, len),
                         coding_system);

   The parenthesized argument pairs are required to make the
preprocessor magic work.

   Here are the different source and sink types:

``DATA, (ptr, len),''
     input data is a fixed buffer of size LEN at address PTR

``ALLOCA, (ptr, len),''
     output data is placed in an alloca()ed buffer of size LEN pointed
     to by PTR

``MALLOC, (ptr, len),''
     output data is in a malloc()ed buffer of size LEN pointed to by PTR

``C_STRING_ALLOCA, ptr,''
     equivalent to `ALLOCA (ptr, len_ignored)' on output.

``C_STRING_MALLOC, ptr,''
     equivalent to `MALLOC (ptr, len_ignored)' on output

``C_STRING, ptr,''
     equivalent to `DATA, (ptr, strlen (ptr) + 1)' on input

``LISP_STRING, string,''
     input or output is a Lisp_Object of type string

``LISP_BUFFER, buffer,''
     output is written to `(point)' in lisp buffer BUFFER

``LISP_LSTREAM, lstream,''
     input or output is a Lisp_Object of type lstream

``LISP_OPAQUE, object,''
     input or output is a Lisp_Object of type opaque

   Often, the data is being converted to a '\0'-byte-terminated string,
which is the format required by many external system C APIs.  For these
purposes, a source type of `C_STRING' or a sink type of
`C_STRING_ALLOCA' or `C_STRING_MALLOC' is appropriate.  Otherwise, we
should try to keep XEmacs '\0'-byte-clean, which means using (ptr, len)
pairs.

   The sinks to be specified must be lvalues, unless they are the lisp
object types `LISP_LSTREAM' or `LISP_BUFFER'.

   For the sink types `ALLOCA' and `C_STRING_ALLOCA', the resulting
text is stored in a stack-allocated buffer, which is automatically
freed on returning from the function.  However, the sink types `MALLOC'
and `C_STRING_MALLOC' return `xmalloc()'ed memory.  The caller is
responsible for freeing this memory using `xfree()'.

   Note that it doesn't make sense for `LISP_STRING' to be a source for
`TO_INTERNAL_FORMAT' or a sink for `TO_EXTERNAL_FORMAT'.  You'll get an
assertion failure if you try.