A more detailed proposal for avoiding binary file corruption is

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
NO BINARY CORRUPTION
====================

Basic idea: A coding system is a filter converting an entire input
stream into an output stream. The resulting stream can be said to be
"correspondent to" the input stream. Similarly, smaller units can
correspond. These could potentially include zero width intervals on
either side, but we avoid this.  Specifically, the coding system works
like:

loop (input) {

 Read bytes till we have enough to generate a translated character or a chars.

 This establishes a "correspondence" between the whole input and
 output more or less in minimal chunks.

}

We then do the following processing:

1. Eliminate correspondences where one or the other of the I/O streams
   has a zero interval by combining with an adjacent interval;

2. Group together all adjacent "identity" correspondences into as
   large groups as possible;

3. Use text properties to store the non-identity correspondences on
   the characters. For identity correspondences, use a simple text
   property on all that contains no data but just indicates that the
   whole string of text is identity corresponded. (How do we define
   "identity"? Latin 1 or could it be something else? For example,
   Latin 2)?

4. Figure out the procedures when text is inserted/deleted and copied
   or pasted.

5. Figure out to save the file out making use of the
   correspondences. Allow ways of saving without correspondences, and
   doing a "save to buffer with and without correspondences."  Need to
   be clever when dealing with modal coding systems to parse the
   correspondences to get the internal state right.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>