A more detailed proposal for avoiding binary file corruption is
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
NO BINARY CORRUPTION
====================
Basic idea: A coding system is a filter converting an entire input
stream into an output stream. The resulting stream can be said to be
"correspondent to" the input stream. Similarly, smaller units can
correspond. These could potentially include zero width intervals on
either side, but we avoid this. Specifically, the coding system works
like:
loop (input) {
Read bytes till we have enough to generate a translated character or a chars.
This establishes a "correspondence" between the whole input and
output more or less in minimal chunks.
}
We then do the following processing:
1. Eliminate correspondences where one or the other of the I/O streams
has a zero interval by combining with an adjacent interval;
2. Group together all adjacent "identity" correspondences into as
large groups as possible;
3. Use text properties to store the non-identity correspondences on
the characters. For identity correspondences, use a simple text
property on all that contains no data but just indicates that the
whole string of text is identity corresponded. (How do we define
"identity"? Latin 1 or could it be something else? For example,
Latin 2)?
4. Figure out the procedures when text is inserted/deleted and copied
or pasted.
5. Figure out to save the file out making use of the
correspondences. Allow ways of saving without correspondences, and
doing a "save to buffer with and without correspondences." Need to
be clever when dealing with modal coding systems to parse the
correspondences to get the internal state right.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>