- `Eistring' API for easy manipulation of internally formatted data,
Mule-correct even without non-ASCII-compatible internal
representation.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
(E) For working with Eistrings:
-------------------------------
NOTE: An Eistring is a structure that makes it easy to work with
internally-formatted strings of data. It provides operations similar
in feel to the standard strcpy(), strcat(), strlen(), etc., but
(a) it is Mule-correct
(b) it does dynamic allocation so you never have to worry about size
restrictions (and all allocation is stack-local using alloca(), so
there is no need to explicitly clean up)
(c) it knows its own length, so it does not suffer from standard null
byte brain-damage
(d) it provides a much more powerful set of operations and knows about
all the standard places where string data might reside: Lisp_Objects,
other Eistrings, char * data with or without an explicit length, etc.
(e) it provides easy operations to convert to/from externally-formatted
data, and is much easier to use than the standard TO_INTERNAL_FORMAT
and TO_EXTERNAL_FORMAT macros.
The idea is to make it as easy to write Mule-correct string manipulation
code as it is to write normal string manipulation code. We also make
the API sufficiently general that it can handle multiple internal data
formats (e.g. some fixed-width optimizing formats and a default variable
width format) and allows for *ANY* data format we might choose in the
future for the default format, including UCS2. (In other words, we can't
assume that the internal format is ASCII-compatible and we can't assume
it doesn't have embedded null bytes.) All of this is hidden from the
user.
#### It is really too bad that we don't have a real object-oriented
language, or at least a language with polymorphism!
Eistring (name):
Declare a new Eistring. This is a standard local variable declaration
and can go anywhere in the variable declaration section, but note that
you *MUST* supply the parens.
----- Initialization -----
eicpy_* (eistr, ...):
Initialize the Eistring from somewhere:
eicpy_ei (eistr, eistr2):
... from another Eistring
eicpy_str (eistr, lisp_string):
... from a Lisp_Object string
eicpy_str_off (eistr, lisp_string, charpos, charlen):
... from a section of a Lisp_Object string
eicpy_str_off_byte (eistr, lisp_string, bytepos, bytelen):
... from a section of a Lisp_Object string, with offset and length
specified in bytes rather than chars
eicpy_buf (eistr, lisp_buf, charpos, charlen):
... from a Lisp_Object buffer
eicpy_buf_byte (eistr, lisp_buf, bytepos, bytelen):
... from a Lisp_Object buffer, with offset and length specified in
bytes rather than chars
eicpy_raw (eistr, intdata, intlen, intfmt):
... from raw internal-format data in the specified format
eicpy_c (eistr, c_string):
... from an ASCII null-terminated string. Non-ASCII characters in
the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_c_len (eistr, c_string, len):
... from an ASCII string, with length specified. Non-ASCII characters
in the string are *ILLEGAL* (read abort() with error-checking defined).
eicpy_ext (eistr, extdata, coding_system):
... from external null-terminated data, with coding system specified.
eicpy_ext_len (eistr, extdata, extlen, coding_system):
... from external data, with length and coding system specified.
eicpy_lstream (eistr, lstream):
... from an lstream; reads data till eof. Data must be in default
internal format; otherwise, interpose a decoding lstream.
----- Getting the data out of the Eistring -----
eirawdata (eistr):
eimake_string (eistr):
eimake_string_sect (eistr, charpos, charlen):
eimake_string_sect_byte (eistr, bytepos, bytelen):
eicpyout_raw_alloca (eistr, intfmt, intlen_out):
eicpyout_raw_malloc (eistr, intfmt, intlen_out):
eicpyout_c_alloca (eistr):
eicpyout_c_malloc (eistr):
eicpyout_c_len_alloca (eistr, len_out):
eicpyout_c_len_malloc (eistr, len_out):
----- Moving to the heap -----
eito_malloc (eistr):
eifree (eistr):
eito_alloca (eistr):
----- Retrieving the length -----
eilen (eistr):
eilen_byte (eistr):
----- Working with positions -----
eicharpos_to_bytepos (eistr, charpos):
eibytepos_to_charpos (eistr, bytepos):
----- Getting the character at a position -----
eiref (eistr, charpos):
eiref_byte (eistr, bytepos):
----- Concatenation -----
eicat_* (eistr, ...):
Concatenate onto the end of the Eistring, with data coming from the
same places as above. (All functions that take string sources allow
only two possibilities: Another Eistring and a simple C string.
In the general case, create another Eistring from the source.)
eicat_ei (eistr, eistr2):
eicat_c (eistr, c_string):
----- Replacement -----
eisub_* (eistr, charoff, charlen, ...):
eisub_*_byte (eistr, byteoff, bytelen, ...):
Replace a section of the Eistring.
eisub_ei (eistr, charoff, charlen, eistr2):
eisub_ei_byte (eistr, byteoff, bytelen, eistr2):
eisub_c (eistr, charoff, charlen, c_string):
eisub_c_byte (eistr, byteoff, bytelen, c_string):
----- Converting to an external format -----
eito_external (eistr, coding_system):
eiextdata (eistr):
eiextlen (eistr):
----- Searching in the Eistring for a character -----
eichr (eistr, chr):
eichr_byte (eistr, chr):
eichr_off (eistr, chr, charpos):
eichr_off_byte (eistr, chr, bytepos):
eirchr (eistr, chr):
eirchr_byte (eistr, chr):
eirchr_off (eistr, chr, charpos):
eirchr_off_byte (eistr, chr, bytepos):
----- Searching in the Eistring for a string -----
eistr_ei (eistr, eistr2):
eistr_ei_byte (eistr, eistr2):
eistr_ei_off (eistr, eistr2, charpos):
eistr_ei_off_byte (eistr, eistr2, bytepos):
eirstr_ei (eistr, eistr2):
eirstr_ei_byte (eistr, eistr2):
eirstr_ei_off (eistr, eistr2, charpos):
eirstr_ei_off_byte (eistr, eistr2, bytepos):
eistr_c (eistr, c_string):
eistr_c_byte (eistr, c_string):
eistr_c_off (eistr, c_string, charpos):
eistr_c_off_byte (eistr, c_string, bytepos):
eirstr_c (eistr, c_string):
eirstr_c_byte (eistr, c_string):
eirstr_c_off (eistr, c_string, charpos):
eirstr_c_off_byte (eistr, c_string, bytepos):
----- Comparison -----
eicmp_* (eistr, ...):
eicmp_off_* (eistr, charoff, charlen, ...):
eicmp_off_*_byte (eistr, byteoff, bytelen, ...):
eicasecmp_* (eistr, ...):
eicasecmp_off_* (eistr, charoff, charlen, ...):
eicasecmp_off_*_byte (eistr, byteoff, bytelen, ...):
Compare the Eistring with the other data. Return value same as
from strcmp.
eicmp_ei (eistr, eistr2):
eicmp_off_ei (eistr, charoff, charlen, eistr2):
eicmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicasecmp_ei (eistr, eistr2):
eicasecmp_off_ei (eistr, charoff, charlen, eistr2):
eicasecmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicmp_c (eistr, c_string):
eicmp_off_c (eistr, charoff, charlen, c_string):
eicmp_off_c_byte (eistr, byteoff, bytelen, c_string):
eicasecmp_c (eistr, c_string):
eicasecmp_off_c (eistr, charoff, charlen, c_string):
eicasecmp_off_c_byte (eistr, byteoff, bytelen, c_string):
----- Case-changing the Eistring -----
eilwr (eistr):
eiupr (eistr):
*/
--------------------------------------------------------------------------
/* ------------------------------ */
/* (E) For working with Eistrings */
/* ------------------------------ */
/* Note: Unfortunately, we have to write most of the Eistring functions as
macros, because of the use of alloca(). The principle used below to assure
no conflict in local variables is to prefix all local variables with "ei"
plus a number, which should be unique among macros. In practice, when
finding a new number, use one greater than all existing numbers. */
typedef struct
{
void *data;
Bytecount max_size_allocated;
Bytecount bytelen;
Charcount charlen;
int mallocp;
void *extdata;
Extcount extlen;
} Eistring_;
Eistring_ the_eistring_zero_init;
#define Eistring(name) Eistring_ name = the_eistring_zero_init
/* ----- Initialization ----- */
/* Make sure we can hold BYTELEN bytes plus a zero terminator.
Preserve existing data as much as possible. */
#define EI_ALLOC_(ei, charlen, bytelen) \
do { \
int ei1oldeibytelen = (ei).bytelen; \
int ei1newbytelen = bytelen; \
int ei1newcharlen = charlen; \
\
(ei).charlen = ei1newcharlen; \
(ei).bytelen = ei1newbytelen; \
\
if (ei1oldeibytelen != (ei).bytelen) \
{ \
if ((ei).mallocp) \
/* xrealloc always preserves existing data as much as possible */ \
(ei).data = xrealloc ((ei).data, (ei).bytelen + 1); \
else if ((ei).bytelen + 1 > (ei).max_size_allocated) \
{ \
/* We don't have realloc, so just use the existing allocation \
if it's big enough; but remember how big it really is. */ \
void *ei1oldeidata = (ei).data; \
(ei).max_size_allocated = (ei).bytelen + 1; \
(ei).data = alloca ((ei).max_size_allocated); \
memcpy ((ei).data, ei1oldeidata, ei1oldeibytelen); \
} \
((char *) (ei).data)[(ei).bytelen] = '\0'; \
} \
} while (0)
#define EI_ALLOC_AND_COPY_(ei, data, charlen, bytelen) \
do { \
EI_ALLOC_ (ei, charlen, bytelen); \
memcpy ((ei).data, data, (ei).bytelen); \
} while (0)
#define eicpy_ei(ei, eicpy) \
do { \
Eistring_ *ei2 = &(eicpy); \
EI_ALLOC_AND_COPY_ (ei, ei2->data, ei2->charlen, ei2->bytelen); \
} while (0)
#define eicpy_str(ei, lisp_string) \
do { \
Lisp_Object ei3 = (lisp_string); \
EI_ALLOC_AND_COPY_ (ei, XSTRING_DATA (ei3), XSTRING_CHAR_LENGTH (ei3), \
XSTRING_LENGTH (ei3)); \
} while (0)
#ifdef ERROR_CHECK_BUFPOS
#define EI_ASSERT_ASCII_(ptr, len) \
do { \
int ei5; \
\
/* we use PTR and LEN multiply; we assume the callers have macro-protected \
them. */ \
for (ei5 = 0; ei5 < len; ei5++) \
assert (ptr[ei5] >= 0x20 && ptr[ei5] < 0x7F); \
} while (0)
#else
#define EI_ASSERT_ASCII_(ptr, len)
#endif
#define eicpy_c(ei, c_string) \
do { \
char *ei4 = (char *) (c_string); \
\
EI_ASSERT_ASCII_ (ei4, strlen (ei4)); \
eicpy_ext (ei, ei4c, Qbinary); \
} while (0)
#define eicpy_c_len(ei, c_string, c_len) \
do { \
char *ei6 = (char *) (c_string); \
int ei6len = (c_len); \
\
EI_ASSERT_ASCII_ (ei6, ei6len); \
eicpy_ext_len (ei, ei6, ei6len, Qbinary); \
} while (0)
#define eicpy_ext_len(ei, extdata, extlen, coding_system) \
do { \
char *ei7 = (char *) (extdata); \
int ei7len = (extlen); \
\
TO_INTERNAL_FORMAT (DATA, (ei7, ei7len), \
ALLOCA, ((ei).data, (ei).bytelen), \
coding_system); \
(ei).max_size_allocated = (ei).bytelen + 1; \
(ei).charlen = bytecount_to_charcount ((ei).data, (ei).bytelen); \
} while (0)
#define eicpy_ext(ei, extdata, coding_system) \
do { \
char *ei8 = (char *) (extdata); \
\
eicpy_ext_len (ei, ei8, strlen (ei8), coding_system); \
} while (0)
/*
eicpy_str_off (eistr, lisp_string, charpos, charlen):
... from a section of a Lisp_Object string
eicpy_str_off_byte (eistr, lisp_string, bytepos, bytelen):
... from a section of a Lisp_Object string, with offset and length
specified in bytes rather than chars
eicpy_buf (eistr, lisp_buf, charpos, charlen):
... from a Lisp_Object buffer
eicpy_buf_byte (eistr, lisp_buf, bytepos, bytelen):
... from a Lisp_Object buffer, with offset and length specified in
bytes rather than chars
eicpy_raw (eistr, intdata, intlen, intfmt):
... from raw internal-format data in the specified format
eicpy_lstream (eistr, lstream):
... from an lstream; reads data till eof. Data must be in default
internal format; otherwise, interpose a decoding lstream.
*/
/* ----- Getting the data out of the Eistring ----- */
#define eirawdata(ei) ((ei).data)
/*
eimake_string (eistr):
eimake_string_sect (eistr, charpos, charlen):
eimake_string_sect_byte (eistr, bytepos, bytelen):
eicpyout_raw_alloca (eistr, intfmt, intlen_out):
eicpyout_raw_malloc (eistr, intfmt, intlen_out):
eicpyout_c_alloca (eistr):
eicpyout_c_malloc (eistr):
eicpyout_c_len_alloca (eistr, len_out):
eicpyout_c_len_malloc (eistr, len_out):
*/
/* ----- Moving to the heap ----- */
/*
eito_malloc (eistr):
eifree (eistr):
eito_alloca (eistr):
*/
/* ----- Retrieving the length ----- */
#define eilen(ei) ((ei).charlen)
#define eilen_byte(ei) ((ei).bytelen)
/* ----- Working with positions ----- */
#define eicharpos_to_bytepos(ei, charpos) \
charcount_to_bytecount ((ei).data, charpos)
#define eibytepos_to_charpos(ei, bytepos) \
bytecount_to_charcount ((ei).data, bytepos)
/* ----- Getting the character at a position ----- */
#define eiref(ei, charpos) charptr_emchar_n ((ei).data, charpos)
#define eiref_byte(ei, bytepos) \
charptr_emchar ((char *) ((ei).data) + (bytepos))
/* ----- Concatenation ----- */
#define eicat_ei(ei, ei2) \
do { \
Eistring__ *ei9 = &(ei2); \
int ei9oldeibytelen = (ei).bytelen; \
EI_ALLOC_ (ei, (ei).charlen + ei9->charlen, \
(ei).bytelen + ei9->bytelen); \
memcpy ((char *) (ei).data + ei9oldeibytelen, ei9->data, \
ei9->bytelen); \
} while (0)
#define eicat_c(ei, c_string) \
do { \
Eistring (ei10); \
\
eicpy_c (ei10, c_string); \
eicat_ei (ei, ei10); \
} while (0)
/* ----- Replacement ----- */
/*
eisub_* (eistr, charoff, charlen, ...):
eisub_*_byte (eistr, byteoff, bytelen, ...):
Replace a section of the Eistring.
eisub_ei (eistr, charoff, charlen, eistr2):
eisub_ei_byte (eistr, byteoff, bytelen, eistr2):
eisub_c (eistr, charoff, charlen, c_string):
eisub_c_byte (eistr, byteoff, bytelen, c_string):
*/
/* ----- Converting to an external format ----- */
#define eito_external(ei, coding_system) \
do { \
TO_EXTERNAL_FORMAT (DATA, ((ei).data, (ei).bytelen), \
ALLOCA, ((ei).extdata, (ei).extlen), \
coding_system); \
} while (0)
#define eiextdata(ei) ((ei).extdata)
#define eiextlen(ei) ((ei).extlen)
/* ----- Searching in the Eistring for a character ----- */
/*
eichr (eistr, chr):
eichr_byte (eistr, chr):
eichr_off (eistr, chr, charpos):
eichr_off_byte (eistr, chr, bytepos):
eirchr (eistr, chr):
eirchr_byte (eistr, chr):
eirchr_off (eistr, chr, charpos):
eirchr_off_byte (eistr, chr, bytepos):
*/
/* ----- Searching in the Eistring for a string ----- */
/*
eistr_ei (eistr, eistr2):
eistr_ei_byte (eistr, eistr2):
eistr_ei_off (eistr, eistr2, charpos):
eistr_ei_off_byte (eistr, eistr2, bytepos):
eirstr_ei (eistr, eistr2):
eirstr_ei_byte (eistr, eistr2):
eirstr_ei_off (eistr, eistr2, charpos):
eirstr_ei_off_byte (eistr, eistr2, bytepos):
eistr_c (eistr, c_string):
eistr_c_byte (eistr, c_string):
eistr_c_off (eistr, c_string, charpos):
eistr_c_off_byte (eistr, c_string, bytepos):
eirstr_c (eistr, c_string):
eirstr_c_byte (eistr, c_string):
eirstr_c_off (eistr, c_string, charpos):
eirstr_c_off_byte (eistr, c_string, bytepos):
*/
/* ----- Comparison ----- */
/*
eicmp_* (eistr, ...):
eicmp_off_* (eistr, charoff, charlen, ...):
eicmp_off_*_byte (eistr, byteoff, bytelen, ...):
eicasecmp_* (eistr, ...):
eicasecmp_off_* (eistr, charoff, charlen, ...):
eicasecmp_off_*_byte (eistr, byteoff, bytelen, ...):
Compare the Eistring with the other data. Return value same as
from strcmp.
eicmp_ei (eistr, eistr2):
eicmp_off_ei (eistr, charoff, charlen, eistr2):
eicmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicasecmp_ei (eistr, eistr2):
eicasecmp_off_ei (eistr, charoff, charlen, eistr2):
eicasecmp_off_ei_byte (eistr, byteoff, bytelen, eistr2):
eicmp_c (eistr, c_string):
eicmp_off_c (eistr, charoff, charlen, c_string):
eicmp_off_c_byte (eistr, byteoff, bytelen, c_string):
eicasecmp_c (eistr, c_string):
eicasecmp_off_c (eistr, charoff, charlen, c_string):
eicasecmp_off_c_byte (eistr, byteoff, bytelen, c_string):
*/
/* ----- Case-changing the Eistring ----- */
int eistr_casefiddle_1 (Bufbyte *olddata, Bytecount len, Bufbyte *newdata,
int downp);
#define EI_CASECHANGE_(ei, downp) \
do { \
int ei11new_allocmax = (ei).charlen * MAX_EMCHAR_LEN + 1; \
Bufbyte *ei11storage = alloca_array (Bufbyte, ei11new_allocmax); \
int ei11newlen = eistr_casefiddle_1 ((ei).data, (ei).bytelen, \
ei11storage, downp); \
\
if (ei11newlen) \
{ \
(ei).max_size_allocated = ei11new_allocmax; \
(ei).data = ei11storage; \
(ei).bytelen = ei11newlen; \
/* charlen is the same. */ \
} \
} while (0)
#define eilwr(ei) EI_CASECHANGE_ (ei, 1)
#define eiupr(ei) EI_CASECHANGE_ (ei, 0)