13.10. Treatment of Newline during Input and Output [sec_13-1-8]

Newlines are written according to the stream's EXT:ENCODING, see the function STREAM-EXTERNAL-FORMAT and the description of EXT:ENCODINGs, in particular, line terminators. The default behavior is as follows:

Platform Dependent: Win32 platform only.
When writing to a file, #\Newline is converted to CR/LF. (This is the usual convention on DOS.) For example, #\Return+#\Newline is written as CR/CR/LF.

When reading from a file, CR/LF is converted to #\Newline (the usual convention on DOS), and CR not followed by LF is converted to #\Newline as well (the usual conversion on MacOS, also used by some programs on Win32). If you do not want this, i.e., if you really want to distinguish LF, CR and CR/LF, you have to resort to binary input (function READ-BYTE).

Justification. Unicode Newline Guidelines say: Even if you know which characters represents NLF on your particular platform, on input and in interpretation, treat CR, LF, CRLF, and NEL the same. Only on output do you need to distinguish between them.

Rationale. In CLISP, #\Newline is identical to #\Linefeed (which is specifically permitted by the [ANSI CL standard] in [sec_13-1-7] Character Names). Consider a file containing exactly this string: (CONCATENATE 'STRING "foo" (STRING #\Linefeed) "bar" (STRING #\Return) (STRING #\Linefeed)) Suppose we open it with (OPEN "foo" :EXTERNAL-FORMAT :DOS). What should READ-LINE return? Right now, it returns "foo" (the second READ-LINE returns "bar" and reaches end-of-stream). If our i/o were faithful, READ-LINE would have returned the string (CONCATENATE 'STRING "foo" (STRING #\Linefeed) "bar"), i.e., a string with an embedded #\Newline between "foo" and "bar" (because a single #\Linefeed is not a #\Newline in the specified :EXTERNAL-FORMAT, it will not make READ-LINE return, but it is a CLISP #\Newline!) Even though the specification for READ-LINE does not explicitly forbids newlines inside the returned string, such behavior would be quite surprising, to say the least. Moreover, this line (with an embedded #\Newline) would be written as two lines (when writing to a STREAM with :EXTERNAL-FORMAT of :DOS), because the embedded #\Newline would be written as CR+LF.


These notes document CLISP version 2.49Last modified: 2010-07-07