EFFECTIVE INTERACTIVE USE OF LARGE CHARACTER SETS

The interactive use of large character sets needs to be distinguished from the somewhat simpler task of making them available for the production of printed documents.

The SUN terminal , or an inexpensive version thereof that can replace Datadiscs, will provide the ability to display arbitrary characters - even variable width. Moreover, we are now planning for new computer equipment that might, for example, replace SAIL and SCORE by something more powerful. Nevertheless, unless we plan some standards for using enlarged character sets and make the corresponding software changes, we will lose even the mildly enlarged character set we have on SAIL - a prospect many who make use of the extra characters will regard with dismay.

[2004 note: SUN originally stood for Stanford University Network and was a Stanford project. I advocated that the project concentrate on a terminal that could be used with time-shared computers and would be inexpensive enough to put on every graduate student's desk. Instead the project designed a rather expensive and powerful single user computer, went off campus, and made fortunes for the participants. It was many years before personal computers became cheap enough to meet my original goal, and it wasn't SUN that did it.]

Here is the result we should strive for; later we'll consider how it might be achieved.

1. Arbitrary characters can inhabit text files. When the files are being edited or read, the characters are visible. When the files are printed, the characters appear on paper. We are not referring here to arbitrary fonts or arbitrary sizes. Still less do we mean an on-line TEX or PUB. The extra characters are treated like the extra characters in the SAIL set, but arbitrary character sets may be used.

[2004 note: Starting in 1966, the Stanford Artificial Intelligence Laboratory acquired a line printer, terminals, and keyboards using an enriched character set. The set included the logical symbols which TeX represents \land \lor \lnot, \equiv, \forall, \exists and arithmetic symbols \leq, \geq and a few Grek letters alpha, beta, epsilon, lambda, pi that are extensively used in mathematical logic and mathematics generally. Logic was emphasized, because logical AI was my research interest, and I was director of the lab. The M.I.T. AI Lab copied our character set with a few changes, and the enriched set was used on the first versions of Lisp machines. When commercial keyboards became really cheap, the enriched character set was abandoned.]

2. When people at other computers read or print our files they also get all the characters. Of course, this will require a certain amount of standardization. Preferably, we will standardize a character description language rather than trying to agree on a standard set of characters.

3. When a file is being edited, the new characters are treated just like ordinary characters as are the enlargements to the SAIL character set. This should be made to work even when the characters are peculiar to a given user.

4. In the best of all possible worlds, the key tops would have LCD displays, and there would be several shift keys like the TOP key on the SAIL keyboards. It seems very unlikely that we will be able to achieve this.

A more likely possibility is to use small keys with space above them for a legend giving the additional characters in the CSD Standard set which can be covered by a plastic or paper overlay when a different set is being used. It has been suggested that the overlay plastic have a flange that projects down into a slot where it can be machine read, but it isn't likely that this will be available in the forseeable future either.

There can be one or more TOP keys as on the SAIL keyboard and many calculator keyboard. Alterna\-tive\-ly, one can use ``escape'' keys typed before the key whose interpretation is to be modified. A third alternative is to use a standard keyboard and concoct the modifications out of the control key and whatever else may be available. The third alternative should certainly be available, because, for various reasons cheap keyboards, sometimes have to be used. There is some consensus that extra TOP keys are preferable to prefixed keys, because the latter puts the editing process into an intermediate state after the prefix key has been typed and before the other key has been typed. Such states lead to errors.

The proposed facilities should be distinguished from the ability to include arbitrary characters in documents printed by PUB, TEX, etc. and also from proposals to make such languages more interactive by using good displays to allow the user of such a language to see and edit directly the finished form of the document he is producing. Our proposals are entirely independent of formatting languages.

Specifically, we propose to provide the user with the ability to interact directly with programs in arbitrary character sets. No special formatting takes place. We are merely providing the user with the ability to use enlarged sets of symbols.

This system does not provide all the output flexibility of such a formatting language. Specifically, control over character size and font would not be offered as a general system facility. This is for two reasons. First, providing the symbols in standard sizes seems like a difficult enough task. Second, it isn't clear that control over size and font would be worth the costs to the user in learning how to use the system. Of course, programs concerned with formatted output could use the display facilities to give the user interactive control over these parameters of the characters.

Programs that interact with a user could use arbitrary fonts and sizes for output and could readily switch fonts for input so as to distinguish user input from program output.

\centerline{\bf IMPLEMENTATION CONSIDERATIONS}

Here are some ideas for implementation.

1. Editors like E and EMACS can be modified to represent special characters by to or three byte (7 bits per byte is what they now use) strings. Commands that space through a file must space over the strings representing single characters. If the characters are of variable width, the editors will have to know about that if they have JUSTIFY and FILL commands.

2. As the SAIL keyboards do now, the keyboards will send strings of seven bit characters to the computer. Therefore, the symbol will not be identified by the key that was typed. It will have to be established by the program with which the user is interacting, from a user INIT file, by initial convention, or by a system command.

3. Text files will need to keep information in directories about the character set being used. Only this will permit them to be displayed at remote sites or even by other users of the same system.

4. The display system must be able to maintain a different large character set for each of its users. With the large address space of the M68000 and the use of 64K rams for its memory, this should not be a problem. Of course, the user may put additional burdens on this memory by switching among several character sets using facilities analogous to SAIL's R.

5. The basic form of a character must be a drawing made of curves rather than a dot image, since it must be displayable and printable on a variety of devices with different resolutions, i.e. Metafont or something like it must be the basic form. The support for display and printing devices must include programs for converting fonts from standard form to forms suitable for the device.

6. There arise the administrative and technical problems of standardizing and registering characters and character sets. This is in addition to standardizing metafont or other means of defining character shapes. One could imagine a national registry of characters to which the inventor of a new character or set could send a design and from which he would receive a registration number. This number could then be included in the prefix of a file using the character or characters. Alternatively, the file could refer to a local registry of characters or even to one of the user's own character design files. The preamble of the file could itself contain the designs for exotic characters, though that might make them rather long. All these systems should co-exist, of course.

We can suppose that commercial publishers and organizations that publish journals like ACM, IEEE and the American Mathematical Society would keep their own registries of characters. It would be important that these registries be network or Dialnet accessible to people who prepare manuscripts for them to publish.

7. Most likely, Stanford will have to act before any standardization committee can be formed and do its job. Therefore, we should undertake to make our work as standardizable as possible, and this includes publishing what we are up to. However, it wouldn't hurt to ask whether M.I.T. and C.M.U. and Xerox and maybe even IBM are interested in talking about character standardization. The costs of doing a good job may be large enough to warrant an externally supported project.

8. It seems that the problem of defining characters apart from fonts interacts in various unpleasant ways with the problem of font definition. Perhaps there needs to be a standard (say Times Roman like) style for defining new characters.

9. This draft may not take into account sufficiently work already done in standardizing characters for publishing purposes, but that work probably doesn't take into account the requirements of the interactive use of large character sets. [2004 note: I was right on both counts].

[2004 note: The goal of having character sets good for mathematical and other scientific work is still important and has not been achieved. Here's my 2004 opinion.

Do it in two stages. Stage one uses ordinary ASCII keyboards, but has editors that can react to typing \alpha by putting an alpha on the display as soon as it recognizes the sequence. Otherwise, the editor is ordinary, using a 16 bit representation of characters. Stage two allows for specialized keyboards that emit sequences of ASCII characters to the computer when a specialized key is typed. Ideally, the characters displayed on the keys should be downloadable from the computer.]

The TEX source file for this document is KEYBOA.TEX[W81,JMC] at SU-AI.