index

(Transcription of old typewritten and handwritten notes from 1984, to modern HTML. Minor typo and editorial corrections. Use of 'we' is only for writing style; it was just me as lone inventor doing everything.)

Project Galaxy

Daren Scot Wilson, 1984

Background

For a long time we have been trying to produce a functioning NCS Compiler and to produce a better Operating System than Ohio Scientific had supplied. These two goals were combined in Project Magellan, which has failed miserably in practical terms althought as a test project in the management of large complex programs and as a project for learning design methods, it had been a great success.

In an unrelated incident, the Operating System chip in the OS Challenger had been zapped and is no longer functional. A new OS is desperately needed if the OSC is to be of any use.

Before Magellan, we saw the possibility of repalcing not only the original OS ROM, but also the BASIC ROM, allowing us to create our own 10K Operating System and Language. This was considered too grand a project at the time, so it was named "Galaxy" and filed away. Now that no OS is available on this machine, the time seems ripe for this project. We have the experience from Magellan. NCS has grown sophisticated enough on paper and in our minds that we can tackle Project Galaxy.

What Project Galaxy is Proposed to Include

NCS Compiler & Primitive Words. With MBC. Without "longhand" concept
Assembler, capable of recording object code on tape, or compiling directly into memory.
Many glorious I/O modules, including modules for
- Cassette tape for various purposes (software, data, audio processing)
- "Screenwerk" ASCII graphics [ed. note: the Ohio Scientific used an old B&W CRT TV as the display device.]
- Dvorak keyboard layout
Swappable sub-directories as libraries to support different applications
Fixed-point numerical modules, at least as good as found in a typical BASIC.
Plenty of support modules to implement complex data structures.
Debugging aids, such breakpoints, stack trace.

General Requirements

First in order to motivate the general specification of Galaxy, we review the tasks to which OSC is put. These fall into two broad groups: Mathematics, and Hardware Experiments.

Among the former are projects for exploring analytic functions in the complex plane e.g. finding zeros of the texp function, extending work previous done by hand such as finding the higher-order Szekeres polynomials, concocting new ways to solve partial differential equations, exploring function minimization algorithms, and matrix math such as finding eigenvalues for quantum mechanical operators. These applications form on ecategory where more or less regular numbers are used to various precisions.

We also do much research of mathematical nature using non-numerical data types, such as arrays of binary bits for cellular automate (Conway's Game of Life automata) or complex networks of complicated elements such as neural networks, or possibly someday, trees and linked lists and other structures not usually used in mathematical computing.

Among the second broad class of uses are those involving connections to hardware devices to be controlled or read. In the past we have constructed video display devices which require complicated control signals and more often than not are used to display results from mathematical programs. The musical instrument project SeOSI required much consideration at the level of hardware, I/O and clock cycles. These tasks require machine language for time optimization and fine control of bit-twiddling. In contrast, mathematical tasks are best done in high level languages.

As a result of these two categories, a third arises: the Computer Science tasks which involve designing and testing compilers, interpreters, developing new programming techniques and concepts. The development of NCS and of assemblers over the past few years are the prime examples here. In the future we may wan to try making a Lisp interpreter or fiddle around with APL. There is, therefore, great interest in anything that can support the design, construction and testing of high-level language systems.

To help clarify what we use OSC for, we state what OSC is not used for: Business and financial applications, record-keeping of any kind, games (except as toy projects for exploring neural networks and other artificial intelligence), educational programs for children, and whatever else it is that people go out and buy Apple and IBM PCs for these days.

To summarize, we use computers strictly as a tool to manipulate pure ideas, such as mathematical ideas and programming concepts. Sophisticated types and sophisticated manipulations are what we need. When Hardware is to be used, precise control over bits is vital, meaning machine language may be the best level to work in. R&D in computer science projects demands tools to ease work in this area.

Purposes

To provide a flexible reasonably efficient intermediate-level language in which high-level language system may be written (NCS, Monobyte).
To make this intermediate language directly usable as-is.
To provide an assembler so machine language routines may be easily constructed. There should be no obstacles to time-optimized programming.

Design

Each of these shall now be discussed in more detail. Remember, we now have 10K of memory [ROM] available for this system! And much of it can be written in NCS MonoByte Code, and so will be very space-efficient!

NCS

The language NCS is similar to Forth. Many cosmetic changes make NCS more readable than Forth, and there really aren't any profound differences. An NCS program is a collection of modules, each a simple subroutine which performs a simple task and may call other modules. The jargon for this sort of arrangement is "threaded code".

Data is manipulated on a stack sixteen bits wide. Primitive modules allow one to do such actions as add the top two items on the stack, make decisions based on the value of the top item, etc.

A compiler allows one to type (or otherwise enter) in NCS source code to define new modules in terms of the primitives and previously defined modules.

The Stack

Be aware there are several stacks. One is the Return Address Stack [aka Machine Stack] which is governed by the 6502, and exists in page one of memory [addresses $01xx]. The most important stack as far as language design is concerned is the Arithmetic Stack [aka Data Stack] which in Project Magellan was located in page zero to allow time and space optimization. We propose that in Project Galaxy, the AS be located in the high part of RAM, perhaps starting from the highest valid RAM address, and grow downward. Sure, this might be a smidgen slower than a page zero AS, and more complicated to write primitive manipulations for, but in consideration of the sort of applications OSC is used for, it is best we leave page zero free for use in other ways, such as the self-modifying wavetable lookup routines in SeOSI.

MonoByte

Each user-defined module as well as many in the firmware is basically a sequence of JSR xxxx instructions. Modules actually implemented this way are said to be "LongHand". To save space, the sequence could be written as a sequence of specail bytes, one byte for each original call. An interpreter reads these codes and calls the appropriate modules. A module written using this scheme is said to be written in "MonoByte Code", and the interpreter is named the "MonoByte Needle" (MBN).

LongHand is fast. MonoByte saves space by a factor of three but is slower by a factor of about thirty. While it would be nice to have the NCS compilers, one for each type of executable code, there are complications in defining some of the NCS primitive control structures. Two definitions would have to be given for the basic conditionals, and a means provided for telling the compiler which definition to use. There could be compatibility problems.

Therefore, we propose exclusive use of MonoByte, because speed in an OS is not as important as having plenty of useful features. Where fast subroutines are needed, the assembler in the OS can be used to compile LongHand modules from assembly language.

ANother problem area is MonoByte and LongHand are both implemented is back-grabbing. Back-grabbing is when a subroutine call is followed with data that the subroutine fetches by popping off the return address, using it to fetch the data, adjusting the return address and finally pushing it back onto the machine stack. This is fairly easy in machine code and also in LongHand (being machine code of a particular style.)

MonoByte must backgrab in a special way. The Needle and everything else must be designed to work in a certain way, while being time-optimized. Certain words in NCS, such as "SAY", would need two definitions - one for Longhand and one for MBC, only because back-grabbing the text would need to be implemented differently.

It may be possible to get by with one definition which calls a special backgrabbing subroutine which determines the method and takes the appropriate action, but this is complicated and deepens the machine call nesting, which costs time. Complex manipulations of the return address stack is risky. Therefore, we think it wise to use MonoByte solely and avoid trouble. Fast modules written in assembly code may back-grab, but in a way unrelated to the NCS implementation, designed case-by-case.

Control Structures

Having decided to use only MonoByte, control structures in NCS shall be implemented with special branching MB codes followed by relative offsets, much like a regular branch operation in 6502 code. No LongHand equivalents shall be available. There has been much controversy in the past over whether to use a third stack, a Loop Stack [aka Control Stack] to hold data for the Measured Loops [iteration count known before executing loop] or to use the return stack somehow. We favor the latter. This was how Magellan was designed. The Loop Stack can be shown to be unnecessary, and by virtue of the modularity of NCS, there can be no problems (we think!) with nesting loop data within return addresses on the return stack.

Removable Modules

The programming process in NCS consists of creating one module after another. Backward references only. An old module can be replaced by a new module for compiling purposes only, or functionality. [??] To approach the ease of program modification of BASIC, however, we would like to be able to remove old modules completely. This may be impossible, because of the nature of the Dictionary of Modules and the way MBC is defined.

The way MBC works is this: If a code is, say $07, then the MonoByte Needle starts at the beginning of the Dictionary and counts ahead seven modules. It then jumps to the execution entry point for that module. Of course, executing the module for MBC 0xFF will take time, but we're not intending for MonoByte to be fast.

If the 5th module were to be removed completely, then the 7th module becomes the 6th, and everything is thereafter screwed up. Either a removed module should leave a nullified header so the Needle can count correctly, or all the MonoByte-coded modules making use of any module after the 5th should have their codes adjusted.

[In the old typewritten notes, this paragraph is crossed off.] An alternative: Use a table of module entry point addresses, supplying one address for each MonoByte Code. Then the number of modules between any one desired module and the origin of the Dictionary becomes irrelevant. If a module is to be removed, we'd like to salvage the space freed by relocating all the modules following it. The table of addresses would need to be adjusted. This may be complicated and time consuming. [!? Apparently I hadn't seen linkers, loaders or heard of garbage collection yet.] The reason we did not choose to use a table of addresses in Magellan wsa that it was wastefu of space, which had to be highly optimized. Now with 10K ROM and 8K RAM (actually 7K due to a bad chip), perhaps it will work fine.

System of Dictionaries

Here our experience with Project Magellan is not so helpful. In BASIC, it is easy to have several programs in memory at one time - just use wildly different line numbers. One program occupying lines 10 - 19999 and another occupying 40000 - 49999, with no GOTOs between, separate entry points and END points, may use the same variable names without confusion. But if in NCS we use only one long Dictionary, and two independent programs each define a module named "Doorknob", how can the interpreter know which one to use? There is no means of distinction between programs, or collections of modules serving special purposes e.g. a matrix library and an LED controller interface library.

By using several distinct Dictionaries such ambiguities may be avoided. Program #1 uses a common base Dictionary plus its own secondary Dictionary #1, which contains Doorknob #1. Program #2 has its Doorknob #2 in Dictionary #2, while sharing the same common base Dictionary. We would like this capability in Project Galaxy.

We have one idea for how to implement this: It concerns the design of MonoByte Code. Regular MBC are single bytes which refer only to the primitive and secondary modules listed in the OS ROM. To call a user-defined module, a special "escape" code is used followed by the serial number of the desired user module. We would have several MonoByte codes designated as User Escape Codes, each having its own Dictionary. We propose four such codes and secondary Dictionaries.

The origins of each Dictionary shall be kept in page 2 (addresses 02xx) in a special table. This allows over two hundred OS NCS modules, and as many per secondary Dictionary. In Magellan, we would have make the user-defined modules consecutive with the OS base modules, so that the first and last user module would have MonoByte codes #250 and #562 [??]. Any user programs would run slowly. With the escape codes, programs could run quick.

Calling Convention

Some modules will be coded in NCS MBC, while other, especially the primitives, will be coded in 6502 machine. When one module calls another, how does the caller know if the callee is coded in 6502 or MBC? It shall assume 6502. If speed is important, this is good. If MBC is desired for space efficiency, the overhead for a jump to the MonoByte Needle is not a concern. This is the same convention as in Project Magellan.

The opposite convention, having every module start off in MBC, would be costly always, since there'd always be at least a special code to switch to 6502, with overhead likely to cancel the speed advantage of using 6502.

Some modules may alternate sequences of 6502 and MBC, where speed is needed in inner loops and tightness needed for complex higher-level operation. Always, the initial execution is 6502. Special codes switch between modes. The module's executable code may end either way; a 6502 RET, or MBC's RETURN which understands the Return Address Stack.

Daren Scot Wilson's personal site
linkedin