So, over the past little while, for a variety of reasons, I’ve been learning Forth. It’s a bizarre, wrongheaded little language that somehow manages to mostly turn its weaknesses into strengths. I can see how its inventor, Chuck Moore, was seduced by it, and by the philosophy of total simplification — once he simplified the problem of the compiler to ridiculous extremes, it fell apart in his hands and he was left with a rather powerful little language.

Some quick background:
Forth, as a language, is really, really simple. It’s so simple that there isn’t syntax, or even a parse tree. The Forth parser simply reads some sort of input until it comes across a space, looks up this “word” in its internal dictionary, and executes its definition. Argument passing is done implicitly, on the stack. You can think of Forth as a language where even data is code; “1″ is literally a word that means, “push ‘1′ onto the stack.”

What this ends up meaning is that Forth is an aggressively postfix language. To say 1 + 2, you pass two arguments to + on the stack by writing 1 2 +. Now, Forth is flexible enough that you could rewrite + to read the next word from the input buffer and execute it first, or parse it as a number, or whatever, but it’s unForthlike and causes problems. For example, what if the word on the right side of + causes two items to be put onto the stack? It just doesn’t play nice with Forth’s model of the world, and addition is too fundamental an operation to not play nice.

What this suggests to me is that the most fundamental choice that every general-purpose programming language must make is how it will provide a way for abstractions to communicate.

Let’s back up a moment and make sure this is clear. The things that make code Forth-like are the things that make it play the nicest with other Forth code. On an informal level, this means taking advantage of the language features available to you when it is appropriate. These features are built to work smoothly with the language’s built-in abstractions; when everybody uses these abstractions, everybody works smoothly with everybody else.

Now, why is this interesting to me? Well, I firmly believe that the next important evolution in software development will only become possible when we have the ability to create domain-specific languages with a hell of a lot more ease than we currently can. We need not only to be able to pick and choose the abstractions that are relevant to the problem at hand, but to create new ones that play well with others. We need the ability to write code in the language of the problem. And rather than merely waiting patiently for XL, IP, or MPS to bear fruit, I like to think about what exactly the problem entails and how it could best be solved.

The biggest concern, in developing these kinds of systems, is — how can you develop a system to be completely flexible and extensible, when all of your components need to interact? You have to decide on some method for your abstractions to communicate, and you are building your entire system on the premise that you’ll never be able to predict all of the needs of your users. If you just pick one, as Forth does, you constrain yourself. You create problems that are unnecessarily difficult to solve, because you have to convert them into this protocol.

Some of you are probably screaming LISP! and LAMBDA CALCULUS! at me, like I was a great big moron for not having considered the mathematically pure options. Why, if your language gives you access to continuations and macros, there’s nothing you can’t build! Except… you know… embedded systems.

Sometimes, I need to be able to talk von Neumann. Sometimes, I need statically-calculated everything and no runtime. Sometimes, I cannot abide by garbage collection. And it kills me that I can’t do that and use real coroutines, which are safer, more efficient, and exhibit way better realtime behaviour than threads.

So I desire an environment that will allow me to drop into low-level abstractions when I need to. Not necessarily “inline assembler”, or “C code that interfaces with Lisp”, or whatever — I’ve written hard-realtime interrupt handlers in SML, just to see if it could be done. I just need the ability to use the appropriate level of abstraction, and I would really, really like to use the appropriate syntax at each level. Imagine an application consisting of a bunch of towers of abstraction, connected only at the bottom levels, and you’ll see the issue that I’m struggling with.

Built-in language constructs have traditionally had three advantages over user-defined abstractions:

  1. A custom syntax that is more comfortable to express problems in.
  2. Access to compile-time abstractions.
  3. Ubiquitousness.

#1 and #2 are fairly self-explanitory, and a lot of work is being poured into making them relatively solved problems (arbitrarily powerful macro systems that can flag domain-specific errors and do static code checking, etc).

#3, though, is important to think about further. An extensible development environment can take one of two approaches: It can provide a monolithic language core that all extensions build on, or it can attempt to build even the core elements of its language out of reusable, reconfigurable components. While the second approach is the more flexible, it comes with its own set of problems.

The big problem that I see is this — any user-written abstractions are probably going to build on some base-level abstractions. However, you don’t necessarily want to inherit an entire language just to use a single feature, and you want extensions to be able to work well with each other.

In essence, this is literally cross-platform development — building an abstraction that can sit on top of different kinds of abstraction towers, and building bridges between these towers when useful. This already happens routinely on the macro level; a hundred different language communities go and re-implement bindings for their own language for some useful C library. It would be kind of nice if this effort could be reduced or even eliminated, because it doesn’t SEEM like a fundamentally hard problem.

Consider the design difficulty, if you wanted to implement a portable abstraction for lazy evaluation. Do you assume you’ve got continuations? Or do you build it on less powerful abstractions? You could probably build it on top of something coroutine-like, but there’s just so many coroutine-like abstractions out there! Python’s got its generators, Ruby’s got its blocks, and some people won’t settle for anything less than Yield the Magnificent. They’re all sort of functionally similar, but there’s no standard conceptual map to build on. In the end, your nice feature that make things clean and concise ends up being intractably ugly to actually implement.

That can’t be what developers will actually end up doing; if they developed their whole project that way, for total reusability, it would be a collossal waste of effort and wouldn’t make anything even remotely easier. Developers will pick a set of abstraction towers that make sense for their problem, and build on top of them.

In the end, you can’t imagine an ecology of abstractions producing a single instance of an abstraction that will play nicely with all others. It can’t and won’t work that way. You will still see strong semantic platforms emerge, just like our current situation with lots and lots of languages incorporating lots of different ideas. Though abstractions are coupled to each other by nature, decoupling them from a standard language syntax makes it easier to build natural abstractions, without having to resort to writing your own compiler and designing a whole new language from scratch.

Essentially, just as there is no One True Abstraction for everyone to build on (no everything is not an object), there is no One True Abstraction through which other abstractions should communicate.

Leave a Reply