Justin Spencer: Why Concatenative Programming Matters by Jon Purdy

Why Concatenative Programming Matters

Concatenative programming is so called because it uses function composition instead of function application--a non-concatenative language is thus called applicative.
One of the problems with functional programming is that the oft-touted advantages--immutability, referential transparency, mathematical purity, etc.--don't immediately seem to apply in the real world.
In an applicative language, functions are applied to values to get other values.
In comcatenative languages, composition is implicit.
The untyped lambda-calculus is a relatively simply system.
Concatenative languages have a much simpler basis--there are only functions and compositions, and evaluation is just the simplification of functions. It is never necessary to deal with named state--there are no variables.
There is nothing stopping a concatenative languages from having infix operators, but for the sake of consistency, most stick to post fix notation: "f g" means (g of f), i.e., the reverse of function composition. This is actually rather nice notation, because it means that data flows in the order the functions are written in.
Values take no arguments and return themselves.
Applicative languages need to have a defined associativity for function application (almost always from left to right), but here we're free from this restriction.
A compiler for a statically typed concatenative language could literally:

divide the program into arbitrary segments
compile every segment in parallel
compose all the segments at the end

With concatenative programming, a parallel compiler is a plain old map-reduce!
Because all we have are functions and compositions, a concatenative program is a single function--typically an impure one with side effects, but that's by no means a requirement.
This is the basic reason Unix pipes are so powerful: they form a rudimentary string-based concatenative programming language.
At the end of the day, a concatenative program is all about the flow of data from start to finish. And that again is why concatenative composition is written in "reverse" order--because it's actually forward.
One of the very cool things about concatenative languages is that while the are inherently quite functional, they also have a very straightforward and efficient imperative implementation.
The type of a concatenative function is formulated so that it takes any number of inputs, uses only the topmost of these, and returns the unused input followed by actual output. These functions are essentially operating on a list-like data structure, one that allows removal and insertion only at one end. It's a stack.
Another name for post fix is reverse Polish notation.
So a concatenative language is a functional language that is not only easy, but trivial to run efficiently, so much so that most language VMs are essentially concatenative.
Furthermore, it's straightforward to make some very clever optimizations, which are ultimately based on simple pattern matching and replacement.
The JVM and CPython VMs, being stack-based, are also in the business of executing and optimizing concatenative languages, so the paradigm is far from unresearched.
It is considered good functional programing style to write functions in point-free form, omitting the unnecessary mention of variables (points) on which the function operates.
It's more meaningful to write what a function is versus what it does, and point-free functions are more succinct than so-called "pointful" ones.
When implementing composition in terms of application, we must explicitly implement every type of composition.
Writing point-free expressions demands concatenative semantics, which just aren't natural in an applicative language.
In addition to composition, the feature that complete a concatenative language is quotation, which allows deferred composition of functions.
While just composition lets us build descriptions of data flow machines, quotation lets us build machines that operate on descriptions of other machines. Quotation eliminates the distinction between code and data, in a simple, type-safe manner.
A Boolean is a quotation, so it behaves like an ordinary value, but it contains a binary function that chooses one branch and discards another. "If-then-else" is merely the application of that quotation to the particular branches.
Writing seemingly simple math expressions can be difficult and unintuitive, especially using just the functions we've seen so far. To do so exposes all of the underlying complexity of the expression that we're accustomed to deferring to a compiler.
Some things are simply more natural to write with named state rather than a bunch of stack-shuffling. However, the vast majority of programs are not predominated by mathematical expression, so in practice this feature is not used very much.
One of the great strengths of concatenative languages, however, is their ability to refactor complex expressions. Because every sequence of terms is just a function, you can directly pull out commonly used code into its own function, without even needing to rewrite anything.
Concatenative languages are simple and consistent.
They are amenable to data flow programming.
Stack languages are well studied and have good performance.
You can easily roll your own control structures.
Everything is written in point-free style (for better or worse).
If you're interested in trying out a mature, practical concatenative language, check out Factor. Also see Cat for more information on static typing in concatenative languages.

Justin Spencer

Pages

20171017

Why Concatenative Programming Matters by Jon Purdy

No comments:

Post a Comment