Metasyntax
In this post I will look at some metasyntaxes. These are general-purpose syntaxes that don’t have inherent semantics. Because a metasyntax describes a generic data structure instead of an AST, they are very easy to extend within the language through macros.
Note: “metasyntax” actually means a syntax that can be used to describe other syntaxes. I’m reusing that name for a different concept.
Syntax
Most programming languages have a 1:1 relationship between syntax tokens and the AST. You use keywords, delimiters and terminators to change the state of the parser.
def f(x, y) { print(x); print(y); }
def main() { f(1, 2); }
This is very easy to implement with a parser generator.
This approach is fine, and it’s what most programming languages do. However, these languages tend to become more complex over time, and we end up with a very large set of syntax rules. It would be much better if we had a single, predictable way to write expressions.
Metasyntax
S-expressions
With an s-expression syntax, you basically just parse a tree using parentheses. For example:
(def f (x y) (do (print x) (print y)))
(def main () (do (f 1 2)))
Everything is either a tree (f x ...)
or an atom x
.
You can see that even though it has built-in keywords like def
and do
, the parser will also recognise something like (table (1 2 3) (4 5 6))
, and the language can provide some system to parse that tree within a program.
That’s what Lisp’s metaprogramming system is.
Indentation trees
Another way to encode trees is with whitespace:
def f (x y)
do
print x
print y
def main ()
do (f 1 2)
There are a few ways of doing this, but here I’m using these rules:
- Add an opening parenthesis to the start of each line
- Add
max(0, 1 - N)
closing parentheses to the end of each line, whereN
is the change in indentation of the next line
You end up with this tree, which is identical to the previous s-expression:
(def f (x y)
(do
(print x)
(print y)))
(def main ()
(do (f 1 2)))
I like this metasyntax, because we already structure programs with indentation, this just makes it explicit. However based on my research, whitespace-sensitivity is very controversial! I also get the impression that most new languages are completely insensitive to whitespace.
B-expressions
Instead of parsing a rose tree with parentheses, you can also parse a binary tree with binary operators. I call these “b-expressions”.
f = x, y -> print x; print y.
main = () -> f 1 2
Everything is a binary expression f <op> x
or an atom x
.
- We use an “empty operator” for function application
f x
. In Haskell this is sometimes called the “whitespace operator”. - We’re even using a “full-stop operator” to separate definitions, making the entire program one big binary expression.
- We’re using operator precedence and associativity, which removes the need for parentheses:
((f = ((x , y) -> ((print x) ; (print y)))) . (main = (() -> ((f 1) 2))))
This syntax is well-suited for expression-oriented functional programming languages.
I really like this metasyntax, but it does have some quirks.
First of all, it really needs operator precedence, otherwise it’s unwieldy.
And if we get a user-defined operator x >>= y
, we need a precedence declaration to figure out how to parse it.
But we need to parse the program to get the precedence declaration…
This is a bit of a chicken-and-egg problem.
Also, top-level expressions are a bit awkward. In my example I used a full-stop operator, similar to Coq. But unlike Coq, it’s a separator and not a terminator, so the last expression can’t actually end in anything.
et al.?
What other metasyntaxes are out there?