Review of Eco, or a language composition editor

Posted on October 21, 2014 by Takayuki Muranushi

The absolute barriers that separate different programming languages — is being neutralized. I witnessed, for the first time since the collapse of the Tower of Babel, multiple language fused together to realize what were impossible by individuals.

I’ve been to SPLASH 2014 conference, and learned many things. To pick the most interesting talk to me, it was Language Composition. Here’s the slides and here’s their paper about the editor.

The demo above shows an HTML, with Python program embedded here and there, which also hosts some SQL query strings. The whole program is interpreted as a python program that reads from a database and renders some HTML.

Another demo showed a game, a variant of tic-tac-toe, played by Humans or by AIs. The game AIs are implemented as search-algorithms in Prolog, and the GUI is in Python.

Yet another demo showed a fused Python-PHP program, where Python parts can refer to PHP variables by their name, and vice versa. We need explicit conversions for some cases; for example, PHP arrays are internally represented as hash maps, so when imported to Python they show up as hash maps. Explicit method calls are needed to change them into Python arrays. But they are details.

How it works

What is so super about language composition? Well, imagine text files that include multiple language pieces. Sometimes we will be able to sense different language parts using our built-in biological neural networks, but it is not hard to imagine ambiguous cases. Situation is worse for compilers, who have formal behaviors. The authors found that “In summary, when it comes to language composition, parsing approaches are either too limited (LR parsing), allow ambiguity (generalized parsing), or are hard to reason about (PEG parsing).”

Solution? Don’t try to parse texts. Instead, edit the syntax tree. This is a technology called syntax directed editing, that author said he found a brief article about it in the last page of a grammar book with remark “this doesn’t work.” Syntax directed editing has been field-tested in 70s and 80s, but programmers preferred their favorite text editors over GUI-based tree editors based on syntax directed editing.

What is so innovative about Eco’s approach is their effort to make the syntax-directed editor look and feel much like usual text editor. Looking at their editor, Eco, indeed I thought it was a text editor. Behind the curtains, however, Eco holds an abstract syntax tree; the cursor is at a certain element of the syntax tree. The structure is saved as a tree, loaded as a tree — it remains a tree all the time. Bad news is that we never get to use our favorite editor such as vi and emacs; Good news is that we now get to edit the fused syntax tree of multiple programming languages.

What you get in the end is a fused tree, and what it means is open to user interpretation. There are vast possibilities.

I saw several demo cases. In a HTML+SQL+Python demo, the tree was interpreted as python program that queries an SQL database and prints out HTML. More interesting were Python+Prolog and PHP+Python demo. In the latter demo, PHP component can access Python variable by their name, and vice versa.

How it interprets

Once you get a syntax tree with fused PHP+Python, the next question is how to interpret it. Their solution is elaborated from around page 28 of the slides. How would you fuse PHP interpreter with Python interpreter? They made several attempts.

First, you can take already-existing interpreters and write some C/C++ interface that interfaces the two interpreters. The interface will convert PHP objects to Python’s and vice versa. Sadly, the overhead is too large, execution is too slow to be useful.

Second, if the interpreters are slow, what if we JIT optimize them? Sadly again, Working on JIT for all languages you want to fuse is too much work, and also you don’t solve the interface overhead problem by per-language JIT.

Third, can we take a virtual machine such as JVM, that already has sophisticated JIT in it? This approach doesn’t work well, too. The kind of optimization we are looking for is in interpreter layor semantics. For example, looking-up for binding table is very costly operation. We’d like to replace multiple lookups for variable x with single lookup if we know that the value of x doesn’t change in the course. JVM is a stack-machine. The behavior of the original interpreters become too fine-grained when compiled to JVM bitecode and JVM JIT is not good at spotting such optimizations that require knowledge of the interpreter semantics.

So what’s their solution? Their solution is Tracing JIT . Tracing JIT is strong in application; it can be applied to any interpreter once the tracing is available.

Tracing JIT works as follows. First, they insert checkpoints to the progarm, and then let the program running for a while. Eventually we can tell the hotspots of the program by most frequently passed checkpoints. Once the hotspots are detected, on the next entry to a hotspot, the JIT mechanism takes the trace of the program execution, taking record of what preconditions met and what instructions run. The JIT compiler takes the trace as a program, optimizes and compiles it to a native code. After that, the native code is used whenever the preconditions are met.

With the Python glue and the tracing JIT, any programming languages can be fused. You can write programs with the interleaved languages as if it is a composed language, enjoying the merit of both language, without the foreign function interface overhead.

Provided that all language has Python backend! PHP, Prolog and Python all have Python implementation. Unfortunately, Haskell, my favorite language, doesn’t.

Conclusions

Another thing struck me in SPLASH’14 was the Python code rewriting its own syntax tree using decorators. In summary, my SPLASH’14 experience is full of joy; there are lots of exciting researches — lot more than I have expected — going on in the field of dynamic languages.