Clojure Decompiled

What actually does execute when you run a Clojure function?

Notes from my talk at the Bay Area Clojure meetup, June 3, 2009

I was looking for a way to make my understanding of Clojure's code generation more concrete. What actually does execute when you run a Clojure function? Disassembling the byte codes generated did not help too much, so I decided to decompile the byte code instead.

For example, here is the definition of addition, from clojure/core.clj:

(defn +
  "Returns the sum of nums. (+) returns 0."
  {:inline (fn [x y] `(. clojure.lang.Numbers (add ~x ~y)))
   :inline-arities #{2}}
  ([] 0)
  ([x] (cast Number x))
  ([x y] (. clojure.lang.Numbers (add x y)))
  ([x y & more]
   (reduce + (+ x y) more)))

Then I used the excellent JD java decompiler, http://java.decompiler.free.fr/ , to see what it would produce. Here is the corresponding code:

package clojure;

import clojure.lang.IFn;
import clojure.lang.Numbers;
import clojure.lang.RT;
import clojure.lang.RestFn;
import clojure.lang.Var;

public class _PLUS___3946 extends RestFn
{
  public static final Object const__0 = Integer.valueOf(0);
  public static final Var const__1 = (Var)RT.var("clojure.core", "cast");
  public static final Object const__2 = Class.forName("java.lang.Number");
  public static final Var const__3 = (Var)RT.var("clojure.core", "reduce");
  public static final Var const__4 = (Var)RT.var("clojure.core", "+");

  public _PLUS___3946()
  {
    super(2);
  }

  public Object doInvoke(Object x, Object y, Object more)
    throws Exception
  {
    x = null; y = null; more = null; return ((IFn)const__3.get()).invoke(const__4.get(), Numbers.add(x, y), more);
  }

  public Object invoke(Object x, Object y)
    throws Exception
  {
    x = null; y = null; return Numbers.add(x, y);
  }

  public Object invoke(Object x)
    throws Exception
  {
    x = null; return ((IFn)const__1.get()).invoke(const__2, x);
  }

  public Object invoke()
    throws Exception
  {
    return const__0;
  }
}

The __PLUS___3946 class extends RestFn, which is the base class used by functions that can take a rest arg. Makes sense, because one of the functions in the defn does take a rest arg. The rest of __PLUS___3946 consists of a number of constant initializers that happen at class load time, followed by a constructor, and four methods, each one corresponding to one of the overloaded functions in the '+' defn.

The constructor calls 'super(2)' which tells its super class constructor that there are at most 2 regular, (non-variadic,) args to these functions.

Clojure uses the name doInvoke() for a method that takes a rest arg, and invoke() for those that don't. Thus the four '+' functions map to one doInvoke() and three invoke() methods. Note also that each method's args map directly to those in the corresponding Clojure function.

The most interesting line in the class is this one, in the doInvoke() method:

    x = null; y = null; more = null; return ((IFn)const__3.get()).invoke(const__4.get(), Numbers.add(x, y), more);

Let's break it down a bit. This portion:

    x = null; y = null; more = null;

seems wrong. The method can't possibly null out the arguments before using them. According to Rich Hickey, this is actually an artifact caused by the way the byte code is generated. The purpose of this code is to null out the arguments to prevent 'holding the head' in case of a recursive function call. Because of the way the byte code is generated, the decompiler can't accurately know when the nulling out takes place; it is definitely not happening at the beginning of the method.

The rest of the line:

return ((IFn)const__3.get()).invoke(const__4.get(), Numbers.add(x, y), more);

basically maps to this line in the defn:

    (reduce + (+ x y) more)

It's a list with four elements. The corresponding Java deals with each of the four. This bit of Java gets the value of the first element, the 'reduce' function:

    const__3.get()

This gets the value of the second element, the '+' function:

    const__4.get()

The third element is '(+ x y)'. Here we would have expected to see a call of the two parameter version of invoke().

Instead, we see this:

    Numbers.add(x, y)

Looking at the '+' defn again we see that the two parameter version is actually inlined, explaining the direct invocation of Numbers.add() here.

The fourth element, the 'more' parameter, doesn't need to be dereferenced, as it is already a valid java parameter passed in on the stack.

Finally, after all the parameters have been generated, the 3 parameter invoke method from the reduce function is called to generate the return value.

Not suprisingly, when I step through the code with a Clojure aware debugger, (http://georgejahad.com/clojure/cljdb.html ,) I see it behaving in exactly this way. First, it steps into the get() call for the 'reduce' and '+' vars, then the Numbers.add() call, and finally into the 3 parameter version of reduce.

I won't detail the rest of the invoke methods in the __PLUS___3946 class because they are fairly straight forward, but it's worth taking a quick look at them to confirm they all work in an analogous fashion.