A profiler for JavaScript on WebAssembly, Part 1

Javy is a QuickJS-based toolchain that allows building Wasm programs from JS, with minimal effort:

[sh]

javy build index.js -o index.wasm

You can then run the resulting Wasm via your engine of choice:

[sh]

wasmtime run --invoke=_start index.wasm

Simple, right?

Yes and no. Suppose that you want to understand how the provided JS program is behaving inside Wasm, e.g., the number of instructions associated to a particular JS-level opcode or high-level construct. What are your options? Where do you look?

Your Wasm engine can only reason about the JS engine compiled to Wasm, and the JS engine can only reason about the bytecode it’s executing. To understand what your code is actually doing, we need a profiler that can peek through the execution layers, helping us understand where time is spent.

That’s the problem this series is about. At Wasm I/O 2026 I gave a talk on the past five years of Javy; this writeup picks up from a strand of that talk — work the Javy maintainers did with the WebAssembly Research Center at Carnegie Mellon during 2024 on what we ended up calling the runtime inception problem: how do you observe a dynamic-language virtual machine running inside another virtual machine, and trace cost back to the user’s source code?

A note on scope: throughout this series I focus specifically on QuickJS and Javy. The mechanisms generalize to other bytecode-interpreter-on-Wasm setups, but the concrete fingerprints — opcode counts, struct layouts, etc. — don’t.

Anatomy of a switch-based interpreter

The objective of our guest runtime profiler is to trace execution cost back to the user’s source code. In simpler terms: record the number of Wasm instructions executed per JS opcode.

To get there, it helps to look at the shape of a common bytecode interpreter, similar to QuickJS’ implementation. Here’s some Zig, because why not?

[zig]

pub fn interpret(bc: []const u8) !void {
    var vm = VM{};
    var pc: usize = 0;

    while (pc < bc.len) {
        const op_byte = bc[pc];
        pc += 1;
        const op = std.meta.intToEnum(Op, op_byte) catch return error.InvalidOpcode;

        switch (op) {
            .push => {
                const v = readI32(bc, pc);
                pc += 4;
                try vm.push(v);
            },
            .add => {
                const b = try vm.pop();
                const a = try vm.pop();
                try vm.push(a + b);
            },

            // ...rest of the cases
        }
    }
}


const bc = [_]u8{
        @intFromEnum(Op.push),        0x05, 0x00, 0x00, 0x00,
        @intFromEnum(Op.dup),
        @intFromEnum(Op.print),
        @intFromEnum(Op.push),        0x01, 0x00, 0x00, 0x00,
        @intFromEnum(Op.sub),
        @intFromEnum(Op.dup),
        @intFromEnum(Op.jmp_if_zero), 0x18, 0x00, 0x00, 0x00,
        @intFromEnum(Op.jmp),         0x05, 0x00, 0x00, 0x00,
        @intFromEnum(Op.halt),
};
try interpret(&bc);

Once compiled to Wasm, trimmed for brevity:

[lisp]

(func $interpret (param $bc i32) (param $bc_len i32)
  (local $pc i32)
  (local $op i32)

  block $exit
    loop $dispatch
      ;; op = bc[pc];
      local.get $bc
      local.get $pc
      i32.add
      i32.load8_u
      local.set $op

      ;; pc += 1;
      local.get $pc
      i32.const 1
      i32.add
      local.set $pc

      ;; switch (op)
      block $halt
       block $print
        block $jmp_if_zero
         block $jmp
          block $drop
           block $dup
            block $mul
             block $sub
              block $add
               block $push
                 local.get $op
                 i32.const 1
                 i32.sub
                 br_table $push $add $sub $mul $dup
                          $drop $jmp $jmp_if_zero $print $halt
               end
               ;; Op.push
               br $dispatch
              end
              ;; Op.add
              br $dispatch
             end
             ;; Op.sub
             br $dispatch
            end
            ;; Op.mul
            br $dispatch
           end
           ;; Op.dup
           br $dispatch
          end
          ;; Op.drop
          br $dispatch
         end
         ;; Op.jmp
         br $dispatch
        end
        ;; Op.jmp_if_zero
        br $dispatch
       end
       ;; Op.print
       br $dispatch
      end
      ;; Op.halt
      br $exit
    end ;; end loop $dispatch
  end ;; end block $exit
)

Valuable information can be retrieved from two particular elements in the snippet above:

Dispatch targets

The number of targets in the br_table instruction matches the number of opcodes in the interpreter. We can use this heuristic, combined with static analysis, to identify the Wasm function responsible for the dispatch loop — the entry point through which every JS opcode flows.

We’ve found the Wasm function that runs every JS opcode. But that only tells us where dispatch happens — it doesn’t tell us which JS function is currently executing. Two distinct JS functions can have identical opcode sequences, so the bytes themselves aren’t enough.

Program counter

The program counter holds the position of the next bytecode operator to be executed in the JS function’s bytecode stream. Loading that operator emits a memory load:

[lisp]

local.get $bc
local.get $pc
i32.add
i32.load8_u ;; <-- load the bytecode operator
local.set $op

Each JS function lives at a unique address in linear memory, so the effective address of the first bytecode load — reading byte 0 of that function’s buffer — uniquely identifies the function.

Where this leaves us

Two observations carry through to the rest of the series:

The Wasm function whose body contains a br_table with one target per opcode is the dispatch function — the entry point for every JS-level function call.
Inside that function, the address being read by the dispatch’s i32.load8_u uniquely identifies the JS function whose bytecode is currently executing.

Together these mean we don’t need source-level information to profile JS running on Wasm: a probe attached to the right Wasm instruction, capturing (address, byte) on each execution, is enough to reconstruct a per-JS-function execution trace.

The missing piece is then to attach the probes to our Wasm program. The next post introduces Whamm, the dynamic-instrumentation tool we’ll use to attach probes and capture the relevant information for our profiler.