A profiler for JavaScript on WebAssembly, Part 1
Javy is a QuickJS-based toolchain that allows building Wasm programs from JS, with minimal effort:
javy build index.js -o index.wasmYou can then run the resulting Wasm via your engine of choice:
wasmtime run --invoke=_start index.wasmSimple, right?
Yes and no. Suppose that you want to understand how the provided JS program is behaving inside Wasm, e.g., the number of instructions associated to a particular JS-level opcode or high-level construct. What are your options? Where do you look?
Your Wasm engine can only reason about the JS engine compiled to Wasm, and the JS engine can only reason about the bytecode it’s executing. To understand what your code is actually doing, we need a profiler that can peek through the execution layers, helping us understand where time is spent.
That’s the problem this series is about. At Wasm I/O 2026 I gave a talk on the past five years of Javy; this writeup picks up from a strand of that talk — work the Javy maintainers did with the WebAssembly Research Center at Carnegie Mellon during 2024 on what we ended up calling the runtime inception problem: how do you observe a dynamic-language virtual machine running inside another virtual machine, and trace cost back to the user’s source code?
A note on scope: throughout this series I focus specifically on QuickJS and Javy. The mechanisms generalize to other bytecode-interpreter-on-Wasm setups, but the concrete fingerprints — opcode counts, struct layouts, etc. — don’t.
Anatomy of a switch-based interpreter
The objective of our guest runtime profiler is to trace execution cost back to the user’s source code. In simpler terms: record the number of Wasm instructions executed per JS opcode.
To get there, it helps to look at the shape of a common bytecode interpreter, similar to QuickJS’ implementation. Here’s some Zig, because why not?
pub fn interpret(bc: []const u8) !void {
var vm = VM{};
var pc: usize = 0;
while (pc < bc.len) {
const op_byte = bc[pc];
pc += 1;
const op = std.meta.intToEnum(Op, op_byte) catch return error.InvalidOpcode;
switch (op) {
.push => {
const v = readI32(bc, pc);
pc += 4;
try vm.push(v);
},
.add => {
const b = try vm.pop();
const a = try vm.pop();
try vm.push(a + b);
},
// ...rest of the cases
}
}
}
const bc = [_]u8{
@intFromEnum(Op.push), 0x05, 0x00, 0x00, 0x00,
@intFromEnum(Op.dup),
@intFromEnum(Op.print),
@intFromEnum(Op.push), 0x01, 0x00, 0x00, 0x00,
@intFromEnum(Op.sub),
@intFromEnum(Op.dup),
@intFromEnum(Op.jmp_if_zero), 0x18, 0x00, 0x00, 0x00,
@intFromEnum(Op.jmp), 0x05, 0x00, 0x00, 0x00,
@intFromEnum(Op.halt),
};
try interpret(&bc);Once compiled to Wasm, trimmed for brevity:
(func $interpret (param $bc i32) (param $bc_len i32)
(local $pc i32)
(local $op i32)
block $exit
loop $dispatch
;; op = bc[pc];
local.get $bc
local.get $pc
i32.add
i32.load8_u
local.set $op
;; pc += 1;
local.get $pc
i32.const 1
i32.add
local.set $pc
;; switch (op)
block $halt
block $print
block $jmp_if_zero
block $jmp
block $drop
block $dup
block $mul
block $sub
block $add
block $push
local.get $op
i32.const 1
i32.sub
br_table $push $add $sub $mul $dup
$drop $jmp $jmp_if_zero $print $halt
end
;; Op.push
br $dispatch
end
;; Op.add
br $dispatch
end
;; Op.sub
br $dispatch
end
;; Op.mul
br $dispatch
end
;; Op.dup
br $dispatch
end
;; Op.drop
br $dispatch
end
;; Op.jmp
br $dispatch
end
;; Op.jmp_if_zero
br $dispatch
end
;; Op.print
br $dispatch
end
;; Op.halt
br $exit
end ;; end loop $dispatch
end ;; end block $exit
)Valuable information can be retrieved from two particular elements in the snippet above:
Dispatch targets
The number of targets in the br_table instruction matches the number
of opcodes in the interpreter. We can use this heuristic, combined
with static analysis, to identify the Wasm function responsible for
the dispatch loop — the entry point through which every JS opcode
flows.
We’ve found the Wasm function that runs every JS opcode. But that only tells us where dispatch happens — it doesn’t tell us which JS function is currently executing. Two distinct JS functions can have identical opcode sequences, so the bytes themselves aren’t enough.
Program counter
The program counter holds the position of the next bytecode operator to be executed in the JS function’s bytecode stream. Loading that operator emits a memory load:
local.get $bc
local.get $pc
i32.add
i32.load8_u ;; <-- load the bytecode operator
local.set $opEach JS function lives at a unique address in linear memory, so the effective address of the first bytecode load — reading byte 0 of that function’s buffer — uniquely identifies the function.
Where this leaves us
Two observations carry through to the rest of the series:
- The Wasm function whose body contains a
br_tablewith one target per opcode is the dispatch function — the entry point for every JS-level function call. - Inside that function, the address being read by the dispatch’s
i32.load8_uuniquely identifies the JS function whose bytecode is currently executing.
Together these mean we don’t need source-level information to profile
JS running on Wasm: a probe attached to the right Wasm instruction,
capturing (address, byte) on each execution, is enough to
reconstruct a per-JS-function execution trace.
The missing piece is then to attach the probes to our Wasm program. The next post introduces Whamm, the dynamic-instrumentation tool we’ll use to attach probes and capture the relevant information for our profiler.