Skip to content

Unblock Voxtral Demo#42

Open
r-chong wants to merge 20 commits intostagingfrom
extensions/toy-stack
Open

Unblock Voxtral Demo#42
r-chong wants to merge 20 commits intostagingfrom
extensions/toy-stack

Conversation

@r-chong
Copy link
Copy Markdown
Collaborator

@r-chong r-chong commented Apr 13, 2026

Consolidates the toy stack (tts → tts-matmul → tts-nn → tts-vq → tts-fsq) into one PR. Adds the pure-TS primitives the Voxtral demo needs on the browser engine.

Ops / modules

  • sin / cos / sqrt (scalar + tensor, fwd + bwd, WGSL, sqrt clamp)
  • matmul, transpose, div
  • Linear, RMSNorm, softmax, mseLoss, tanh
  • Tensor.randn, Tensor.parameter, Tensor.detach, negative-dim sum/mean
  • VectorQuantize and FiniteScalarQuantize under extensions/quantization/

Rebase fixes

  • TensorHistory/ScalarHistory rewritten to avoid parameter-property constructors (node --strip-types rejects them).
  • webgpu module shim so tsc resolves the import without the runtime polyfill.
  • toy.test.ts wired into test/run.ts.
  • summarize() exits explicitly — the fast_ops worker pool kept the runner hanging.

Stacks on #41. Together they unblock the Voxtral demo.

r-chong added 20 commits April 13, 2026 02:48
Build toy/ to dist/toy/ with a separate tsconfig so it can be
imported as @mni-ml/framework/toy without pulling in the native
addon. Needed for the browser demo.
Drop parameter-property constructors in TensorHistory/ScalarHistory
(node --strip-types doesn't support them). Add a webgpu module
shim so tsc resolves the import without the native polyfill. Wire
toy.test.ts into the runner and force an explicit exit in
summarize() — the fast_ops worker pool was keeping node alive.
@r-chong r-chong force-pushed the extensions/toy-stack branch from 80decc2 to 865ac86 Compare April 13, 2026 06:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the pure-TypeScript “toy” engine to support the Voxtral browser demo by adding core math ops (sin/cos/sqrt, matmul, div/transpose), basic NN building blocks (Linear/RMSNorm/softmax/etc.), and quantization modules, along with build/test plumbing for the new toy subpath export.

Changes:

  • Added new toy tensor ops + autograd functions (sin/cos/sqrt/matmul, div/transpose, randn/parameter/detach, negative-dim sum/mean) and basic NN utilities (Linear, RMSNorm, softmax, mseLoss, tanh).
  • Introduced quantization extensions (VectorQuantize, FiniteScalarQuantize) and a toy WebGPU typing shim.
  • Updated build/test wiring: added tsconfig.toy.json, expanded package exports, added toy tests to the runner, and ensured tests exit cleanly.

Reviewed changes

Copilot reviewed 16 out of 18 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tsconfig.toy.json Adds a separate TS build targeting toy/dist/toy/.
toy/webgpu-shim.d.ts Declares a TS module shim for the webgpu import used by the toy GPU backend.
toy/tensor.ts Adds new Tensor APIs (randn/parameter/detach, sin/cos/sqrt, matmul/div/transpose, negative-dim sum/mean).
toy/tensor_functions.ts Implements new ops (sin/cos/sqrt/matmul) in the functional + autograd layer; refactors TensorHistory ctor.
toy/scalar_functions.ts Refactors ScalarHistory ctor to avoid parameter properties for Node’s strip-types loader.
toy/optimizer.ts Extends SGD to update Tensor parameters as well as Scalars.
toy/operators.ts Adds scalar operator implementations for sin/cos/sqrt (with clamp semantics for sqrt).
toy/nn.ts Introduces toy NN utilities: Linear, RMSNorm, softmax, mseLoss, tanh.
toy/index.ts Re-exports the new NN module from the toy entrypoint.
toy/gpu_kernels.ts Registers sin/cos/sqrt for GPU unary op generation.
test/toy.test.ts Adds a comprehensive toy test suite covering new ops, autograd, NN blocks, and quantization.
test/run.ts Wires test/toy.test.ts into the suite runner.
test/helpers.ts Ensures summarize() always exits, preventing hangs on green runs.
package.json Adds ./toy export, adds toy build step, and adds @webgpu/types dev dependency.
package-lock.json Updates lockfile to match version/dependency/optional native package matrix changes.
extensions/quantization/vq.ts Adds VectorQuantize module (straight-through quantization) for toy tensors.
extensions/quantization/fsq.ts Adds FiniteScalarQuantize module (grid quantization) for toy tensors.
extensions/quantization/index.ts Adds a small index barrel for quantization exports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tsconfig.toy.json
"outDir": "./dist/toy",
"types": ["node", "@webgpu/types"]
},
"include": ["toy/**/*.ts"],
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsconfig.toy.json only includes toy/**/*.ts, so it will not include toy/webgpu-shim.d.ts. That means the toy build will still fail type-checking on import { create, globals } from 'webgpu' (no module declarations found). Include toy/**/*.d.ts (or broaden the include glob) so the shim participates in compilation.

Suggested change
"include": ["toy/**/*.ts"],
"include": ["toy/**/*.ts", "toy/**/*.d.ts"],

Copilot uses AI. Check for mistakes.
Comment thread toy/tensor_functions.ts
Comment on lines +371 to +377
ctx.saveForBackward(result);
return result;
}
static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] {
const [result] = ctx.savedTensors;
// d/dx sqrt(x) = 1 / (2 * sqrt(x)) = grad / (2 * result)
return [gradOutput.mul(result!.mul(Tensor.tensor(2)).inv())];
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sqrt.backward ignores the forward clamp semantics (operators.sqrt uses sqrt(max(x, EPS))). For inputs below EPS this should have zero gradient (the forward is constant), but the current formula returns a large non-zero gradient (1/(2*sqrt(EPS))). Save the original input in forward and mask the gradient so it’s 0 where a <= EPS, and only apply 1/(2*sqrt(a)) where unclamped.

Suggested change
ctx.saveForBackward(result);
return result;
}
static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] {
const [result] = ctx.savedTensors;
// d/dx sqrt(x) = 1 / (2 * sqrt(x)) = grad / (2 * result)
return [gradOutput.mul(result!.mul(Tensor.tensor(2)).inv())];
ctx.saveForBackward(a);
return result;
}
static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] {
const [a] = ctx.savedTensors;
const eps = Tensor.tensor(operators.EPS);
const clamped = new Tensor(sqrt(a!.data));
const belowEps = a!.lt(eps);
const atEps = a!.eq(eps);
const unclampedMask = belowEps.add(atEps).neg().add(Tensor.tensor(1));
// Forward uses sqrt(max(x, EPS)), so the derivative is 0 where x <= EPS
// and 1 / (2 * sqrt(x)) only where x > EPS.
const localGrad = clamped.mul(Tensor.tensor(2)).inv().mul(unclampedMask);
return [gradOutput.mul(localGrad)];

Copilot uses AI. Check for mistakes.
Comment thread toy/gpu_kernels.ts
inv: 'return 1.0 / x;',
sin: 'return sin(x);',
cos: 'return cos(x);',
sqrt: 'return sqrt(x);',
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WGSL sqrt uses return sqrt(x);, but the CPU op clamps via sqrt(max(x, EPS)). This makes CPU and GPU results diverge for negative/small inputs (CPU returns sqrt(EPS), GPU will produce NaN / different gradients). Update the WGSL op body to match the clamp semantics used by operators.sqrt.

Suggested change
sqrt: 'return sqrt(x);',
sqrt: 'return sqrt(max(x, 0.000001));',

Copilot uses AI. Check for mistakes.
Comment thread toy/tensor.ts
Comment on lines +239 to +242
const order = [...Array(this.dims).keys()];
const tmp = order[dim0]!;
order[dim0] = order[dim1]!;
order[dim1] = tmp;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transpose doesn’t normalize/validate dim0/dim1. Negative dims (e.g. -1) silently set non-index properties on the order array, resulting in no-op transposes, and out-of-range dims produce confusing behavior. Normalize negative dims like sum/mean do and throw for invalid indices before calling permute.

Suggested change
const order = [...Array(this.dims).keys()];
const tmp = order[dim0]!;
order[dim0] = order[dim1]!;
order[dim1] = tmp;
const normalizedDim0 = dim0 < 0 ? this.dims + dim0 : dim0;
const normalizedDim1 = dim1 < 0 ? this.dims + dim1 : dim1;
if (normalizedDim0 < 0 || normalizedDim0 >= this.dims) {
throw new Error(`transpose dim0 out of range: ${dim0} for tensor with ${this.dims} dims`);
}
if (normalizedDim1 < 0 || normalizedDim1 >= this.dims) {
throw new Error(`transpose dim1 out of range: ${dim1} for tensor with ${this.dims} dims`);
}
const order = [...Array(this.dims).keys()];
const tmp = order[normalizedDim0]!;
order[normalizedDim0] = order[normalizedDim1]!;
order[normalizedDim1] = tmp;

Copilot uses AI. Check for mistakes.
Comment thread toy/nn.ts
Comment on lines +44 to +46
export function softmax(x: Tensor, dim: number = -1): Tensor {
if (dim < 0) dim = x.dims + dim;
const expX = x.exp();
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

softmax is implemented as exp(x) / sum(exp(x)) without subtracting the max, which will overflow to Infinity for moderately large logits and can yield NaNs. Implement a numerically-stable softmax by subtracting max(x) along dim before exponentiating (even if it requires a small helper until a max reduction exists).

Suggested change
export function softmax(x: Tensor, dim: number = -1): Tensor {
if (dim < 0) dim = x.dims + dim;
const expX = x.exp();
function maxAlongDim(x: Tensor, dim: number): Tensor {
return (x as Tensor & { max: (dim: number) => Tensor }).max(dim);
}
export function softmax(x: Tensor, dim: number = -1): Tensor {
if (dim < 0) dim = x.dims + dim;
const shifted = x.sub(maxAlongDim(x, dim));
const expX = shifted.exp();

Copilot uses AI. Check for mistakes.
Comment thread package.json
},
"./toy": {
"types": "./dist/toy/index.d.ts",
"import": "./dist/toy/index.js"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With an exports map present, consumers cannot deep-import extensions/quantization/* unless it is exported. Right now only . and ./toy are exported, so VectorQuantize/FiniteScalarQuantize are unreachable from the published package. Either re-export quantization from ./toy (and ensure it’s emitted into dist/toy) or add an explicit subpath export for the quantization entrypoint(s).

Suggested change
"import": "./dist/toy/index.js"
"import": "./dist/toy/index.js"
},
"./extensions/quantization/*": {
"types": "./dist/extensions/quantization/*.d.ts",
"import": "./dist/extensions/quantization/*.js"

Copilot uses AI. Check for mistakes.
Comment thread test/toy.test.ts
Comment on lines +236 to +242
// dL/dA = ones(2,2) @ B^T
// B^T = [[5,7],[6,8]], ones @ B^T = [[11,14],[11,14]]
assertClose(a.grad.get([0, 0]), 11, 1e-5, 'matmul grad A [0,0]');
assertClose(a.grad.get([0, 1]), 15, 1e-5, 'matmul grad A [0,1]');
// dL/dB = A^T @ ones(2,2)
// A^T = [[1,3],[2,4]], A^T @ ones = [[4,4],[6,6]]
assertClose(b.grad.get([0, 0]), 4, 1e-5, 'matmul grad B [0,0]');
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment for dL/dA is incorrect: ones(2,2) @ B^T with B^T=[[5,7],[6,8]] yields [[11,15],[11,15]] (not [[11,14],[11,14]]). Please fix the comment so it matches the asserted values.

Copilot uses AI. Check for mistakes.
Comment thread toy/optimizer.ts
Comment on lines 49 to +54
if (p.value instanceof Scalar) {
const grad = p.value.derivative ?? 0;
p.value.data -= this.lr * grad;
} else if (p.value instanceof Tensor) {
const grad = p.value.grad;
if (!grad) continue;
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

step() now handles Tensor parameters, but SGD is still constructed as constructor(parameters: Parameter<Scalar>[], ...), which makes Tensor-parameter usage a type error in strict TS. Update the constructor (and/or the Optimizer base class) to accept Parameter<Scalar | Tensor>[] so the API matches the implementation.

Copilot uses AI. Check for mistakes.
@r-chong r-chong changed the title feat(toy): sin/cos/sqrt, matmul, Linear/RMSNorm/softmax, VQ/FSQ for Voxtral demo Unblock Voxtral Demo Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants