Conversation
3756bbb to
80decc2
Compare
Build toy/ to dist/toy/ with a separate tsconfig so it can be imported as @mni-ml/framework/toy without pulling in the native addon. Needed for the browser demo.
Drop parameter-property constructors in TensorHistory/ScalarHistory (node --strip-types doesn't support them). Add a webgpu module shim so tsc resolves the import without the native polyfill. Wire toy.test.ts into the runner and force an explicit exit in summarize() — the fast_ops worker pool was keeping node alive.
80decc2 to
865ac86
Compare
There was a problem hiding this comment.
Pull request overview
This PR expands the pure-TypeScript “toy” engine to support the Voxtral browser demo by adding core math ops (sin/cos/sqrt, matmul, div/transpose), basic NN building blocks (Linear/RMSNorm/softmax/etc.), and quantization modules, along with build/test plumbing for the new toy subpath export.
Changes:
- Added new toy tensor ops + autograd functions (sin/cos/sqrt/matmul, div/transpose, randn/parameter/detach, negative-dim sum/mean) and basic NN utilities (Linear, RMSNorm, softmax, mseLoss, tanh).
- Introduced quantization extensions (VectorQuantize, FiniteScalarQuantize) and a toy WebGPU typing shim.
- Updated build/test wiring: added
tsconfig.toy.json, expanded package exports, added toy tests to the runner, and ensured tests exit cleanly.
Reviewed changes
Copilot reviewed 16 out of 18 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.toy.json | Adds a separate TS build targeting toy/ → dist/toy/. |
| toy/webgpu-shim.d.ts | Declares a TS module shim for the webgpu import used by the toy GPU backend. |
| toy/tensor.ts | Adds new Tensor APIs (randn/parameter/detach, sin/cos/sqrt, matmul/div/transpose, negative-dim sum/mean). |
| toy/tensor_functions.ts | Implements new ops (sin/cos/sqrt/matmul) in the functional + autograd layer; refactors TensorHistory ctor. |
| toy/scalar_functions.ts | Refactors ScalarHistory ctor to avoid parameter properties for Node’s strip-types loader. |
| toy/optimizer.ts | Extends SGD to update Tensor parameters as well as Scalars. |
| toy/operators.ts | Adds scalar operator implementations for sin/cos/sqrt (with clamp semantics for sqrt). |
| toy/nn.ts | Introduces toy NN utilities: Linear, RMSNorm, softmax, mseLoss, tanh. |
| toy/index.ts | Re-exports the new NN module from the toy entrypoint. |
| toy/gpu_kernels.ts | Registers sin/cos/sqrt for GPU unary op generation. |
| test/toy.test.ts | Adds a comprehensive toy test suite covering new ops, autograd, NN blocks, and quantization. |
| test/run.ts | Wires test/toy.test.ts into the suite runner. |
| test/helpers.ts | Ensures summarize() always exits, preventing hangs on green runs. |
| package.json | Adds ./toy export, adds toy build step, and adds @webgpu/types dev dependency. |
| package-lock.json | Updates lockfile to match version/dependency/optional native package matrix changes. |
| extensions/quantization/vq.ts | Adds VectorQuantize module (straight-through quantization) for toy tensors. |
| extensions/quantization/fsq.ts | Adds FiniteScalarQuantize module (grid quantization) for toy tensors. |
| extensions/quantization/index.ts | Adds a small index barrel for quantization exports. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "outDir": "./dist/toy", | ||
| "types": ["node", "@webgpu/types"] | ||
| }, | ||
| "include": ["toy/**/*.ts"], |
There was a problem hiding this comment.
tsconfig.toy.json only includes toy/**/*.ts, so it will not include toy/webgpu-shim.d.ts. That means the toy build will still fail type-checking on import { create, globals } from 'webgpu' (no module declarations found). Include toy/**/*.d.ts (or broaden the include glob) so the shim participates in compilation.
| "include": ["toy/**/*.ts"], | |
| "include": ["toy/**/*.ts", "toy/**/*.d.ts"], |
| ctx.saveForBackward(result); | ||
| return result; | ||
| } | ||
| static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] { | ||
| const [result] = ctx.savedTensors; | ||
| // d/dx sqrt(x) = 1 / (2 * sqrt(x)) = grad / (2 * result) | ||
| return [gradOutput.mul(result!.mul(Tensor.tensor(2)).inv())]; |
There was a problem hiding this comment.
Sqrt.backward ignores the forward clamp semantics (operators.sqrt uses sqrt(max(x, EPS))). For inputs below EPS this should have zero gradient (the forward is constant), but the current formula returns a large non-zero gradient (1/(2*sqrt(EPS))). Save the original input in forward and mask the gradient so it’s 0 where a <= EPS, and only apply 1/(2*sqrt(a)) where unclamped.
| ctx.saveForBackward(result); | |
| return result; | |
| } | |
| static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] { | |
| const [result] = ctx.savedTensors; | |
| // d/dx sqrt(x) = 1 / (2 * sqrt(x)) = grad / (2 * result) | |
| return [gradOutput.mul(result!.mul(Tensor.tensor(2)).inv())]; | |
| ctx.saveForBackward(a); | |
| return result; | |
| } | |
| static backward(ctx: TensorContext, gradOutput: Tensor): Tensor[] { | |
| const [a] = ctx.savedTensors; | |
| const eps = Tensor.tensor(operators.EPS); | |
| const clamped = new Tensor(sqrt(a!.data)); | |
| const belowEps = a!.lt(eps); | |
| const atEps = a!.eq(eps); | |
| const unclampedMask = belowEps.add(atEps).neg().add(Tensor.tensor(1)); | |
| // Forward uses sqrt(max(x, EPS)), so the derivative is 0 where x <= EPS | |
| // and 1 / (2 * sqrt(x)) only where x > EPS. | |
| const localGrad = clamped.mul(Tensor.tensor(2)).inv().mul(unclampedMask); | |
| return [gradOutput.mul(localGrad)]; |
| inv: 'return 1.0 / x;', | ||
| sin: 'return sin(x);', | ||
| cos: 'return cos(x);', | ||
| sqrt: 'return sqrt(x);', |
There was a problem hiding this comment.
WGSL sqrt uses return sqrt(x);, but the CPU op clamps via sqrt(max(x, EPS)). This makes CPU and GPU results diverge for negative/small inputs (CPU returns sqrt(EPS), GPU will produce NaN / different gradients). Update the WGSL op body to match the clamp semantics used by operators.sqrt.
| sqrt: 'return sqrt(x);', | |
| sqrt: 'return sqrt(max(x, 0.000001));', |
| const order = [...Array(this.dims).keys()]; | ||
| const tmp = order[dim0]!; | ||
| order[dim0] = order[dim1]!; | ||
| order[dim1] = tmp; |
There was a problem hiding this comment.
transpose doesn’t normalize/validate dim0/dim1. Negative dims (e.g. -1) silently set non-index properties on the order array, resulting in no-op transposes, and out-of-range dims produce confusing behavior. Normalize negative dims like sum/mean do and throw for invalid indices before calling permute.
| const order = [...Array(this.dims).keys()]; | |
| const tmp = order[dim0]!; | |
| order[dim0] = order[dim1]!; | |
| order[dim1] = tmp; | |
| const normalizedDim0 = dim0 < 0 ? this.dims + dim0 : dim0; | |
| const normalizedDim1 = dim1 < 0 ? this.dims + dim1 : dim1; | |
| if (normalizedDim0 < 0 || normalizedDim0 >= this.dims) { | |
| throw new Error(`transpose dim0 out of range: ${dim0} for tensor with ${this.dims} dims`); | |
| } | |
| if (normalizedDim1 < 0 || normalizedDim1 >= this.dims) { | |
| throw new Error(`transpose dim1 out of range: ${dim1} for tensor with ${this.dims} dims`); | |
| } | |
| const order = [...Array(this.dims).keys()]; | |
| const tmp = order[normalizedDim0]!; | |
| order[normalizedDim0] = order[normalizedDim1]!; | |
| order[normalizedDim1] = tmp; |
| export function softmax(x: Tensor, dim: number = -1): Tensor { | ||
| if (dim < 0) dim = x.dims + dim; | ||
| const expX = x.exp(); |
There was a problem hiding this comment.
softmax is implemented as exp(x) / sum(exp(x)) without subtracting the max, which will overflow to Infinity for moderately large logits and can yield NaNs. Implement a numerically-stable softmax by subtracting max(x) along dim before exponentiating (even if it requires a small helper until a max reduction exists).
| export function softmax(x: Tensor, dim: number = -1): Tensor { | |
| if (dim < 0) dim = x.dims + dim; | |
| const expX = x.exp(); | |
| function maxAlongDim(x: Tensor, dim: number): Tensor { | |
| return (x as Tensor & { max: (dim: number) => Tensor }).max(dim); | |
| } | |
| export function softmax(x: Tensor, dim: number = -1): Tensor { | |
| if (dim < 0) dim = x.dims + dim; | |
| const shifted = x.sub(maxAlongDim(x, dim)); | |
| const expX = shifted.exp(); |
| }, | ||
| "./toy": { | ||
| "types": "./dist/toy/index.d.ts", | ||
| "import": "./dist/toy/index.js" |
There was a problem hiding this comment.
With an exports map present, consumers cannot deep-import extensions/quantization/* unless it is exported. Right now only . and ./toy are exported, so VectorQuantize/FiniteScalarQuantize are unreachable from the published package. Either re-export quantization from ./toy (and ensure it’s emitted into dist/toy) or add an explicit subpath export for the quantization entrypoint(s).
| "import": "./dist/toy/index.js" | |
| "import": "./dist/toy/index.js" | |
| }, | |
| "./extensions/quantization/*": { | |
| "types": "./dist/extensions/quantization/*.d.ts", | |
| "import": "./dist/extensions/quantization/*.js" |
| // dL/dA = ones(2,2) @ B^T | ||
| // B^T = [[5,7],[6,8]], ones @ B^T = [[11,14],[11,14]] | ||
| assertClose(a.grad.get([0, 0]), 11, 1e-5, 'matmul grad A [0,0]'); | ||
| assertClose(a.grad.get([0, 1]), 15, 1e-5, 'matmul grad A [0,1]'); | ||
| // dL/dB = A^T @ ones(2,2) | ||
| // A^T = [[1,3],[2,4]], A^T @ ones = [[4,4],[6,6]] | ||
| assertClose(b.grad.get([0, 0]), 4, 1e-5, 'matmul grad B [0,0]'); |
There was a problem hiding this comment.
The comment for dL/dA is incorrect: ones(2,2) @ B^T with B^T=[[5,7],[6,8]] yields [[11,15],[11,15]] (not [[11,14],[11,14]]). Please fix the comment so it matches the asserted values.
| if (p.value instanceof Scalar) { | ||
| const grad = p.value.derivative ?? 0; | ||
| p.value.data -= this.lr * grad; | ||
| } else if (p.value instanceof Tensor) { | ||
| const grad = p.value.grad; | ||
| if (!grad) continue; |
There was a problem hiding this comment.
step() now handles Tensor parameters, but SGD is still constructed as constructor(parameters: Parameter<Scalar>[], ...), which makes Tensor-parameter usage a type error in strict TS. Update the constructor (and/or the Optimizer base class) to accept Parameter<Scalar | Tensor>[] so the API matches the implementation.
Consolidates the toy stack (tts → tts-matmul → tts-nn → tts-vq → tts-fsq) into one PR. Adds the pure-TS primitives the Voxtral demo needs on the browser engine.
Ops / modules
Rebase fixes
Stacks on #41. Together they unblock the Voxtral demo.